Inconsistent Workbook Range - XLSX/ODS #3164

Open
opened 2024-07-25 06:00:42 +00:00 by s-ashwin · 5 comments

Consider the below files
NOTE: These files contain the same content, exported in two different formats (.xlsx and .ods)

The workbook consist of two sheets Sheet1 and Sheet2
Both have same contents.

When using .read() on the .xlsx file
The sheet data are as follows

Sheet 1 { "!ref": "A2:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } }

Sheet 2 { "!ref": "A1:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } }

Note the differences in range ("A2:C2" vs "A1:C2")
"A1:C2" is the correct one since the first row is empty and need to be included

And on using .read() on the .ods file,
Both the sheets contain same data

{ "A2": { "t": "s", "v": "a", "w": "a" }, "B2": { "t": "s", "v": "b", "w": "b" }, "C2": { "t": "s", "v": "c", "w": "c" }, "!ref": "A2:C2" }

Here the range is "A2:C2" for both sheets, despite these are the same files just being exported in two different formats (.xlsx and .ods)

This inconsistency affects the calculation being perforemed with this data

Consider the below files NOTE: These files contain the same content, exported in two different formats (.xlsx and .ods) The workbook consist of two sheets Sheet1 and Sheet2 Both have same contents. When using .read() on the .xlsx file The sheet data are as follows `Sheet 1 { "!ref": "A2:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } } ` `Sheet 2 { "!ref": "A1:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } } ` Note the differences in range ("A2:C2" vs "A1:C2") "A1:C2" is the correct one since the first row is empty and need to be included And on using .read() on the .ods file, Both the sheets contain same data `{ "A2": { "t": "s", "v": "a", "w": "a" }, "B2": { "t": "s", "v": "b", "w": "b" }, "C2": { "t": "s", "v": "c", "w": "c" }, "!ref": "A2:C2" }` Here the range is "A2:C2" for both sheets, despite these are the same files just being exported in two different formats (.xlsx and .ods) This inconsistency affects the calculation being perforemed with this data
Owner

At a high level, both ODS and XLSX use XML with elements corresponding to the entire table (similar to HTML TABLE), a single row (HTML TR) and a single cell (HTML TD).

XLSX allows worksheets to specify a range that does not include the first row or first column (e.g. C4:D9), which is why SheetJS worksheet ranges can start from a point other than A1.


Both worksheets in the XLSX file omit the range, so it is calculated from the existing cells.

"Sheet 1" specifies cells A2, B2, and C2, so the range is A2:C2

"Sheet 2" specifies cell B1 as well as A2, B2 and C2, so the final range is A1:C2


You will receive the expected A1:C2 range in this case by passing the option nodim: true to read or readFile. This will force the XLSX parser to look at the cells to determine the range. It pins the start of the worksheet to A1.

At a high level, both ODS and XLSX use XML with elements corresponding to the entire table (similar to HTML `TABLE`), a single row (HTML `TR`) and a single cell (HTML `TD`). XLSX allows worksheets to specify a range that does not include the first row or first column (e.g. `C4:D9`), which is why SheetJS worksheet ranges can start from a point other than A1. --- Both worksheets in the XLSX file omit the range, so it is calculated from the existing cells. "Sheet 1" specifies cells A2, B2, and C2, so the range is `A2:C2` **"Sheet 2" specifies cell B1** as well as A2, B2 and C2, so the final range is `A1:C2` --- You will receive the expected A1:C2 range in this case by passing the option [`nodim: true` to `read` or `readFile`](https://docs.sheetjs.com/docs/api/parse-options#range). This will force the XLSX parser to look at the cells to determine the range. It pins the start of the worksheet to A1.
Author

Setting nodim:true, solves the case for .xlsx files.
But for .ods file, the range is still A2:C2 for both the sheets.

Is there a reason behind this behaviour across different file formats

Setting nodim:true, solves the case for .xlsx files. But for .ods file, the range is still A2:C2 for both the sheets. Is there a reason behind this behaviour across different file formats
Owner

XLSX allows worksheets to specify a range and allows rows to specify an offset. ODS does not.

Like HTML tables or CSV files, an ODS worksheet that starts on row 2 must start with a blank row.

XLSX allows worksheets to specify a range and allows rows to specify an offset. ODS does not. Like HTML tables or CSV files, an ODS worksheet that starts on row 2 must start with a blank row.
Author

Is there a workaround that you would suggest trying,
Like manually setting the ranges to start from A1

Is there a workaround that you would suggest trying, Like manually setting the ranges to start from A1
Owner

nodim: true + forcefully setting the start to A1 will give consistent results across formats.

We'll revisit the range logic for ODS / HTML / CSV / etc. It might make sense to apply an offset when the first rows or columns are blank.

`nodim: true` + forcefully setting the start to A1 will give consistent results across formats. We'll revisit the range logic for ODS / HTML / CSV / etc. It might make sense to apply an offset when the first rows or columns are blank.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3164
No description provided.