Inconsistent Workbook Range - XLSX/ODS #3164
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#3164
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Consider the below files
NOTE: These files contain the same content, exported in two different formats (.xlsx and .ods)
The workbook consist of two sheets Sheet1 and Sheet2
Both have same contents.
When using .read() on the .xlsx file
The sheet data are as follows
Sheet 1 { "!ref": "A2:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } }
Sheet 2 { "!ref": "A1:C2", "A2": { "h": "a", "r": "<t>a</t>", "t": "s", "v": "a", "w": "a" }, "B2": { "h": "b", "r": "<t>b</t>", "t": "s", "v": "b", "w": "b" }, "C2": { "h": "c", "r": "<t>c</t>", "t": "s", "v": "c", "w": "c" } }
Note the differences in range ("A2:C2" vs "A1:C2")
"A1:C2" is the correct one since the first row is empty and need to be included
And on using .read() on the .ods file,
Both the sheets contain same data
{ "A2": { "t": "s", "v": "a", "w": "a" }, "B2": { "t": "s", "v": "b", "w": "b" }, "C2": { "t": "s", "v": "c", "w": "c" }, "!ref": "A2:C2" }
Here the range is "A2:C2" for both sheets, despite these are the same files just being exported in two different formats (.xlsx and .ods)
This inconsistency affects the calculation being perforemed with this data
At a high level, both ODS and XLSX use XML with elements corresponding to the entire table (similar to HTML
TABLE
), a single row (HTMLTR
) and a single cell (HTMLTD
).XLSX allows worksheets to specify a range that does not include the first row or first column (e.g.
C4:D9
), which is why SheetJS worksheet ranges can start from a point other than A1.Both worksheets in the XLSX file omit the range, so it is calculated from the existing cells.
"Sheet 1" specifies cells A2, B2, and C2, so the range is
A2:C2
"Sheet 2" specifies cell B1 as well as A2, B2 and C2, so the final range is
A1:C2
You will receive the expected A1:C2 range in this case by passing the option
nodim: true
toread
orreadFile
. This will force the XLSX parser to look at the cells to determine the range. It pins the start of the worksheet to A1.Setting nodim:true, solves the case for .xlsx files.
But for .ods file, the range is still A2:C2 for both the sheets.
Is there a reason behind this behaviour across different file formats
XLSX allows worksheets to specify a range and allows rows to specify an offset. ODS does not.
Like HTML tables or CSV files, an ODS worksheet that starts on row 2 must start with a blank row.
Is there a workaround that you would suggest trying,
Like manually setting the ranges to start from A1
nodim: true
+ forcefully setting the start to A1 will give consistent results across formats.We'll revisit the range logic for ODS / HTML / CSV / etc. It might make sense to apply an offset when the first rows or columns are blank.