Support XLSX XML sheet files with missing t attributes #2843

Closed
opened 2022-12-29 17:52:44 +00:00 by kennu · 3 comments

Our bank exports XLSX files with sheet1.xml files that have HYPERLINK formulas to indicate file attachments. They are missing the t-attribute that specifies the field type:

  <c r="M299" s="3">
    <f>HYPERLINK("https://www.dropbox.com/xxx","xxx.pdf")</f>
    <v></v>
  </c>

I propose adding a default case to the t attribute check in 1ca49a50bd/bits/67_wsxml.js (L428) so that a missing t attribute would be treated as t="str", allowing to import these files.

Currently the xlsx module just ignores these fields, resulting in a partial import, and there is no way to access the missing fields.

Our bank exports XLSX files with sheet1.xml files that have HYPERLINK formulas to indicate file attachments. They are missing the t-attribute that specifies the field type: ``` <c r="M299" s="3"> <f>HYPERLINK("https://www.dropbox.com/xxx","xxx.pdf")</f> <v></v> </c> ``` I propose adding a default case to the t attribute check in https://github.com/SheetJS/sheetjs/blob/1ca49a50bd22d22f45196672a26f6262f5304b62/bits/67_wsxml.js#L428 so that a missing t attribute would be treated as t="str", allowing to import these files. Currently the xlsx module just ignores these fields, resulting in a partial import, and there is no way to access the missing fields.
Owner

To ensure the cell is parsed, pass the option sheetStubs: true when reading the file.

As for type deduction, the official schema in ECMA-376 defines the t attribute as follows:

  <xsd:attribute name="t" type="ST_CellType" use="optional" default="n"/>

It might make sense to speculate on the cell type by using the formula (there are a number of formula functions whose return type is textual or logical), but sheetStubs should address the immediate problem.

To ensure the cell is parsed, pass the option `sheetStubs: true` when reading the file. As for type deduction, the official schema in ECMA-376 defines the `t` attribute as follows: ```xml <xsd:attribute name="t" type="ST_CellType" use="optional" default="n"/> ``` It might make sense to speculate on the cell type by using the formula (there are a number of formula functions whose return type is textual or logical), but `sheetStubs` should address the immediate problem.
Author

Thanks for the great tip @sheetjs! I tried most of the other read parameters, but didn't realize sheetStubs would help. Now I can see the original formulas in the imported object. I've also notified the bank that their file export is a bit broken.

Thanks for the great tip @sheetjs! I tried most of the other read parameters, but didn't realize sheetStubs would help. Now I can see the original formulas in the imported object. I've also notified the bank that their file export is a bit broken.
Owner

We'll consider flipping that default in the next release.

We'll consider flipping that default in the next release.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2843
No description provided.