Parsing xlsx with broken docProps #759
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#759
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm having trouble reading a spreadsheet which has been generated by another company's report server. The file
docProps/app.xml
contains this invalid looking ooxml. Note that the vector is ofsize
2 but only contains one string:The parser throws when it reaches this line:
https://github.com/SheetJS/js-xlsx/blob/master/bits/22_xmlutils.js#L145
The application name in the doc properties is given as
Microsoft Excel
but that may well be untrue.For our particular case, commenting out the check on the size of the vector allows the parser to carry on succesfully. Would it make sense to have a switch to turn errors into warnings or maybe a switch to disable reading properties altogether? Commenting out this line also lets us parse the file:
https://github.com/SheetJS/js-xlsx/blob/master/bits/85_parsezip.js#L89
Skipping properties seems to work for us:
https://github.com/SheetJS/js-xlsx/compare/master...djbeaumont:skip-props?expand=1
@djbeaumont that's certainly unexpected! Instead of a new option to skip all of the extended properties, let's change that error to only throw in WTF mode. This is the git diff from the current HEAD:
If you could confirm that works, we'd accept it as a PR :)
That works for me 👍 Thanks for the very quick response @SheetJSDev