XLSX.utils.json_to_sheet and XLSX.utils.aoa_to_sheet stuck in infinite loop after certain data size #2989
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#2989
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We export a dataset with a million rows and 10 columns. Trying out both
XLSX.utils.json_to_sheet
andXLSX.utils.aoa_to_sheet
, we ran into this frustrating behavior, that SheetJS would just get stuck on datasets bigger than ~600k rows.No errors, no crashes, just one Thread pinned 100%, with the tab freezing and never returning to normal.
Also tried using WebWorkers with the exact same result. The UI Tab does not freeze, but the Worker never returns from an infinite state of being being pinned 100%, even after 15mins of waiting, where as normally this command only takes ~2 seconds for 500k rows. Same behavior across Chrome, Edge, Firefox.
What's happening here is some kind of browser memory limit being hit and SheetJS not recovering from that, I guess. Nothing ever throws any kind of error though...
The solution we came up with is the following:
Split our datasets into sheets of 300k rows each
const ws: XLSX.WorkSheet = XLSX.utils.json_to_sheet(xls.slice(i + 1 - maxDataPointsPerSheet, i + 1));
,book_append_sheet()
each of them to one finalXLSX.writeFile(wb, this.data.filename + ".xlsx", { bookType: 'xlsx', compression: true });
. Then we tell the user:Hey, if you want everything in one sheet to output graphs and what not, please copy paste all the sheets together back into one.
This works, but it would be excellent, if there was some kind of way to do this in code. Either by: Somehow memory managing around this problem to output as one sheet, or with the ability to combine multiple sheets into one sheets after the fact.
"xlsx": "^0.18.5",
Try passing
dense: true
to either function. https://docs.sheetjs.com/docs/demos/bigdata/stream#dense-mode describes in more detail.There's definitely room for improvement, but it appears to run.
https://jsfiddle.net/w67scg3o/0/ with no compression, the entire process takes 17 seconds to generate an uncompressed 348MB XLSX file.
https://jsfiddle.net/w67scg3o/1/ with compression, the entire process takes 23 seconds to generate compressed 111MB file. It can be compressed further (#2974 tracking issue).
IF you are hitting the string length limit (~512MB), the browser will crash and Chrome will show the "Oh Snap" message.
PS: https://docs.sheetjs.com/docs/getting-started/installation/standalone you may need to update.
Yes! This worked beautifully! All solved.
I already found out about dense mode here: https://github.com/SheetJS/sheetjs/issues/2750
But what had me confused, is typescript refusing to take this option an listing
dense
as unrecognized. Judging from the age of that issue being last yeat, I thoughtdense
mode was just turned on by default.Indeed we were
"xlsx": "^0.18.5"
and after reading the doc about the legacy endpoint https://docs.sheetjs.com/docs/getting-started/installation/nodejs#legacy-endpoints being the culprit of having an older version, properly upgraded. But here is the thing, thedense
option is still not recognized by typescript as an option and I had to do this hack:Even though the types file for
XLSX
listed dense as a boolean option. So I'm not sure what's going on here.It looks like the type for
aoa_to_sheet
is missing that option (it's included inParsingOptions
, used byread
andreadFile
, but not inAOA2SheetOpts
).You can submit a patch if you like. https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/types/index.d.ts is the relevant type definition.
The best approach is probably to pull out the
dense
parameter into a separate interface:and make
ParsingOptions
,JSON2SheetOpts
andSheetAOAOpts
extendDenseOption
248108b667