XLSX.utils.json_to_sheet and XLSX.utils.aoa_to_sheet stuck in infinite loop after certain data size #2989

Closed
opened 2023-09-14 08:23:54 +00:00 by FrostKiwi · 4 comments

We export a dataset with a million rows and 10 columns. Trying out both XLSX.utils.json_to_sheet and XLSX.utils.aoa_to_sheet, we ran into this frustrating behavior, that SheetJS would just get stuck on datasets bigger than ~600k rows.

No errors, no crashes, just one Thread pinned 100%, with the tab freezing and never returning to normal.
Also tried using WebWorkers with the exact same result. The UI Tab does not freeze, but the Worker never returns from an infinite state of being being pinned 100%, even after 15mins of waiting, where as normally this command only takes ~2 seconds for 500k rows. Same behavior across Chrome, Edge, Firefox.

What's happening here is some kind of browser memory limit being hit and SheetJS not recovering from that, I guess. Nothing ever throws any kind of error though...

The solution we came up with is the following:
Split our datasets into sheets of 300k rows each const ws: XLSX.WorkSheet = XLSX.utils.json_to_sheet(xls.slice(i + 1 - maxDataPointsPerSheet, i + 1));, book_append_sheet() each of them to one final XLSX.writeFile(wb, this.data.filename + ".xlsx", { bookType: 'xlsx', compression: true });. Then we tell the user: Hey, if you want everything in one sheet to output graphs and what not, please copy paste all the sheets together back into one.

This works, but it would be excellent, if there was some kind of way to do this in code. Either by: Somehow memory managing around this problem to output as one sheet, or with the ability to combine multiple sheets into one sheets after the fact.

"xlsx": "^0.18.5",

We export a dataset with a million rows and 10 columns. Trying out both `XLSX.utils.json_to_sheet` and `XLSX.utils.aoa_to_sheet`, we ran into this frustrating behavior, that SheetJS would just get stuck on datasets bigger than ~600k rows. No errors, no crashes, just one Thread pinned 100%, with the tab freezing and never returning to normal. Also tried using WebWorkers with the exact same result. The UI Tab does not freeze, but the Worker never returns from an infinite state of being being pinned 100%, even after 15mins of waiting, where as normally this command only takes ~2 seconds for 500k rows. Same behavior across Chrome, Edge, Firefox. What's happening here is some kind of browser memory limit being hit and SheetJS not recovering from that, I guess. Nothing ever throws any kind of error though... The solution we came up with is the following: Split our datasets into sheets of 300k rows each `const ws: XLSX.WorkSheet = XLSX.utils.json_to_sheet(xls.slice(i + 1 - maxDataPointsPerSheet, i + 1));`, `book_append_sheet()` each of them to one final `XLSX.writeFile(wb, this.data.filename + ".xlsx", { bookType: 'xlsx', compression: true });`. Then we tell the user: `Hey, if you want everything in one sheet to output graphs and what not, please copy paste all the sheets together back into one.` This works, but it would be excellent, if there was some kind of way to do this in code. Either by: Somehow memory managing around this problem to output as one sheet, or with the ability to combine multiple sheets into one sheets after the fact. `"xlsx": "^0.18.5",`
Owner

Try passing dense: true to either function. https://docs.sheetjs.com/docs/demos/bigdata/stream#dense-mode describes in more detail.

There's definitely room for improvement, but it appears to run.

/* create data set of 1M rows and 10 columns. The first column is text */
console.time("data");
var data = Array.from({length:1e6}, (_, i) => Array.from({length: 10}, (_, j) => j == 0 ? ""+i : i+j));
console.timeEnd("data");

/* generate worksheet */
console.time("ws");
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
console.timeEnd("ws");

console.time("xlsx");
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Export");
XLSX.writeFile(wb, "issue2989.xlsx");
console.timeEnd("xlsx");

https://jsfiddle.net/w67scg3o/0/ with no compression, the entire process takes 17 seconds to generate an uncompressed 348MB XLSX file.

https://jsfiddle.net/w67scg3o/1/ with compression, the entire process takes 23 seconds to generate compressed 111MB file. It can be compressed further (#2974 tracking issue).

IF you are hitting the string length limit (~512MB), the browser will crash and Chrome will show the "Oh Snap" message.

PS: https://docs.sheetjs.com/docs/getting-started/installation/standalone you may need to update.

Try passing `dense: true` to either function. https://docs.sheetjs.com/docs/demos/bigdata/stream#dense-mode describes in more detail. There's definitely room for improvement, but it appears to run. ```js /* create data set of 1M rows and 10 columns. The first column is text */ console.time("data"); var data = Array.from({length:1e6}, (_, i) => Array.from({length: 10}, (_, j) => j == 0 ? ""+i : i+j)); console.timeEnd("data"); /* generate worksheet */ console.time("ws"); const ws = XLSX.utils.aoa_to_sheet(data, {dense: true}); console.timeEnd("ws"); console.time("xlsx"); const wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Export"); XLSX.writeFile(wb, "issue2989.xlsx"); console.timeEnd("xlsx"); ``` https://jsfiddle.net/w67scg3o/0/ with no compression, the entire process takes 17 seconds to generate an uncompressed 348MB XLSX file. https://jsfiddle.net/w67scg3o/1/ with compression, the entire process takes 23 seconds to generate compressed 111MB file. It can be compressed further (#2974 tracking issue). IF you are hitting the string length limit (~512MB), the browser will crash and Chrome will show the "Oh Snap" message. PS: https://docs.sheetjs.com/docs/getting-started/installation/standalone you may need to update.
Author

Try passing dense: true to either function. https://docs.sheetjs.com/docs/demos/bigdata/stream#dense-mode describes in more detail.

Yes! This worked beautifully! All solved.
I already found out about dense mode here: https://github.com/SheetJS/sheetjs/issues/2750
But what had me confused, is typescript refusing to take this option an listing dense as unrecognized. Judging from the age of that issue being last yeat, I thought dense mode was just turned on by default.

PS: https://docs.sheetjs.com/docs/getting-started/installation/standalone you may need to update.

Indeed we were "xlsx": "^0.18.5" and after reading the doc about the legacy endpoint https://docs.sheetjs.com/docs/getting-started/installation/nodejs#legacy-endpoints being the culprit of having an older version, properly upgraded. But here is the thing, the dense option is still not recognized by typescript as an option and I had to do this hack:

const ws = XLSX.utils.aoa_to_sheet(aoa, { dense: true } as any);

Even though the types file for XLSX listed dense as a boolean option. So I'm not sure what's going on here.

> Try passing `dense: true` to either function. https://docs.sheetjs.com/docs/demos/bigdata/stream#dense-mode describes in more detail. Yes! This worked beautifully! All solved. I already found out about dense mode here: https://github.com/SheetJS/sheetjs/issues/2750 But what had me confused, is typescript refusing to take this option an listing `dense` as unrecognized. Judging from the age of that issue being last yeat, I thought `dense` mode was just turned on by default. > PS: https://docs.sheetjs.com/docs/getting-started/installation/standalone you may need to update. Indeed we were `"xlsx": "^0.18.5"` and after reading the doc about the legacy endpoint https://docs.sheetjs.com/docs/getting-started/installation/nodejs#legacy-endpoints being the culprit of having an older version, properly upgraded. But here is the thing, the `dense` option is still not recognized by typescript as an option and I had to do this hack: ```typescript const ws = XLSX.utils.aoa_to_sheet(aoa, { dense: true } as any); ``` Even though the types file for `XLSX` listed dense as a boolean option. So I'm not sure what's going on here.
Owner

It looks like the type for aoa_to_sheet is missing that option (it's included in ParsingOptions, used by read and readFile, but not in AOA2SheetOpts).

You can submit a patch if you like. https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/types/index.d.ts is the relevant type definition.

The best approach is probably to pull out the dense parameter into a separate interface:

export interface DenseOption {
    /** If true, generate dense-mode worksheets */
    dense?: boolean;
}

and make ParsingOptions, JSON2SheetOpts and SheetAOAOpts extend DenseOption

It looks like the type for `aoa_to_sheet` is missing that option (it's included in `ParsingOptions`, used by `read` and `readFile`, but not in `AOA2SheetOpts`). You can submit a patch if you like. https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/types/index.d.ts is the relevant type definition. The best approach is probably to pull out the `dense` parameter into a separate interface: ```ts export interface DenseOption { /** If true, generate dense-mode worksheets */ dense?: boolean; } ``` and make `ParsingOptions`, `JSON2SheetOpts` and `SheetAOAOpts` extend `DenseOption`
Owner
https://git.sheetjs.com/sheetjs/sheetjs/commit/248108b667a79f105fb3029c88186f09696418ba
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2989
No description provided.