json_to_sheet hanging when using large JSON file #2750

Closed
opened 2022-07-28 16:00:26 +00:00 by mcbergsma · 11 comments
mcbergsma commented 2022-07-28 16:00:26 +00:00 (Migrated from github.com)

I've got a fairly large JSON file I'm trying to write to a sheet. The file is ~353MB, with a length (when stringified) of 361,514,182.

It just hangs. I can get it working by chunking the Object and writing it to different sheets, but I can't write it all to a single sheet. I've also tried appending the chunks to the bottom of the sheet, but that also fails once it get's to the "latter" chunks.

I've just tried moving from v0.17 to 0.18 to see if that helps, but also no luck.

Running Node 16.15.1, Windows 11. Any ideas or workarounds? It's annoying having to manually open the files and paste the sheets back together.

Perhaps also worth noting that since the move to 0.18, the filesize has gone from 37MB to 98MB.

I've got a fairly large JSON file I'm trying to write to a sheet. The file is ~353MB, with a length (when stringified) of 361,514,182. It just hangs. I can get it working by chunking the Object and writing it to different sheets, but I can't write it all to a single sheet. I've also tried appending the chunks to the bottom of the sheet, but that also fails once it get's to the "latter" chunks. I've just tried moving from v0.17 to 0.18 to see if that helps, but also no luck. Running Node 16.15.1, Windows 11. Any ideas or workarounds? It's annoying having to manually open the files and paste the sheets back together. Perhaps also worth noting that since the move to 0.18, the filesize has gone from 37MB to 98MB.
SheetJSDev commented 2022-07-30 18:12:48 +00:00 (Migrated from github.com)

You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696

Try passing the option dense: true to json_to_sheet or aoa_to_sheet.

You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696 Try passing the option `dense: true` to `json_to_sheet` or `aoa_to_sheet`.
mcbergsma commented 2022-07-30 19:17:06 +00:00 (Migrated from github.com)

Thanks for the suggestion! Tried this let ws = XLSX.utils.json_to_sheet(fullResult, { dense: true }) but it still hangs.
I don't even see a reference to dense in the docs! I moved back to v0.17.5
on account of the file size issue, is that an option for 0.18 only?

On Sat, Jul 30, 2022 at 2:12 PM SheetJSDev @.***> wrote:

You might be running into
https://bugs.chromium.org/p/v8/issues/detail?id=6696

Try passing the option dense: true to json_to_sheet or aoa_to_sheet.


Reply to this email directly, view it on GitHub
https://github.com/SheetJS/sheetjs/issues/2750#issuecomment-1200268577,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA3TZY6K27TSTWUJFMJPGN3VWVWCVANCNFSM545Z3FZQ
.
You are receiving this because you authored the thread.Message ID:
@.***>

Thanks for the suggestion! Tried this `let ws = XLSX.utils.json_to_sheet(fullResult, { dense: true })` but it still hangs. I don't even see a reference to dense in the docs! I moved back to v0.17.5 on account of the file size issue, is that an option for 0.18 only? On Sat, Jul 30, 2022 at 2:12 PM SheetJSDev ***@***.***> wrote: > You might be running into > https://bugs.chromium.org/p/v8/issues/detail?id=6696 > > Try passing the option dense: true to json_to_sheet or aoa_to_sheet. > > — > Reply to this email directly, view it on GitHub > <https://github.com/SheetJS/sheetjs/issues/2750#issuecomment-1200268577>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AA3TZY6K27TSTWUJFMJPGN3VWVWCVANCNFSM545Z3FZQ> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
SheetJSDev commented 2022-07-30 19:30:51 +00:00 (Migrated from github.com)

The worksheet representation has a long history. We compared the hash vs array of arrays back in 2012 and found that IE6 performed much better with the Object. V8 also performed better with the object approach until some major changes under the hood heavily penalized large objects.

After V8 regressed, with no sign of improvement from Google's side, we revisited the array of arrays approach. That's what dense enables. Changing the default representation will require a major version bump.

If you are using compression: true and seeing a jump in file size, it's because we fully switched the ZIP engine and the sliding window hasn't been optimized

Can you give a rough estimate for the number of rows in the object? If you could find a way to share the data, that would also be helpful

The worksheet representation has a long history. We compared the hash vs array of arrays back in 2012 and found that IE6 performed much better with the Object. V8 also performed better with the object approach until some major changes under the hood heavily penalized large objects. After V8 regressed, with no sign of improvement from Google's side, we revisited the array of arrays approach. That's what `dense` enables. Changing the default representation will require a major version bump. If you are using `compression: true` and seeing a jump in file size, it's because we fully switched the ZIP engine and the sliding window hasn't been optimized Can you give a rough estimate for the number of rows in the object? If you could find a way to share the data, that would also be helpful
mcbergsma commented 2022-08-03 01:21:56 +00:00 (Migrated from github.com)

Ahh IE6, to blame for so many things! ;)

In this particular case I really can't share the data because it's confidential, but I'll try and provide as much detail as possible outside the actual data.

It's a data pull from a number of data sources that are processed and put together as an array of objects. The most recent one, run just this afternoon is 269496 rows total - and array of that many objects.

While not every object contains every one of these keys, most do. I've changed the actual name in most cases, but this is a list of name and type:

UniqueID	number
Study Name	string
Modified By	string
Country	string
Language	string
Target	string
Category	string
Parent Category	string
Segment	string
Filter Type	string
Filter	string
Product	string
Description	string (longer)
Image URL	string
File Name	string
File Size	string
Tags	string
KnownTag	string
KeyTag	string
Population	number
Group	string
Status	string
Created	date
productId	number
result1	decimal
result2	decimal
result3	decimal
quad	string
index	number
keyscore	number
keyscore2	number
check	string

To be clear, I saw the jump in size (roughly x3) with compression: true in v0.18. I dropped back to .17 and it went back to normal size - approx 36MB.

Ahh IE6, to blame for so many things! ;) In this particular case I really can't share the data because it's confidential, but I'll try and provide as much detail as possible outside the actual data. It's a data pull from a number of data sources that are processed and put together as an array of objects. The most recent one, run just this afternoon is 269496 rows total - and array of that many objects. While not every object contains every one of these keys, most do. I've changed the actual name in most cases, but this is a list of name and type: ```Study ID number UniqueID number Study Name string Modified By string Country string Language string Target string Category string Parent Category string Segment string Filter Type string Filter string Product string Description string (longer) Image URL string File Name string File Size string Tags string KnownTag string KeyTag string Population number Group string Status string Created date productId number result1 decimal result2 decimal result3 decimal quad string index number keyscore number keyscore2 number check string ``` To be clear, I saw the jump in size (roughly x3) with `compression: true` in v0.18. I dropped back to .17 and it went back to normal size - approx 36MB.
MHamzaRajput commented 2022-10-02 06:43:58 +00:00 (Migrated from github.com)

Got the same error on json_to_sheet function with 421060 rows and 20 columns. it stuck almost an hour with no output
Any help?

Got the same error on json_to_sheet function with 421060 rows and 20 columns. it stuck almost an hour with no output Any help?
MHamzaRajput commented 2022-10-02 06:45:14 +00:00 (Migrated from github.com)

Is there any debug mode so that I can able to see any progress and where it is stuck?

Is there any debug mode so that I can able to see any progress and where it is stuck?

json_to_sheet( data, { dense:true } ) hangs with 766734 rows(11 columns). on 0.19.1

Unless there's a workaround or a fix for this issue, are the limitations mentioned somewhere?

json_to_sheet( data, { dense:true } ) hangs with 766734 rows(11 columns). on 0.19.1 Unless there's a workaround or a fix for this issue, are the limitations mentioned somewhere?

https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred

That said, the current limits apply on a per-worksheet level. The limit is driven by how the worksheet is written (a large string is constructed).

For the maximum worksheet rows (1048576 rows), you can write 10 columns (https://jsfiddle.net/L462gdjy/ demo)

For sheets with a small number of strings repeated many times, bookSST reduces size by de-duplicating text using the shared string table.

https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred That said, the current limits apply on a per-worksheet level. The limit is driven by how the worksheet is written (a large string is constructed). For the maximum worksheet rows (1048576 rows), you can write 10 columns (https://jsfiddle.net/L462gdjy/ demo) For sheets with a small number of strings repeated many times, `bookSST` reduces size by de-duplicating text using the shared string table.

can confirm this is on serverside(nestjs) and simply hanging at worksheet creation (not workbook) with 766,734 rows and 11 columns. can't find anything related in the link provided

can confirm this is on serverside(nestjs) and simply hanging at worksheet creation (not workbook) with 766,734 rows and 11 columns. can't find anything related in the link provided

Let's try the equivalent of the demo in NodeJS:

var XLSX = require("xlsx");
const data = Array.from({length:1048576}, (_,i) => Array.from({length:12}, (_,j) => j == 0 ? "" + i : i+j));
console.time('write');
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
XLSX.utils.book_append_sheet(wb, ws, "SheetJS");
XLSX.writeFile(wb, "issue2750.xlsx");
console.timeEnd('write');
console.log(process.memoryUsage());

Locally. the file is created and the process prints:

write: 20.393s
{
  rss: 2727473152,
  heapTotal: 1370050560,
  heapUsed: 1310882368,
  external: 1301035504,
  arrayBuffers: 1300701688
}
Let's try the equivalent of the demo in NodeJS: ```js var XLSX = require("xlsx"); const data = Array.from({length:1048576}, (_,i) => Array.from({length:12}, (_,j) => j == 0 ? "" + i : i+j)); console.time('write'); const wb = XLSX.utils.book_new(); const ws = XLSX.utils.aoa_to_sheet(data, {dense: true}); XLSX.utils.book_append_sheet(wb, ws, "SheetJS"); XLSX.writeFile(wb, "issue2750.xlsx"); console.timeEnd('write'); console.log(process.memoryUsage()); ``` Locally. the file is created and the process prints: ``` write: 20.393s { rss: 2727473152, heapTotal: 1370050560, heapUsed: 1310882368, external: 1301035504, arrayBuffers: 1300701688 } ```

During the processing, it will seem to hang the server.

Since it takes a while to write, if you are doing this in the server, it is recommended to use a worker thread or subprocess. An example of worker threads is now in the docs: https://docs.sheetjs.com/docs/demos/server#worker-threads

#2834 for more context on the example.

During the processing, it will seem to hang the server. Since it takes a while to write, if you are doing this in the server, it is recommended to use a worker thread or subprocess. An example of worker threads is now in the docs: https://docs.sheetjs.com/docs/demos/server#worker-threads https://git.sheetjs.com/sheetjs/sheetjs/issues/2834 for more context on the example.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2750
No description provided.