XLSX.write() taking too long #77

Open
opened 2014-06-28 22:05:46 +00:00 by irfanyounas · 39 comments
irfanyounas commented 2014-06-28 22:05:46 +00:00 (Migrated from github.com)

Hi,
I have tried to write 3 columns and N number of rows, below we can see the time taken (on my i3 machine):

N --------------- TIME (s)
10000 --------------- 9
20000 --------------- 35
30000 --------------- 90
65000 --------------- 453

We can see that as we increase the number of rows, the time taken by the write() function increases dramatically (its not linear), its taking too long for 65000 records.
Can we have some other way which is efficient?

Hi, I have tried to write 3 columns and N number of rows, below we can see the time taken (on my i3 machine): N --------------- TIME (s) 10000 --------------- 9 20000 --------------- 35 30000 --------------- 90 65000 --------------- 453 We can see that as we increase the number of rows, the time taken by the write() function increases dramatically (its not linear), its taking too long for 65000 records. Can we have some other way which is efficient?
SheetJSDev commented 2014-06-28 22:37:21 +00:00 (Migrated from github.com)

I am not happy with the performance at all (granted, our first goal was to be correct). There are quite a few slow/inefficient operations that can be improved.

When I last investigated, the runtime was dominated by functions from the ZIP library (we are currently using https://github.com/Stuk/jszip), so I'm working on a new component right now (https://github.com/SheetJS/js-crc32 and https://github.com/SheetJS/js-adler32 are the first few pieces of the puzzle).

I am not happy with the performance at all (granted, our first goal was to be correct). There are quite a few slow/inefficient operations that can be improved. When I last investigated, the runtime was dominated by functions from the ZIP library (we are currently using https://github.com/Stuk/jszip), so I'm working on a new component right now (https://github.com/SheetJS/js-crc32 and https://github.com/SheetJS/js-adler32 are the first few pieces of the puzzle).
irfanyounas commented 2014-07-01 02:51:32 +00:00 (Migrated from github.com)

Thanks, good work. I hope so we will have efficient solution soon.

Thanks, good work. I hope so we will have efficient solution soon.
SheetJSDev commented 2014-07-01 03:33:50 +00:00 (Migrated from github.com)

@irfanyounas just so we are on the same page, can you share the script you used?

@irfanyounas just so we are on the same page, can you share the script you used?
irfanyounas commented 2014-07-01 16:03:17 +00:00 (Migrated from github.com)

I am using the script give below:
src="//cdnjs.cloudflare.com/ajax/libs/xlsx/0.7.6-g/xlsx.core.min.js"

For creating xlsx file, I have followed the following example:
http://sheetjs.com/demos/writexlsx.html

I am using the script give below: src="//cdnjs.cloudflare.com/ajax/libs/xlsx/0.7.6-g/xlsx.core.min.js" For creating xlsx file, I have followed the following example: http://sheetjs.com/demos/writexlsx.html
SheetJSDev commented 2014-07-01 16:14:43 +00:00 (Migrated from github.com)

@irfanyounas how did you create a JS object with 65000 rows? Do you have numbers / text / dates? Are there missing cells (in the example, C2 is missing) or did you fill every cell?

@irfanyounas how did you create a JS object with 65000 rows? Do you have numbers / text / dates? Are there missing cells (in the example, C2 is missing) or did you fill every cell?
SheetJSDev commented 2014-07-01 16:21:59 +00:00 (Migrated from github.com)

@irfanyounas Here's an example writing numbers in a 3x100001 file: http://sheetjs.com/demos/100001.html

Dates are printed at 3 times:

  • just before starting to build the workbook
  • after the write
  • after the saveas call

On Chrome 35 it takes 4 seconds to process a much larger test case than what you showed.

@irfanyounas Here's an example writing numbers in a 3x100001 file: http://sheetjs.com/demos/100001.html Dates are printed at 3 times: - just before starting to build the workbook - after the write - after the saveas call On Chrome 35 it takes 4 seconds to process a much larger test case than what you showed.
irfanyounas commented 2014-07-01 16:32:49 +00:00 (Migrated from github.com)

Thanks, I have 3 'text' (string) columns. The second column is empty sometimes.
I have used the following script:

function writeToXLSX(data) {
    var ws_name = "AllMembers",
          wb,
          wbout;
    wb = new Workbook(), ws = sheet_from_array_of_json(data, ["address", "name", "status"]);
    wb.SheetNames.push(ws_name);

    wb.Sheets[ws_name] = ws;
    wbout = XLSX.write(wb, {bookType:'xlsx', bookSST:true, type: 'binary'});
    postMessage({t:"data", d:wbout});
}

function Workbook() {
    if(!(this instanceof Workbook)) return new Workbook();
    this.SheetNames = [];
    this.Sheets = {};
}

function stringToArrayBuffer(s) {
    var buf = new ArrayBuffer(s.length);
    var view = new Uint8Array(buf);
    for (var i=0; i!=s.length; ++i) view[i] = s.charCodeAt(i) & 0xFF;
    return buf;
}


function sheet_from_array_of_json(data, columnNames) {
    var ws = {}, jsonObj, cell, cell_ref;
    var range = {s: {c:0, r:0}, e: {c:0, r:0 }};
    for(var R = 0; R != data.length; ++R) {
                jsonObj = data[R];

                if(range.e.r < R) range.e.r = R;
                for(var C = 0; C != columnNames.length; ++C) {

                        if(range.e.c < C) range.e.c = C;

                        if (jsonObj.hasOwnProperty(columnNames[C])) {
                cell = {v: jsonObj[columnNames[C]]};
                        } else {
                continue;
                        }
            if(cell.v == null)  {
                            continue; 
                        }

            cell_ref = XLSX.utils.encode_cell({c:C,r:R});

            if(typeof cell.v === 'number') cell.t = 'n';
            else if(typeof cell.v === 'boolean') cell.t = 'b';
            else if(cell.v instanceof Date) {
                cell.t = 'n'; cell.z = XLSX.SSF._table[14];
                cell.v = datenum(cell.v);
            }
                    else cell.t = 's';

            ws[cell_ref] = cell;
        }
    }
    ws['!ref'] = XLSX.utils.encode_range(range);
    return ws;
}
Thanks, I have 3 'text' (string) columns. The second column is empty sometimes. I have used the following script: ``` function writeToXLSX(data) { var ws_name = "AllMembers", wb, wbout; wb = new Workbook(), ws = sheet_from_array_of_json(data, ["address", "name", "status"]); wb.SheetNames.push(ws_name); wb.Sheets[ws_name] = ws; wbout = XLSX.write(wb, {bookType:'xlsx', bookSST:true, type: 'binary'}); postMessage({t:"data", d:wbout}); } function Workbook() { if(!(this instanceof Workbook)) return new Workbook(); this.SheetNames = []; this.Sheets = {}; } function stringToArrayBuffer(s) { var buf = new ArrayBuffer(s.length); var view = new Uint8Array(buf); for (var i=0; i!=s.length; ++i) view[i] = s.charCodeAt(i) & 0xFF; return buf; } function sheet_from_array_of_json(data, columnNames) { var ws = {}, jsonObj, cell, cell_ref; var range = {s: {c:0, r:0}, e: {c:0, r:0 }}; for(var R = 0; R != data.length; ++R) { jsonObj = data[R]; if(range.e.r < R) range.e.r = R; for(var C = 0; C != columnNames.length; ++C) { if(range.e.c < C) range.e.c = C; if (jsonObj.hasOwnProperty(columnNames[C])) { cell = {v: jsonObj[columnNames[C]]}; } else { continue; } if(cell.v == null) { continue; } cell_ref = XLSX.utils.encode_cell({c:C,r:R}); if(typeof cell.v === 'number') cell.t = 'n'; else if(typeof cell.v === 'boolean') cell.t = 'b'; else if(cell.v instanceof Date) { cell.t = 'n'; cell.z = XLSX.SSF._table[14]; cell.v = datenum(cell.v); } else cell.t = 's'; ws[cell_ref] = cell; } } ws['!ref'] = XLSX.utils.encode_range(range); return ws; } ```
SheetJSDev commented 2014-07-01 16:45:04 +00:00 (Migrated from github.com)

@irfanyounas http://sheetjs.com/demos/write_num.html?n=1000000 is a more general demo (you can specify the number of rows via the n parameter). The runtime appears to be dominated by generating the source array-of-arrays (not from the process that converts it to a workbook, and not from the process that generates the XLSX file).

Here are the times I see using the aforementioned page:

Number of Rows Time
100K 4 sec
200K 8 sec
300K 15 sec
400K 18 sec
500K Chrome crashes ...
@irfanyounas http://sheetjs.com/demos/write_num.html?n=1000000 is a more general demo (you can specify the number of rows via the `n` parameter). The runtime appears to be dominated by generating the source array-of-arrays (not from the process that converts it to a workbook, and not from the process that generates the XLSX file). Here are the times I see using the aforementioned page: | Number of Rows | Time | | --- | --- | | 100K | 4 sec | | 200K | 8 sec | | 300K | 15 sec | | 400K | 18 sec | | 500K | Chrome crashes ... |
SheetJSDev commented 2014-07-01 16:48:20 +00:00 (Migrated from github.com)

@irfanyounas just saw your last message -- where are you measuring the times? Your code sample doesn't show it. Also, can you put up the main page (it looks like you shared only the worker logic)

@irfanyounas just saw your last message -- where are you measuring the times? Your code sample doesn't show it. Also, can you put up the main page (it looks like you shared only the worker logic)
irfanyounas commented 2014-07-01 17:31:33 +00:00 (Migrated from github.com)

Thanks, I am looking into it, may the problem is on my side. Thanks for your help

Thanks, I am looking into it, may the problem is on my side. Thanks for your help
SheetJSDev commented 2014-07-01 18:11:53 +00:00 (Migrated from github.com)

If you are using web workers, I recommend also timing the process in the main thread (avoid we workers altogether). FWIW I found in some cases that transferring data to and from the worker is significantly slower than just doing the work in the main thread.

If you are using web workers, I recommend also timing the process in the main thread (avoid we workers altogether). FWIW I found in some cases that transferring data to and from the worker is significantly slower than just doing the work in the main thread.
irfanyounas commented 2014-07-01 20:58:33 +00:00 (Migrated from github.com)

I have tested it using different kinds of data:

Scenario 1: I create a list of 20,000 records, where each records is a dict
as given: {"status": "active", "name": "", "address": "sms:+45705846525"}.
In this scenario all the records are same (copy of the dict shown above).

Scenario 2: The list of 20,000 records, where each recordsis a dict but
each record is different.

Now runtime for Senario 1 is 4 sec, while for Scenario 2, it is around 14
sec.

As per my understanding, May be zip module is taking long due to different
records (not same text) in Scenario 2.
Can you please try this scenario?

On Tue, Jul 1, 2014 at 2:11 PM, SheetJSDev notifications@github.com wrote:

If you are using web workers, I recommend also timing the process in the
main thread (avoid we workers altogether). FWIW I found in some cases that
transferring data to and from the worker is significantly slower than just
doing the work in the main thread.


Reply to this email directly or view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/77#issuecomment-47690723.

I have tested it using different kinds of data: Scenario 1: I create a list of 20,000 records, where each records is a dict as given: {"status": "active", "name": "", "address": "sms:+45705846525"}. In this scenario all the records are same (copy of the dict shown above). Scenario 2: The list of 20,000 records, where each recordsis a dict but each record is different. Now runtime for Senario 1 is 4 sec, while for Scenario 2, it is around 14 sec. As per my understanding, May be zip module is taking long due to different records (not same text) in Scenario 2. Can you please try this scenario? On Tue, Jul 1, 2014 at 2:11 PM, SheetJSDev notifications@github.com wrote: > If you are using web workers, I recommend also timing the process in the > main thread (avoid we workers altogether). FWIW I found in some cases that > transferring data to and from the worker is significantly slower than just > doing the work in the main thread. > > — > Reply to this email directly or view it on GitHub > https://github.com/SheetJS/js-xlsx/issues/77#issuecomment-47690723.
SheetJSDev commented 2014-07-01 21:23:58 +00:00 (Migrated from github.com)

Here's a good example: http://sheetjs.com/demos/write_text.html?n=8000. The length of the data in the B column increases as you go down. I do see a massive performance issue

Here's a good example: http://sheetjs.com/demos/write_text.html?n=8000. The length of the data in the B column increases as you go down. I do see a massive performance issue
irfanyounas commented 2014-07-01 21:34:34 +00:00 (Migrated from github.com)

Yes, though we can keep the length of the data in B column fixed, the performance issue will also be there if the data is different, e.g, in my case the data looks like 'sms:+45704055659', 'sms:+457683738392', 'sms:+457049821521' and so on.....

Yes, though we can keep the length of the data in B column fixed, the performance issue will also be there if the data is different, e.g, in my case the data looks like 'sms:+45704055659', 'sms:+457683738392', 'sms:+457049821521' and so on.....
SheetJSDev commented 2014-07-01 22:49:07 +00:00 (Migrated from github.com)

I did some more testing by adding some timing statements (simple console.log(new Date().getTime()) all over the place) and about 90% of the runtime is in the zip.generate operation, which suggests that it should be replaced.

I did some more testing by adding some timing statements (simple `console.log(new Date().getTime())` all over the place) and about 90% of the runtime is in the `zip.generate` operation, which suggests that it should be replaced.
irfanyounas commented 2014-07-01 22:54:52 +00:00 (Migrated from github.com)

Yes that's right, may be, we can try different zip modules and see the performance. Thanks

Yes that's right, may be, we can try different zip modules and see the performance. Thanks
Mithgol commented 2014-07-02 03:44:20 +00:00 (Migrated from github.com)

@SheetJSDev Not to divert you from the main topic, but still: it's easier to type console.time and console.timeEnd instead of explicit console.log(new Date().getTime()).

@SheetJSDev Not to divert you from the main topic, but still: it's easier to type [`console.time`](https://developer.mozilla.org/en-US/docs/Web/API/console.time) and [`console.timeEnd`](https://developer.mozilla.org/en-US/docs/Web/API/console.timeEnd) instead of explicit `console.log(new Date().getTime())`.
SheetJSDev commented 2014-07-02 03:56:30 +00:00 (Migrated from github.com)

@Mithgol this is really neat! I think the MDN article needs to be updated: https://developer.mozilla.org/en-US/docs/Web/API/console.time claims that IE11 supports it, but https://developer.mozilla.org/en-US/docs/Web/API/console.timeEnd has a ? for IE

@Mithgol this is really neat! I think the MDN article needs to be updated: https://developer.mozilla.org/en-US/docs/Web/API/console.time claims that IE11 supports it, but https://developer.mozilla.org/en-US/docs/Web/API/console.timeEnd has a `?` for IE
Mithgol commented 2014-07-02 05:38:23 +00:00 (Migrated from github.com)

MDN is a wiki. If you think that it needs to be updated, update it.

(I wound have updated it myself, but for some reason I'm on Windows XP currently and IE11 is not installable on such an old OS.)

MDN is a wiki. If you think that it needs to be updated, update it. (I wound have updated it myself, but for some reason I'm on Windows XP currently and IE11 is not installable on such an old OS.)
SheetJSDev commented 2014-07-02 05:50:01 +00:00 (Migrated from github.com)

@Mithgol updated :) IE11 actually shows the elapsed time down to the 100-nsec level (milliseconds + 4 decimal places)

@Mithgol updated :) IE11 actually shows the elapsed time down to the 100-nsec level (milliseconds + 4 decimal places)
sandroboehme commented 2014-07-02 18:39:41 +00:00 (Migrated from github.com)

When I export a big amount of rows in Firefox I get a dialog saying something like "A script is broken or doesn't respond. [Debug Script] [Stop Script] [Continue Script]"

No matter how fast the export will be, by exporting even more data it will always be possible to get this message.

If Firefox gets some time slices for processing by adding window.setTimeout(function(){...},milliseconds); it should solve the isse. I know that it adds time to the export. But as a user I understand that it is a long running task to export this big amount of data. If I provide the user with a message saying that an export is in progress he knows why the UI is blocked. I would be even better if it would be possible to cancel a running export.

When I export a big amount of rows in Firefox I get a dialog saying something like "A script is broken or doesn't respond. [Debug Script] [Stop Script] [Continue Script]" No matter how fast the export will be, by exporting even more data it will always be possible to get this message. If Firefox gets some time slices for processing by adding `window.setTimeout(function(){...},milliseconds);` it should solve the isse. I know that it adds time to the export. But as a user I understand that it is a long running task to export this big amount of data. If I provide the user with a message saying that an export is in progress he knows why the UI is blocked. I would be even better if it would be possible to cancel a running export.
SheetJSDev commented 2014-07-02 18:56:27 +00:00 (Migrated from github.com)

@sandroboehme you can perform the write process in a WebWorker (which wouldn't lock up the UI). For example, http://oss.sheetjs.com/js-xlsx/ (it's hosted from the gh-pages branch ) uses a web worker to read data.

The main reader demo (http://oss.sheetjs.com/, source https://github.com/SheetJS/SheetJS.github.io) shows a spinner when the read process may take a while)

@sandroboehme you can perform the write process in a WebWorker (which wouldn't lock up the UI). For example, http://oss.sheetjs.com/js-xlsx/ (it's hosted from the [gh-pages branch](https://github.com/SheetJS/js-xlsx/tree/gh-pages) ) uses a web worker to read data. The main reader demo (http://oss.sheetjs.com/, source https://github.com/SheetJS/SheetJS.github.io) shows a spinner when the read process may take a while)
irfanyounas commented 2014-07-02 21:08:52 +00:00 (Migrated from github.com)

Regarding XLSX write: I have debugged the code and found out that inside function write_ws_xml_cell(), the following line is creating the performance issue:

v = writetag('v', ''+get_sst_id(opts.Strings, cell.v));

Specifically the following function : get_sst_id(opts.Strings, cell.v)

Regarding XLSX write: I have debugged the code and found out that inside function write_ws_xml_cell(), the following line is creating the performance issue: v = writetag('v', ''+get_sst_id(opts.Strings, cell.v)); Specifically the following function : get_sst_id(opts.Strings, cell.v)
SheetJSDev commented 2014-07-02 21:43:01 +00:00 (Migrated from github.com)

@irfanyounas That function performs a scan in the shared string table. That behavior can be disabled by setting the option bookSST to false.

I agree that a linear scan is horrible here, but from my testing, it contributes less than 5% to the runtime at 20K rows. It does, however, explain the superlinear runtime

@irfanyounas That function [performs a scan in the shared string table](https://github.com/SheetJS/js-xlsx/blob/master/bits/66_wscommon.js#L6). That behavior can be disabled by [setting the option `bookSST` to false](https://github.com/SheetJS/js-xlsx#writing-options). I agree that a linear scan is horrible here, but from my testing, it contributes less than 5% to the runtime at 20K rows. It does, however, explain the superlinear runtime
irfanyounas commented 2014-07-02 21:52:07 +00:00 (Migrated from github.com)

If the text data is same in all rows then its true that it will be fast, but if you have unique data in each row then the performance will be poor.
For example you can test it by creating data using the following code:
var data1 = [];

while(data1.length < 20000) {
var randomnumber = Math.floor(Math.random() * (799999999 - 700000000 + 1)) + 700000000;
var sms = "sms:+45"+randomnumber;
data1[data1.length] = ["active", "abc", sms];
}

If the text data is same in all rows then its true that it will be fast, but if you have unique data in each row then the performance will be poor. For example you can test it by creating data using the following code: var data1 = []; while(data1.length < 20000) { var randomnumber = Math.floor(Math.random() \* (799999999 - 700000000 + 1)) + 700000000; var sms = "sms:+45"+randomnumber; data1[data1.length] = ["active", "abc", sms]; }
irfanyounas commented 2014-07-02 22:06:14 +00:00 (Migrated from github.com)

Thanks. I was wondering what if we don't perform a scan in the shared string table? Does it effect the compressed file size?

Thanks. I was wondering what if we don't perform a scan in the shared string table? Does it effect the compressed file size?
SheetJSDev commented 2014-07-02 23:09:59 +00:00 (Migrated from github.com)

@irfanyounas The original problem is that some readers (notably iOS Numbers) had issues with inline strings. The SST was the only way for strings to be displayed. Testing it now, it appears that the newest version of Numbers does support inline strings, so it might make sense to recommend inline strings.

Adding the shared string table is larger for files that don't have repeated elements (for example, with tables that have text labels and numeric bodies) but smaller for files with lots of repeated strings (for example, a table where one column describes a category).

The mystery here (which explains the difference in our understanding) is that as the size of the data increases, the zip time (as a percentage of the total write process) gets progressively worse. Basically, there are two effects: the effect of the linear scan and the effect of having to write a new file in the zip. To see this, add a console.time("zip") just before the switch statement in write_zip and a `console.timeEnd("zip") in your script. I generate data this way:

var w = 2048; // <-- width of each string
var MAXLEN = 10000; // <-- number of rows

var o = new Array(MAXLEN + w);
for(var i = 0; i != MAXLEN + w; ++i) o[i] = String.fromCharCode(i); // safe for values before 0xd000 = 53248
var data = new Array(MAXLEN);
for(var dd = 0; dd != MAXLEN; ++dd) data[dd] = ["foo", o.slice(dd, dd+w).join(""), dd+2];

This ensures v8 doesn't perform some magic and ensures we have distinct strings.

For small w, the zip operations take less than half the total runtime, but as w is increased the zip operations almost completely dominate the runtime (in both cases). According to the v8 profile (you can get this in node by running with the --prof flag and then using the tick processor), the function that takes the most time is the CRC-32 function.

@irfanyounas The original problem is that some readers (notably iOS Numbers) had issues with inline strings. The SST was the only way for strings to be displayed. Testing it now, it appears that the newest version of Numbers does support inline strings, so it might make sense to recommend inline strings. Adding the shared string table is larger for files that don't have repeated elements (for example, with tables that have text labels and numeric bodies) but smaller for files with lots of repeated strings (for example, a table where one column describes a category). The mystery here (which explains the difference in our understanding) is that as the size of the data increases, the zip time (as a percentage of the total write process) gets progressively worse. Basically, there are two effects: the effect of the linear scan and the effect of having to write a new file in the zip. To see this, add a `console.time("zip")` just before the switch statement in write_zip and a `console.timeEnd("zip") in your script. I generate data this way: ``` var w = 2048; // <-- width of each string var MAXLEN = 10000; // <-- number of rows var o = new Array(MAXLEN + w); for(var i = 0; i != MAXLEN + w; ++i) o[i] = String.fromCharCode(i); // safe for values before 0xd000 = 53248 var data = new Array(MAXLEN); for(var dd = 0; dd != MAXLEN; ++dd) data[dd] = ["foo", o.slice(dd, dd+w).join(""), dd+2]; ``` This ensures v8 doesn't perform some magic and ensures we have distinct strings. For small `w`, the zip operations take less than half the total runtime, but as w is increased the zip operations almost completely dominate the runtime (in both cases). According to the v8 profile (you can get this in node by running with the `--prof` flag and then using the tick processor), the function that takes the most time is the CRC-32 function.
irfanyounas commented 2014-07-03 16:18:45 +00:00 (Migrated from github.com)

Thanks, that's right. You are doing really good work.

Thanks, that's right. You are doing really good work.
turbobuilt commented 2015-07-12 19:49:05 +00:00 (Migrated from github.com)

I'm having this issue too. Is there any form of the xls spec that doesn't require zipping? That might be a workaround till the zip issue is taken care of.

I'm having this issue too. Is there any form of the xls spec that doesn't require zipping? That might be a workaround till the zip issue is taken care of.
turbobuilt commented 2016-01-05 15:05:34 +00:00 (Migrated from github.com)

The jszip api offers the ability to not compress the zip file. Would that help?

https://stuk.github.io/jszip/documentation/api_jszip/generate.html

The jszip api offers the ability to not compress the zip file. Would that help? https://stuk.github.io/jszip/documentation/api_jszip/generate.html
kumarr10 commented 2022-02-09 23:31:29 +00:00 (Migrated from github.com)

31 Columns and 300K rows from API , this json_to_sheet/ writeFile breaks. please see if this can addressed.

31 Columns and 300K rows from API , this json_to_sheet/ writeFile breaks. please see if this can addressed.
SheetJSDev commented 2022-03-03 10:11:30 +00:00 (Migrated from github.com)

@kumarr10 live demo https://jsfiddle.net/xgr4sbk1/

var NR = 300000, NC = 31;
console.time("prep");
var ws = XLSX.utils.aoa_to_sheet(Array.from({length:NR}, (_,i) => Array.from({length: NC}, (_,j) => j == 0 ? `Row${i+1}` : i+j)), {dense: true});
var wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
console.timeEnd("prep");
console.time("write");
XLSX.writeFile(wb, "out.xlsx");
console.timeEnd("write");

Using chrome 0.18.3 on a 2018 macbook pro it takes ~13 seconds to generate 300K x 31 fully populated worksheet. You can play with the NR / NC variables. The new limit is the string length cap when generating the worksheet xml.

@kumarr10 live demo https://jsfiddle.net/xgr4sbk1/ ```js var NR = 300000, NC = 31; console.time("prep"); var ws = XLSX.utils.aoa_to_sheet(Array.from({length:NR}, (_,i) => Array.from({length: NC}, (_,j) => j == 0 ? `Row${i+1}` : i+j)), {dense: true}); var wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Sheet1"); console.timeEnd("prep"); console.time("write"); XLSX.writeFile(wb, "out.xlsx"); console.timeEnd("write"); ``` Using chrome 0.18.3 on a 2018 macbook pro it takes ~13 seconds to generate 300K x 31 fully populated worksheet. You can play with the NR / NC variables. The new limit is the string length cap when generating the worksheet xml.
anandmuthu1978 commented 2022-04-16 13:03:04 +00:00 (Migrated from github.com)

Hi All,

I have struggle to export large rows of data for xlsx export. The data is 100 columns X 100K rows for JSON data. Browser either goes into freeze mode or you will get array length error hitting the string limitation for V8 in chrome.

Following is how I resolve this, so hopefully it will shed some light to some who are having similar issues.

  1. First of all update the version used 0.18.5 includes some performance string conversion.
  2. I used json_to_sheet initially to convert to a worksheet. This always fail for large data most likely hitting a limitation in chrome or v8. Unable to solve this. No errors was raised in chrome and browser tab goes into infinite loop. This was resolved by changing the JSON object mapping to direct array mapping.
    3 Example. JSON data = {
    {a:111 , b: 333, c: 444}
    {a:111 , b: 333, c: 444}
    }
    Convert to data = [
    [111,333,444] ,
    [111,333,444]
    ]
  3. Hence converted from json to arrray of arrays. Using aoa_to_sheet worked great. Use the option dense : true and raw : true. 100K X 100 columns array changed to worksheet in like 10s.
  4. The next problem was writeFile, which raises range error in chrome. Googling this tells me the we hit the string limit in chrome. In writeFile, the whole workbook is converted to string before downloading using a string join.
  5. I manage to successfully download by enabling the following options.
    type : binary
    bookSST: true
    compression: true
  6. The most important thing here is the bookSST. Basically, this is like indexing of repeating data in the cells. One of the reason we are hitting the string error is the xml set blank cells to XML preserve = space string in each of the cell. Hence, empty cells now has an xml string of like 20 to 30 characters associated to it. By using bookSST, this will be index and just a reference index key will be populated. This will reduce the total string length.
  7. compression will reduce the file size.
  8. Finally manage to download the file.

These are all based on my understanding of the code and various help from the developers when I raised issues to them. They been a great help. Hopefully this will help others encountering the same problem.

Thanks.
Anand Muthu

Hi All, I have struggle to export large rows of data for xlsx export. The data is 100 columns X 100K rows for JSON data. Browser either goes into freeze mode or you will get array length error hitting the string limitation for V8 in chrome. Following is how I resolve this, so hopefully it will shed some light to some who are having similar issues. 1. First of all update the version used 0.18.5 includes some performance string conversion. 2. I used json_to_sheet initially to convert to a worksheet. This always fail for large data most likely hitting a limitation in chrome or v8. Unable to solve this. No errors was raised in chrome and browser tab goes into infinite loop. This was resolved by changing the JSON object mapping to direct array mapping. 3 Example. JSON data = { {a:111 , b: 333, c: 444} {a:111 , b: 333, c: 444} } Convert to data = [ [111,333,444] , [111,333,444] ] 4. Hence converted from json to arrray of arrays. Using aoa_to_sheet worked great. Use the option dense : true and raw : true. 100K X 100 columns array changed to worksheet in like 10s. 5. The next problem was writeFile, which raises range error in chrome. Googling this tells me the we hit the string limit in chrome. In writeFile, the whole workbook is converted to string before downloading using a string join. 6. I manage to successfully download by enabling the following options. type : binary bookSST: true compression: true 8. The most important thing here is the bookSST. Basically, this is like indexing of repeating data in the cells. One of the reason we are hitting the string error is the xml set blank cells to XML preserve = space string in each of the cell. Hence, empty cells now has an xml string of like 20 to 30 characters associated to it. By using bookSST, this will be index and just a reference index key will be populated. This will reduce the total string length. 9. compression will reduce the file size. 10. Finally manage to download the file. These are all based on my understanding of the code and various help from the developers when I raised issues to them. They been a great help. Hopefully this will help others encountering the same problem. Thanks. Anand Muthu
dtslvr commented 2022-04-20 11:22:38 +00:00 (Migrated from github.com)

"bookSST": true,
"compression": true

Thanks for your explanations @anandmuthu1978!

> `"bookSST": true,` > `"compression": true` Thanks for your explanations @anandmuthu1978!
SheetJSDev commented 2022-04-20 16:26:39 +00:00 (Migrated from github.com)

Thanks for sharing @anandmuthu1978 ! bookSST: true is helpful for files containing a large number of repeated strings (text labels for each row, for example) and compression: true obviously reduces the file size, but the tradeoff is increased runtime. Using type: "array" generates a Uint8Array rather than a string (which should be more efficient)

Thanks for sharing @anandmuthu1978 ! `bookSST: true` is helpful for files containing a large number of repeated strings (text labels for each row, for example) and `compression: true` obviously reduces the file size, but the tradeoff is increased runtime. Using `type: "array"` generates a `Uint8Array` rather than a string (which should be more efficient)
goproclube commented 2022-05-10 14:52:43 +00:00 (Migrated from github.com)

@anandmuthu1978 hello

Could you give me an example how you managed to solve to write a large number of lines?

With my code, I can save up to 60k lines. With 70k I couldn't understand why it saves only 15 lines, and it doesn't show any error.

try {

  const ws = xlsx.utils.json_to_sheet(data)
  const wb = xlsx.utils.book_new()

  xlsx.utils.book_append_sheet(wb, ws)

  await xlsx.writeFileSync(wb, fileName)

} catch (e) {

  console.log(e)
}

thks

@anandmuthu1978 hello Could you give me an example how you managed to solve to write a large number of lines? With my code, I can save up to 60k lines. With 70k I couldn't understand why it saves only 15 lines, and it doesn't show any error. ```js try { const ws = xlsx.utils.json_to_sheet(data) const wb = xlsx.utils.book_new() xlsx.utils.book_append_sheet(wb, ws) await xlsx.writeFileSync(wb, fileName) } catch (e) { console.log(e) } ``` thks
anandmuthu1978 commented 2022-05-15 11:25:12 +00:00 (Migrated from github.com)

@goproclube
hi.
In my case i converted the data from json to array and used aoa_to_sheet(data).
Most likely in your case 70k is already hitting the memory limit. I noticed json_to_sheet takes a lot of memory so I converted it to array format. You will lose the data mapping association but as long you want all it doesnt really matter.
Example. JSON data = {
{a:111 , b: 333, c: 444}
{a:111 , b: 333, c: 444}
}
Convert to data = [
[111,333,444] ,
[111,333,444]
]

@goproclube hi. In my case i converted the data from json to array and used aoa_to_sheet(data). Most likely in your case 70k is already hitting the memory limit. I noticed json_to_sheet takes a lot of memory so I converted it to array format. You will lose the data mapping association but as long you want all it doesnt really matter. Example. JSON data = { {a:111 , b: 333, c: 444} {a:111 , b: 333, c: 444} } Convert to data = [ [111,333,444] , [111,333,444] ]
farideliyev commented 2022-06-17 10:17:58 +00:00 (Migrated from github.com)

@SheetJSDev is it possible, when writing big files to .xlsx, somehow append worksheets together?
I mean, imagine I have 700k rows, my CHUNK_SIZE = 350k, I create worksheet, append it to workbook and pass it to XLSX.write and get typed array back. On my next step I repeat the same steps, but now with another 350k piece of data. At the end I get 2 typed Arrays. My question is it possible to merge worksheets together in order to get them in one file. If so, we will be able to pass chunked data to xlsx and zipping manipulations will not take so long time.

@SheetJSDev is it possible, when writing big files to .xlsx, somehow append worksheets together? I mean, imagine I have 700k rows, my CHUNK_SIZE = 350k, I create worksheet, append it to workbook and pass it to XLSX.write and get typed array back. On my next step I repeat the same steps, but now with another 350k piece of data. At the end I get 2 typed Arrays. My question is it possible to merge worksheets together in order to get them in one file. If so, we will be able to pass chunked data to xlsx and zipping manipulations will not take so long time.
chy869 commented 2022-09-27 01:53:43 +00:00 (Migrated from github.com)

thx, anandmuthu1978

I face the performance issue using json_to_sheet

problem resolved by using aoa_to_sheet,

Sample code below for reference

  const wb = xlsx.utils.book_new();
  const outputData = records.map( Object.values );
  outputData.unshift(Object.keys(records[0]));
  const ws = xlsx.utils.aoa_to_sheet (outputData,{dense: true});
  xlsx.utils.book_append_sheet(wb, ws);
  xlsx.writeFile(wb, destination,{compression:true});

thx, anandmuthu1978 I face the performance issue using json_to_sheet problem resolved by using aoa_to_sheet, Sample code below for reference ``` const wb = xlsx.utils.book_new(); const outputData = records.map( Object.values ); outputData.unshift(Object.keys(records[0])); const ws = xlsx.utils.aoa_to_sheet (outputData,{dense: true}); xlsx.utils.book_append_sheet(wb, ws); xlsx.writeFile(wb, destination,{compression:true}); ```
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#77
No description provided.