Adding link data severely degrades write performance #2537

Closed
opened 2022-02-28 04:06:13 +00:00 by pqvst · 1 comment
pqvst commented 2022-02-28 04:06:13 +00:00 (Migrated from github.com)

If I add link data to cells, the time it takes to write the file seems to explode exponentially:

1000 rows: 75.151ms
10000 rows: 5.696s
20000 rows: 27.698s
30000 rows: 58.313s
50000 rows: cancelled because it took too long...

In comparison, generating cells without link data, takes only a fraction of the time:

1000 rows: 23.891ms
10000 rows: 66.103ms
20000 rows: 103.742ms
30000 rows: 132.128ms
50000 rows: 222.278ms

Test code:

rows = 10000;
var XLSX = require("xlsx");
var ws = XLSX.utils.aoa_to_sheet(Array.from({length:rows}, (_,i) => ([String(i), i])));
for (let i = 0; i < rows; i++) {
  cell = ws[`A${i+1}`];
  cell.l = { Target: cell.v, Tooltip: 'Link' }; // comment this line to remove link data
}
var wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
var tag = rows;
console.time(tag);
var out = XLSX.write(wb, {bookType: "xlsx", type: "buffer"});
console.timeEnd(tag);
If I add link data to cells, the time it takes to write the file seems to explode exponentially: ``` 1000 rows: 75.151ms 10000 rows: 5.696s 20000 rows: 27.698s 30000 rows: 58.313s 50000 rows: cancelled because it took too long... ``` In comparison, generating cells *without* link data, takes only a fraction of the time: ``` 1000 rows: 23.891ms 10000 rows: 66.103ms 20000 rows: 103.742ms 30000 rows: 132.128ms 50000 rows: 222.278ms ``` Test code: ``` rows = 10000; var XLSX = require("xlsx"); var ws = XLSX.utils.aoa_to_sheet(Array.from({length:rows}, (_,i) => ([String(i), i]))); for (let i = 0; i < rows; i++) { cell = ws[`A${i+1}`]; cell.l = { Target: cell.v, Tooltip: 'Link' }; // comment this line to remove link data } var wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Sheet1"); var tag = rows; console.time(tag); var out = XLSX.write(wb, {bookType: "xlsx", type: "buffer"}); console.timeEnd(tag); ```
SheetJSDev commented 2022-02-28 04:42:02 +00:00 (Migrated from github.com)

Feel free to send a PR, the modifications are to bits/31_rels.js:

@@ -61,7 +61,9 @@ var RELS_EXTERN = [RELS.HLINK, RELS.XPATH, RELS.XMISS];
 function add_rels(rels, rId/*:number*/, f, type, relobj, targetmode/*:?string*/)/*:number*/ {
        if(!relobj) relobj = {};
        if(!rels['!id']) rels['!id'] = {};
-       if(rId < 0) for(rId = 1; rels['!id']['rId' + rId]; ++rId){/* empty */}
+       if(!rels['!idx']) rels['!idx'] = 1;
+       if(rId < 0) for(rId = rels['!idx']; rels['!id']['rId' + rId]; ++rId){/* empty */}
+       rels['!idx'] = rId + 1;
        relobj.Id = 'rId' + rId;
        relobj.Type = type;
        relobj.Target = f;

To summarize: each link must be generated and assigned a relationship ID within the worksheet relationships file. Not anticipating a huge number of links, the code does a simple linear scan when a new link is requested. The patch just caches the next expected target ID. Please confirm the output is as expected and the performance is a lot more reasonable, then send a PR.

It's not using an array because theoretically a relationship id can start with something other than rId.

PS: there is a limit of 65530 links per worksheet that is not currently enforced -- should the library throw an error or do something if there are too many links?

Feel free to send a PR, the modifications are to `bits/31_rels.js`: ```diff @@ -61,7 +61,9 @@ var RELS_EXTERN = [RELS.HLINK, RELS.XPATH, RELS.XMISS]; function add_rels(rels, rId/*:number*/, f, type, relobj, targetmode/*:?string*/)/*:number*/ { if(!relobj) relobj = {}; if(!rels['!id']) rels['!id'] = {}; - if(rId < 0) for(rId = 1; rels['!id']['rId' + rId]; ++rId){/* empty */} + if(!rels['!idx']) rels['!idx'] = 1; + if(rId < 0) for(rId = rels['!idx']; rels['!id']['rId' + rId]; ++rId){/* empty */} + rels['!idx'] = rId + 1; relobj.Id = 'rId' + rId; relobj.Type = type; relobj.Target = f; ``` To summarize: each link must be generated and assigned a relationship ID within the worksheet relationships file. Not anticipating a huge number of links, the code does a simple linear scan when a new link is requested. The patch just caches the next expected target ID. Please confirm the output is as expected and the performance is a lot more reasonable, then send a PR. It's not using an array because theoretically a relationship id can start with something other than `rId`. PS: there is a limit of 65530 links per worksheet that is not currently enforced -- should the library throw an error or do something if there are too many links?
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2537
No description provided.