Write performance issue & fix (rebuild_cfb) #12

Closed
opened 2021-09-03 20:56:52 +00:00 by rossj · 2 comments
rossj commented 2021-09-03 20:56:52 +00:00 (Migrated from github.com)

Hi there,

I discovered today some unusually slow operations when trying to create a rather large .msg file with roughly 63k data nodes. I tracked the slowness down to a nested loop in rebuild_cfb() that is ensuring that each stream node has a parent storage node.

I made some modifications to track the names in a in a JS object instead, and the rebuild_cfb() time dropped from about 1 minute to only 30 ms. I decided to use a plain object instead of a Set just to maintain maximum compatibility with old browsers.

Below are my changes with the added fullPaths object and the removed nested loop.

	// Used to track which names exist
	var fullPaths = Object.create(null);
	var data/*:Array<[string, CFBEntry]>*/ = [];
	for(i = 0; i < cfb.FullPaths.length; ++i) {
		fullPaths[cfb.FullPaths[i]] = true;
		if(cfb.FileIndex[i].type === 0) continue;
		data.push([cfb.FullPaths[i], cfb.FileIndex[i]]);
	}
	for(i = 0; i < data.length; ++i) {
		var dad = dirname(data[i][0]);
		s = fullPaths[dad];
		if(!s) {
			data.push([dad, ({
				name: filename(dad).replace("/",""),
				type: 1,
				clsid: HEADER_CLSID,
				ct: now, mt: now,
				content: null
			}/*:any*/)]);
			// Add name to set
			fullPaths[dad] = true;
		}
	}
Hi there, I discovered today some unusually slow operations when trying to create a rather large .msg file with roughly 63k data nodes. I tracked the slowness down to a [nested loop in rebuild_cfb()](https://github.com/SheetJS/js-cfb/blob/master/cfb.flow.js#L767) that is ensuring that each stream node has a parent storage node. I made some modifications to track the names in a in a JS object instead, and the `rebuild_cfb()` time dropped from about 1 minute to only 30 ms. I decided to use a plain object instead of a `Set` just to maintain maximum compatibility with old browsers. Below are my changes with the added `fullPaths` object and the removed nested loop. ```javascript // Used to track which names exist var fullPaths = Object.create(null); var data/*:Array<[string, CFBEntry]>*/ = []; for(i = 0; i < cfb.FullPaths.length; ++i) { fullPaths[cfb.FullPaths[i]] = true; if(cfb.FileIndex[i].type === 0) continue; data.push([cfb.FullPaths[i], cfb.FileIndex[i]]); } for(i = 0; i < data.length; ++i) { var dad = dirname(data[i][0]); s = fullPaths[dad]; if(!s) { data.push([dad, ({ name: filename(dad).replace("/",""), type: 1, clsid: HEADER_CLSID, ct: now, mt: now, content: null }/*:any*/)]); // Add name to set fullPaths[dad] = true; } } ```
SheetJSDev commented 2021-09-03 21:55:52 +00:00 (Migrated from github.com)

Thanks for looking into this! Feel free to send a PR. One slight nit: we can't assume Object.create is available so maybe something like Object.create ? Object.create(null) : {} would play nice with IE8 and below

Thanks for looking into this! Feel free to send a PR. One slight nit: we can't assume Object.create is available so maybe something like `Object.create ? Object.create(null) : {}` would play nice with IE8 and below
rossj commented 2021-09-03 22:35:16 +00:00 (Migrated from github.com)

Hmm good point. I was going to say that the <= IE8 version with the plain object then wouldn't work if the root folder happened to be named "__proto__" or "constructor", but I think the dad dirname will always end in a slash anyway (and probably start with "Root Entry/" for that matter).

I'll work on a PR with your suggestion.

Hmm good point. I was going to say that the <= IE8 version with the plain object then wouldn't work if the root folder happened to be named "\_\_proto__" or "constructor", but I think the `dad` dirname will always end in a slash anyway (and probably start with "Root Entry/" for that matter). I'll work on a PR with your suggestion.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/js-cfb#12
No description provided.