Only first sheet is returning when working with Huge data sets #563

Closed
opened 2017-02-18 05:33:36 +00:00 by gtd57 · 15 comments
gtd57 commented 2017-02-18 05:33:36 +00:00 (Migrated from github.com)

While working or bigger data sets( tested with 2000 cols X 1000 rows) with multiple sheets. In that case, after certain MB, (in my case 40MB), sheet_to_row_object_array is only returning first sheet.

Is it a limitation? Are there any alternatives to get all the sheets regardless of files size?

While working or bigger data sets( tested with 2000 cols X 1000 rows) with multiple sheets. In that case, after certain MB, (in my case 40MB), `sheet_to_row_object_array` is only returning first sheet. Is it a limitation? Are there any alternatives to get all the sheets regardless of files size?
reviewher commented 2017-02-18 05:50:05 +00:00 (Migrated from github.com)

sheet_to_row_object_array, as the name suggests, takes a single worksheet as input. If you want to convert the second or third worksheet in a workbook you need to read the workbook first, then get the sheet name you want from the SheetNames array, then pull from the Sheets object. For example in node, if you want to convert the third worksheet:

var XLSX = require('xlsx')
var wb = XLSX.readFile('test.xlsx');
var name_of_third_worksheet = wb.SheetNames[3-1]; // SheetNames is an array of strings
var ws = wb.Sheets[name_of_third_worksheet];
var output = XLSX.utils.sheet_to_row_object_array(ws);

The browser demo iterates across the sheet names in a similar manner:

	workbook.SheetNames.forEach(function(sheetName) {
		var roa = X.utils.sheet_to_row_object_array(workbook.Sheets[sheetName]);
		if(roa.length > 0){
			result[sheetName] = roa;
		}
	});

If you have an example file where you can read one worksheet but not another, then there's probably a parsing issue with that specific sheet. If you could send us a sample, we'd be glad to take a look. Alternatively, if you can enable dev mode (set the option WTF:1 in the read or readFile options object) you may see a concrete error message.

`sheet_to_row_object_array`, as the name suggests, takes a single worksheet as input. If you want to convert the second or third worksheet in a workbook you need to read the workbook first, then get the sheet name you want from the SheetNames array, then pull from the Sheets object. For example in node, if you want to convert the third worksheet: ```js var XLSX = require('xlsx') var wb = XLSX.readFile('test.xlsx'); var name_of_third_worksheet = wb.SheetNames[3-1]; // SheetNames is an array of strings var ws = wb.Sheets[name_of_third_worksheet]; var output = XLSX.utils.sheet_to_row_object_array(ws); ``` The browser demo [iterates across the sheet names in a similar manner](https://github.com/SheetJS/js-xlsx/blob/gh-pages/index.html#L152-L157): ```js workbook.SheetNames.forEach(function(sheetName) { var roa = X.utils.sheet_to_row_object_array(workbook.Sheets[sheetName]); if(roa.length > 0){ result[sheetName] = roa; } }); ``` If you have an example file where you can read one worksheet but not another, then there's probably a parsing issue with that specific sheet. If you could send us a sample, we'd be glad to take a look. Alternatively, if you can enable dev mode (set the option `WTF:1` in the read or readFile options object) you may see a concrete error message.
gtd57 commented 2017-02-18 11:57:47 +00:00 (Migrated from github.com)

@reviewher Thanks for the quick reply.

continue with my above reply... Actually, the problem is with. let excelData = XLSX.read("<some file>", {type: 'binary'});, then calling excelData.Sheets not returning the object of all sheets, its only returning object with one key which is first sheet name and its data.

Just to mention there are no errors in the sheet. And there is no special data to share with. Just dummy set of columns and rows are fine to test.

One last thing. While doing the testing js-xlsx is not returning any errors.(I will try with WTF:1 option set, and share the details)

@reviewher Thanks for the quick reply. continue with my above reply... Actually, the problem is with. `let excelData = XLSX.read("<some file>", {type: 'binary'});`, then calling `excelData.Sheets` not returning the object of all sheets, its only returning object with one key which is first sheet name and its data. Just to mention there are no errors in the sheet. And there is no special data to share with. Just dummy set of columns and rows are fine to test. One last thing. While doing the testing `js-xlsx` is not returning any errors.(I will try with WTF:1 option set, and share the details)
gtd57 commented 2017-02-22 13:22:58 +00:00 (Migrated from github.com)

Hi @reviewher, any alternative for this? any help is appreciated.

Hi @reviewher, any alternative for this? any help is appreciated.
reviewher commented 2017-02-22 17:51:46 +00:00 (Migrated from github.com)

@gtd57 what does excelData.SheetNames show?

If you can share a file then we can take a look. If you don't want to upload to github you can email a file to dev at sheetjs dot com and someone can take a look

@gtd57 what does `excelData.SheetNames` show? If you can share a file then we can take a look. If you don't want to upload to github you can email a file to dev at sheetjs dot com and someone can take a look
gtd57 commented 2017-02-24 09:52:50 +00:00 (Migrated from github.com)

As per SheetJS/js-xlsx API, excelData.SheetNames must contain sheet name as key for each sheet and it data as value, However in my case it is only showing one key and value of first sheet data only.

I can send you the data, but its is huge file above 50MB. However, that sheet does not contain any special data, You can create data on your side, I just popped some, one word strings in the first column and first row and rest are just numbers and in total I have used 2000 cols in 1200 rows and total of 3 sheets. My requirement is upto 8 sheets

As per SheetJS/js-xlsx API, `excelData.SheetNames` must contain sheet name as key for each sheet and it data as value, However in my case it is only showing one key and value of first sheet data only. I can send you the data, but its is huge file above 50MB. However, that sheet does not contain any special data, You can create data on your side, I just popped some, one word strings in the first column and first row and rest are just numbers and in total I have used 2000 cols in 1200 rows and total of 3 sheets. My requirement is upto 8 sheets
SheetJSDev commented 2017-02-24 11:07:23 +00:00 (Migrated from github.com)

Hello @gtd57 ! Just for you I ran a test: https://gist.github.com/SheetJSDev/205d9d9c14c5203e602da70ed57e0a43

It generates an XLSX workbook with 8 sheets. Each sheet happens to be the same but it doesn't affect the nature of the issue. The worksheet is 2000 columns and 1000 rows. You can run it to generate and to verify.

On a side note: the files are massive! without compression the generated file was 475MB. With compression it was a more manageable 98M, both well beyond the 50M limit.

I tried using the node command line tool to read the data. Turns out you hit a memory snag with the default settings, so you need to override:

$ node --max-old-space-size=4096 bin/xlsx.njs sheetjs.xlsx SheetJSA

After waiting a minute and 15 seconds, finally saw the CSV output!

So if it's only showing you one sheet, I'm guessing there's an issue with the other sheets. I'd like to take a look. If you can post the file to a hosting service like Dropbox, you can email me a link and I can download and take a look.

Hello @gtd57 ! Just for you I ran a test: https://gist.github.com/SheetJSDev/205d9d9c14c5203e602da70ed57e0a43 It generates an XLSX workbook with 8 sheets. Each sheet happens to be the same but it doesn't affect the nature of the issue. The worksheet is 2000 columns and 1000 rows. You can run it to generate and to verify. On a side note: the files are massive! without compression the generated file was 475MB. With compression it was a more manageable 98M, both well beyond the 50M limit. I tried using the node command line tool to read the data. Turns out you hit a memory snag with the default settings, so you need to override: ```bash $ node --max-old-space-size=4096 bin/xlsx.njs sheetjs.xlsx SheetJSA ``` After waiting a minute and 15 seconds, finally saw the CSV output! So if it's only showing you one sheet, I'm guessing there's an issue with the other sheets. I'd like to take a look. If you can post the file to a hosting service like Dropbox, you can email me a link and I can download and take a look.
mtharrison commented 2017-02-26 09:15:01 +00:00 (Migrated from github.com)

I'm getting the same or similar issue. I have a workbook with 7 sheets. SheetNames contains all 7 sheetnames but one of the sheets is missing altogether from Sheets. The missing sheet has 382,000 rows. Do you think I'm hitting some kind of limit in xlsx? Do you have any suggested workaround or if it sounds like I bug can I help to debug?

I'm getting the same or similar issue. I have a workbook with 7 sheets. `SheetNames` contains all 7 sheetnames but one of the sheets is missing altogether from `Sheets`. The missing sheet has 382,000 rows. Do you think I'm hitting some kind of limit in xlsx? Do you have any suggested workaround or if it sounds like I bug can I help to debug?
SheetJSDev commented 2017-02-26 09:24:28 +00:00 (Migrated from github.com)

@mtharrison The limit is dictated by memory, and honestly you would be able to tell in either case when you hit the memory limit (chrome shows "oh snap" and node gives you a long message about GC). Assuming you don't see that, there's something in the file causing trouble.

The simplest thing you could do is pass the option WTF:1 to the reader:

  • If you are using node, and if you installed xlsx globally, you can just run xlsx --dev --perf file_name the --dev enables all errors and the --perf tells the xlsx command not to output anything

  • If you are using XLSX.read, the second argument is the options argument, so just add WTF:1 there. For example, in the browser using the index.html demo in the repo, add that key to the object {type: 'binary'} and {type: 'base64'}

  • If you are using XLSX.readFile, just pass a second argument with {WTF:1} or add that key to the file. For example, the example in the README https://github.com/SheetJS/js-xlsx/#parsing-workbooks would be var workbook = XLSX.readFile('test.xlsx', {WTF:1});

That option will expose any sheet-specific errors. You should get an error message that you can share.

I'd like to see a file if you can share it. Since it probably would exceed the github or email limit, please post somewhere (like dropbox) and email or share a link.

@mtharrison The limit is dictated by memory, and honestly you would be able to tell in either case when you hit the memory limit (chrome shows "oh snap" and node gives you a long message about GC). Assuming you don't see that, there's something in the file causing trouble. The simplest thing you could do is pass the option `WTF:1` to the reader: - If you are using node, and if you installed xlsx globally, you can just run `xlsx --dev --perf file_name` the --dev enables all errors and the --perf tells the xlsx command not to output anything - If you are using XLSX.read, the second argument is the options argument, so just add `WTF:1` there. For example, in the browser using the `index.html` demo in the repo, add that key to the object `{type: 'binary'}` and `{type: 'base64'}` - If you are using XLSX.readFile, just pass a second argument with `{WTF:1}` or add that key to the file. For example, the example in the README https://github.com/SheetJS/js-xlsx/#parsing-workbooks would be `var workbook = XLSX.readFile('test.xlsx', {WTF:1});` That option will expose any sheet-specific errors. You should get an error message that you can share. I'd like to see a file if you can share it. Since it probably would exceed the github or email limit, please post somewhere (like dropbox) and email or share a link.
mtharrison commented 2017-02-26 09:52:05 +00:00 (Migrated from github.com)

Thanks @SheetJSDev for the super-quick response. I tried with the debug option and the error I get is:

Error: Uncaught error: "toString()" failed
    at Buffer.toString (buffer.js:495:11)
    at getdatastr (/REDACTED/node_modules/xlsx/xlsx.js:1318:72)
    at getdata (/REDACTED/node_modules/xlsx/xlsx.js:1336:95)
    at getzipdata (/REDACTED/node_modules/xlsx/xlsx.js:1352:19)
    at safe_parse_ws (/REDACTED/node_modules/xlsx/xlsx.js:12165:26)
    at parse_zip (/REDACTED/node_modules/xlsx/xlsx.js:12268:3)
    at read_zip (/REDACTED/node_modules/xlsx/xlsx.js:12408:9)
    at Object.readSync [as read] (/REDACTED/node_modules/xlsx/xlsx.js:12420:21)

Looks like I'm probably hitting V8's max string size. I logged the buffer size out before toString() is called on it and it's 356mb.

Thanks @SheetJSDev for the super-quick response. I tried with the debug option and the error I get is: ``` Error: Uncaught error: "toString()" failed at Buffer.toString (buffer.js:495:11) at getdatastr (/REDACTED/node_modules/xlsx/xlsx.js:1318:72) at getdata (/REDACTED/node_modules/xlsx/xlsx.js:1336:95) at getzipdata (/REDACTED/node_modules/xlsx/xlsx.js:1352:19) at safe_parse_ws (/REDACTED/node_modules/xlsx/xlsx.js:12165:26) at parse_zip (/REDACTED/node_modules/xlsx/xlsx.js:12268:3) at read_zip (/REDACTED/node_modules/xlsx/xlsx.js:12408:9) at Object.readSync [as read] (/REDACTED/node_modules/xlsx/xlsx.js:12420:21) ``` Looks like I'm probably hitting V8's max string size. I logged the buffer size out before `toString()` is called on it and it's 356mb.
SheetJSDev commented 2017-02-26 10:00:54 +00:00 (Migrated from github.com)

@mtharrison you caught me in a late night session!

If you are running in a recent version of node, can you try calling node with --max-old-space-size=4096? To do this against the global install, run

$ node --max-old-space-size=4096 `which xlsx` filename

If you are running something like node main.js normally, just add the flag:

$ node --max-old-space-size=4096 your_script
@mtharrison you caught me in a late night session! If you are running in a recent version of node, can you try calling node with `--max-old-space-size=4096`? To do this against the global install, run ```bin $ node --max-old-space-size=4096 `which xlsx` filename ``` If you are running something like `node main.js` normally, just add the flag: ``` $ node --max-old-space-size=4096 your_script ```
SheetJSDev commented 2017-02-26 10:10:08 +00:00 (Migrated from github.com)

@mtharrison scratch that.

The nodejs string limit is 16 bytes shy of 256MB: https://github.com/nodejs/node/issues/3175 . I thought the previous write test case would cover it, but the underlying xml files are 60M.

I'm going to resolve this in the next version by trying to catch that particular error and give more informative output.

Can you do one last thing: try saving that file as XLSB in Excel and seeing if you hit the same problem? The XLSB parser keeps data as buffers so it theoretically should not trigger a 256MB string conversion

@mtharrison scratch that. The nodejs string limit is 16 bytes shy of 256MB: https://github.com/nodejs/node/issues/3175 . I thought the previous write test case would cover it, but the underlying xml files are 60M. I'm going to resolve this in the next version by trying to catch that particular error and give more informative output. Can you do one last thing: try saving that file as XLSB in Excel and seeing if you hit the same problem? The XLSB parser keeps data as buffers so it theoretically should not trigger a 256MB string conversion
mtharrison commented 2017-02-26 13:54:19 +00:00 (Migrated from github.com)

@SheetJSDev Using an XLSB does indeed sidestep that issue! Thanks a lot for the help.

@SheetJSDev Using an XLSB does indeed sidestep that issue! Thanks a lot for the help.
Achuta-94 commented 2019-12-17 09:43:51 +00:00 (Migrated from github.com)

ERROR in src/app/app.component.ts(50,20): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(59,27): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(60,15): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(63,25): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(98,20): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(109,27): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(113,26): error TS2339: Property 'rows' does not exist on type 'HTMLElement'.
src/app/app.component.ts(172,31): error TS2339: Property 'sheet_to_row_object_array' does not exist on type 'XLSX$Utils'.
src/app/app.component.ts(186,36): error TS2339: Property 'files' does not exist on type 'HTMLElement'.
please help...

ERROR in src/app/app.component.ts(50,20): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(59,27): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(60,15): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(63,25): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(98,20): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(109,27): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(113,26): error TS2339: Property 'rows' does not exist on type 'HTMLElement'. src/app/app.component.ts(172,31): error TS2339: Property 'sheet_to_row_object_array' does not exist on type 'XLSX$Utils'. src/app/app.component.ts(186,36): error TS2339: Property 'files' does not exist on type 'HTMLElement'. please help...
Achuta-94 commented 2019-12-17 09:45:51 +00:00 (Migrated from github.com)

CODE:
submit()
{
var myTad = document.getElementById('rr');
var len = myTad.rows.length ;
var arr = [];
var values =[];
var key1 ="Task_NO";var key2 ="EFFORT_MONTH";var key3 ="ROLE";var key4 ="LOC"; var key5 ="ESTIMATED_HOURS"; var key6 ="CNTY";
var key7 ="EMPLOYEE_LEVEL";var key8 ="AUTHENTED_HOURS";var key9 ="ESTIMATED_COST"; var key10 ="ACTUAL_COST";
console.log("Row_length:" +len);

		for( var z =0; z < len ; z++ )
		{ 
			for( var y = 0; myTad.rows[z].cells[y]; y++){
				if(myTad.rows[z].cells[y].childNodes[0].getAttribute("type") == 'button'){
						
				}else{
					values.push(myTad.rows[z].cells[y].childNodes[0].value);
			}
		}
		for(var k =0; k < values.length ;k++){
			var dict = {};
			dict[key1] = Number(values[0]);
            dict[key2] =Number(values[1]);
            dict[key3] = (values[2]);
            dict[key4] = (values[3]);
            dict[key5] = Number(values[4]);
            dict[key6] = (values[5]);
            dict[key7] = Number(values[6]);
            dict[key8] = Number(values[7]);
            dict[key9] = Number(values[8]);
            dict[key10] = Number(values[9]);
	
		}
		arr.push(dict);
		dict ={};
		values =[];
		
		}

}

CODE: submit() { var myTad = document.getElementById('rr'); var len = myTad.rows.length ; var arr = []; var values =[]; var key1 ="Task_NO";var key2 ="EFFORT_MONTH";var key3 ="ROLE";var key4 ="LOC"; var key5 ="ESTIMATED_HOURS"; var key6 ="CNTY"; var key7 ="EMPLOYEE_LEVEL";var key8 ="AUTHENTED_HOURS";var key9 ="ESTIMATED_COST"; var key10 ="ACTUAL_COST"; console.log("Row_length:" +len); for( var z =0; z < len ; z++ ) { for( var y = 0; myTad.rows[z].cells[y]; y++){ if(myTad.rows[z].cells[y].childNodes[0].getAttribute("type") == 'button'){ }else{ values.push(myTad.rows[z].cells[y].childNodes[0].value); } } for(var k =0; k < values.length ;k++){ var dict = {}; dict[key1] = Number(values[0]); dict[key2] =Number(values[1]); dict[key3] = (values[2]); dict[key4] = (values[3]); dict[key5] = Number(values[4]); dict[key6] = (values[5]); dict[key7] = Number(values[6]); dict[key8] = Number(values[7]); dict[key9] = Number(values[8]); dict[key10] = Number(values[9]); } arr.push(dict); dict ={}; values =[]; } }
reviewher commented 2021-09-19 21:00:13 +00:00 (Migrated from github.com)

Grouping all related reading issues to #61 please follow up there

Grouping all related reading issues to #61 please follow up there
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#563
No description provided.