Out of memory when dealing with large files #138

Closed
opened 2014-10-25 14:59:24 +00:00 by quentinclems · 14 comments
quentinclems commented 2014-10-25 14:59:24 +00:00 (Migrated from github.com)

Sometimes the browser locks up when dealing with large files, and sometimes nodejs will give an allocation error :(

Sometimes the browser locks up when dealing with large files, and sometimes nodejs will give an allocation error :(
elad commented 2014-10-26 01:38:04 +00:00 (Migrated from github.com)

Could you please post the errors you get along with any stack traces etc.? Also, how big is the file?

Could you please post the errors you get along with any stack traces etc.? Also, how big is the file?
chethan-K commented 2014-10-28 10:09:55 +00:00 (Migrated from github.com)

" Segmentation fault (core dumped) " getting this error at this line.
workbook = XLSX.readFile(xlsxFile)

By updating Node version fixes this problem.
sudo npm cache clean -f
sudo npm install -g n
sudo n stable

" Segmentation fault (core dumped) " getting this error at this line. workbook = XLSX.readFile(xlsxFile) By updating Node version fixes this problem. sudo npm cache clean -f sudo npm install -g n sudo n stable
stevek-pro commented 2015-02-24 14:10:06 +00:00 (Migrated from github.com)

Solution: Read in the excel workbook in chunks of 1000 rows, e.g.:

   workbook = J.readFile(filefullpath, {sheetRows:1000});  

However I am still struggling to determine the total number of rows of a sheet in order to iterate the right amount of iterations over the sheetRows option.

So far I am trying:

    workbook = J.readFile(filefullpath, {bookProps:true});  
    console.log(workbook);

But the info received does not contain the number of rows of a worksheet. Help would be appreciated.

Solution: Read in the excel workbook in chunks of 1000 rows, e.g.: ``` workbook = J.readFile(filefullpath, {sheetRows:1000}); ``` **However** I am still struggling to determine the total number of rows of a sheet in order to iterate the right amount of iterations over the sheetRows option. So far I am trying: ``` workbook = J.readFile(filefullpath, {bookProps:true}); console.log(workbook); ``` But the info received does not contain the number of rows of a worksheet. Help would be appreciated.
SheetJSDev commented 2015-02-24 15:26:20 +00:00 (Migrated from github.com)

@nottinhill This is explained in the worksheet object section of the README but I'll explain it in a bit more detail here since the code is slightly different for J:

tl;dr: the worksheet !fullref key shows you the full range, decode it to find the number of rows

Each worksheet has a !ref property which shows you the range. For example, the RkNumber.xlsx test file has a sheet called RkNumber, so to pull the range you would write:

var filename = "RkNumber.xlsx", sheetname = "RkNumber";
var workbook = J.readFile(filename);
var ref = workbook[1].Sheets[sheetname]["!ref"]; // using J
//var ref = workbook.Sheets[sheetname]["!ref"]; // using xlsx

To actually get the rows from that, you have to parse the reference:

var range = workbook[0].utils.decode_range(ref); // using J
//var range = XLSX.utils.decode_range(ref); // using xlsx

This gives you a range object that looks like

{
  s: { c: 0, r: 0 },
  e: { c: 2, r: 4 } 
}

range.e.r - range.s.r + 1 gives you the total number of rows (the row indices are 0-indexed and the endpoints are inclusive)

When using sheetRows, there are actually two ranges stored. The !ref range shows you the restricted range that you read and the !fullref shows you the full range. These are separate because utilities like the CSV writer rely on the reference (if you used the full range, there would be a bunch of useless empty rows at the end)

@nottinhill This is explained in the worksheet object section of the [README](https://github.com/SheetJS/js-xlsx#worksheet-object) but I'll explain it in a bit more detail here since the code is slightly different for J: tl;dr: the worksheet `!fullref` key shows you the full range, decode it to find the number of rows Each worksheet has a `!ref` property which shows you the range. For example, the [`RkNumber.xlsx` test file](https://github.com/SheetJS/test_files/raw/master/RkNumber.xlsx) has a sheet called `RkNumber`, so to pull the range you would write: ``` var filename = "RkNumber.xlsx", sheetname = "RkNumber"; var workbook = J.readFile(filename); var ref = workbook[1].Sheets[sheetname]["!ref"]; // using J //var ref = workbook.Sheets[sheetname]["!ref"]; // using xlsx ``` To actually get the rows from that, you have to parse the reference: ``` var range = workbook[0].utils.decode_range(ref); // using J //var range = XLSX.utils.decode_range(ref); // using xlsx ``` This gives you a [range object](https://github.com/SheetJS/js-xlsx#general-structures) that looks like ``` { s: { c: 0, r: 0 }, e: { c: 2, r: 4 } } ``` `range.e.r - range.s.r + 1` gives you the total number of rows (the row indices are 0-indexed and the endpoints are inclusive) When using sheetRows, there are actually two ranges stored. The `!ref` range shows you the restricted range that you read and the `!fullref` shows you the full range. These are separate because utilities like the CSV writer rely on the reference (if you used the full range, there would be a bunch of useless empty rows at the end)
SheetJSDev commented 2015-02-24 15:28:43 +00:00 (Migrated from github.com)

@chethan-K There was an issue in node 0.10.31, affecting many projects including js-xlsx: https://github.com/joyent/node/issues/8208. Fortunately they fixed it in 0.10.32

@chethan-K There was an issue in node 0.10.31, affecting many projects including js-xlsx: https://github.com/joyent/node/issues/8208. Fortunately they fixed it in 0.10.32
stevek-pro commented 2015-02-25 11:20:12 +00:00 (Migrated from github.com)

@SheetJSDev: First of all: Thank you and 2nd: wow, the usage of the methods and references with J is quite confusing. Are there any plans to re-name/structure some data structures and interfaces in a logical manner?

e.g. this is quite confusing code:

    var filename = filefullpath;
    var workbookProps = J.readFile(filename, {bookProps:true});
    var sheetname = workbookProps[1].SheetNames[0];;
    var workbook = J.readFile(filename, {sheetRows: 1000});

    var ref = workbook[1].Sheets[sheetname]["!fullref"]; // using J
    var range = workbook[0].utils.decode_range(ref); // using J
    var totalRange = range.e.r - range.s.r + 1;

Now how would I have to read in the workbook's rows range inside a loop, using totalRange as a limes?

    var workbook = J.readFile(filename, {sheetRows: rowsRange});

with e.g. rowsRange = "0-1000", then "1001-2000" and so forth.

Then doing the parsing of the chunks of read in lines with:

    var workbookJSON = J.utils.to_json(workbook);
    var sheetone = getFirstObj(workbookJSON);
    var sheetoneJSON = workbookJSON[sheetone];
@SheetJSDev: First of all: Thank you and 2nd: wow, the usage of the methods and references with J is quite confusing. Are there any plans to re-name/structure some data structures and interfaces in a logical manner? e.g. this is quite confusing code: ``` var filename = filefullpath; var workbookProps = J.readFile(filename, {bookProps:true}); var sheetname = workbookProps[1].SheetNames[0];; var workbook = J.readFile(filename, {sheetRows: 1000}); var ref = workbook[1].Sheets[sheetname]["!fullref"]; // using J var range = workbook[0].utils.decode_range(ref); // using J var totalRange = range.e.r - range.s.r + 1; ``` Now how would I have to read in the workbook's rows range inside a loop, using totalRange as a limes? ``` var workbook = J.readFile(filename, {sheetRows: rowsRange}); ``` with e.g. rowsRange = "0-1000", then "1001-2000" and so forth. Then doing the parsing of the chunks of read in lines with: ``` var workbookJSON = J.utils.to_json(workbook); var sheetone = getFirstObj(workbookJSON); var sheetoneJSON = workbookJSON[sheetone]; ```
elad commented 2015-02-25 16:39:16 +00:00 (Migrated from github.com)

@SheetJSDev: The API is a repeating issue (like cell styles and dates :), how about we start a few wiki pages to keep track of progress? I'm more than willing to help with the code, but we need to decide what the implementation should do about the corner cases and how the interfaces should look like.

@SheetJSDev: The API is a repeating issue (like cell styles and dates :), how about we start a few wiki pages to keep track of progress? I'm more than willing to help with the code, but we need to decide what the implementation should do about the corner cases and how the interfaces should look like.
SheetJSDev commented 2015-02-25 16:53:19 +00:00 (Migrated from github.com)

@elad Something like a wiki or a google doc or IRC chat would help a lot. Any preference?

@elad Something like a wiki or a google doc or IRC chat would help a lot. Any preference?
SheetJSDev commented 2015-02-25 16:59:22 +00:00 (Migrated from github.com)

@nottinhill your point is well taken, and fortunately we are going to rework the API to support this and similar access patterns (so stay tuned :) We're still 0.x for a reason.

@nottinhill your point is well taken, and fortunately we are going to rework the API to support this and similar access patterns (so stay tuned :) We're still 0.x for a reason.
elad commented 2015-02-25 19:38:20 +00:00 (Migrated from github.com)

@SheetJSDev a wiki page, that's persistent, we can reference it and refer to it, and collaborate. Let's start with a proposed API for reading/writing cells, and progress from there to styles and dates.

@SheetJSDev a wiki page, that's persistent, we can reference it and refer to it, and collaborate. Let's start with a proposed API for reading/writing cells, and progress from there to styles and dates.
jnystrom commented 2017-12-04 20:24:39 +00:00 (Migrated from github.com)

Does this idea of chunking work when the type: "buffer"? I am also using a regular CSV file delimited by ;

Does this idea of chunking work when the `type: "buffer"`? I am also using a regular CSV file delimited by `;`
Javirln commented 2018-07-16 04:43:11 +00:00 (Migrated from github.com)

Any updates on trying to allow users to parse data in chunks ?

Any updates on trying to allow users to parse data in chunks ?
BruceAndLee commented 2018-09-12 00:49:03 +00:00 (Migrated from github.com)

why cell style has no effective,such as backgroud color,font color? Whether professional version can resolve this issue?

why cell style has no effective,such as backgroud color,font color? Whether professional version can resolve this issue?
reviewher commented 2021-09-19 21:02:44 +00:00 (Migrated from github.com)

Grouping large file read issues into #61

Grouping large file read issues into #61
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#138
No description provided.