Out of memory when dealing with large files #138

New Issue

2014-10-25T14:59:24Z

quentinclems commented

2014-10-25 14:59:24 +00:00

(Migrated from github.com)

Sometimes the browser locks up when dealing with large files, and sometimes nodejs will give an allocation error :(

elad commented

2014-10-26 01:38:04 +00:00

(Migrated from github.com)

Could you please post the errors you get along with any stack traces etc.? Also, how big is the file?

chethan-K commented

2014-10-28 10:09:55 +00:00

(Migrated from github.com)

" Segmentation fault (core dumped) " getting this error at this line.
workbook = XLSX.readFile(xlsxFile)

By updating Node version fixes this problem.
sudo npm cache clean -f
sudo npm install -g n
sudo n stable

" Segmentation fault (core dumped) " getting this error at this line. workbook = XLSX.readFile(xlsxFile) By updating Node version fixes this problem. sudo npm cache clean -f sudo npm install -g n sudo n stable

stevek-pro commented

2015-02-24 14:10:06 +00:00

(Migrated from github.com)

Solution: Read in the excel workbook in chunks of 1000 rows, e.g.:

   workbook = J.readFile(filefullpath, {sheetRows:1000});

However I am still struggling to determine the total number of rows of a sheet in order to iterate the right amount of iterations over the sheetRows option.

So far I am trying:

    workbook = J.readFile(filefullpath, {bookProps:true});  
    console.log(workbook);

But the info received does not contain the number of rows of a worksheet. Help would be appreciated.

Solution: Read in the excel workbook in chunks of 1000 rows, e.g.: ``` workbook = J.readFile(filefullpath, {sheetRows:1000}); ``` **However** I am still struggling to determine the total number of rows of a sheet in order to iterate the right amount of iterations over the sheetRows option. So far I am trying: ``` workbook = J.readFile(filefullpath, {bookProps:true}); console.log(workbook); ``` But the info received does not contain the number of rows of a worksheet. Help would be appreciated.

SheetJSDev commented

2015-02-24 15:26:20 +00:00

(Migrated from github.com)

@nottinhill This is explained in the worksheet object section of the README but I'll explain it in a bit more detail here since the code is slightly different for J:

tl;dr: the worksheet !fullref key shows you the full range, decode it to find the number of rows

Each worksheet has a !ref property which shows you the range. For example, the RkNumber.xlsx test file has a sheet called RkNumber, so to pull the range you would write:

var filename = "RkNumber.xlsx", sheetname = "RkNumber";
var workbook = J.readFile(filename);
var ref = workbook[1].Sheets[sheetname]["!ref"]; // using J
//var ref = workbook.Sheets[sheetname]["!ref"]; // using xlsx

To actually get the rows from that, you have to parse the reference:

var range = workbook[0].utils.decode_range(ref); // using J
//var range = XLSX.utils.decode_range(ref); // using xlsx

This gives you a range object that looks like

{
  s: { c: 0, r: 0 },
  e: { c: 2, r: 4 } 
}

range.e.r - range.s.r + 1 gives you the total number of rows (the row indices are 0-indexed and the endpoints are inclusive)

When using sheetRows, there are actually two ranges stored. The !ref range shows you the restricted range that you read and the !fullref shows you the full range. These are separate because utilities like the CSV writer rely on the reference (if you used the full range, there would be a bunch of useless empty rows at the end)

@nottinhill This is explained in the worksheet object section of the [README](https://github.com/SheetJS/js-xlsx#worksheet-object) but I'll explain it in a bit more detail here since the code is slightly different for J: tl;dr: the worksheet `!fullref` key shows you the full range, decode it to find the number of rows Each worksheet has a `!ref` property which shows you the range. For example, the [`RkNumber.xlsx` test file](https://github.com/SheetJS/test_files/raw/master/RkNumber.xlsx) has a sheet called `RkNumber`, so to pull the range you would write: ``` var filename = "RkNumber.xlsx", sheetname = "RkNumber"; var workbook = J.readFile(filename); var ref = workbook[1].Sheets[sheetname]["!ref"]; // using J //var ref = workbook.Sheets[sheetname]["!ref"]; // using xlsx ``` To actually get the rows from that, you have to parse the reference: ``` var range = workbook[0].utils.decode_range(ref); // using J //var range = XLSX.utils.decode_range(ref); // using xlsx ``` This gives you a [range object](https://github.com/SheetJS/js-xlsx#general-structures) that looks like ``` { s: { c: 0, r: 0 }, e: { c: 2, r: 4 } } ``` `range.e.r - range.s.r + 1` gives you the total number of rows (the row indices are 0-indexed and the endpoints are inclusive) When using sheetRows, there are actually two ranges stored. The `!ref` range shows you the restricted range that you read and the `!fullref` shows you the full range. These are separate because utilities like the CSV writer rely on the reference (if you used the full range, there would be a bunch of useless empty rows at the end)

SheetJSDev commented

2015-02-24 15:28:43 +00:00

(Migrated from github.com)

@chethan-K There was an issue in node 0.10.31, affecting many projects including js-xlsx: https://github.com/joyent/node/issues/8208. Fortunately they fixed it in 0.10.32

stevek-pro commented

2015-02-25 11:20:12 +00:00

(Migrated from github.com)

@SheetJSDev: First of all: Thank you and 2nd: wow, the usage of the methods and references with J is quite confusing. Are there any plans to re-name/structure some data structures and interfaces in a logical manner?

e.g. this is quite confusing code:

    var filename = filefullpath;
    var workbookProps = J.readFile(filename, {bookProps:true});
    var sheetname = workbookProps[1].SheetNames[0];;
    var workbook = J.readFile(filename, {sheetRows: 1000});

    var ref = workbook[1].Sheets[sheetname]["!fullref"]; // using J
    var range = workbook[0].utils.decode_range(ref); // using J
    var totalRange = range.e.r - range.s.r + 1;

Now how would I have to read in the workbook's rows range inside a loop, using totalRange as a limes?

    var workbook = J.readFile(filename, {sheetRows: rowsRange});

with e.g. rowsRange = "0-1000", then "1001-2000" and so forth.

Then doing the parsing of the chunks of read in lines with:

    var workbookJSON = J.utils.to_json(workbook);
    var sheetone = getFirstObj(workbookJSON);
    var sheetoneJSON = workbookJSON[sheetone];

@SheetJSDev: First of all: Thank you and 2nd: wow, the usage of the methods and references with J is quite confusing. Are there any plans to re-name/structure some data structures and interfaces in a logical manner? e.g. this is quite confusing code: ``` var filename = filefullpath; var workbookProps = J.readFile(filename, {bookProps:true}); var sheetname = workbookProps[1].SheetNames[0];; var workbook = J.readFile(filename, {sheetRows: 1000}); var ref = workbook[1].Sheets[sheetname]["!fullref"]; // using J var range = workbook[0].utils.decode_range(ref); // using J var totalRange = range.e.r - range.s.r + 1; ``` Now how would I have to read in the workbook's rows range inside a loop, using totalRange as a limes? ``` var workbook = J.readFile(filename, {sheetRows: rowsRange}); ``` with e.g. rowsRange = "0-1000", then "1001-2000" and so forth. Then doing the parsing of the chunks of read in lines with: ``` var workbookJSON = J.utils.to_json(workbook); var sheetone = getFirstObj(workbookJSON); var sheetoneJSON = workbookJSON[sheetone]; ```

elad commented

2015-02-25 16:39:16 +00:00

(Migrated from github.com)

@SheetJSDev: The API is a repeating issue (like cell styles and dates :), how about we start a few wiki pages to keep track of progress? I'm more than willing to help with the code, but we need to decide what the implementation should do about the corner cases and how the interfaces should look like.

SheetJSDev commented

2015-02-25 16:53:19 +00:00

(Migrated from github.com)

@elad Something like a wiki or a google doc or IRC chat would help a lot. Any preference?

SheetJSDev commented

2015-02-25 16:59:22 +00:00

(Migrated from github.com)

@nottinhill your point is well taken, and fortunately we are going to rework the API to support this and similar access patterns (so stay tuned :) We're still 0.x for a reason.

elad commented

2015-02-25 19:38:20 +00:00

(Migrated from github.com)

@SheetJSDev a wiki page, that's persistent, we can reference it and refer to it, and collaborate. Let's start with a proposed API for reading/writing cells, and progress from there to styles and dates.

jnystrom commented

2017-12-04 20:24:39 +00:00

(Migrated from github.com)

Does this idea of chunking work when the type: "buffer"? I am also using a regular CSV file delimited by ;

Does this idea of chunking work when the `type: "buffer"`? I am also using a regular CSV file delimited by `;`

Javirln commented

2018-07-16 04:43:11 +00:00

(Migrated from github.com)

Any updates on trying to allow users to parse data in chunks ?

👍 1

BruceAndLee commented

2018-09-12 00:49:03 +00:00