Out of memory when dealing with large files #138
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#138
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Sometimes the browser locks up when dealing with large files, and sometimes nodejs will give an allocation error :(
Could you please post the errors you get along with any stack traces etc.? Also, how big is the file?
" Segmentation fault (core dumped) " getting this error at this line.
workbook = XLSX.readFile(xlsxFile)
By updating Node version fixes this problem.
sudo npm cache clean -f
sudo npm install -g n
sudo n stable
Solution: Read in the excel workbook in chunks of 1000 rows, e.g.:
However I am still struggling to determine the total number of rows of a sheet in order to iterate the right amount of iterations over the sheetRows option.
So far I am trying:
But the info received does not contain the number of rows of a worksheet. Help would be appreciated.
@nottinhill This is explained in the worksheet object section of the README but I'll explain it in a bit more detail here since the code is slightly different for J:
tl;dr: the worksheet
!fullref
key shows you the full range, decode it to find the number of rowsEach worksheet has a
!ref
property which shows you the range. For example, theRkNumber.xlsx
test file has a sheet calledRkNumber
, so to pull the range you would write:To actually get the rows from that, you have to parse the reference:
This gives you a range object that looks like
range.e.r - range.s.r + 1
gives you the total number of rows (the row indices are 0-indexed and the endpoints are inclusive)When using sheetRows, there are actually two ranges stored. The
!ref
range shows you the restricted range that you read and the!fullref
shows you the full range. These are separate because utilities like the CSV writer rely on the reference (if you used the full range, there would be a bunch of useless empty rows at the end)@chethan-K There was an issue in node 0.10.31, affecting many projects including js-xlsx: https://github.com/joyent/node/issues/8208. Fortunately they fixed it in 0.10.32
@SheetJSDev: First of all: Thank you and 2nd: wow, the usage of the methods and references with J is quite confusing. Are there any plans to re-name/structure some data structures and interfaces in a logical manner?
e.g. this is quite confusing code:
Now how would I have to read in the workbook's rows range inside a loop, using totalRange as a limes?
with e.g. rowsRange = "0-1000", then "1001-2000" and so forth.
Then doing the parsing of the chunks of read in lines with:
@SheetJSDev: The API is a repeating issue (like cell styles and dates :), how about we start a few wiki pages to keep track of progress? I'm more than willing to help with the code, but we need to decide what the implementation should do about the corner cases and how the interfaces should look like.
@elad Something like a wiki or a google doc or IRC chat would help a lot. Any preference?
@nottinhill your point is well taken, and fortunately we are going to rework the API to support this and similar access patterns (so stay tuned :) We're still 0.x for a reason.
@SheetJSDev a wiki page, that's persistent, we can reference it and refer to it, and collaborate. Let's start with a proposed API for reading/writing cells, and progress from there to styles and dates.
Does this idea of chunking work when the
type: "buffer"
? I am also using a regular CSV file delimited by;
Any updates on trying to allow users to parse data in chunks ?
why cell style has no effective,such as backgroud color,font color? Whether professional version can resolve this issue?
Grouping large file read issues into #61