Unable to load large XLS document, "Sheet1 undefined" #2999

Open
opened 2023-09-26 16:26:01 +00:00 by ssshake · 4 comments

I have a single sheet xlsx document with a side of 79,815,440 bytes and 446,790 rows and 44 columns.

I have failed to load this document using my own app with this XLSX library. But it also fails using the xlsx-cli tool and the WebSQL demo site found in the sheetjs documentation.

It seems like it is failing silently. I do not get any error in my browser console.

After an XLSX.read the wb.SheetNames[] array correctly has "Sheet1" in it. However the wb.Sheets[] array is empty.

My Code Example:

image

const handleFile = (file: File) => {
        const reader = new FileReader()
        setIsProcessingXLSXFile(true)
        reader.onload = (e: any) => {
            const ab = e.target.result
            const wb = XLSX.read(ab, { type: 'array' }) //executes and completes, no errors, but no Sheets data
            const wsname = wb.SheetNames[0] // This array does contain the "Sheet1"
            const ws = wb.Sheets[wsname] //This is undefined, no Sheet1 data
            const tempData = XLSX.utils.sheet_to_json(ws, { header: 0 })

            if (importedDataContainsNoRows(tempData)) {
                handleEmptyFile(file)
                return
            }

            if (importedDataContainsEmptyColumnHeaders(tempData)) {
                handleFileWithEmptyColumnHeaders(file)
                return
            }

            if (
                getHeaderDistance(tempData[0]) <
                getAverageDistance(tempData) * editDistanceThreshold
            ) {
                toast({
                    title: 'Warning',
                    description: 'Column headers may be missing',
                    status: 'warning',
                    ...defaultToastProps,
                })
            }
            const data = XLSX.utils.sheet_to_json(ws, { header: 1 })
            const columns = makeColDefs(data[0])
            data.shift()
            const columnData = formatRows(columns, data)
            setCurrentFileData([...columnData])
            setCols([...columns])
            setIsProcessingXLSXFile(false)
        }
        reader.readAsArrayBuffer(file)
    }

WebSQL Demo Site Comparison

https://sheetjs.com/sql/

This page seems to run into the same issue. Doesn't crash or throw error, just no data in the Sheets array.

SELECT * FROM `Sheet1` LIMIT 30
[object SQLError]
could not prepare statement (1 no such table: Sheet1)

Attempt using XLSX-CLI

xlsx-cli --dev WS_Dataset.xlsx
/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26126
	} catch(e) { if(opts.WTF) throw e; }
	                          ^

Error: Cannot create a string longer than 0x1fffffe8 characters
    at Object.slice (buffer.js:636:37)
    at Buffer.toString (buffer.js:827:14)
    at cc2str (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3357:14)
    at getdatastr (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3483:39)
    at getdata (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3504:95)
    at getzipdata (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3525:19)
    at safe_parse_sheet (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26090:14)
    at parse_zip (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26290:3)
    at read_zip (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26704:9)
    at readSync (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26782:69) {
  code: 'ERR_STRING_TOO_LONG'
}

Expected Results / Comment

Is it possible to load a xlsx document of this size with this library? Given that it fail with two different browser examples, and a command line example on a modern, powerful computer I think the limitation is something other than compute resources.

Is there a way to only load a subset of the data? I don't need to parse or render 400,000+ rows on Chrome. I really just need to load the document to show a preview of the file. The first 10 lines even.

Thankjs!

I have a single sheet xlsx document with a side of 79,815,440 **bytes** and 446,790 **rows** and 44 **columns**. I have failed to load this document using my own app with this XLSX library. But it also fails using the xlsx-cli tool and the WebSQL demo site found in the sheetjs documentation. It seems like it is failing silently. I do not get any error in my browser console. After an ```XLSX.read``` the wb.SheetNames[] array correctly has "Sheet1" in it. However the wb.Sheets[] array is empty. **My Code Example:** ![image](/attachments/34085b77-29e1-4582-a247-ccaa98783574) ``` const handleFile = (file: File) => { const reader = new FileReader() setIsProcessingXLSXFile(true) reader.onload = (e: any) => { const ab = e.target.result const wb = XLSX.read(ab, { type: 'array' }) //executes and completes, no errors, but no Sheets data const wsname = wb.SheetNames[0] // This array does contain the "Sheet1" const ws = wb.Sheets[wsname] //This is undefined, no Sheet1 data const tempData = XLSX.utils.sheet_to_json(ws, { header: 0 }) if (importedDataContainsNoRows(tempData)) { handleEmptyFile(file) return } if (importedDataContainsEmptyColumnHeaders(tempData)) { handleFileWithEmptyColumnHeaders(file) return } if ( getHeaderDistance(tempData[0]) < getAverageDistance(tempData) * editDistanceThreshold ) { toast({ title: 'Warning', description: 'Column headers may be missing', status: 'warning', ...defaultToastProps, }) } const data = XLSX.utils.sheet_to_json(ws, { header: 1 }) const columns = makeColDefs(data[0]) data.shift() const columnData = formatRows(columns, data) setCurrentFileData([...columnData]) setCols([...columns]) setIsProcessingXLSXFile(false) } reader.readAsArrayBuffer(file) } ``` **WebSQL Demo Site Comparison** https://sheetjs.com/sql/ This page seems to run into the same issue. Doesn't crash or throw error, just no data in the Sheets array. ``` SELECT * FROM `Sheet1` LIMIT 30 [object SQLError] could not prepare statement (1 no such table: Sheet1) ``` Attempt using XLSX-CLI ``` xlsx-cli --dev WS_Dataset.xlsx /Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26126 } catch(e) { if(opts.WTF) throw e; } ^ Error: Cannot create a string longer than 0x1fffffe8 characters at Object.slice (buffer.js:636:37) at Buffer.toString (buffer.js:827:14) at cc2str (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3357:14) at getdatastr (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3483:39) at getdata (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3504:95) at getzipdata (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:3525:19) at safe_parse_sheet (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26090:14) at parse_zip (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26290:3) at read_zip (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26704:9) at readSync (/Users/rich/.nvm/versions/node/v14.19.1/lib/node_modules/xlsx-cli/node_modules/xlsx/xlsx.js:26782:69) { code: 'ERR_STRING_TOO_LONG' } ``` ### Expected Results / Comment Is it possible to load a xlsx document of this size with this library? Given that it fail with two different browser examples, and a command line example on a modern, powerful computer I think the limitation is something other than compute resources. Is there a way to only load a subset of the data? I don't need to parse or render 400,000+ rows on Chrome. I really just need to load the document to show a preview of the file. The first 10 lines even. Thankjs!
203 KiB
Owner

It is noted in the documentation:

https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred

Expand "Technical Limitations" for details:

Node 16 and Chrome 106 enforce a limit of 536870888 characters.

We'll add a comment in the docs mentioning ERR_STRING_TOO_LONG and other error messages.

Current SheetJS Process

Currently the whole XML is decoded before processing.

This approach does not lean on advanced features and still works in IE6 and legacy JavaScript engines (including engines used in Photoshop and InDesign, certain NetSuite SuiteScripts deployments, and other extension platforms).

This also means the library is subject to the string length limit.

V8 Limit

The V8 string length limit has changed over the years. https://bugs.chromium.org/p/v8/issues/detail?id=6148 was raised in the V8 bug tracker in 2017.

Note that IE11 did not have the same hardcoded limit. It was able to support longer strings in 2017 than V8 can support in 2023

Future

With modern JS and web APIs there are new ways to work around the limitations, but it would drop compatibility with most browsers and environments (possibly Chromium / NodeJS only).

If NodeJS is the only target, it is possible to unzip the entire XLSX file into a temporary folder and seek/stream individual entries from the filesystem in the correct order. It's possible to do something similar in the browser using deferred Blob operations.

If this is of interest, we can take a closer look again.

It is noted in the documentation: https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred Expand "Technical Limitations" for details: > Node 16 and Chrome 106 enforce a limit of 536870888 characters. We'll add a comment in the docs mentioning `ERR_STRING_TOO_LONG` and other error messages. #### Current SheetJS Process Currently the whole XML is decoded before processing. This approach does not lean on advanced features and still works in IE6 and legacy JavaScript engines (including engines used in Photoshop and InDesign, certain NetSuite SuiteScripts deployments, and other extension platforms). This also means the library is subject to the string length limit. #### V8 Limit The V8 string length limit has changed over the years. https://bugs.chromium.org/p/v8/issues/detail?id=6148 was raised in the V8 bug tracker in 2017. Note that IE11 did not have the same hardcoded limit. It was able to support longer strings in 2017 than V8 can support in 2023 #### Future With modern JS and web APIs there are new ways to work around the limitations, but it would drop compatibility with most browsers and environments (possibly Chromium / NodeJS only). If NodeJS is the only target, it is possible to unzip the entire XLSX file into a temporary folder and seek/stream individual entries from the filesystem in the correct order. It's possible to do something similar in the browser using deferred `Blob` operations. If this is of interest, we can take a closer look again.
Owner

A new section has been added to the Troubleshooting page: https://docs.sheetjs.com/docs/miscellany/errors#invalid-string-length-or-err_string_too_long

PS: dense mode will help for worksheets of ~10M cells (that do not trip the string length limit):

            const wb = XLSX.read(ab, { type: 'array', dense: true })
A new section has been added to the Troubleshooting page: https://docs.sheetjs.com/docs/miscellany/errors#invalid-string-length-or-err_string_too_long PS: `dense` mode will help for worksheets of ~10M cells (that do not trip the string length limit): ```js const wb = XLSX.read(ab, { type: 'array', dense: true }) ```
Author

OK thanks for the quick response and the details.

My sheet is well over that 10M cell limit at 19M cells. So it looks like I will have to come up with another solution which is OK because it was never my intent to render the full sheet in chrome anyway. I only wish for a preview so I will have to do some pre-processing on the file.

Regarding the string length error, that only happens with the CLI, not chrome. In Chrome it fails silently and the sheets array is simply empty. So that made the initial investigation confusing. Making more sense now.

OK thanks for the quick response and the details. My sheet is well over that 10M cell limit at 19M cells. So it looks like I will have to come up with another solution which is OK because it was never my intent to render the full sheet in chrome anyway. I only wish for a preview so I will have to do some pre-processing on the file. Regarding the string length error, that only happens with the CLI, not chrome. In Chrome it fails silently and the sheets array is simply empty. So that made the initial investigation confusing. Making more sense now.

I'm also getting the "ERR_STRING_TOO_LONG", I'm using Node 18 (inside Docker, node:18-alpine).
The file is 43.2MB and has slightly over 1M rows and 13 Columns. "dense" Mode also doesn't help.
After saving the file as XLSB it does work correctly, but this will probably not be a feasible option for my users...

I'm also getting the "ERR_STRING_TOO_LONG", I'm using Node 18 (inside Docker, node:18-alpine). The file is 43.2MB and has slightly over 1M rows and 13 Columns. "dense" Mode also doesn't help. After saving the file as XLSB it does work correctly, but this will probably not be a feasible option for my users...
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2999
No description provided.