Partial xlsx file causes indefinite loop #3215

Open
opened 2024-09-12 10:53:31 +00:00 by Rogier75 · 4 comments

A partial xlsx file smaller than 43 bytes will cause an (almost) indefinite loop.

SYSTEM SETUP:

  1. node-xlsx 0.24.0
  2. xlsx version 0.20.2
  3. compiled against commonjs

Last known xlsx version to work correctly and not have this problem is: 0.17.5

STEPS TO REPRODUCE:

  1. start with a valid xlsx file
  2. split the valid xlsx file to where you end up with the first 20 bytes. Anything below 43 bytes will cause the loop.
  3. run xlsx.parse with the partial file as input

A hexdump of the first 20 bytes might look like:

0000000 4b50 0403 0014 0808 0008 5112 592c 0000
0000010 0000 0000                              
0000014

EXPECTED RESULT:

A partial file should trigger an error such as 'End of zip file not found'

ACTUAL RESULT:

We see that the xlsx.parse_zip function gets called. Variable fcnt gets set to a value of in this case 2056. The for loop will iterate 2056 times. Each loop iteration taking 1 second. The whole loop lasting preventing any other node actions getting processing time, so hanging the system.

A partial xlsx file smaller than 43 bytes will cause an (almost) indefinite loop. SYSTEM SETUP: 1. node-xlsx 0.24.0 2. xlsx version 0.20.2 3. compiled against commonjs Last known xlsx version to work correctly and not have this problem is: 0.17.5 STEPS TO REPRODUCE: 1. start with a valid xlsx file 2. split the valid xlsx file to where you end up with the first 20 bytes. Anything below 43 bytes will cause the loop. 3. run xlsx.parse with the partial file as input A hexdump of the first 20 bytes might look like: ``` 0000000 4b50 0403 0014 0808 0008 5112 592c 0000 0000010 0000 0000 0000014 ``` EXPECTED RESULT: A partial file should trigger an error such as 'End of zip file not found' ACTUAL RESULT: We see that the xlsx.parse_zip function gets called. Variable fcnt gets set to a value of in this case 2056. The for loop will iterate 2056 times. Each loop iteration taking 1 second. The whole loop lasting preventing any other node actions getting processing time, so hanging the system.
Owner

To be sure, let's start from a sample file https://docs.sheetjs.com/pres.xlsx and start a new project:

## Sample project
cd /tmp
mkdir i3215
cd i3215
npm init -y

## Install SheetJS NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz

## Download test file
curl -LO https://docs.sheetjs.com/pres.xlsx

We can inspect the contents using xxd:

## Show first 32 bytes
xxd pres.xlsx | head -n 2

The output appears byte-swapped compared to your output, likely because your tool was interpreting values as little-endian 16-bit integers and xxd prints the bytes as they appear in the file:

00000000: 504b 0304 1400 0600 0800 0000 2100 62ee  PK..........!.b.
00000010: 9d68 5e01 0000 9004 0000 1300 0802 5b43  .h^...........[C

Following your procedure, the following node invocation should hang:

node -pe 'require("xlsx").read(fs.readFileSync("pres.xlsx").slice(0,20))'

This fails fairly quickly in local testing with the error Bad compressed size: 0 != 350.


To test against your output, we need to recover the original Uint8Array. The following should be run in the NodeJS REPL:

/* run these commands in a NodeJS REPL session */
var i16 = "4b50 0403 0014 0808 0008 5112 592c 0000 0000 0000".split(" ").map(s => parseInt(s, 16));
var buf = Buffer.from(i16.flatMap(n => [n&0x255, n >> 8]))
var wb = require("xlsx").read(buf);

This command immediately fails with Error: Unsupported ZIP file


Can you test the aforementioned steps locally and compare the performance? If you agree with the analysis, can you reinstall the library using the "NodeJS" installation guide

To be sure, let's start from a sample file https://docs.sheetjs.com/pres.xlsx and start a new project: ```bash ## Sample project cd /tmp mkdir i3215 cd i3215 npm init -y ## Install SheetJS NodeJS module npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz ## Download test file curl -LO https://docs.sheetjs.com/pres.xlsx ``` We can inspect the contents using `xxd`: ```bash ## Show first 32 bytes xxd pres.xlsx | head -n 2 ``` The output appears byte-swapped compared to your output, likely because your tool was interpreting values as little-endian 16-bit integers and `xxd` prints the bytes as they appear in the file: ``` 00000000: 504b 0304 1400 0600 0800 0000 2100 62ee PK..........!.b. 00000010: 9d68 5e01 0000 9004 0000 1300 0802 5b43 .h^...........[C ``` Following your procedure, the following `node` invocation should hang: ```bash node -pe 'require("xlsx").read(fs.readFileSync("pres.xlsx").slice(0,20))' ``` This fails fairly quickly in local testing with the error `Bad compressed size: 0 != 350`. --- To test against your output, we need to recover the original `Uint8Array`. The following should be run in the NodeJS REPL: ```js /* run these commands in a NodeJS REPL session */ var i16 = "4b50 0403 0014 0808 0008 5112 592c 0000 0000 0000".split(" ").map(s => parseInt(s, 16)); var buf = Buffer.from(i16.flatMap(n => [n&0x255, n >> 8])) var wb = require("xlsx").read(buf); ``` This command immediately fails with `Error: Unsupported ZIP file` --- Can you test the aforementioned steps locally and compare the performance? If you agree with the analysis, can you reinstall the library using the ["NodeJS" installation guide](https://docs.sheetjs.com/docs/getting-started/installation/nodejs/)
Author

Thank you for your response. I have gone thru the steps you posted above.

The result is that with pres.xlsx the expected exception is thrown:

Error: Bad compressed size: 0 != 350

With the file that I have the method call hangs as reported in this ticket.

I will paste the outcome of the pres.xlsx file and my test file below to easily compare the two:

## pres.xlsx
## show first 32 bytes
xdd pres.xlsx | head -n 2

It prints the below bytes similar to what you expect.

00000000: 504b 0304 1400 0600 0800 0000 2100 62ee  PK..........!.b.
00000010: 9d68 5e01 0000 9004 0000 1300 0802 5b43  .h^...........[C

Now I use my tester file which is a file created by LibreOffice Calc. It's an empty sheet. File on disk is 5094 bytes. But we are interested in the first 20 bytes. We print them below:

## myfile.xlsx
## show first 32 bytes
xdd myfile.xlsx | head -n 2
00000000: 504b 0304 1400 0808 0800 1251 2c59 0000  PK.........Q,Y..
00000010: 0000 0000 0000 0000 0000 1a00 0000 786c  ..............xl

Running the above partial file (first 20 bytes) thru the read method is causing the function to seemingly hang. (a very long loop)

Thank you for your response. I have gone thru the steps you posted above. The result is that with **pres.xlsx** the expected exception is thrown: `Error: Bad compressed size: 0 != 350` With the file that I have the method call hangs as reported in this ticket. I will paste the outcome of the pres.xlsx file and my test file below to easily compare the two: ``` ## pres.xlsx ## show first 32 bytes xdd pres.xlsx | head -n 2 ``` It prints the below bytes similar to what you expect. ``` 00000000: 504b 0304 1400 0600 0800 0000 2100 62ee PK..........!.b. 00000010: 9d68 5e01 0000 9004 0000 1300 0802 5b43 .h^...........[C ``` Now I use my tester file which is a file created by LibreOffice Calc. It's an empty sheet. File on disk is 5094 bytes. But we are interested in the first 20 bytes. We print them below: ``` ## myfile.xlsx ## show first 32 bytes xdd myfile.xlsx | head -n 2 ``` ``` 00000000: 504b 0304 1400 0808 0800 1251 2c59 0000 PK.........Q,Y.. 00000010: 0000 0000 0000 0000 0000 1a00 0000 786c ..............xl ``` Running the above partial file (first 20 bytes) thru the read method is causing the function to seemingly hang. (a very long loop)
Owner

Thanks for following up! Can confirm the slowdown with the following script based on your file data:

var buf = Buffer.from("50 4b 03 04 14 00 08 08 08 00 12 51 2c 59 00 00 00 00 00 00".split(" ").map(x => parseInt(x, 16))); require("xlsx").read(buf)
Thanks for following up! Can confirm the slowdown with the following script based on your file data: ```js var buf = Buffer.from("50 4b 03 04 14 00 08 08 08 00 12 51 2c 59 00 00 00 00 00 00".split(" ").map(x => parseInt(x, 16))); require("xlsx").read(buf) ```
Author

Thanks for following up! Can confirm the slowdown with the following script based on your file data:

var buf = Buffer.from("50 4b 03 04 14 00 08 08 08 00 12 51 2c 59 00 00 00 00 00 00".split(" ").map(x => parseInt(x, 16))); require("xlsx").read(buf)

That I can confirm on my side as well.

> Thanks for following up! Can confirm the slowdown with the following script based on your file data: > > ```js > var buf = Buffer.from("50 4b 03 04 14 00 08 08 08 00 12 51 2c 59 00 00 00 00 00 00".split(" ").map(x => parseInt(x, 16))); require("xlsx").read(buf) > ``` That I can confirm on my side as well.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3215
No description provided.