XLSX.stream.to_json doesn't ends when a blank row is the last row in Excel sheet. #1779

Closed
opened 2020-02-28 10:12:18 +00:00 by rohandhamapurkar · 12 comments
rohandhamapurkar commented 2020-02-28 10:12:18 +00:00 (Migrated from github.com)

sheetjs/bits/97_node.js

		stream._read = function() {
 			if(R > r.e.r) return stream.push(null);
			while(R <= r.e.r) {
				//if ((rowinfo[R-1]||{}).hidden) continue;
				var row = make_json_row(sheet, r, R, cols, header, hdr, dense, o);
				++R;
				if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) {
					stream.push(row.row);
					break;
				}
			}
		};
		return stream;

Hi @SheetJSDev !

This piece of code has the R > r.e.r checking outside of the loop but when a blank row is inserted at the end of a given sheet then the if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) condition inside of the while loop isn't executed and the stream does not end.

Ill be submitting a pull request for the same for a fix for this issue that I have found.
Please verify and merge the changes as soon as possible as I am using this library in production in my system.

### sheetjs/bits/97_node.js ``` stream._read = function() { if(R > r.e.r) return stream.push(null); while(R <= r.e.r) { //if ((rowinfo[R-1]||{}).hidden) continue; var row = make_json_row(sheet, r, R, cols, header, hdr, dense, o); ++R; if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) { stream.push(row.row); break; } } }; return stream; ``` Hi @SheetJSDev ! This piece of code has the ```R > r.e.r``` checking outside of the loop but when a blank row is inserted at the end of a given sheet then the ```if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) ``` condition inside of the while loop isn't executed and the stream does not end. Ill be submitting a pull request for the same for a fix for this issue that I have found. Please verify and merge the changes as soon as possible as I am using this library in production in my system.
highflying commented 2020-03-31 11:35:45 +00:00 (Migrated from github.com)

I'm seeing the same issue, for now I'm having to specify the range to skip processing the blank lines at the end.

I'm seeing the same issue, for now I'm having to specify the range to skip processing the blank lines at the end.
rohandhamapurkar commented 2020-03-31 17:58:03 +00:00 (Migrated from github.com)

@highflying can you please elaborate more when you say
"specify the range to skip processing the blank lines"

@highflying can you please elaborate more when you say "specify the range to skip processing the blank lines"
highflying commented 2020-03-31 18:14:47 +00:00 (Migrated from github.com)

@highflying can you please elaborate more when you say
"specify the range to skip processing the blank lines"

I know the range of the sheet that has data in it, so I am specifying it manually, e.g.

  const stream = XLSX.stream.to_json(sheet, {
    range: "A1:BN3",
  });

When doing this the end event gets emitted, without it no end event happens.

> @highflying can you please elaborate more when you say > "specify the range to skip processing the blank lines" I know the range of the sheet that has data in it, so I am specifying it manually, e.g. ``` const stream = XLSX.stream.to_json(sheet, { range: "A1:BN3", }); ``` When doing this the `end` event gets emitted, without it no `end` event happens.
rohandhamapurkar commented 2020-03-31 18:18:36 +00:00 (Migrated from github.com)

Oh makes sense thanks a lot.
Please do make this change live soon I use this in production.
Thanks again for your response.

Oh makes sense thanks a lot. Please do make this change live soon I use this in production. Thanks again for your response.
ev45ive commented 2020-05-16 23:50:27 +00:00 (Migrated from github.com)

Ok. If i have many files with variable number of rows and I want to stream it as those are big files.

How do I end the stream then?

Ok. If i have many files with variable number of rows and I want to stream it as those are big files. How do I end the stream then?
rohandhamapurkar commented 2020-05-23 11:35:04 +00:00 (Migrated from github.com)

Ok. If i have many files with variable number of rows and I want to stream it as those are big files.

How do I end the stream then?

@ev45ive
Every worksheet object has "!range" property which gives the range for each worksheet which can be passed in to the
XLSX.stream.to_json function and you can do it dynamically.

> Ok. If i have many files with variable number of rows and I want to stream it as those are big files. > > How do I end the stream then? @ev45ive Every worksheet object has "!range" property which gives the range for each worksheet which can be passed in to the ```XLSX.stream.to_json``` function and you can do it dynamically.
ev45ive commented 2020-05-25 19:24:53 +00:00 (Migrated from github.com)

@highflying this is interesting. When I use sheet['!ref'] ( which is "A1:O252" ) stream doesn't end.

const stream = xlsx.stream.to_json(sheet, {
  range: sheet['!ref'] // :"A1:O252"
})

but when I make it LESS THAN that (by one or more) then it works fine. gets all results and ends stream

const stream = xlsx.stream.to_json(sheet, {
  range: "A1:O251"  // sheet['!ref'] minus one row
})

Seems like a bug to me, unless I missed something

@highflying this is interesting. When I use sheet['!ref'] ( which is "A1:O252" ) stream doesn't end. ``` const stream = xlsx.stream.to_json(sheet, { range: sheet['!ref'] // :"A1:O252" }) ``` but when I make it LESS THAN that (by one or more) then it works fine. gets all results and ends stream ``` const stream = xlsx.stream.to_json(sheet, { range: "A1:O251" // sheet['!ref'] minus one row }) ``` Seems like a bug to me, unless I missed something
SheetJSDev commented 2020-05-25 19:26:58 +00:00 (Migrated from github.com)

What version of node are you testing against and can you share the file?

What version of node are you testing against and can you share the file?
ev45ive commented 2020-05-25 19:38:12 +00:00 (Migrated from github.com)
const range = sheet['!ref']!
const stream = xlsx.stream.to_json(sheet, {
  range: range.slice(0, -1) + (parseInt(range.slice(-1)) - 1) //  :-O
})

@SheetJSDev
node -v
v12.16.2
"xlsx": "^0.16.0"

Here is file (removed private data, left jsut amounts and dates)
zakup — kopia.xlsx

Now i see there are totals in row 252. But it should still produce messed up record and then end or error instead of silently failing, right?

``` const range = sheet['!ref']! const stream = xlsx.stream.to_json(sheet, { range: range.slice(0, -1) + (parseInt(range.slice(-1)) - 1) // :-O }) ``` @SheetJSDev node -v v12.16.2 "xlsx": "^0.16.0" Here is file (removed private data, left jsut amounts and dates) [zakup — kopia.xlsx](https://github.com/SheetJS/sheetjs/files/4678628/zakup.kopia.xlsx) Now i see there are totals in row 252. But it should still produce messed up record and then end or error instead of silently failing, right?
ev45ive commented 2020-05-25 19:40:55 +00:00 (Migrated from github.com)

updated to 0.16.1 . The same issue

const stream = xlsx.stream.to_json(sheet, {
  range: sheet['!ref'] 
})

stream never ends or errors after last row (totals - see file included above)

updated to 0.16.1 . The same issue ``` const stream = xlsx.stream.to_json(sheet, { range: sheet['!ref'] }) ``` stream never ends or errors after last row (totals - see file included above)
mpgo13 commented 2020-07-29 02:26:06 +00:00 (Migrated from github.com)

I'm having the same issue. I noticed that setting blankrows: true will emit the end event. Here my reproducible problem:

const xlsx = require('xlsx');
const workbook = xlsx.utils.book_new();
const worksheet = xlsx.utils.json_to_sheet([
  {
    a: '1',
    b: '2',
  },
  {},
  {
    a: '11',
    b: '22',
  },
  {},
  {},
]);
xlsx.utils.book_append_sheet(workbook, worksheet);

const stream = xlsx.stream.to_json(workbook.Sheets[workbook.SheetNames[0]], {
  //blankrows: true, // WORKS with blankrows = true
  header: ['a', 'b'],
});

let closed = false;

stream.on('data', console.log);
stream.on('close', () => console.log('CLOSE'));
stream.on('end', err => {
  console.log('END', err);

  closed = true;
});

// for the event loop
const interval = setInterval(() => {
  if (closed) {
    clearInterval(interval);
  }
}, 100);

I'm having the same issue. I noticed that setting `blankrows: true` will emit the end event. Here my reproducible problem: ```js const xlsx = require('xlsx'); const workbook = xlsx.utils.book_new(); const worksheet = xlsx.utils.json_to_sheet([ { a: '1', b: '2', }, {}, { a: '11', b: '22', }, {}, {}, ]); xlsx.utils.book_append_sheet(workbook, worksheet); const stream = xlsx.stream.to_json(workbook.Sheets[workbook.SheetNames[0]], { //blankrows: true, // WORKS with blankrows = true header: ['a', 'b'], }); let closed = false; stream.on('data', console.log); stream.on('close', () => console.log('CLOSE')); stream.on('end', err => { console.log('END', err); closed = true; }); // for the event loop const interval = setInterval(() => { if (closed) { clearInterval(interval); } }, 100); ```
SheetJSDev commented 2020-07-29 18:07:00 +00:00 (Migrated from github.com)

Can someone check if the following patch fixes the problem (and if so, please feel free to send a PR):

https://github.com/SheetJS/sheetjs/blob/master/bits/97_node.js#L111-L112

                stream._read = function() {
-                       if(R > r.e.r) return stream.push(null);
                        while(R <= r.e.r) {
                                //if ((rowinfo[R-1]||{}).hidden) continue;
                                var row = make_json_row(sheet, r, R, cols, header, hdr, dense, o);
                                ++R;
                                if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) {
                                        stream.push(row.row);
-                                       break;
+                                       return;
                                }
                        }
+                       return stream.push(null); // <-- ADD THIS LINE
                };
                return stream;
        };

The most obvious guess is that in object mode, Readable streams are always expected to push a value. The null, which would trigger the end of stream, would not be pushed by the function if the last row is empty AND blank rows are skipped.

@rohandhamapurkar this is slightly different from your proposal in that this approach will always push null if there is no more data.

Can someone check if the following patch fixes the problem (and if so, please feel free to send a PR): https://github.com/SheetJS/sheetjs/blob/master/bits/97_node.js#L111-L112 ```diff stream._read = function() { - if(R > r.e.r) return stream.push(null); while(R <= r.e.r) { //if ((rowinfo[R-1]||{}).hidden) continue; var row = make_json_row(sheet, r, R, cols, header, hdr, dense, o); ++R; if((row.isempty === false) || (header === 1 ? o.blankrows !== false : !!o.blankrows)) { stream.push(row.row); - break; + return; } } + return stream.push(null); // <-- ADD THIS LINE }; return stream; }; ``` The most obvious guess is that in object mode, Readable streams are always expected to push a value. The `null`, which would trigger the end of stream, would not be pushed by the function if the last row is empty AND blank rows are skipped. @rohandhamapurkar this is slightly different from your proposal in that this approach will always push null if there is no more data.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1779
No description provided.