Is it possible to know the rows length without XLSX.read? #459

Closed
opened 2016-08-17 07:52:32 +00:00 by ronilitman · 3 comments
ronilitman commented 2016-08-17 07:52:32 +00:00 (Migrated from github.com)

Hi,

I have to know the rows length without reading the file at all - because reading the file in order to get the rows length takes me a lot of time.. Is there such a possibility?

Edit: I have found out that it takes me A LOT more time if the my excel file has macros in it.

Is there any possibility to send a flag to not pay attention to the macros?

Thanks

Hi, I have to know the rows length without reading the file at all - because reading the file in order to get the rows length takes me a lot of time.. Is there such a possibility? Edit: I have found out that it takes me A LOT more time if the my excel file has macros in it. Is there any possibility to send a flag to not pay attention to the macros? Thanks
reviewher commented 2017-01-06 19:58:46 +00:00 (Migrated from github.com)

@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the <dimension> tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16

The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range.

Related issues https://github.com/SheetJS/js-xlsx/issues/189 https://github.com/SheetJS/js-xlsx/issues/82

@sheetjsdev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?

@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the `<dimension>` tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16 The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range. Related issues https://github.com/SheetJS/js-xlsx/issues/189 https://github.com/SheetJS/js-xlsx/issues/82 @sheetjsdev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?
VN666 commented 2019-03-16 11:36:00 +00:00 (Migrated from github.com)

@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the <dimension> tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16

The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range.

Related issues #189 #82

@SheetJSDev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?

@ronilitman by the way, how to get the progress when reading the file ?

> @ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the `<dimension>` tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16 > > The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range. > > Related issues #189 #82 > > @SheetJSDev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell? @ronilitman by the way, how to get the progress when reading the file ?
SheetJSDev commented 2022-02-14 07:29:20 +00:00 (Migrated from github.com)

The technical answer depends on file format:

Some formats like CSV don't report the range anywhere and have variable sized rows, so the only way to know the total number of records is to effectively parse the whole thing.

Other formats like DBF have readily computable record counts based on the size since the header tells you how large each row payload must be.

The interesting formats generally have a way of self-reporting ranges but these are self-reported. A number of third party generators are known to hack around this. Third party hacks have made the data source unreliable, and resolving #1601 will involve changing the behavior anyway.

So the complete and unfortunate answer is "no, it's not possible to correctly determine the number of rows without scanning the entire worksheet".

As @reviewher mentioned, it is possible to just avoid generating cells, but it's unclear if the payoff is worth it (especially if the file will have to be re-parsed to actually extract the data)

@VN666 #632 is tracking "progress" related issues

The technical answer depends on file format: Some formats like CSV don't report the range anywhere and have variable sized rows, so the only way to know the total number of records is to effectively parse the whole thing. Other formats like DBF have readily computable record counts based on the size since the header tells you how large each row payload must be. The interesting formats generally have a way of self-reporting ranges but these are self-reported. A number of third party generators are known to hack around this. Third party hacks have made the data source unreliable, and resolving #1601 will involve changing the behavior anyway. So the complete and unfortunate answer is "no, it's not possible to correctly determine the number of rows without scanning the entire worksheet". As @reviewher mentioned, it is possible to just avoid generating cells, but it's unclear if the payoff is worth it (especially if the file will have to be re-parsed to actually extract the data) @VN666 #632 is tracking "progress" related issues
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#459
No description provided.