Is it possible to know the rows length without XLSX.read? #459
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#459
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi,
I have to know the rows length without reading the file at all - because reading the file in order to get the rows length takes me a lot of time.. Is there such a possibility?
Edit: I have found out that it takes me A LOT more time if the my excel file has macros in it.
Is there any possibility to send a flag to not pay attention to the macros?
Thanks
@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the
<dimension>
tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range.
Related issues https://github.com/SheetJS/js-xlsx/issues/189 https://github.com/SheetJS/js-xlsx/issues/82
@sheetjsdev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?
@ronilitman by the way, how to get the progress when reading the file ?
The technical answer depends on file format:
Some formats like CSV don't report the range anywhere and have variable sized rows, so the only way to know the total number of records is to effectively parse the whole thing.
Other formats like DBF have readily computable record counts based on the size since the header tells you how large each row payload must be.
The interesting formats generally have a way of self-reporting ranges but these are self-reported. A number of third party generators are known to hack around this. Third party hacks have made the data source unreliable, and resolving #1601 will involve changing the behavior anyway.
So the complete and unfortunate answer is "no, it's not possible to correctly determine the number of rows without scanning the entire worksheet".
As @reviewher mentioned, it is possible to just avoid generating cells, but it's unclear if the payoff is worth it (especially if the file will have to be re-parsed to actually extract the data)
@VN666 #632 is tracking "progress" related issues