Future support of reading first columns along/instead of first rows? #2489

Open
opened 2022-01-12 04:11:54 +00:00 by laurelgr · 5 comments
laurelgr commented 2022-01-12 04:11:54 +00:00 (Migrated from github.com)

Hello!
I currently extract first row and column of my worksheet by parsing through the cells, but I would like to limit the file reading so only needed data (first column and/or row) is extracted.
I already know about using XLSX.read(myFile, { sheetRows: 1 });, but I didn't find any issue discussing a column variant sheetColumns. Is such a feature planned for later? Or not at all?
Thanks for your time! :)

Hello! I currently extract first row and column of my worksheet by parsing through the cells, but I would like to limit the file reading so only needed data (first column and/or row) is extracted. I already know about using `XLSX.read(myFile, { sheetRows: 1 });`, but I didn't find any issue discussing a column variant `sheetColumns`. Is such a feature planned for later? Or not at all? Thanks for your time! :)
SheetJSDev commented 2022-01-12 05:35:33 +00:00 (Migrated from github.com)

It's definitely interesting but lower priority. Some notes:

Most file formats store data in row-major order. For example, in the CSV format:

A1,B1,C1,D1,...
A2,B2,C2,D2,...
A3,B3,C3,D3,...
A4,B4,C4,D4,...
...
A1000000,B1000000,C1000000,D1000000,...

The file starts with the first row of data, followed by the second row, followed by the third row, etc. For sheetRows: 1, the parser can stop after the end of the first row.

To extract the data from the first column, the parser would have to scan almost the entire file (stopping at cell B1000000 in the last row)

.

The dual sheetCols parameter is interesting because some formats are column-major order. The XLS file from issue #2432 is in column-major order. BIFF8 XLS also has optional special records that let you jump around the file. This approach would not work for XLSB or XLSX.

It's definitely interesting but lower priority. Some notes: Most file formats store data in row-major order. For example, in the CSV format: ``` A1,B1,C1,D1,... A2,B2,C2,D2,... A3,B3,C3,D3,... A4,B4,C4,D4,... ... A1000000,B1000000,C1000000,D1000000,... ``` The file starts with the first row of data, followed by the second row, followed by the third row, etc. For `sheetRows: 1`, the parser can stop after the end of the first row. To extract the data from the first column, the parser would have to scan almost the entire file (stopping at cell B1000000 in the last row) . The dual `sheetCols` parameter is interesting because some formats are column-major order. The XLS file from issue #2432 is in column-major order. BIFF8 XLS also has optional special records that let you jump around the file. This approach would not work for XLSB or XLSX.
laurelgr commented 2022-01-12 09:30:16 +00:00 (Migrated from github.com)

If scanning the whole file is ok, just need to only store the first column and ignore the others, would that be easier to implement? I have no idea where to start looking in the code however 🤔

If scanning the whole file is ok, just need to only *store* the first column and ignore the others, would that be easier to implement? I have no idea where to start looking in the code however 🤔
reviewher commented 2022-02-28 18:07:48 +00:00 (Migrated from github.com)

Dupe of #335

Dupe of #335
laurelgr commented 2022-03-09 10:39:28 +00:00 (Migrated from github.com)

@reviewher related but not really duplicate, my issue is not with the format order of the json (row-major) but with limiting the parsed (or stored) data to the initial columns.

@reviewher related but not really duplicate, my issue is not with the format order of the json (row-major) but with limiting the parsed (or stored) data to the initial columns.
reviewher commented 2022-03-09 10:45:38 +00:00 (Migrated from github.com)

drop a comment in #475 that a column limit feature should also be supported

drop a comment in #475 that a column limit feature should also be supported
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2489
No description provided.