Only parse enough to get headers #409

Closed
opened 2016-05-03 15:38:08 +00:00 by loriensleafs · 4 comments
loriensleafs commented 2016-05-03 15:38:08 +00:00 (Migrated from github.com)

Hey,

There are a couple options for the read functions such as 'bookProps' and 'bookSheets' that when set to true limit the scope of the parse. These are great because when I use them the read takes 60% less time, which makes sense because of the limited parse scope.

My question is, is there a way to have the read only parse enough to return the headers, as I don't actually need more than that, and it's a lot of extra time spent on data I'm not actually using. If there isn't could you point me in the direction of how to customize?

Thanks for any help.

Hey, There are a couple options for the read functions such as 'bookProps' and 'bookSheets' that when set to true limit the scope of the parse. These are great because when I use them the read takes 60% less time, which makes sense because of the limited parse scope. My question is, is there a way to have the read only parse enough to return the headers, as I don't actually need more than that, and it's a lot of extra time spent on data I'm not actually using. If there isn't could you point me in the direction of how to customize? Thanks for any help.
reviewher commented 2017-03-13 02:35:24 +00:00 (Migrated from github.com)

The sheetRows option limits the number of rows that the parser will consider when generating cells. If you just want the first row, set the option sheetRows:1 in the options object when you call read or readFile. For example with the AutoFilter.xlsb test file:

> require('./').readFile('AutoFilter.xlsb', {sheetRows:1}).Sheets['GT']
{ A1: { t: 's', v: 'Code', r: '<t>Code</t>', w: 'Code' },
  B1: { t: 's', v: 'Format', r: '<t>Format</t>', w: 'Format' },
  C1: { t: 's', v: 'Library', r: '<t>Library</t>', w: 'Library' },
  D1: { t: 's', v: 'Notes', r: '<t>Notes</t>', w: 'Notes' },
  E1:
   { t: 's',
     v: 'Importance',
     r: '<t>Importance</t>',
     w: 'Importance' },
  '!ref': 'A1:E1', // <-- this is the range of the extracted cells
  '!fullref': 'A1:E22' } // <-- this is the full workbook range
The `sheetRows` option limits the number of rows that the parser will consider when generating cells. If you just want the first row, set the option `sheetRows:1` in the options object when you call read or readFile. For example with the [AutoFilter.xlsb test file](https://github.com/SheetJS/test_files/blob/master/AutoFilter.xlsb?raw=true): ```js > require('./').readFile('AutoFilter.xlsb', {sheetRows:1}).Sheets['GT'] { A1: { t: 's', v: 'Code', r: '<t>Code</t>', w: 'Code' }, B1: { t: 's', v: 'Format', r: '<t>Format</t>', w: 'Format' }, C1: { t: 's', v: 'Library', r: '<t>Library</t>', w: 'Library' }, D1: { t: 's', v: 'Notes', r: '<t>Notes</t>', w: 'Notes' }, E1: { t: 's', v: 'Importance', r: '<t>Importance</t>', w: 'Importance' }, '!ref': 'A1:E1', // <-- this is the range of the extracted cells '!fullref': 'A1:E22' } // <-- this is the full workbook range ```
elgalu commented 2018-05-03 07:03:19 +00:00 (Migrated from github.com)

Does this option work on the browser? we need a browser-only in which users may upload big files up to 6GB but we would like to preview the first few lines as soon as possible.

Does this option work on the browser? we need a browser-only in which users may upload big files up to 6GB but we would like to preview the first few lines as soon as possible.
adamjaffeback commented 2018-08-04 06:11:10 +00:00 (Migrated from github.com)

Per the example in https://github.com/SheetJS/js-xlsx/issues/409#issuecomment-286002080, I'm not showing that the !fullref shows for all file types. If I do XLSX.read('someFile.csv', {sheetRows:1}), for example, I just get the !ref bounded by 1, but !fullref is undefined.

What file types does this !fullref functionality not appear for?

Per the example in https://github.com/SheetJS/js-xlsx/issues/409#issuecomment-286002080, I'm not showing that the `!fullref` shows for all file types. If I do `XLSX.read('someFile.csv', {sheetRows:1})`, for example, I just get the !ref bounded by 1, but !fullref is undefined. What file types does this !fullref functionality not appear for?
SheetJSDev commented 2021-09-13 06:44:31 +00:00 (Migrated from github.com)

sheetRows: 1 will pull the first row from each sheet.

@elgalu yes the parameter merely says to stop processing cells after that point in the worksheet.

@adamjaffeback formats that self-report worksheet ranges (XLS / XLSX / XLSB / ODS) will have !fullref. They are the self-reported ranges and may not be accurate (some writers cheat by intentionally reporting a range like A1:C1 or A1:C1048576). For CSV, getting the full range means determining the number of rows and columns, which would require processing the the entire file.

`sheetRows: 1` will pull the first row from each sheet. @elgalu yes the parameter merely says to stop processing cells after that point in the worksheet. @adamjaffeback formats that self-report worksheet ranges (XLS / XLSX / XLSB / ODS) will have `!fullref`. They are the self-reported ranges and may not be accurate (some writers cheat by intentionally reporting a range like A1:C1 or A1:C1048576). For CSV, getting the full range means determining the number of rows and columns, which would require processing the the entire file.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#409
No description provided.