sheetjs/docbits/80_parseopts.md

4.3 KiB

Parsing Options

The exported read and readFile functions accept an options argument:

Option Name Default Description
type Input data encoding (see Input Type below)
cellFormula true Save formulae to the .f field **
cellHTML true Parse rich text and save HTML to the .h field
cellNF false Save number format string to the .z field
cellStyles false Save style/theme info to the .s field
cellDates false Store dates as type d (default is n) **
sheetStubs false Create cell objects of type z for stub cells
sheetRows 0 If >0, read the first sheetRows rows **
bookDeps false If true, parse calculation chains
bookFiles false If true, add raw files to book object **
bookProps false If true, only parse enough to get book metadata **
bookSheets false If true, only parse enough to get the sheet names
bookVBA false If true, expose vbaProject.bin to vbaraw field **
password "" If defined and file is encrypted, use password **
WTF false If true, throw errors on unexpected file features **
  • cellFormula option only applies to formats that require extra processing to parse formulae (XLS/XLSB).
  • Even if cellNF is false, formatted text will be generated and saved to .w
  • In some cases, sheets may be parsed even if bookSheets is false.
  • bookSheets and bookProps combine to give both sets of information
  • Deps will be an empty object if bookDeps is falsy
  • bookFiles behavior depends on file type:
    • keys array (paths in the ZIP) for ZIP-based formats
    • files hash (mapping paths to objects representing the files) for ZIP
    • cfb object for formats using CFB containers
  • sheetRows-1 rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)
  • bookVBA merely exposes the raw vba object. It does not parse the data.
  • cellDates currently does not convert numerical dates to JS dates.
  • Currently only XOR encryption is supported. Unsupported error will be thrown for files employing other encryption methods.
  • WTF is mainly for development. By default, the parser will suppress read errors on single worksheets, allowing you to read from the worksheets that do parse properly. Setting WTF:1 forces those errors to be thrown.

The defaults are enumerated in bits/84_defaults.js

Input Type

Strings can be interpreted in multiple ways. The type parameter for read tells the library how to parse the data argument:

type expected input
"base64" string: base64 encoding of the file
"binary" string: binary string (n-th byte is data.charCodeAt(n))
"buffer" nodejs Buffer
"array" array: array of 8-bit unsigned int (n-th byte is data[n])
"file" string: filename that will be read and processed (nodejs only)

Guessing File Type

Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. This enables file type punning: renaming files with the .xls extension will tell your computer to use Excel to open the file but Excel will know how to handle it. This library applies similar logic:

Byte 0 Raw File Type Spreadsheet Types
0xD0 CFB Container BIFF 5/8 or password-protected XLSX/XLSB
0x09 BIFF Stream BIFF 2/3/4/5
0x3C XML/HTML SpreadsheetML or Flat ODS or UOS1 or HTML
0x50 ZIP Archive XLSB or XLSX/M or ODS or UOS2
0xFE UTF8 Text SpreadsheetML or Flat ODS or UOS1