Document option

2024-05-13 22:43:07 -04:00 · 2024-05-13 22:43:07 -04:00 · e41febc653
commit e41febc653
parent c11146f21a
1 changed files with 92 additions and 43 deletions
--- a/docz/docs/08-api/03-parse-options.md
+++ b/docz/docs/08-api/03-parse-options.md
@ -1,23 +1,37 @@
 ---
+title: Reading Files
 sidebar_position: 3
 hide_table_of_contents: true
-title: Reading Files
 ---

-**`XLSX.read(data, options)`**
+The main SheetJS method for reading files is `read`. It expects developers to
+supply the actual data in a supported representation.
+
+The `readFile` helper method accepts a filename and tries to read the specified
+file using standard APIs. *It does not work in web browsers!*
+
+**Parse file data and generate a SheetJS workbook object**
+
+```js
+var wb = XLSX.read(data, opts);
+```

 `read` attempts to parse `data` and return [a workbook object](/docs/csf/book)

-The [`type`](#input-type) of the `options` object determines how `data` is
+The [`type`](#input-type) property of the `opts` object controls how `data` is
 interpreted. For string data, the default interpretation is Base64.

-**`XLSX.readFile(filename, options)`**
+**Read a specified file and generate a SheetJS workbook object**
+
+```js
+var wb = XLSX.readFile(filename, opts);
+```

 `readFile` attempts to read a local file with specified `filename`.

 :::caution pass

-This method only works in specific environments. It does not work in browsers!
+`readFile` works in specific platforms. **It does not support web browsers!**

 The [NodeJS installation note](/docs/getting-started/installation/nodejs#usage)
 includes additional instructions for non-standard use cases.
@ -29,32 +43,33 @@ includes additional instructions for non-standard use cases.
 The read functions accept an options argument:

 | Option Name | Default | Description                                          |
-| :---------- | ------: | :--------------------------------------------------- |
-|`type`       |         | Input data encoding (see Input Type below)           |
-|`raw`        | false   | If true, plain text parsing will not parse values ** |
-|`dense`      | false   | If true, use a dense worksheet representation **     |
+|:------------|:--------|:-----------------------------------------------------|
+|`type`       |         | [Input data representation](#input-type)             |
+|`raw`        | `false` | If true, plain text parsing will not parse values ** |
+|`dense`      | `false` | If true, use a dense worksheet representation **     |
 |`codepage`   |         | If specified, use code page when appropriate **      |
-|`cellFormula`| true    | Save formulae to the .f field                        |
-|`cellHTML`   | true    | Parse rich text and save HTML to the `.h` field      |
-|`cellNF`     | false   | Save number format string to the `.z` field          |
-|`cellStyles` | false   | Save style/theme info to the `.s` field              |
-|`cellText`   | true    | Generated formatted text to the `.w` field           |
-|`cellDates`  | false   | Store dates as type `d` (default is `n`)             |
+|`cellFormula`| `true`  | Save [formulae to the `.f` field](#formulae)         |
+|`cellHTML`   | `true`  | Parse rich text and save HTML to the `.h` field      |
+|`cellNF`     | `false` | Save number format string to the `.z` field          |
+|`cellStyles` | `false` | Save style/theme info to the `.s` field              |
+|`cellText`   | `true`  | Generated formatted text to the `.w` field           |
+|`cellDates`  | `false` | Store dates as type `d` (default is `n`)             |
 |`dateNF`     |         | If specified, use the string for date code 14 **     |
-|`sheetStubs` | false   | Create cell objects of type `z` for stub cells       |
-|`sheetRows`  | 0       | If >0, read the first `sheetRows` rows **            |
-|`bookDeps`   | false   | If true, parse calculation chains                    |
-|`bookFiles`  | false   | If true, add raw files to book object **             |
-|`bookProps`  | false   | If true, only parse enough to get book metadata **   |
-|`bookSheets` | false   | If true, only parse enough to get the sheet names    |
-|`bookVBA`    | false   | If true, copy VBA blob to `vbaraw` field **          |
-|`password`   | ""      | If defined and file is encrypted, use password **    |
-|`WTF`        | false   | If true, throw errors on unexpected file features ** |
+|`sheetStubs` | `false` | Create cell objects of type `z` for stub cells       |
+|`sheetRows`  | `0`     | If >0, read the [specified number of rows](#range)   |
+|`bookDeps`   | `false` | If true, parse calculation chains                    |
+|`bookFiles`  | `false` | If true, add raw files to book object **             |
+|`bookProps`  | `false` | If true, only parse enough to get book metadata **   |
+|`bookSheets` | `false` | If true, only parse enough to get the sheet names    |
+|`bookVBA`    | `false` | If true, generate [VBA blob](#vba)                   |
+|`password`   | `""`    | If defined and file is encrypted, use password **    |
+|`WTF`        | `false` | If true, throw errors on unexpected file features ** |
 |`sheets`     |         | If specified, only parse specified sheets **         |
-|`PRN`        | false   | If true, allow parsing of PRN files **               |
-|`xlfn`       | false   | If true, preserve `_xlfn.` prefixes in formulae **   |
+|`nodim`      | `false` | If true, calculate [worksheet ranges](#range)        |
+|`PRN`        | `false` | If true, allow parsing of PRN files **               |
+|`xlfn`       | `false` | If true, [preserve prefixes](#formulae) in formulae  |
 |`FS`         |         | DSV Field Separator override                         |
-|`UTC`        | true    | If explicitly false, parse text dates in local time  |
+|`UTC`        | `true`  | If explicitly false, parse text dates in local time  |

 - Even if `cellNF` is false, formatted text will be generated and saved to `.w`
 - In some cases, sheets may be parsed even if `bookSheets` is false.
@ -66,23 +81,15 @@ The read functions accept an options argument:
    * `keys` array (paths in the ZIP) for ZIP-based formats
    * `files` hash (mapping paths to objects representing the files) for ZIP
    * `cfb` object for formats using CFB containers
- `sheetRows-1` rows will be generated when looking at the JSON object output
-  (since the header row is counted as a row when parsing the data)
 - By default all worksheets are parsed.  `sheets` restricts based on input type:
    * number: zero-based index of worksheet to parse (`0` is first worksheet)
    * string: name of worksheet to parse (case insensitive)
    * array of numbers and strings to select multiple worksheets.
- `bookVBA` merely exposes the raw VBA CFB object.  It does not parse the data.
-  XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes
-  the VBA entries alongside the core Workbook entry, so the library generates a
-  new blob from the XLS CFB container that works in XLSM and XLSB files.
 - `codepage` is applied to BIFF2 - BIFF5 files without `CodePage` records and to
  CSV files without BOM in `type:"binary"`.  BIFF8 XLS always defaults to 1200.
 - `PRN` affects parsing of text files without a common delimiter character.
 - Currently only XOR encryption is supported.  Unsupported error will be thrown
  for files employing other encryption methods.
- Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the
-  user. SheetJS will strip `_xlfn.` normally. The `xlfn` option preserves them.
 - `WTF` is mainly for development.  By default, the parser will suppress read
  errors on single worksheets, allowing you to read from the worksheets that do
  parse properly. Setting `WTF:true` forces those errors to be thrown.
@ -93,10 +100,52 @@ The read functions accept an options argument:
  the parsers will assume the files are specified in local time. By default, as
  is the case for other file formats, dates and times are interpreted in UTC.

+#### Range
+
+Some file formats, including XLSX and XLS, can self-report worksheet ranges. The
+self-reported ranges are used by default.
+
+If the `sheetRows` option is set, up to `sheetRows` rows will be parsed from the
+worksheets. `sheetRows-1` rows will be generated when looking at the JSON object
+output (since the header row is counted as a row when parsing the data).
+
+The `nodim` option instructs the parser to ignore self-reported ranges and use
+the actual cells in the worksheet to determine the range. This addresses known
+issues with non-compliant third-party exporters.
+
+#### Formulae
+
+For some file formats, the `cellFormula` option must be explicitly enabled to
+ensure that formulae are extracted.
+
+Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the
+user. SheetJS will strip `_xlfn.` normally. The `xlfn` option preserves them.
+[The "Formulae" docs](/docs/csf/features/formulae#prefixed-future-functions)
+covers this in more detail.
+
+["Formulae"](/docs/csf/features/formulae) covers the features in more detail.
+
+#### VBA
+
+When a macro-enabled file is parsed, if the `bookVBA` option is `true`, the raw
+VBA blob will be stored in the `vbaraw` property of the workbook.
+
+["VBA and Macros"](/docs/csf/features/vba) covers the features in more detail.
+
+<details>
+  <summary><b>Implementation Details</b> (click to show)</summary>
+
+The `bookVBA` merely exposes the raw VBA CFB object. It does not parse the data.
+
+XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes
+the VBA entries alongside the core Workbook entry, so the library generates a
+new blob from the XLS CFB container that works in XLSM and XLSB files.
+
+</details>
+
 ### Input Type

-Strings can be interpreted in multiple ways.  The `type` parameter for `read`
-tells the library how to parse the data argument:
+The `type` parameter for `read` controls how data is interpreted:

 | `type`     | expected input                                                  |
 |:-----------|:----------------------------------------------------------------|
@ -151,7 +200,7 @@ Plain text format guessing follows the priority order:
 | Format | Test                                                                |
 |:-------|:--------------------------------------------------------------------|
 | XML    | `<?xml` appears in the first 1024 characters                        |
-| HTML   | starts with `<` and HTML tags appear in the first 1024 characters * |
+| HTML   | starts with `<` and HTML tags appear in the first 1024 characters   |
 | XML    | starts with `<` and the first tag is valid                          |
 | RTF    | starts with `{\rt`                                                  |
 | DSV    | starts with `sep=` followed by field delimiter and line separator   |
@ -163,17 +212,17 @@ Plain text format guessing follows the priority order:
 | PRN    | `PRN` option is set to true                                         |
 | CSV    | (fallback)                                                          |

- HTML tags include: `html`, `table`, `head`, `meta`, `script`, `style`, `div`
+HTML tags include `html`, `table`, `head`, `meta`, `script`, `style`, `div`

 </details>

 <details open>
  <summary><b>Why are random text files valid?</b> (click to hide)</summary>

-Excel is extremely aggressive in reading files.  Adding an XLS extension to any
-display text file  (where the only characters are ANSI display chars) tricks
-Excel into thinking that the file is potentially a CSV or TSV file, even if it
-is only one column!  This library attempts to replicate that behavior.
+Excel is extremely aggressive in reading files. Adding the XLS extension to any
+text file (where the only characters are ANSI display chars) tricks Excel into
+processing the file as if it were a CSV or TSV file, even if the result is not
+useful!  This library attempts to replicate that behavior.

 The best approach is to validate the desired worksheet and ensure it has the
 expected number of rows or columns.  Extracting the range is extremely simple: