Document option
This commit is contained in:
parent
c11146f21a
commit
e41febc653
@ -1,23 +1,37 @@
|
||||
---
|
||||
title: Reading Files
|
||||
sidebar_position: 3
|
||||
hide_table_of_contents: true
|
||||
title: Reading Files
|
||||
---
|
||||
|
||||
**`XLSX.read(data, options)`**
|
||||
The main SheetJS method for reading files is `read`. It expects developers to
|
||||
supply the actual data in a supported representation.
|
||||
|
||||
The `readFile` helper method accepts a filename and tries to read the specified
|
||||
file using standard APIs. *It does not work in web browsers!*
|
||||
|
||||
**Parse file data and generate a SheetJS workbook object**
|
||||
|
||||
```js
|
||||
var wb = XLSX.read(data, opts);
|
||||
```
|
||||
|
||||
`read` attempts to parse `data` and return [a workbook object](/docs/csf/book)
|
||||
|
||||
The [`type`](#input-type) of the `options` object determines how `data` is
|
||||
The [`type`](#input-type) property of the `opts` object controls how `data` is
|
||||
interpreted. For string data, the default interpretation is Base64.
|
||||
|
||||
**`XLSX.readFile(filename, options)`**
|
||||
**Read a specified file and generate a SheetJS workbook object**
|
||||
|
||||
```js
|
||||
var wb = XLSX.readFile(filename, opts);
|
||||
```
|
||||
|
||||
`readFile` attempts to read a local file with specified `filename`.
|
||||
|
||||
:::caution pass
|
||||
|
||||
This method only works in specific environments. It does not work in browsers!
|
||||
`readFile` works in specific platforms. **It does not support web browsers!**
|
||||
|
||||
The [NodeJS installation note](/docs/getting-started/installation/nodejs#usage)
|
||||
includes additional instructions for non-standard use cases.
|
||||
@ -29,32 +43,33 @@ includes additional instructions for non-standard use cases.
|
||||
The read functions accept an options argument:
|
||||
|
||||
| Option Name | Default | Description |
|
||||
| :---------- | ------: | :--------------------------------------------------- |
|
||||
|`type` | | Input data encoding (see Input Type below) |
|
||||
|`raw` | false | If true, plain text parsing will not parse values ** |
|
||||
|`dense` | false | If true, use a dense worksheet representation ** |
|
||||
|:------------|:--------|:-----------------------------------------------------|
|
||||
|`type` | | [Input data representation](#input-type) |
|
||||
|`raw` | `false` | If true, plain text parsing will not parse values ** |
|
||||
|`dense` | `false` | If true, use a dense worksheet representation ** |
|
||||
|`codepage` | | If specified, use code page when appropriate ** |
|
||||
|`cellFormula`| true | Save formulae to the .f field |
|
||||
|`cellHTML` | true | Parse rich text and save HTML to the `.h` field |
|
||||
|`cellNF` | false | Save number format string to the `.z` field |
|
||||
|`cellStyles` | false | Save style/theme info to the `.s` field |
|
||||
|`cellText` | true | Generated formatted text to the `.w` field |
|
||||
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||||
|`cellFormula`| `true` | Save [formulae to the `.f` field](#formulae) |
|
||||
|`cellHTML` | `true` | Parse rich text and save HTML to the `.h` field |
|
||||
|`cellNF` | `false` | Save number format string to the `.z` field |
|
||||
|`cellStyles` | `false` | Save style/theme info to the `.s` field |
|
||||
|`cellText` | `true` | Generated formatted text to the `.w` field |
|
||||
|`cellDates` | `false` | Store dates as type `d` (default is `n`) |
|
||||
|`dateNF` | | If specified, use the string for date code 14 ** |
|
||||
|`sheetStubs` | false | Create cell objects of type `z` for stub cells |
|
||||
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows ** |
|
||||
|`bookDeps` | false | If true, parse calculation chains |
|
||||
|`bookFiles` | false | If true, add raw files to book object ** |
|
||||
|`bookProps` | false | If true, only parse enough to get book metadata ** |
|
||||
|`bookSheets` | false | If true, only parse enough to get the sheet names |
|
||||
|`bookVBA` | false | If true, copy VBA blob to `vbaraw` field ** |
|
||||
|`password` | "" | If defined and file is encrypted, use password ** |
|
||||
|`WTF` | false | If true, throw errors on unexpected file features ** |
|
||||
|`sheetStubs` | `false` | Create cell objects of type `z` for stub cells |
|
||||
|`sheetRows` | `0` | If >0, read the [specified number of rows](#range) |
|
||||
|`bookDeps` | `false` | If true, parse calculation chains |
|
||||
|`bookFiles` | `false` | If true, add raw files to book object ** |
|
||||
|`bookProps` | `false` | If true, only parse enough to get book metadata ** |
|
||||
|`bookSheets` | `false` | If true, only parse enough to get the sheet names |
|
||||
|`bookVBA` | `false` | If true, generate [VBA blob](#vba) |
|
||||
|`password` | `""` | If defined and file is encrypted, use password ** |
|
||||
|`WTF` | `false` | If true, throw errors on unexpected file features ** |
|
||||
|`sheets` | | If specified, only parse specified sheets ** |
|
||||
|`PRN` | false | If true, allow parsing of PRN files ** |
|
||||
|`xlfn` | false | If true, preserve `_xlfn.` prefixes in formulae ** |
|
||||
|`nodim` | `false` | If true, calculate [worksheet ranges](#range) |
|
||||
|`PRN` | `false` | If true, allow parsing of PRN files ** |
|
||||
|`xlfn` | `false` | If true, [preserve prefixes](#formulae) in formulae |
|
||||
|`FS` | | DSV Field Separator override |
|
||||
|`UTC` | true | If explicitly false, parse text dates in local time |
|
||||
|`UTC` | `true` | If explicitly false, parse text dates in local time |
|
||||
|
||||
- Even if `cellNF` is false, formatted text will be generated and saved to `.w`
|
||||
- In some cases, sheets may be parsed even if `bookSheets` is false.
|
||||
@ -66,23 +81,15 @@ The read functions accept an options argument:
|
||||
* `keys` array (paths in the ZIP) for ZIP-based formats
|
||||
* `files` hash (mapping paths to objects representing the files) for ZIP
|
||||
* `cfb` object for formats using CFB containers
|
||||
- `sheetRows-1` rows will be generated when looking at the JSON object output
|
||||
(since the header row is counted as a row when parsing the data)
|
||||
- By default all worksheets are parsed. `sheets` restricts based on input type:
|
||||
* number: zero-based index of worksheet to parse (`0` is first worksheet)
|
||||
* string: name of worksheet to parse (case insensitive)
|
||||
* array of numbers and strings to select multiple worksheets.
|
||||
- `bookVBA` merely exposes the raw VBA CFB object. It does not parse the data.
|
||||
XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes
|
||||
the VBA entries alongside the core Workbook entry, so the library generates a
|
||||
new blob from the XLS CFB container that works in XLSM and XLSB files.
|
||||
- `codepage` is applied to BIFF2 - BIFF5 files without `CodePage` records and to
|
||||
CSV files without BOM in `type:"binary"`. BIFF8 XLS always defaults to 1200.
|
||||
- `PRN` affects parsing of text files without a common delimiter character.
|
||||
- Currently only XOR encryption is supported. Unsupported error will be thrown
|
||||
for files employing other encryption methods.
|
||||
- Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the
|
||||
user. SheetJS will strip `_xlfn.` normally. The `xlfn` option preserves them.
|
||||
- `WTF` is mainly for development. By default, the parser will suppress read
|
||||
errors on single worksheets, allowing you to read from the worksheets that do
|
||||
parse properly. Setting `WTF:true` forces those errors to be thrown.
|
||||
@ -93,10 +100,52 @@ The read functions accept an options argument:
|
||||
the parsers will assume the files are specified in local time. By default, as
|
||||
is the case for other file formats, dates and times are interpreted in UTC.
|
||||
|
||||
#### Range
|
||||
|
||||
Some file formats, including XLSX and XLS, can self-report worksheet ranges. The
|
||||
self-reported ranges are used by default.
|
||||
|
||||
If the `sheetRows` option is set, up to `sheetRows` rows will be parsed from the
|
||||
worksheets. `sheetRows-1` rows will be generated when looking at the JSON object
|
||||
output (since the header row is counted as a row when parsing the data).
|
||||
|
||||
The `nodim` option instructs the parser to ignore self-reported ranges and use
|
||||
the actual cells in the worksheet to determine the range. This addresses known
|
||||
issues with non-compliant third-party exporters.
|
||||
|
||||
#### Formulae
|
||||
|
||||
For some file formats, the `cellFormula` option must be explicitly enabled to
|
||||
ensure that formulae are extracted.
|
||||
|
||||
Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the
|
||||
user. SheetJS will strip `_xlfn.` normally. The `xlfn` option preserves them.
|
||||
[The "Formulae" docs](/docs/csf/features/formulae#prefixed-future-functions)
|
||||
covers this in more detail.
|
||||
|
||||
["Formulae"](/docs/csf/features/formulae) covers the features in more detail.
|
||||
|
||||
#### VBA
|
||||
|
||||
When a macro-enabled file is parsed, if the `bookVBA` option is `true`, the raw
|
||||
VBA blob will be stored in the `vbaraw` property of the workbook.
|
||||
|
||||
["VBA and Macros"](/docs/csf/features/vba) covers the features in more detail.
|
||||
|
||||
<details>
|
||||
<summary><b>Implementation Details</b> (click to show)</summary>
|
||||
|
||||
The `bookVBA` merely exposes the raw VBA CFB object. It does not parse the data.
|
||||
|
||||
XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes
|
||||
the VBA entries alongside the core Workbook entry, so the library generates a
|
||||
new blob from the XLS CFB container that works in XLSM and XLSB files.
|
||||
|
||||
</details>
|
||||
|
||||
### Input Type
|
||||
|
||||
Strings can be interpreted in multiple ways. The `type` parameter for `read`
|
||||
tells the library how to parse the data argument:
|
||||
The `type` parameter for `read` controls how data is interpreted:
|
||||
|
||||
| `type` | expected input |
|
||||
|:-----------|:----------------------------------------------------------------|
|
||||
@ -151,7 +200,7 @@ Plain text format guessing follows the priority order:
|
||||
| Format | Test |
|
||||
|:-------|:--------------------------------------------------------------------|
|
||||
| XML | `<?xml` appears in the first 1024 characters |
|
||||
| HTML | starts with `<` and HTML tags appear in the first 1024 characters * |
|
||||
| HTML | starts with `<` and HTML tags appear in the first 1024 characters |
|
||||
| XML | starts with `<` and the first tag is valid |
|
||||
| RTF | starts with `{\rt` |
|
||||
| DSV | starts with `sep=` followed by field delimiter and line separator |
|
||||
@ -163,17 +212,17 @@ Plain text format guessing follows the priority order:
|
||||
| PRN | `PRN` option is set to true |
|
||||
| CSV | (fallback) |
|
||||
|
||||
- HTML tags include: `html`, `table`, `head`, `meta`, `script`, `style`, `div`
|
||||
HTML tags include `html`, `table`, `head`, `meta`, `script`, `style`, `div`
|
||||
|
||||
</details>
|
||||
|
||||
<details open>
|
||||
<summary><b>Why are random text files valid?</b> (click to hide)</summary>
|
||||
|
||||
Excel is extremely aggressive in reading files. Adding an XLS extension to any
|
||||
display text file (where the only characters are ANSI display chars) tricks
|
||||
Excel into thinking that the file is potentially a CSV or TSV file, even if it
|
||||
is only one column! This library attempts to replicate that behavior.
|
||||
Excel is extremely aggressive in reading files. Adding the XLS extension to any
|
||||
text file (where the only characters are ANSI display chars) tricks Excel into
|
||||
processing the file as if it were a CSV or TSV file, even if the result is not
|
||||
useful! This library attempts to replicate that behavior.
|
||||
|
||||
The best approach is to validate the desired worksheet and ensure it has the
|
||||
expected number of rows or columns. Extracting the range is extremely simple:
|
||||
|
Loading…
Reference in New Issue
Block a user