browser tests and doc cleanup [ci skip]

This commit is contained in:
SheetJS 2017-04-30 16:37:53 -04:00
parent b9bc0a1627
commit 810c447bf9
31 changed files with 1164 additions and 270 deletions

337
README.md

@ -22,11 +22,18 @@ enhancements and additional features by request.
![circo graph of format support](formats.png)
<details>
<summary>Graph Legend</summary>
![graph legend](legend.png)
</details>
## Table of Contents
<details>
<summary>Expand to show Table of Contents</summary>
<!-- toc -->
- [Installation](#installation)
@ -34,9 +41,12 @@ enhancements and additional features by request.
* [Optional Modules](#optional-modules)
* [ECMAScript 5 Compatibility](#ecmascript-5-compatibility)
- [Parsing Workbooks](#parsing-workbooks)
* [Complete Examples](#complete-examples)
* [Note on Streaming Read](#note-on-streaming-read)
- [Working with the Workbook](#working-with-the-workbook)
* [Complete Examples](#complete-examples-1)
- [Writing Workbooks](#writing-workbooks)
* [Complete Examples](#complete-examples-2)
* [Streaming Write](#streaming-write)
- [Interface](#interface)
* [Parsing functions](#parsing-functions)
@ -108,6 +118,8 @@ enhancements and additional features by request.
<!-- tocstop -->
</details>
## Installation
With [npm](https://www.npmjs.org/package/xlsx):
@ -187,7 +199,8 @@ To use the shim, add the shim before the script tag that loads xlsx.js:
For parsing, the first step is to read the file. This involves acquiring the
data and feeding it into the library. Here are a few common scenarios:
- node readFile:
<details>
<summary><b>nodejs read a file</b> (click to show)</summary>
```js
if(typeof require !== 'undefined') XLSX = require('xlsx');
@ -195,15 +208,23 @@ var workbook = XLSX.readFile('test.xlsx');
/* DO SOMETHING WITH workbook HERE */
```
- Browser DOM Table element:
</details>
<details>
<summary><b>Browser read TABLE element from page</b> (click to show)</summary>
```js
var worksheet = XLSX.utils.table_to_book(document.getElementById('tableau'));
/* DO SOMETHING WITH workbook HERE */
```
- ajax (for a more complete example that works in older browsers, check the demo
at <http://oss.sheetjs.com/js-xlsx/ajax.html>):
</details>
<details>
<summary><b>Browser download file (ajax)</b> (click to show)</summary>
Note: for a more complete example that works in older browsers, check the demo
at <http://oss.sheetjs.com/js-xlsx/ajax.html>):
```js
/* set up XMLHttpRequest */
@ -230,9 +251,14 @@ oReq.onload = function(e) {
oReq.send();
```
- HTML5 drag-and-drop using readAsBinaryString or readAsArrayBuffer:
note: readAsBinaryString and readAsArrayBuffer may not be available in every
browser. Use dynamic feature tests to determine which method to use.
</details>
<details>
<summary><b>Browser drag-and-drop</b> (click to show)</summary>
Drag-and-drop uses FileReader with readAsBinaryString or readAsArrayBuffer.
Note: readAsBinaryString and readAsArrayBuffer may not be available in every
browser. Use dynamic feature tests to determine which method to use.
```js
/* processing array buffers, only required for readAsArrayBuffer */
@ -276,7 +302,10 @@ function handleDrop(e) {
drop_dom_element.addEventListener('drop', handleDrop, false);
```
- HTML5 input file element using readAsBinaryString or readAsArrayBuffer:
</details>
<details>
<summary><b>Browser file upload form element</b> (click to show)</summary>
```js
/* fixdata and rABS are defined in the drag and drop example */
@ -308,7 +337,10 @@ function handleFile(e) {
input_dom_element.addEventListener('change', handleFile, false);
```
**Complete examples:**
</details>
### Complete Examples
- <http://oss.sheetjs.com/js-xlsx/> HTML5 File API / Base64 Text / Web Workers
@ -361,7 +393,7 @@ var desired_cell = worksheet[address_of_cell];
var desired_value = (desired_cell ? desired_cell.v : undefined);
```
**Complete examples:**
### Complete Examples
- <https://github.com/SheetJS/js-xlsx/blob/master/bin/xlsx.njs> node
@ -382,7 +414,8 @@ For writing, the first step is to generate output data. The helper functions
dissemination. The second step is to actual share the data with the end point.
Assuming `workbook` is a workbook object:
- nodejs write to file:
<details>
<summary><b>nodejs write a file</b> (click to show)</summary>
```js
/* output format determined by filename */
@ -390,11 +423,16 @@ XLSX.writeFile(workbook, 'out.xlsx');
/* at this point, out.xlsx is a file that you can distribute */
```
- browser generate binary blob and "download" to client
(using [FileSaver.js](https://github.com/eligrey/FileSaver.js/) for download):
</details>
<details>
<summary><b>Browser download file</b> (click to show)</summary>
Note: browser generates binary blob and forces a "download" to client. This
example uses [FileSaver.js](https://github.com/eligrey/FileSaver.js/):
```js
/* bookType can be 'xlsx' or 'xlsm' or 'xlsb' or 'ods' */
/* bookType can be any supported output type */
var wopts = { bookType:'xlsx', bookSST:false, type:'binary' };
var wbout = XLSX.write(workbook,wopts);
@ -409,12 +447,12 @@ function s2ab(s) {
/* the saveAs call downloads a file on the local machine */
saveAs(new Blob([s2ab(wbout)],{type:"application/octet-stream"}), "test.xlsx");
```
</details>
**Complete examples:**
### Complete Examples
- <http://sheetjs.com/demos/writexlsx.html> generates a simple file
- <http://git.io/WEK88Q> writing an array of arrays in nodejs
- <http://sheetjs.com/demos/table.html> exporting an HTML table
- <http://sheetjs.com/demos/writexlsx.html> generates a simple file
### Streaming Write
@ -426,6 +464,7 @@ stream. They are only exposed in node.
- `XLSX.stream.to_html` is the streaming version of the HTML output type.
<https://github.com/sheetjs/sheetaki> pipes write streams to nodejs response.
## Interface
`XLSX` is the exposed variable in the browser and the exported node variable
@ -535,6 +574,9 @@ Type `b` is the Boolean type. `v` is interpreted according to JS truth tables.
Type `e` is the Error type. `v` holds the number and `w` holds the common name:
<details>
<summary><b>Error values and interpretation</b> (click to show)</summary>
| Value | Error Meaning |
| -----: | :-------------- |
| `0x00` | `#NULL!` |
@ -546,6 +588,8 @@ Type `e` is the Error type. `v` holds the number and `w` holds the common name:
| `0x2A` | `#N/A` |
| `0x2B` | `#GETTING_DATA` |
</details>
Type `n` is the Number type. This includes all forms of data that Excel stores
as numbers, such as dates/times and Boolean fields. Excel exclusively uses data
that can be fit in an IEEE754 floating point number, just like JS Number, so the
@ -585,7 +629,7 @@ Each key that does not start with `!` maps to a cell (using `A-1` notation)
`sheet[address]` returns the cell object for the specified address.
Special sheet keys (accessible as `sheet[key]`, each starting with `!`):
**Special sheet keys (accessible as `sheet[key]`, each starting with `!`):**
- `sheet['!ref']`: A-1 based range representing the sheet range. Functions that
work with sheets should use this parameter to determine the range. Cells that
@ -605,6 +649,9 @@ Special sheet keys (accessible as `sheet[key]`, each starting with `!`):
follow Excel's "normal" preset. Excel also has a "wide" and a "narrow" preset
but they are stored as raw measurements. The main properties are listed below:
<details>
<summary><b>Page margin details</b> (click to show)</summary>
| key | description | "normal" | "wide" | "narrow" |
|----------|------------------------|:---------|:-------|:-------- |
| `left` | left margin (inches) | `0.7` | `1.0` | `0.25` |
@ -622,6 +669,7 @@ sheet["!margins"] = { left:1.0, right:1.0, top:1.0, bottom:1.0, header:0.5, foot
/* Set worksheet sheet to "narrow" */
sheet["!margins"] = { left:0.25, right:0.25, top:0.75, bottom:0.75, header:0.3, footer:0.3 }
```
</details>
#### Worksheet Object
@ -647,6 +695,9 @@ In addition to the base sheet keys, worksheets also add:
keys control the sheet protection -- set to `false` to enable a feature when
sheet is locked or set to `true` to disable a feature:
<details>
<summary><b>Worksheet Protection Details</b> (click to show)</summary>
| key | feature (true=disabled / false=enabled) | default |
|:----------------------|:----------------------------------------|:-----------|
| `selectLockedCells` | Select locked cells | enabled |
@ -664,6 +715,7 @@ In addition to the base sheet keys, worksheets also add:
| `pivotTables` | Use PivotTable reports | disabled |
| `objects` | Edit objects | enabled |
| `scenarios` | Edit scenarios | enabled |
</details>
- `ws['!autofilter']`: AutoFilter object following the schema:
@ -703,6 +755,9 @@ standard, XLS parsing stores core properties in both places.
The various file formats use different internal names for file properties. The
workbook `Props` object normalizes the names:
<details>
<summary><b>File Properties</b> (click to show)</summary>
| JS Name | Excel Description |
|:------------|:-------------------------------|
| Title | Summary tab "Title" |
@ -715,6 +770,7 @@ workbook `Props` object normalizes the names:
| Comments | Summary tab "Comments" |
| LastAuthor | Statistics tab "Last saved by" |
| CreatedDate | Statistics tab "Created" |
</details>
For example, to set the workbook title property:
@ -745,12 +801,16 @@ XLSX.write(wb, {Props:{Author:"SheetJS"}});
`wb.Workbook.Names` is an array of defined name objects which have the keys:
<details>
<summary><b>Defined Name Properties</b> (click to show)</summary>
| Key | Description |
|:----------|:-----------------------------------------------------------------|
| `Sheet` | Name scope. Sheet Index (0 = first sheet) or `null` (Workbook) |
| `Name` | Case-sensitive name. Standard rules apply ** |
| `Ref` | A1-style Reference (e.g. `"Sheet1!$A$1:$D$20"`) |
| `Comment` | Comment (only applicable for XLS/XLSX/XLSB) |
</details>
Excel allows two sheet-scoped defined names to share the same name. However, a
sheet-scoped name cannot collide with a workbook-scope name. Workbook writers
@ -770,7 +830,8 @@ file formats store the formulae in different ways, the formats are translated.
Even though some formats store formulae with a leading equal sign, CSF formulae
do not start with `=`.
The worksheet representation of A1=1, A2=2, A3=A1+A2:
<details>
<summary><b>Representation of A1=1, A2=2, A3=A1+A2</b> (click to show)</summary>
```js
{
@ -780,6 +841,7 @@ The worksheet representation of A1=1, A2=2, A3=A1+A2:
A3: { t:'n', v:3, f:'A1+A2' }
}
```
</details>
Shared formulae are decompressed and each cell has the formula corresponding to
its cell. Writers generally do not attempt to generate shared formulae.
@ -788,6 +850,9 @@ Cells with formula entries but no value will be serialized in a way that Excel
and other spreadsheet tools will recognize. This library will not automatically
compute formula results! For example, to compute `BESSELJ` in a worksheet:
<details>
<summary><b>Formula without known value</b> (click to show)</summary>
```js
{
"!ref": "A1:A3",
@ -796,6 +861,7 @@ compute formula results! For example, to compute `BESSELJ` in a worksheet:
A3: { t:'n', f:'BESSELJ(A1,A2)' }
}
```
</details>
**Array Formulae**
@ -803,6 +869,9 @@ Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.
<details>
<summary><b>Array Formula examples</b> (click to show)</summary>
For example, setting the cell `C1` to the array formula `{=SUM(A1:A3*B1:B3)}`:
```js
@ -818,18 +887,23 @@ worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
```
</details>
Utilities and writers are expected to check for the presence of a `F` field and
ignore any possible formula element `f` in cells other than the starting cell.
They are not expected to perform validation of the formulae!
**Formula Output**
<details>
<summary><b>Formula Output Utility Function</b> (click to show)</summary>
The `sheet_to_formulae` method generates one line per formula or array formula.
Array formulae are rendered in the form `range=formula` while plain cells are
rendered in the form `cell=formula or value`. Note that string literals are
prefixed with an apostrophe `'`, consistent with Excel's formula bar display.
</details>
**Formulae File Format Details**
<details>
<summary><b>Formulae File Format Details</b> (click to show)</summary>
| Storage Representation | Formats | Read | Write |
|:-----------------------|:-------------------------|:-----:|:-----:|
@ -842,13 +916,36 @@ Since Excel prohibits named cells from colliding with names of A1 or RC style
cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed
formulae have to be explicitly unwound. OpenFormula formulae can be converted
with regexes for the most part.
</details>
#### Column Properties
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
/* visibility */
hidden:?boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?:number; // width in screen pixels
width:number; // width in Excel's "Max Digit Width", width*256 is integral
wch?:number; // width in characters
/* other fields for preserving features from files */
MDW?:number; // Excel's "Max Digit Width" unit, always integral
};
```
Excel internally stores column widths in a nebulous "Max Digit Width" form. The
Max Digit Width is the width of the largest digit when rendered. The internal
width must be an integer multiple of the the width divided by 256. ECMA-376
describes a formula for converting between pixels and the internal width.
Max Digit Width is the width of the largest digit when rendered (generally the
"0" character is the widest). The internal width must be an integer multiple of
the the width divided by 256. ECMA-376 describes a formula for converting
between pixels and the internal width.
<details>
<summary><b>Implementation details</b> (click to show)</summary>
Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
@ -856,48 +953,43 @@ to pixels and back, repeating for all possible MDW and selecting the MDW that
minimizes the error. XLML actually stores the pixel width, so the guess works
in the opposite direction.
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
MDW?:number; // Excel's "Max Digit Width" unit, always integral
width:number; // width in Excel's "Max Digit Width", width*256 is integral
wpx?:number; // width in screen pixels
wch?:number; // intermediate character calculation
hidden:?boolean; // if true, the column is hidden
};
```
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `width` field if available
2) use `wpx` pixel width if available
3) use `wch` character count if available
</details>
#### Row Properties
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions
they may not agree, so the library separates the concepts.
The `!rows` array in each worksheet, if present, is a collection of `RowInfo`
objects which have the following properties:
```typescript
type RowInfo = {
/* visibility */
hidden:?boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?:number; // height in screen pixels
hpt?:number; // height in points
hidden:?boolean; // if true, the row is hidden
};
```
<details>
<summary><b>Implementation details</b> (click to show)</summary>
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions
they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `hpx` pixel height if available
2) use `hpt` point height if available
</details>
#### Number Formats
@ -911,6 +1003,9 @@ Custom tools should ensure that the local table has each used format string
somewhere in the table. Excel convention mandates that the custom formats start
at index 164. The following example creates a custom format from scratch:
<details>
<summary><b>New worksheet with custom format</b> (click to show)</summary>
```js
var tbl = {};
XLSX.SSF.init_table(tbl); // <-- load builtin formats
@ -928,6 +1023,7 @@ var wb = {
}
}
```
</details>
The rules are slightly different from how Excel displays custom number formats.
In particular, literal characters must be wrapped in double quotes or preceded
@ -935,6 +1031,9 @@ by a backslash. For more info, see the Excel documentation article
[`Create or delete a custom number format`](https://support.office.com/en-us/article/78f2a361-936b-4c03-8772-09fab54be7f4)
or ECMA-376 18.8.31 (Number Formats)
<details>
<summary><b>Default Number Formats</b> (click to show)</summary>
The default formats are listed in ECMA-376 18.8.30:
| ID | Format |
@ -968,6 +1067,8 @@ The default formats are listed in ECMA-376 18.8.30:
| 48 | `##0.0E+0` |
| 49 | `@` |
</details>
Format 14 (`m/d/yy`) is localized by Excel: even though the file specifies that
number format, it will be drawn differently based on system settings. It makes
sense when the producer and consumer of files are in the same locale, but that
@ -1015,8 +1116,10 @@ the file but the UI does not readily make it available. Standard hidden sheets
are revealed in the unhide menu. Excel also has "very hidden" sheets which
cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the `Hidden` property of the sheet props
array. The values are:
The visibility setting is stored in the `Hidden` property of sheet props array.
<details>
<summary><b>More details</b> (click to show)</summary>
| Value | Definition |
|:-----:|:------------|
@ -1038,6 +1141,7 @@ if a sheet is visible is to check if the `Hidden` property is logical truth:
> wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
[ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
```
</details>
## Parsing Options
@ -1097,6 +1201,9 @@ tells the library how to parse the data argument:
### Guessing File Type
<details>
<summary><b>Implementation Details</b> (click to show)</summary>
Excel and other spreadsheet tools read the first few bytes and apply other
heuristics to determine a file type. This enables file type punning: renaming
files with the `.xls` extension will tell your computer to use Excel to open the
@ -1126,6 +1233,7 @@ Plaintext format guessing follows the priority order:
| TSV | one of the first 1024 characters is a tab char `"\t"` |
| CSV | one of the first 1024 characters is a comma char `","` |
| PRN | (default) |
</details>
## Writing Options
@ -1224,6 +1332,9 @@ other values are stored as strings. The function takes an options argument:
| cellDates | false | Store dates as type `d` (default is `n`) |
| sheetStubs | false | Create cell objects of type `z` for `null` values |
<details>
<summary><b>Examples</b> (click to show)</summary>
To generate the example sheet:
```js
@ -1233,6 +1344,7 @@ var ws = XLSX.utils.aoa_to_sheet([
[2,3,4,5,6,7,8]
]);
```
</details>
### HTML Table Input
@ -1242,6 +1354,9 @@ as strings.
`XLSX.utils.table_to_book` produces a minimal workbook based on the worksheet.
<details>
<summary><b>Examples</b> (click to show)</summary>
To generate the example sheet, start with the HTML table:
```html
@ -1258,6 +1373,7 @@ To process the table:
var tbl = document.getElementById('sheetjs');
var wb = XLSX.utils.table_to_book(tbl);
```
</details>
Note: `XLSX.read` can handle HTML represented as strings.
@ -1266,13 +1382,19 @@ Note: `XLSX.read` can handle HTML represented as strings.
`XLSX.utils.sheet_to_formulae` generates an array of commands that represent
how a person would enter data into an application. Each entry is of the form
`A1-cell-address=formula-or-value`. String literals are prefixed with a `'` in
accordance with Excel. For the example sheet:
accordance with Excel.
<details>
<summary><b>Examples</b> (click to show)</summary>
For the example sheet:
```js
> var o = XLSX.utils.sheet_to_formulae(ws);
> o.filter(function(v, i) { return i % 5 === 0; });
[ 'A1=\'S', 'F1=\'J', 'D2=4', 'B3=3', 'G3=8' ]
```
</details>
### Delimiter-Separated Output
@ -1290,6 +1412,9 @@ produces CSV output. The function takes an options argument:
- `strip` will remove trailing commas from each line under default `FS/RS`
- blankrows must be set to `false` to skip blank lines.
<details>
<summary><b>Examples</b> (click to show)</summary>
For the example sheet:
```js
@ -1304,6 +1429,7 @@ S h e e t J S
> console.log(XLSX.utils.sheet_to_csv(ws,{FS:":",RS:"|"}));
S:h:e:e:t:J:S|1:2:3:4:5:6:7|2:3:4:5:6:7:8|
```
</details>
#### UTF-16 Unicode Text
@ -1360,6 +1486,9 @@ generate different types of JS objects. The function takes an options argument:
If header is not `1`, the row object will contain the non-enumerable property
`__rowNum__` that represents the row of the sheet corresponding to the entry.
<details>
<summary><b>Examples</b> (click to show)</summary>
For the example sheet:
```js
@ -1395,6 +1524,7 @@ Example showing the effect of `raw`:
[ 1, 2, 3, 4, 5, 6, 7 ], // <-- A2 uses the raw value
[ 2, 3, 4, 5, 6, 7, 8 ] ]
```
</details>
## File Formats
@ -1429,6 +1559,9 @@ Despite the library name `xlsx`, it supports numerous spreadsheet file formats:
### Excel 2007+ XML (XLSX/XLSM)
<details>
<summary>(click to show)</summary>
XLSX and XLSM files are ZIP containers containing a series of XML files in
accordance with the Open Packaging Conventions (OPC). The XLSM filetype, almost
identical to XLSX, is used for files containing macros.
@ -1437,8 +1570,13 @@ The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does
not follow the specification, and there are additional documents discussing how
Excel deviates from the specification.
</details>
### Excel 2.0-95 (BIFF2/BIFF3/BIFF4/BIFF5)
<details>
<summary>(click to show)</summary>
BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced
the concept of a workbook (`XLW` files) but also had single-sheet `XLS` format.
The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12
@ -1449,8 +1587,13 @@ files in these formats, so record lengths and fields were backsolved by writing
in all of the supported formats and comparing files. Excel 2016 can generate
BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
</details>
### Excel 97-2004 Binary (BIFF8)
<details>
<summary>(click to show)</summary>
BIFF8 exclusively uses the Compound File Binary container format, splitting some
content into streams within the file. At its core, it still uses an extended
version of the binary record format from older versions of BIFF.
@ -1458,15 +1601,25 @@ version of the binary record format from older versions of BIFF.
The `MS-XLS` specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
</details>
### Excel 2003-2004 (SpreadsheetML)
<details>
<summary>(click to show)</summary>
Predating XLSX, SpreadsheetML files are simple XML files. There is no official
and comprehensive specification, although MS has released whitepapers on the
format. Since Excel 2016 can generate SpreadsheetML files, backsolving is
pretty straightforward.
</details>
### Excel 2007+ Binary (XLSB, BIFF12)
<details>
<summary>(click to show)</summary>
Introduced in parallel with XLSX, the XLSB filetype combines BIFF architecture
with the content separation and ZIP container of XLSX. For the most part nodes
in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
@ -1474,8 +1627,13 @@ in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The `MS-XLSB` specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
</details>
### Delimiter-Separated Values (CSV/TXT)
<details>
<summary>(click to show)</summary>
Excel CSV deviates from RFC4180 in a number of important ways. The generated
CSV files should generally work in Excel although they may not work in RFC4180
compatible readers. The parser should generally understand Excel CSV. The
@ -1483,44 +1641,74 @@ writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and codepage 1200.
</details>
### Other Workbook Formats
<details>
<summary>(click to show)</summary>
Support for other formats is generally far XLS/XLSB/XLSX support, due in large
part to a lack of publicly available documentation. Test files were produced in
the respective apps and compared to their XLS exports to determine structure.
The main focus is data extraction.
</details>
#### Lotus 1-2-3 (WKS/WK1/WK2/WK3/WK4/123)
<details>
<summary>(click to show)</summary>
The Lotus formats consist of binary records similar to the BIFF structure. Lotus
did release a whitepaper decades ago covering the original WK1 format. Other
features were deduced by producing files and comparing to Excel support.
</details>
#### Quattro Pro (WQ1/WQ2/WB1/WB2/WB3/QPW)
<details>
<summary>(click to show)</summary>
The Quattro Pro formats use binary records in the same way as BIFF and Lotus.
Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like
BIFF8 XLS.
</details>
#### OpenDocument Spreadsheet (ODS/FODS)
<details>
<summary>(click to show)</summary>
ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to
SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO
add undocumented extensions. The parsers and writers do not implement the full
standard, instead focusing on parts necessary to extract and store raw data.
</details>
#### Uniform Office Spreadsheet (UOS1/2)
<details>
<summary>(click to show)</summary>
UOS is a very similar format, and it comes in 2 varieties corresponding to ODS
and FODS respectively. For the most part, the difference between the formats
lies in the names of tags and attributes.
</details>
### Other Single-Worksheet Formats
Many older formats supported only one worksheet:
#### dBASE and Visual FoxPro (DBF)
<details>
<summary>(click to show)</summary>
DBF is really a typed table format: each column can only hold one data type and
each record omits type information. The parser generates a header row and
inserts records starting at the second row of the worksheet.
@ -1528,22 +1716,37 @@ inserts records starting at the second row of the worksheet.
Multi-file extensions like external memos and tables are currently unsupported,
limited by the general ability to read arbitrary files in the web browser.
</details>
#### Symbolic Link (SYLK)
<details>
<summary>(click to show)</summary>
There is no real documentation. All knowledge was gathered by saving files in
various versions of Excel to deduce the meaning of fields. Notes:
- Plain formulae are stored in the RC form.
- Column widths are rounded to integral characters.
</details>
#### Lotus Formatted Text (PRN)
<details>
<summary>(click to show)</summary>
There is no real documentation, and in fact Excel treats PRN as an output-only
file format. Nevertheless we can guess the column widths and reverse-engineer
the original layout. Excel's 240-character width limitation is not enforced.
</details>
#### Data Interchange Format (DIF)
<details>
<summary>(click to show)</summary>
There is no unified definition. Visicalc DIF differs from Lotus DIF, and both
differ from Excel DIF. Where ambiguous, the parser/writer follows the expected
behavior from Excel. In particular, Excel extends DIF in incompatible ways:
@ -1555,16 +1758,26 @@ behavior from Excel. In particular, Excel extends DIF in incompatible ways:
- DIF technically has no support for formulae, but Excel will automatically
convert plain formulae. Array formulae are not preserved.
</details>
#### HTML
<details>
<summary>(click to show)</summary>
Excel HTML worksheets include special metadata encoded in styles. For example,
`mso-number-format` is a localized string containing the number format. Despite
the metadata the output is valid HTML, although it does accept bare `&` symbols.
</details>
## Testing
### Node
<details>
<summary>(click to show)</summary>
`make test` will run the node-based tests. By default it runs tests on files in
every supported format. To test a specific file type, set `FMTS` to the format
you want to test. Feature-specific tests are avaialble with `make test_misc`
@ -1586,15 +1799,20 @@ $ make test # run full tests
$ WTF=1 make test # enable all error messages
```
Flow and JSHint/JSCS checks are available:
Flow and eslint checks are available:
```bash
$ make lint # JSHint and JSCS checks
$ make lint # eslint checks
$ make flow # make lint + Flow checking
```
</details>
### Browser
<details>
<summary>(click to show)</summary>
The core in-browser tests are available at `tests/index.html` within this repo.
Start a local server and navigate to that directory to run the tests.
`make ctestserv` will start a server on port 8000.
@ -1612,6 +1830,8 @@ $ cd ../SheetJS.github.io
$ simplehttpserver # or "python -mSimpleHTTPServer" or "serve"
$ open -a Chromium.app http://localhost:8000/stress.html
```
</details>
### Tested Environments
- NodeJS 0.8, 0.9, 0.10, 0.11, 0.12, 4.x, 5.x, 6.x, 7.x
@ -1642,6 +1862,9 @@ important to ensure code is cleanroom. Consult CONTRIBUTING.md
### Tests
<details>
<summary>(click to show)</summary>
The `test_misc` target (`make test_misc` on Linux/OSX / `make misc` on Windows)
runs the targeted feature tests. It should take 5-10 seconds to perform feature
tests without testing against the entire test battery. New features should be
@ -1658,9 +1881,13 @@ and verifying that the feature is preserved.
For tests involving a new write feature without an existing read ability, please
add a feature test to the kitchen sink `tests/write.js`.
</details>
### OSX/Linux
<details>
<summary>(click to show)</summary>
The xlsx.js file is constructed from the files in the `bits` subdirectory. The
build script (run `make`) will concatenate the individual bits to produce the
script. Before submitting a contribution, ensure that running make will produce
@ -1675,9 +1902,13 @@ $ git diff xlsx.js
To produce the dist files, run `make dist`. The dist files are updated in each
version release and *should not be committed between versions*.
</details>
### Windows
<details>
<summary>(click to show)</summary>
The included `make.cmd` script will build `xlsx.js` from the `bits` directory.
Building is as simple as:
@ -1710,6 +1941,7 @@ Latest test files snapshot:
<https://github.com/SheetJS/test_files/releases/download/20170409/test_files.zip>
Download and unzip to the `test_files` subdirectory.
</details>
## License
@ -1719,9 +1951,8 @@ granted by the Apache 2.0 License are reserved by the Original Author.
## References
ISO/IEC 29500:2012(E) "Information technology — Document description and processing languages — Office Open XML File Formats"
OSP-covered specifications:
<details>
<summary><b>OSP-covered Specifications</b> (click to show)</summary>
- [MS-XLSB]: Excel (.xlsb) Binary File Format
- [MS-XLSX]: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File Format
@ -1742,9 +1973,11 @@ OSP-covered specifications:
- [XLS]: Microsoft Office Excel 97-2007 Binary File Format Specification
- [MS-OI29500]: Office Implementation Information for ISO/IEC 29500 Standards Support
Open Document Format for Office Applications Version 1.2 (29 September 2011)
</details>
Worksheet File Format (From Lotus) December 1984
- ISO/IEC 29500:2012(E) "Information technology — Document description and processing languages — Office Open XML File Formats"
- Open Document Format for Office Applications Version 1.2 (29 September 2011)
- Worksheet File Format (From Lotus) December 1984
## Badges

@ -775,6 +775,7 @@ function stringify_formula(formula/*Array<any>*/, range, cell/*:any*/, supbooks,
case 'PtgRef3d': // TODO: lots of stuff
type = f[1][0]; ixti = /*::Number(*/f[1][1]/*::)*/; c = shift_cell_xls(f[1][2], _range, opts);
sname = supbooks.SheetNames[ixti];
var w = sname; /* IE9 fails on defined names */
stack.push(sname + "!" + encode_cell_xls(c));
break;

@ -22,6 +22,10 @@ enhancements and additional features by request.
![circo graph of format support](formats.png)
<details>
<summary>Graph Legend</summary>
![graph legend](legend.png)
</details>

@ -1,4 +1,8 @@
## Table of Contents
<!-- toc -->
<details>
<summary>Expand to show Table of Contents</summary>
<!-- toc -->
</details>

@ -3,7 +3,8 @@
For parsing, the first step is to read the file. This involves acquiring the
data and feeding it into the library. Here are a few common scenarios:
- node readFile:
<details>
<summary><b>nodejs read a file</b> (click to show)</summary>
```js
if(typeof require !== 'undefined') XLSX = require('xlsx');
@ -11,15 +12,23 @@ var workbook = XLSX.readFile('test.xlsx');
/* DO SOMETHING WITH workbook HERE */
```
- Browser DOM Table element:
</details>
<details>
<summary><b>Browser read TABLE element from page</b> (click to show)</summary>
```js
var worksheet = XLSX.utils.table_to_book(document.getElementById('tableau'));
/* DO SOMETHING WITH workbook HERE */
```
- ajax (for a more complete example that works in older browsers, check the demo
at <http://oss.sheetjs.com/js-xlsx/ajax.html>):
</details>
<details>
<summary><b>Browser download file (ajax)</b> (click to show)</summary>
Note: for a more complete example that works in older browsers, check the demo
at <http://oss.sheetjs.com/js-xlsx/ajax.html>):
```js
/* set up XMLHttpRequest */
@ -46,9 +55,14 @@ oReq.onload = function(e) {
oReq.send();
```
- HTML5 drag-and-drop using readAsBinaryString or readAsArrayBuffer:
note: readAsBinaryString and readAsArrayBuffer may not be available in every
browser. Use dynamic feature tests to determine which method to use.
</details>
<details>
<summary><b>Browser drag-and-drop</b> (click to show)</summary>
Drag-and-drop uses FileReader with readAsBinaryString or readAsArrayBuffer.
Note: readAsBinaryString and readAsArrayBuffer may not be available in every
browser. Use dynamic feature tests to determine which method to use.
```js
/* processing array buffers, only required for readAsArrayBuffer */
@ -92,7 +106,10 @@ function handleDrop(e) {
drop_dom_element.addEventListener('drop', handleDrop, false);
```
- HTML5 input file element using readAsBinaryString or readAsArrayBuffer:
</details>
<details>
<summary><b>Browser file upload form element</b> (click to show)</summary>
```js
/* fixdata and rABS are defined in the drag and drop example */
@ -124,7 +141,10 @@ function handleFile(e) {
input_dom_element.addEventListener('change', handleFile, false);
```
**Complete examples:**
</details>
### Complete Examples
- <http://oss.sheetjs.com/js-xlsx/> HTML5 File API / Base64 Text / Web Workers

@ -18,7 +18,7 @@ var desired_cell = worksheet[address_of_cell];
var desired_value = (desired_cell ? desired_cell.v : undefined);
```
**Complete examples:**
### Complete Examples
- <https://github.com/SheetJS/js-xlsx/blob/master/bin/xlsx.njs> node

@ -5,7 +5,8 @@ For writing, the first step is to generate output data. The helper functions
dissemination. The second step is to actual share the data with the end point.
Assuming `workbook` is a workbook object:
- nodejs write to file:
<details>
<summary><b>nodejs write a file</b> (click to show)</summary>
```js
/* output format determined by filename */
@ -13,11 +14,16 @@ XLSX.writeFile(workbook, 'out.xlsx');
/* at this point, out.xlsx is a file that you can distribute */
```
- browser generate binary blob and "download" to client
(using [FileSaver.js](https://github.com/eligrey/FileSaver.js/) for download):
</details>
<details>
<summary><b>Browser download file</b> (click to show)</summary>
Note: browser generates binary blob and forces a "download" to client. This
example uses [FileSaver.js](https://github.com/eligrey/FileSaver.js/):
```js
/* bookType can be 'xlsx' or 'xlsm' or 'xlsb' or 'ods' */
/* bookType can be any supported output type */
var wopts = { bookType:'xlsx', bookSST:false, type:'binary' };
var wbout = XLSX.write(workbook,wopts);
@ -32,10 +38,10 @@ function s2ab(s) {
/* the saveAs call downloads a file on the local machine */
saveAs(new Blob([s2ab(wbout)],{type:"application/octet-stream"}), "test.xlsx");
```
</details>
**Complete examples:**
### Complete Examples
- <http://sheetjs.com/demos/writexlsx.html> generates a simple file
- <http://git.io/WEK88Q> writing an array of arrays in nodejs
- <http://sheetjs.com/demos/table.html> exporting an HTML table
- <http://sheetjs.com/demos/writexlsx.html> generates a simple file

@ -8,3 +8,4 @@ stream. They are only exposed in node.
- `XLSX.stream.to_html` is the streaming version of the HTML output type.
<https://github.com/sheetjs/sheetaki> pipes write streams to nodejs response.

@ -6,6 +6,9 @@ Type `b` is the Boolean type. `v` is interpreted according to JS truth tables.
Type `e` is the Error type. `v` holds the number and `w` holds the common name:
<details>
<summary><b>Error values and interpretation</b> (click to show)</summary>
| Value | Error Meaning |
| -----: | :-------------- |
| `0x00` | `#NULL!` |
@ -17,6 +20,8 @@ Type `e` is the Error type. `v` holds the number and `w` holds the common name:
| `0x2A` | `#N/A` |
| `0x2B` | `#GETTING_DATA` |
</details>
Type `n` is the Number type. This includes all forms of data that Excel stores
as numbers, such as dates/times and Boolean fields. Excel exclusively uses data
that can be fit in an IEEE754 floating point number, just like JS Number, so the

@ -4,7 +4,7 @@ Each key that does not start with `!` maps to a cell (using `A-1` notation)
`sheet[address]` returns the cell object for the specified address.
Special sheet keys (accessible as `sheet[key]`, each starting with `!`):
**Special sheet keys (accessible as `sheet[key]`, each starting with `!`):**
- `sheet['!ref']`: A-1 based range representing the sheet range. Functions that
work with sheets should use this parameter to determine the range. Cells that
@ -24,6 +24,9 @@ Special sheet keys (accessible as `sheet[key]`, each starting with `!`):
follow Excel's "normal" preset. Excel also has a "wide" and a "narrow" preset
but they are stored as raw measurements. The main properties are listed below:
<details>
<summary><b>Page margin details</b> (click to show)</summary>
| key | description | "normal" | "wide" | "narrow" |
|----------|------------------------|:---------|:-------|:-------- |
| `left` | left margin (inches) | `0.7` | `1.0` | `0.25` |
@ -41,4 +44,5 @@ sheet["!margins"] = { left:1.0, right:1.0, top:1.0, bottom:1.0, header:0.5, foot
/* Set worksheet sheet to "narrow" */
sheet["!margins"] = { left:0.25, right:0.25, top:0.75, bottom:0.75, header:0.3, footer:0.3 }
```
</details>

@ -22,6 +22,9 @@ In addition to the base sheet keys, worksheets also add:
keys control the sheet protection -- set to `false` to enable a feature when
sheet is locked or set to `true` to disable a feature:
<details>
<summary><b>Worksheet Protection Details</b> (click to show)</summary>
| key | feature (true=disabled / false=enabled) | default |
|:----------------------|:----------------------------------------|:-----------|
| `selectLockedCells` | Select locked cells | enabled |
@ -39,6 +42,7 @@ In addition to the base sheet keys, worksheets also add:
| `pivotTables` | Use PivotTable reports | disabled |
| `objects` | Edit objects | enabled |
| `scenarios` | Edit scenarios | enabled |
</details>
- `ws['!autofilter']`: AutoFilter object following the schema:

@ -3,6 +3,9 @@
The various file formats use different internal names for file properties. The
workbook `Props` object normalizes the names:
<details>
<summary><b>File Properties</b> (click to show)</summary>
| JS Name | Excel Description |
|:------------|:-------------------------------|
| Title | Summary tab "Title" |
@ -15,6 +18,7 @@ workbook `Props` object normalizes the names:
| Comments | Summary tab "Comments" |
| LastAuthor | Statistics tab "Last saved by" |
| CreatedDate | Statistics tab "Created" |
</details>
For example, to set the workbook title property:

@ -6,12 +6,16 @@
`wb.Workbook.Names` is an array of defined name objects which have the keys:
<details>
<summary><b>Defined Name Properties</b> (click to show)</summary>
| Key | Description |
|:----------|:-----------------------------------------------------------------|
| `Sheet` | Name scope. Sheet Index (0 = first sheet) or `null` (Workbook) |
| `Name` | Case-sensitive name. Standard rules apply ** |
| `Ref` | A1-style Reference (e.g. `"Sheet1!$A$1:$D$20"`) |
| `Comment` | Comment (only applicable for XLS/XLSX/XLSB) |
</details>
Excel allows two sheet-scoped defined names to share the same name. However, a
sheet-scoped name cannot collide with a workbook-scope name. Workbook writers

@ -5,7 +5,8 @@ file formats store the formulae in different ways, the formats are translated.
Even though some formats store formulae with a leading equal sign, CSF formulae
do not start with `=`.
The worksheet representation of A1=1, A2=2, A3=A1+A2:
<details>
<summary><b>Representation of A1=1, A2=2, A3=A1+A2</b> (click to show)</summary>
```js
{
@ -15,6 +16,7 @@ The worksheet representation of A1=1, A2=2, A3=A1+A2:
A3: { t:'n', v:3, f:'A1+A2' }
}
```
</details>
Shared formulae are decompressed and each cell has the formula corresponding to
its cell. Writers generally do not attempt to generate shared formulae.
@ -23,6 +25,9 @@ Cells with formula entries but no value will be serialized in a way that Excel
and other spreadsheet tools will recognize. This library will not automatically
compute formula results! For example, to compute `BESSELJ` in a worksheet:
<details>
<summary><b>Formula without known value</b> (click to show)</summary>
```js
{
"!ref": "A1:A3",
@ -31,6 +36,7 @@ compute formula results! For example, to compute `BESSELJ` in a worksheet:
A3: { t:'n', f:'BESSELJ(A1,A2)' }
}
```
</details>
**Array Formulae**
@ -38,6 +44,9 @@ Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.
<details>
<summary><b>Array Formula examples</b> (click to show)</summary>
For example, setting the cell `C1` to the array formula `{=SUM(A1:A3*B1:B3)}`:
```js
@ -53,18 +62,23 @@ worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
```
</details>
Utilities and writers are expected to check for the presence of a `F` field and
ignore any possible formula element `f` in cells other than the starting cell.
They are not expected to perform validation of the formulae!
**Formula Output**
<details>
<summary><b>Formula Output Utility Function</b> (click to show)</summary>
The `sheet_to_formulae` method generates one line per formula or array formula.
Array formulae are rendered in the form `range=formula` while plain cells are
rendered in the form `cell=formula or value`. Note that string literals are
prefixed with an apostrophe `'`, consistent with Excel's formula bar display.
</details>
**Formulae File Format Details**
<details>
<summary><b>Formulae File Format Details</b> (click to show)</summary>
| Storage Representation | Formats | Read | Write |
|:-----------------------|:-------------------------|:-----:|:-----:|
@ -77,4 +91,5 @@ Since Excel prohibits named cells from colliding with names of A1 or RC style
cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed
formulae have to be explicitly unwound. OpenFormula formulae can be converted
with regexes for the most part.
</details>

@ -1,9 +1,31 @@
#### Column Properties
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
/* visibility */
hidden:?boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?:number; // width in screen pixels
width:number; // width in Excel's "Max Digit Width", width*256 is integral
wch?:number; // width in characters
/* other fields for preserving features from files */
MDW?:number; // Excel's "Max Digit Width" unit, always integral
};
```
Excel internally stores column widths in a nebulous "Max Digit Width" form. The
Max Digit Width is the width of the largest digit when rendered. The internal
width must be an integer multiple of the the width divided by 256. ECMA-376
describes a formula for converting between pixels and the internal width.
Max Digit Width is the width of the largest digit when rendered (generally the
"0" character is the widest). The internal width must be an integer multiple of
the the width divided by 256. ECMA-376 describes a formula for converting
between pixels and the internal width.
<details>
<summary><b>Implementation details</b> (click to show)</summary>
Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
@ -11,46 +33,41 @@ to pixels and back, repeating for all possible MDW and selecting the MDW that
minimizes the error. XLML actually stores the pixel width, so the guess works
in the opposite direction.
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
MDW?:number; // Excel's "Max Digit Width" unit, always integral
width:number; // width in Excel's "Max Digit Width", width*256 is integral
wpx?:number; // width in screen pixels
wch?:number; // intermediate character calculation
hidden:?boolean; // if true, the column is hidden
};
```
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `width` field if available
2) use `wpx` pixel width if available
3) use `wch` character count if available
</details>
#### Row Properties
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions
they may not agree, so the library separates the concepts.
The `!rows` array in each worksheet, if present, is a collection of `RowInfo`
objects which have the following properties:
```typescript
type RowInfo = {
/* visibility */
hidden:?boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?:number; // height in screen pixels
hpt?:number; // height in points
hidden:?boolean; // if true, the row is hidden
};
```
<details>
<summary><b>Implementation details</b> (click to show)</summary>
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions