docs.sheetjs.com/docz/docs/07-csf/07-features.md

702 lines
20 KiB
Markdown
Raw Normal View History

2022-05-16 03:26:04 +00:00
---
sidebar_position: 7
---
# Spreadsheet Features
Even for basic features like date storage, the official Excel formats store the
same content in different ways. The parsers are expected to convert from the
underlying file format representation to the Common Spreadsheet Format. Writers
are expected to convert from CSF back to the underlying file format.
## Formulae
The A1-style formula string is stored in the `f` field. Even though different
file formats store the formulae in different ways, the formats are translated.
Even though some formats store formulae with a leading equal sign, CSF formulae
do not start with `=`.
<details>
<summary><b>Formulae File Format Support</b> (click to show)</summary>
| Storage Representation | Formats | Read | Write |
|:-----------------------|:-------------------------|:-----:|:-----:|
| A1-style strings | XLSX | ✔ | ✔ |
| RC-style strings | XLML and plain text | ✔ | ✔ |
| BIFF Parsed formulae | XLSB and all XLS formats | ✔ | |
| OpenFormula formulae | ODS/FODS/UOS | ✔ | ✔ |
| Lotus Parsed formulae | All Lotus WK_ formats | ✔ | |
Since Excel prohibits named cells from colliding with names of A1 or RC style
cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed
formulae and Lotus Parsed formulae have to be explicitly unwound. OpenFormula
formulae can be converted with regular expressions.
Shared formulae are decompressed and each cell has the formula corresponding to
its cell. Writers generally do not attempt to generate shared formulae.
</details>
### Single-Cell Formulae
For simple formulae, the `f` key of the desired cell can be set to the actual
formula text. This worksheet represents `A1=1`, `A2=2`, and `A3=A1+A2`:
```js
var worksheet = {
"!ref": "A1:A3",
A1: { t:'n', v:1 },
A2: { t:'n', v:2 },
A3: { t:'n', v:3, f:'A1+A2' }
};
```
Utilities like `aoa_to_sheet` will accept cell objects in lieu of values:
```js
var worksheet = XLSX.utils.aoa_to_sheet([
[ 1 ], // A1
[ 2 ], // A2
[ {t: "n", v: 3, f: "A1+A2"} ] // A3
]);
```
Cells with formula entries but no value will be serialized in a way that Excel
and other spreadsheet tools will recognize. This library will not automatically
compute formula results! For example, the following worksheet will include the
`BESSELJ` function but the result will not be available in JavaScript:
```js
var worksheet = XLSX.utils.aoa_to_sheet([
[ 3.14159, 2 ], // Row "1"
[ { t:'n', f:'BESSELJ(A1,B1)' } ] // Row "2" will be calculated on file open
}
```
If the actual results are needed in JS, [SheetJS Pro](https://sheetjs.com/pro)
offers a formula calculator component for evaluating expressions, updating
values and dependent cells, and refreshing entire workbooks.
### Array Formulae
_Assign an array formula_
```js
XLSX.utils.sheet_set_array_formula(worksheet, range, formula);
```
Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.
For example, setting the cell `C1` to the array formula `{=SUM(A1:A3*B1:B3)}`:
```js
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "SUM(A1:A3*B1:B3)");
// ... OR raw operations
worksheet['C1'] = { t:'n', f: "SUM(A1:A3*B1:B3)", F:"C1:C1" };
```
For a multi-cell array formula, every cell has the same array range but only the
first cell specifies the formula. Consider `D1:D3=A1:A3*B1:B3`:
```js
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "D1:D3", "A1:A3*B1:B3");
// ... OR raw operations
worksheet['D1'] = { t:'n', F:"D1:D3", f:"A1:A3*B1:B3" };
worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
```
Utilities and writers are expected to check for the presence of a `F` field and
ignore any possible formula element `f` in cells other than the starting cell.
They are not expected to perform validation of the formulae!
### Dynamic Arrays
_Assign a dynamic array formula_
```js
XLSX.utils.sheet_set_array_formula(worksheet, range, formula, true);
```
Released in 2020, Dynamic Array Formulae are supported in the XLSX/XLSM and XLSB
file formats. They are represented like normal array formulae but have special
cell metadata indicating that the formula should be allowed to adjust the range.
An array formula can be marked as dynamic by setting the cell's `D` property to
true. The `F` range is expected but can be the set to the current cell:
```js
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "_xlfn.UNIQUE(A1:A3)", 1);
// ... OR raw operations
worksheet['C1'] = { t: "s", f: "_xlfn.UNIQUE(A1:A3)", F:"C1", D: 1 }; // dynamic
```
### Localization
SheetJS operates at the file level. Excel stores formula expressions using the
English (United States) function names. For non-English users, Excel uses a
localized set of function names.
For example, when the computer language and region is set to French (France),
Excel interprets `=SOMME(A1:C3)` as if `SOMME` is the `SUM` function. However,
in the actual file, Excel stores `SUM(A1:C3)`.
**Prefixed "Future Functions"**
Functions introduced in newer versions of Excel are prefixed with `_xlfn.` when
stored in files. When writing formula expressions using these functions, the
prefix is required for maximal compatibility:
```js
// Broadest compatibility
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "_xlfn.UNIQUE(A1:A3)", 1);
// Can cause errors in spreadsheet software
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "UNIQUE(A1:A3)", 1);
```
When reading a file, the `xlfn` option preserves the prefixes.
<details>
<summary><b> Functions requiring `_xlfn.` prefix</b> (click to show)</summary>
This list is growing with each Excel release.
```
ACOT
ACOTH
AGGREGATE
ARABIC
BASE
BETA.DIST
BETA.INV
BINOM.DIST
BINOM.DIST.RANGE
BINOM.INV
BITAND
BITLSHIFT
BITOR
BITRSHIFT
BITXOR
BYCOL
BYROW
CEILING.MATH
CEILING.PRECISE
CHISQ.DIST
CHISQ.DIST.RT
CHISQ.INV
CHISQ.INV.RT
CHISQ.TEST
COMBINA
CONFIDENCE.NORM
CONFIDENCE.T
COT
COTH
COVARIANCE.P
COVARIANCE.S
CSC
CSCH
DAYS
DECIMAL
ERF.PRECISE
ERFC.PRECISE
EXPON.DIST
F.DIST
F.DIST.RT
F.INV
F.INV.RT
F.TEST
FIELDVALUE
FILTERXML
FLOOR.MATH
FLOOR.PRECISE
FORMULATEXT
GAMMA
GAMMA.DIST
GAMMA.INV
GAMMALN.PRECISE
GAUSS
HYPGEOM.DIST
IFNA
IMCOSH
IMCOT
IMCSC
IMCSCH
IMSEC
IMSECH
IMSINH
IMTAN
ISFORMULA
ISOMITTED
ISOWEEKNUM
LAMBDA
LET
LOGNORM.DIST
LOGNORM.INV
MAKEARRAY
MAP
MODE.MULT
MODE.SNGL
MUNIT
NEGBINOM.DIST
NORM.DIST
NORM.INV
NORM.S.DIST
NORM.S.INV
NUMBERVALUE
PDURATION
PERCENTILE.EXC
PERCENTILE.INC
PERCENTRANK.EXC
PERCENTRANK.INC
PERMUTATIONA
PHI
POISSON.DIST
QUARTILE.EXC
QUARTILE.INC
QUERYSTRING
RANDARRAY
RANK.AVG
RANK.EQ
REDUCE
RRI
SCAN
SEC
SECH
SEQUENCE
SHEET
SHEETS
SKEW.P
SORTBY
STDEV.P
STDEV.S
T.DIST
T.DIST.2T
T.DIST.RT
T.INV
T.INV.2T
T.TEST
UNICHAR
UNICODE
UNIQUE
VAR.P
VAR.S
WEBSERVICE
WEIBULL.DIST
XLOOKUP
XOR
Z.TEST
```
</details>
## Row and Column Properties
<details>
<summary><b>Format Support</b> (click to show)</summary>
**Row Properties**: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
**Column Properties**: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
</details>
Row and Column properties are not extracted by default when reading from a file
and are not persisted by default when writing to a file. The option
`cellStyles: true` must be passed to the relevant read or write function.
_Column Properties_
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
/* visibility */
hidden?: boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?: number; // width in screen pixels
width?: number; // width in Excel's "Max Digit Width", width*256 is integral
wch?: number; // width in characters
/* other fields for preserving features from files */
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};
```
_Row Properties_
The `!rows` array in each worksheet, if present, is a collection of `RowInfo`
objects which have the following properties:
```typescript
type RowInfo = {
/* visibility */
hidden?: boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?: number; // height in screen pixels
hpt?: number; // height in points
level?: number; // 0-indexed outline / group level
};
```
_Outline / Group Levels Convention_
The Excel UI displays the base outline level as `1` and the max level as `8`.
Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base
outline level is `0` and the max level is `7`.
<details>
<summary><b>Why are there three width types?</b> (click to show)</summary>
There are three different width types corresponding to the three different ways
spreadsheets store column widths:
SYLK and other plain text formats use raw character count. Contemporaneous tools
like Visicalc and Multiplan were character based. Since the characters had the
same width, it sufficed to store a count. This tradition was continued into the
BIFF formats.
SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel
count throughout the file. Column widths, row heights, and other measures use
pixels. When the pixel and character counts do not align, Excel rounds values.
XLSX internally stores column widths in a nebulous "Max Digit Width" form. The
Max Digit Width is the width of the largest digit when rendered (generally the
"0" character is the widest). The internal width must be an integer multiple of
the the width divided by 256. ECMA-376 describes a formula for converting
between pixels and the internal width. This represents a hybrid approach.
Read functions attempt to populate all three properties. Write functions will
try to cycle specified values to the desired type. In order to avoid potential
conflicts, manipulation should delete the other properties first. For example,
when changing the pixel width, delete the `wch` and `width` properties.
</details>
<details>
<summary><b>Implementation details</b> (click to show)</summary>
_Row Heights_
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions
they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `hpx` pixel height if available
2) use `hpt` point height if available
_Column Widths_
Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
to pixels and back, repeating for all possible MDW and selecting the MDW that
minimizes the error. XLML actually stores the pixel width, so the guess works
in the opposite direction.
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `width` field if available
2) use `wpx` pixel width if available
3) use `wch` character count if available
</details>
## Number Formats
The `cell.w` formatted text for each cell is produced from `cell.v` and `cell.z`
format. If the format is not specified, the Excel `General` format is used.
The format can either be specified as a string or as an index into the format
table. Parsers are expected to populate `workbook.SSF` with the number format
table. Writers are expected to serialize the table.
Custom tools should ensure that the local table has each used format string
somewhere in the table. Excel convention mandates that the custom formats start
at index 164. The following example creates a custom format from scratch:
<details>
<summary><b>New worksheet with custom format</b> (click to show)</summary>
```js
var wb = {
SheetNames: ["Sheet1"],
Sheets: {
Sheet1: {
"!ref":"A1:C1",
A1: { t:"n", v:10000 }, // <-- General format
B1: { t:"n", v:10000, z: "0%" }, // <-- Builtin format
C1: { t:"n", v:10000, z: "\"T\"\ #0.00" } // <-- Custom format
}
}
}
```
</details>
The rules are slightly different from how Excel displays custom number formats.
In particular, literal characters must be wrapped in double quotes or preceded
by a backslash. For more info, see the Excel documentation article
`Create or delete a custom number format` or ECMA-376 18.8.31 (Number Formats)
<details>
<summary><b>Default Number Formats</b> (click to show)</summary>
The default formats are listed in ECMA-376 18.8.30:
| ID | Format |
|---:|:---------------------------|
| 0 | `General` |
| 1 | `0` |
| 2 | `0.00` |
| 3 | `#,##0` |
| 4 | `#,##0.00` |
| 9 | `0%` |
| 10 | `0.00%` |
| 11 | `0.00E+00` |
| 12 | `# ?/?` |
| 13 | `# ??/??` |
| 14 | `m/d/yy` (see below) |
| 15 | `d-mmm-yy` |
| 16 | `d-mmm` |
| 17 | `mmm-yy` |
| 18 | `h:mm AM/PM` |
| 19 | `h:mm:ss AM/PM` |
| 20 | `h:mm` |
| 21 | `h:mm:ss` |
| 22 | `m/d/yy h:mm` |
| 37 | `#,##0 ;(#,##0)` |
| 38 | `#,##0 ;[Red](#,##0)` |
| 39 | `#,##0.00;(#,##0.00)` |
| 40 | `#,##0.00;[Red](#,##0.00)` |
| 45 | `mm:ss` |
| 46 | `[h]:mm:ss` |
| 47 | `mmss.0` |
| 48 | `##0.0E+0` |
| 49 | `@` |
</details>
Format 14 (`m/d/yy`) is localized by Excel: even though the file specifies that
number format, it will be drawn differently based on system settings. It makes
sense when the producer and consumer of files are in the same locale, but that
is not always the case over the Internet. To get around this ambiguity, parse
functions accept the `dateNF` option to override the interpretation of that
specific format string.
## Hyperlinks
<details>
<summary><b>Format Support</b> (click to show)</summary>
**Cell Hyperlinks**: XLSX/M, XLSB, BIFF8 XLS, XLML, ODS
**Tooltips**: XLSX/M, XLSB, BIFF8 XLS, XLML
</details>
Hyperlinks are stored in the `l` key of cell objects. The `Target` field of the
hyperlink object is the target of the link, including the URI fragment. Tooltips
are stored in the `Tooltip` field and are displayed when you move your mouse
over the text.
For example, the following snippet creates a link from cell `A3` to
<https://sheetjs.com> with the tip `"Find us @ SheetJS.com!"`:
```js
ws['A1'].l = { Target:"https://sheetjs.com", Tooltip:"Find us @ SheetJS.com!" };
```
Note that Excel does not automatically style hyperlinks -- they will generally
be displayed as normal text.
_Remote Links_
HTTP / HTTPS links can be used directly:
```js
ws['A2'].l = { Target:"https://docs.sheetjs.com/#hyperlinks" };
ws['A3'].l = { Target:"http://localhost:7262/yes_localhost_works" };
```
Excel also supports `mailto` email links with subject line:
```js
ws['A4'].l = { Target:"mailto:ignored@dev.null" };
ws['A5'].l = { Target:"mailto:ignored@dev.null?subject=Test Subject" };
```
_Local Links_
Links to absolute paths should use the `file://` URI scheme:
```js
ws['B1'].l = { Target:"file:///SheetJS/t.xlsx" }; /* Link to /SheetJS/t.xlsx */
ws['B2'].l = { Target:"file:///c:/SheetJS.xlsx" }; /* Link to c:\SheetJS.xlsx */
```
Links to relative paths can be specified without a scheme:
```js
ws['B3'].l = { Target:"SheetJS.xlsb" }; /* Link to SheetJS.xlsb */
ws['B4'].l = { Target:"../SheetJS.xlsm" }; /* Link to ../SheetJS.xlsm */
```
Relative Paths have undefined behavior in the SpreadsheetML 2003 format. Excel
2019 will treat a `..\` parent mark as two levels up.
_Internal Links_
Links where the target is a cell or range or defined name in the same workbook
("Internal Links") are marked with a leading hash character:
```js
ws['C1'].l = { Target:"#E2" }; /* Link to cell E2 */
ws['C2'].l = { Target:"#Sheet2!E2" }; /* Link to cell E2 in sheet Sheet2 */
ws['C3'].l = { Target:"#SomeDefinedName" }; /* Link to Defined Name */
```
## Cell Comments
Cell comments are objects stored in the `c` array of cell objects. The actual
contents of the comment are split into blocks based on the comment author. The
`a` field of each comment object is the author of the comment and the `t` field
is the plain text representation.
For example, the following snippet appends a cell comment into cell `A1`:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"I'm a little comment, short and stout!"});
```
Note: XLSB enforces a 54 character limit on the Author name. Names longer than
54 characters may cause issues with other formats.
To mark a comment as normally hidden, set the `hidden` property:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This comment is visible"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This comment will be hidden"});
```
_Threaded Comments_
Introduced in Excel 365, threaded comments are plain text comment snippets with
author metadata and parent references. They are supported in XLSX and XLSB.
To mark a comment as threaded, each comment part must have a true `T` property:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This is not threaded"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This is threaded", T: true});
ws.A2.c.push({a:"JSSheet", t:"This is also threaded", T: true});
```
There is no Active Directory or Office 365 metadata associated with authors in a thread.
## Sheet Visibility
Excel enables hiding sheets in the lower tab bar. The sheet data is stored in
the file but the UI does not readily make it available. Standard hidden sheets
are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which
cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the `Hidden` property of sheet props array.
<details>
<summary><b>More details</b> (click to show)</summary>
| Value | Definition |
|:-----:|:------------|
| 0 | Visible |
| 1 | Hidden |
| 2 | Very Hidden |
With <https://rawgit.com/SheetJS/test_files/HEAD/sheet_visibility.xlsx>:
```js
> wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
[ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]
```
Non-Excel formats do not support the Very Hidden state. The best way to test
if a sheet is visible is to check if the `Hidden` property is logical truth:
```js
> wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
[ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
```
</details>
## VBA and Macros
VBA Macros are stored in a special data blob that is exposed in the `vbaraw`
property of the workbook object when the `bookVBA` option is `true`. They are
supported in `XLSM`, `XLSB`, and `BIFF8 XLS` formats. The supported format
writers automatically insert the data blobs if it is present in the workbook and
associate with the worksheet names.
<details>
<summary><b>Custom Code Names</b> (click to show)</summary>
The workbook code name is stored in `wb.Workbook.WBProps.CodeName`. By default,
Excel will write `ThisWorkbook` or a translated phrase like `DieseArbeitsmappe`.
Worksheet and Chartsheet code names are in the worksheet properties object at
`wb.Workbook.Sheets[i].CodeName`. Macrosheets and Dialogsheets are ignored.
The readers and writers preserve the code names, but they have to be manually
set when adding a VBA blob to a different workbook.
</details>
<details>
<summary><b>Macrosheets</b> (click to show)</summary>
Older versions of Excel also supported a non-VBA "macrosheet" sheet type that
stored automation commands. These are exposed in objects with the `!type`
property set to `"macro"`.
</details>
<details>
<summary><b>Detecting macros in workbooks</b> (click to show)</summary>
The `vbaraw` field will only be set if macros are present, so testing is simple:
```js
function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
if(!!wb.vbaraw) return true;
const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
return sheets.some((ws) => !!ws && ws['!type']=='macro');
}
```
</details>