docs.sheetjs.com/docz/docs/07-csf/07-features/index.md

435 lines
14 KiB
Markdown
Raw Normal View History

2022-05-16 03:26:04 +00:00
# Spreadsheet Features
2022-06-27 02:05:36 +00:00
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
2022-05-16 03:26:04 +00:00
Even for basic features like date storage, the official Excel formats store the
same content in different ways. The parsers are expected to convert from the
underlying file format representation to the Common Spreadsheet Format. Writers
2022-08-25 08:22:28 +00:00
are expected to serialize SheetJS workbooks in the underlying file format.
2022-05-16 03:26:04 +00:00
2022-06-27 02:05:36 +00:00
The following topics are covered in sub-pages:
<ul>{useCurrentSidebarCategory().items.map((item, index) => {
const listyle = (item.customProps?.icon) ? {
listStyleImage: `url("${item.customProps.icon}")`
} : {};
return (<li style={listyle} {...(item.customProps?.class ? {className: item.customProps.class}: {})}>
<a href={item.href}>{item.label}</a>{item.customProps?.summary && (" - " + item.customProps.summary)}
</li>);
})}</ul>
2022-05-16 03:26:04 +00:00
## Row and Column Properties
<details>
<summary><b>Format Support</b> (click to show)</summary>
**Row Properties**: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
**Column Properties**: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
</details>
Row and Column properties are not extracted by default when reading from a file
and are not persisted by default when writing to a file. The option
`cellStyles: true` must be passed to the relevant read or write function.
_Column Properties_
The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:
```typescript
type ColInfo = {
/* visibility */
hidden?: boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?: number; // width in screen pixels
2022-05-27 14:59:53 +00:00
width?: number; // width in Excel "Max Digit Width", width*256 is integral
2022-05-16 03:26:04 +00:00
wch?: number; // width in characters
/* other fields for preserving features from files */
level?: number; // 0-indexed outline / group level
2022-05-27 14:59:53 +00:00
MDW?: number; // Excel "Max Digit Width" unit, always integral
2022-05-16 03:26:04 +00:00
};
```
_Row Properties_
The `!rows` array in each worksheet, if present, is a collection of `RowInfo`
objects which have the following properties:
```typescript
type RowInfo = {
/* visibility */
hidden?: boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?: number; // height in screen pixels
hpt?: number; // height in points
level?: number; // 0-indexed outline / group level
};
```
_Outline / Group Levels Convention_
The Excel UI displays the base outline level as `1` and the max level as `8`.
Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base
outline level is `0` and the max level is `7`.
<details>
<summary><b>Why are there three width types?</b> (click to show)</summary>
There are three different width types corresponding to the three different ways
spreadsheets store column widths:
SYLK and other plain text formats use raw character count. Contemporaneous tools
like Visicalc and Multiplan were character based. Since the characters had the
same width, it sufficed to store a count. This tradition was continued into the
BIFF formats.
SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel
count throughout the file. Column widths, row heights, and other measures use
pixels. When the pixel and character counts do not align, Excel rounds values.
XLSX internally stores column widths in a nebulous "Max Digit Width" form. The
Max Digit Width is the width of the largest digit when rendered (generally the
"0" character is the widest). The internal width must be an integer multiple of
2022-08-19 06:42:18 +00:00
the width divided by 256. ECMA-376 describes a formula for converting between
pixels and the internal width. This represents a hybrid approach.
2022-05-16 03:26:04 +00:00
Read functions attempt to populate all three properties. Write functions will
try to cycle specified values to the desired type. In order to avoid potential
conflicts, manipulation should delete the other properties first. For example,
when changing the pixel width, delete the `wch` and `width` properties.
</details>
<details>
<summary><b>Implementation details</b> (click to show)</summary>
_Row Heights_
Excel internally stores row heights in points. The default resolution is 72 DPI
or 96 PPI, so the pixel and point size should agree. For different resolutions
they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `hpx` pixel height if available
2) use `hpt` point height if available
_Column Widths_
2022-08-25 08:22:28 +00:00
Given the constraints, it is possible to determine the `MDW` without actually
2022-05-16 03:26:04 +00:00
inspecting the font! The parsers guess the pixel width by converting from width
2022-08-25 08:22:28 +00:00
to pixels and back, repeating for all possible `MDW` and selecting the value
that minimizes the error. XLML actually stores the pixel width, so the guess
works in the opposite direction.
2022-05-16 03:26:04 +00:00
Even though all of the information is made available, writers are expected to
follow the priority order:
1) use `width` field if available
2) use `wpx` pixel width if available
3) use `wch` character count if available
</details>
## Number Formats
The `cell.w` formatted text for each cell is produced from `cell.v` and `cell.z`
format. If the format is not specified, the Excel `General` format is used.
The format can either be specified as a string or as an index into the format
2022-08-25 08:22:28 +00:00
table. Readers are expected to populate `workbook.SSF` with the number format
2022-05-16 03:26:04 +00:00
table. Writers are expected to serialize the table.
2022-06-27 02:05:36 +00:00
The following example creates a custom format from scratch:
2022-05-16 03:26:04 +00:00
```js
var wb = {
SheetNames: ["Sheet1"],
Sheets: {
Sheet1: {
"!ref":"A1:C1",
A1: { t:"n", v:10000 }, // <-- General format
B1: { t:"n", v:10000, z: "0%" }, // <-- Builtin format
C1: { t:"n", v:10000, z: "\"T\"\ #0.00" } // <-- Custom format
}
}
}
```
2022-05-27 14:59:53 +00:00
2022-05-16 03:26:04 +00:00
The rules are slightly different from how Excel displays custom number formats.
In particular, literal characters must be wrapped in double quotes or preceded
by a backslash. For more info, see the Excel documentation article
`Create or delete a custom number format` or ECMA-376 18.8.31 (Number Formats)
<details>
<summary><b>Default Number Formats</b> (click to show)</summary>
The default formats are listed in ECMA-376 18.8.30:
| ID | Format |
|---:|:---------------------------|
| 0 | `General` |
| 1 | `0` |
| 2 | `0.00` |
| 3 | `#,##0` |
| 4 | `#,##0.00` |
| 9 | `0%` |
| 10 | `0.00%` |
| 11 | `0.00E+00` |
| 12 | `# ?/?` |
| 13 | `# ??/??` |
| 14 | `m/d/yy` (see below) |
| 15 | `d-mmm-yy` |
| 16 | `d-mmm` |
| 17 | `mmm-yy` |
| 18 | `h:mm AM/PM` |
| 19 | `h:mm:ss AM/PM` |
| 20 | `h:mm` |
| 21 | `h:mm:ss` |
| 22 | `m/d/yy h:mm` |
| 37 | `#,##0 ;(#,##0)` |
| 38 | `#,##0 ;[Red](#,##0)` |
| 39 | `#,##0.00;(#,##0.00)` |
| 40 | `#,##0.00;[Red](#,##0.00)` |
| 45 | `mm:ss` |
| 46 | `[h]:mm:ss` |
| 47 | `mmss.0` |
| 48 | `##0.0E+0` |
| 49 | `@` |
</details>
Format 14 (`m/d/yy`) is localized by Excel: even though the file specifies that
number format, it will be drawn differently based on system settings. It makes
sense when the producer and consumer of files are in the same locale, but that
is not always the case over the Internet. To get around this ambiguity, parse
functions accept the `dateNF` option to override the interpretation of that
specific format string.
2022-05-27 14:59:53 +00:00
## Cell Comments
2022-05-16 03:26:04 +00:00
<details>
<summary><b>Format Support</b> (click to show)</summary>
2022-05-27 14:59:53 +00:00
**Simple Notes/Comments**: XLSX/M, XLSB, BIFF8 XLS (read only), XLML, ODS (read only)
2022-05-16 03:26:04 +00:00
2022-05-27 14:59:53 +00:00
**Threaded Comments**: XLSX/M, XLSB (read only)
2022-05-16 03:26:04 +00:00
</details>
Cell comments are objects stored in the `c` array of cell objects. The actual
contents of the comment are split into blocks based on the comment author. The
`a` field of each comment object is the author of the comment and the `t` field
is the plain text representation.
For example, the following snippet appends a cell comment into cell `A1`:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"I'm a little comment, short and stout!"});
```
Note: XLSB enforces a 54 character limit on the Author name. Names longer than
54 characters may cause issues with other formats.
To mark a comment as normally hidden, set the `hidden` property:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This comment is visible"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This comment will be hidden"});
```
_Threaded Comments_
Introduced in Excel 365, threaded comments are plain text comment snippets with
author metadata and parent references. They are supported in XLSX and XLSB.
To mark a comment as threaded, each comment part must have a true `T` property:
```js
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This is not threaded"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This is threaded", T: true});
ws.A2.c.push({a:"JSSheet", t:"This is also threaded", T: true});
```
There is no Active Directory or Office 365 metadata associated with authors in a thread.
## Sheet Visibility
2022-05-23 03:37:51 +00:00
<details>
<summary><b>Format Support</b> (click to show)</summary>
**Hidden Sheets**: XLSX/M, XLSB, BIFF8/BIFF5 XLS, XLML
**Very Hidden Sheets**: XLSX/M, XLSB, BIFF8/BIFF5 XLS, XLML
</details>
2022-05-16 03:26:04 +00:00
Excel enables hiding sheets in the lower tab bar. The sheet data is stored in
the file but the UI does not readily make it available. Standard hidden sheets
are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which
cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the `Hidden` property of sheet props array.
2022-05-23 03:37:51 +00:00
| Value | Definition | VB Editor "Visible" Property |
|:-----:|:------------|:-----------------------------|
| 0 | Visible | `-1 - xlSheetVisible` |
| 1 | Hidden | ` 0 - xlSheetHidden` |
| 2 | Very Hidden | ` 2 - xlSheetVeryHidden` |
2022-05-16 03:26:04 +00:00
2022-05-23 03:37:51 +00:00
If the respective Sheet entry does not exist or if the `Hidden` property is not
set, the worksheet is visible.
2022-05-16 03:26:04 +00:00
2022-08-25 08:22:28 +00:00
**List all worksheets and their visibility settings**
2022-05-16 03:26:04 +00:00
```js
2022-05-23 03:37:51 +00:00
wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
// [ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]
2022-05-16 03:26:04 +00:00
```
2022-05-23 03:37:51 +00:00
**Check if worksheet is visible**
2022-05-16 03:26:04 +00:00
Non-Excel formats do not support the Very Hidden state. The best way to test
if a sheet is visible is to check if the `Hidden` property is logical truth:
```js
2022-05-23 03:37:51 +00:00
wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
// [ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
```
<details>
<summary><b>Live Example</b> (click to show)</summary>
[This test file](pathname:///files/sheet_visibility.xlsx) has three sheets:
- "Visible" is visible
- "Hidden" is hidden
- "VeryHidden" is very hidden
![Screenshot](pathname:///files/sheet_visibility.png)
**Live demo**
```jsx live
function Visibility(props) {
const [sheets, setSheets] = React.useState([]);
const names = [ "Visible", "Hidden", "Very Hidden" ];
React.useEffect(async() => {
const f = await fetch("/files/sheet_visibility.xlsx");
const ab = await f.arrayBuffer();
const wb = XLSX.read(ab);
/* State will be set to the `Sheets` property array */
setSheets(wb.Workbook.Sheets);
}, []);
return (<table>
<thead><tr><th>Name</th><th>Value</th><th>Hidden</th></tr></thead>
<tbody>{sheets.map((x,i) => (<tr key={i}>
<td>{x.name}</td>
<td>{x.Hidden} - {names[x.Hidden]}</td>
<td>{!x.Hidden ? "No" : "Yes"}</td>
</tr>))}</tbody></table>);
}
2022-05-16 03:26:04 +00:00
```
2022-05-23 03:37:51 +00:00
2022-05-16 03:26:04 +00:00
</details>
## VBA and Macros
2022-05-27 14:59:53 +00:00
<details>
<summary><b>Format Support</b> (click to show)</summary>
**VBA Modules**: XLSM, XLSB, BIFF8 XLS
</details>
2022-05-16 03:26:04 +00:00
VBA Macros are stored in a special data blob that is exposed in the `vbaraw`
property of the workbook object when the `bookVBA` option is `true`. They are
supported in `XLSM`, `XLSB`, and `BIFF8 XLS` formats. The supported format
writers automatically insert the data blobs if it is present in the workbook and
associate with the worksheet names.
2022-06-27 02:05:36 +00:00
The `vbaraw` property stores raw bytes. [SheetJS Pro](https://sheetjs.com/pro)
offers a special component for extracting macro text from the VBA blob, editing
the VBA project, and exporting new VBA blobs.
#### Round-tripping Macro Enabled Files
In order to preserve macro when reading and writing files, the `bookVBA` option
must be set to true when reading and when writing. In addition, the output file
format must support macros. `XLSX` notably does not support macros, and `XLSM`
should be used in its place:
```js
/* Reading data */
var wb = XLSX.read(data, { bookVBA: true }); // read file and distill VBA blob
var vbablob = wb.vbaraw;
```
#### Code Names
By default, Excel will use `ThisWorkbook` or a translation `DieseArbeitsmappe`
for the workbook. Each worksheet will be identified using the default `Sheet#`
naming pattern even if the worksheet names have changed.
A custom workbook code name will be stored in `wb.Workbook.WBProps.CodeName`.
For exports, assigning the property will override the default value.
2022-05-16 03:26:04 +00:00
Worksheet and Chartsheet code names are in the worksheet properties object at
`wb.Workbook.Sheets[i].CodeName`. Macrosheets and Dialogsheets are ignored.
The readers and writers preserve the code names, but they have to be manually
set when adding a VBA blob to a different workbook.
2022-06-27 02:05:36 +00:00
#### Macrosheets
2022-05-16 03:26:04 +00:00
Older versions of Excel also supported a non-VBA "macrosheet" sheet type that
stored automation commands. These are exposed in objects with the `!type`
property set to `"macro"`.
2022-06-27 02:05:36 +00:00
Under the hood, Excel treats Macrosheets as normal worksheets with special
interpretation of the function expressions.
2022-05-16 03:26:04 +00:00
2022-06-27 02:05:36 +00:00
#### Detecting Macros in Workbooks
2022-05-16 03:26:04 +00:00
2022-06-27 02:05:36 +00:00
The `vbaraw` field will only be set if macros are present. Macrosheets will be
explicitly flagged. Combining the two checks yields a simple function:
2022-05-16 03:26:04 +00:00
```js
function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
2022-07-07 04:05:14 +00:00
if(!!wb.vbaraw) return true;
const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
return sheets.some((ws) => !!ws && ws['!type']=='macro');
2022-05-16 03:26:04 +00:00
}
```