docs.sheetjs.com/docz/docs/07-csf/07-features/06-nf.md
SheetJS 92e3c5aa72 mdx cleanup in preparation for v2
- use autolinks (e.g <https://sheetjs.com> -> https://sheetjs.com)
- move <summary> blocks to separate lines
2024-04-08 00:57:39 -04:00

371 lines
14 KiB
Markdown

---
title: Number Formats
sidebar_position: 6
---
<details>
<summary><b>File Format Support</b> (click to show)</summary>
Modern applications separate "content" from "presentation". A value like `$3.50`
is typically stored as the underlying value (`3.50`) with a format (`$0.00`).
Parsers are expected to render values using the respective number formats.
Text-based file formats like CSV and HTML mix content and presentation. `$3.50`
is stored as the formatted value. The formatted values can be generated from
many different values and number formats. SheetJS parsers expose options to
control value parsing and number format speculation.
| Formats | Basic | Storage Representation |
|:------------------|:-----:|:-----------------------|
| XLSX / XLSM | ✔ | Number Format Code |
| XLSB | ✔ | Number Format Code |
| XLS | ✔ | Number Format Code |
| XLML | ✔ | Number Format Code |
| SYLK | R | Number Format Code |
| ODS / FODS / UOS | ✔ | XML Tokens |
| NUMBERS | | Binary encoding |
| WK1 | + | Fixed set of formats |
| WK3 / WK4 | | Binary encoding |
| WKS Lotus | + | Fixed set of formats |
| WKS Works | + | Fixed set of formats |
| WQ1 | + | Fixed set of formats |
| WQ2 | | Binary encoding |
| WB1 / WB2 / WB3 | | Binary encoding |
| QPW | + | Binary encoding |
| DBF | | Implied by field types |
| HTML | ! | Special override |
| CSV | * | N/A |
| PRN | * | N/A |
| DIF | * | N/A |
| RTF | * | N/A |
(+) mark formats with limited support. The QPW (Quattro Pro Workbooks) parser
supports the built-in date and built-in time formats but does not support
custom number formats. [Date and Time support](/docs/csf/features/dates) in
modern Excel formats requires limited number format support to distinguish date
or time codes from standard numeric data.
Asterisks (*) mark formats that mix content and presentation. Writers will use
formatted values if cell objects include formatted text or number formats.
Parsers may guess number formats for special values.
The letter R (R) marks features parsed but not written in the format.
(!) HTML mixes content and presentation. The HTML DOM parser supports special
[attributes to override number formats](/docs/api/utilities/html#value-override)
</details>
Typically spreadsheets will include formatted text such as currencies (`$3.50`)
or large numbers with thousands separators (`7,262`) or percentages (`2.19%`).
To simplify editing, the applications will store the underlying values and the
number formats separately. For example, `$3.50` will be represented as the value
`3.5` with a number format that mandates a `$` sigil and 2 decimal places.
The `z` property of SheetJS cell objects stores the number format metadata:
```js
/* set the format of cell B2 to "0.00%" */
worksheet["B2"].z = "0.00%";
```
When requested, the formatted text will be stored in the `w` property.
## Live Demo
This example generates a worksheet with common number formats. The number
formats are explicitly assigned:
```js
/* assign number formats */
ws["B2"].z = '"$"#,##0.00_);\\("$"#,##0.00\\)'; // Currency format
ws["B3"].z = '#,##0'; // Number with thousands separator
ws["B4"].z = "0.00%"; // Percentage with up to 2 decimal places
```
`sheet_to_html` uses the number formats and values to compute the formatted text
when generating the HTML table.
The "Export" button will write a workbook with number formats. The file can be
opened in Excel or another spreadsheet editor. The values in column B will be
proper numbers with the assigned number formats.
```jsx live
function SheetJSSimpleNF(props) {
const [ws, setWS] = React.useState();
const [__html, setHTML] = React.useState("");
const fmt = React.useRef(null);
/* when the page is loaded, create worksheet and show table */
React.useEffect(() => {
/* Create worksheet from simple data */
const ws = XLSX.utils.aoa_to_sheet([
["General", 54337 ],
["Currency", 3.5 ],
["Thousands", 7262 ],
["Percent", 0.0219 ],
]);
/* assign number formats */
ws["B2"].z = '"$"#,##0.00_);\\("$"#,##0.00\\)';
ws["B3"].z = '#,##0';
ws["B4"].z = "0.00%";
/* save worksheet object for the export */
setWS(ws);
/* generate the HTML table */
setHTML(XLSX.utils.sheet_to_html(ws));
}, []);
const xport = (fmt) => {
/* Export to file (start a download) */
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Formats");
XLSX.writeFile(wb, `SheetJSSimpleNF.${fmt}`);
};
const fmts = ["xlsx", "xls", "csv", "xlsb", "html", "ods"];
return ( <>
<b>File format: </b>
<select ref={fmt}>{fmts.map(f=>(<option value={f}>{f}</option>))}</select>
<br/><button onClick={()=>xport(fmt.current.value)}><b>Export!</b></button>
<div dangerouslySetInnerHTML={{__html}}/>
</> );
}
```
## SheetJS Representation
Number formats and values are attached to cells. The following keys are used:
| Key | Description |
| --- | ---------------------------------------------------------------------- |
| `v` | raw value (number, string, Date object, boolean) |
| `z` | number format string associated with the cell (if requested) |
| `w` | formatted text (if applicable) |
The `cell.w` formatted text for each cell is produced from `cell.v` and `cell.z`
format. If the format is not specified, the Excel `General` format is used.
By default, parsers do not attach number formats to cells. The `cellNF` option
instructs `XLSX.read` or `XLSX.readFile` to save the formats.
### Number Format Strings
The `z` format string follows the Excel persistence rules as described in
ECMA-376 18.8.31 (Number Formats)[^1]
The rules are slightly different from how Excel displays custom number formats.
In particular, literal characters must be wrapped in double quotes or preceded
by a backslash.
The following example prints number formats from a user-specified file:
```jsx live
function SheetJSExtractNF(props) {
const [rows, setRows] = React.useState([])
return ( <>
<input type="file" onChange={async(e) => {
/* parse workbook with cellNF: true */
const wb = XLSX.read(await e.target.files[0].arrayBuffer(), {cellNF: true});
/* look at each cell in each worksheet */
const formats = {};
wb.SheetNames.forEach(n => {
var ws = wb.Sheets[n]; if(!ws || !ws["!ref"]) return;
var ref = XLSX.utils.decode_range(ws["!ref"]);
for(var R = 0; R <= ref.e.r; ++R) for(var C = 0; C <= ref.e.c; ++C) {
var addr = XLSX.utils.encode_cell({r:R,c:C});
if(!ws[addr] || !ws[addr].z || formats[ws[addr].z]) continue;
/* when a new format is found, save the address */
formats[ws[addr].z] = `'${n}'!${addr}`;
setRows(Object.entries(formats));
}
});
}}/>
<table><tr><th>Format</th><th>JSON</th><th>Example Cell</th></tr>
{rows.map((r,R) => ( <tr key={R}>
<td><code>{r[0]}</code></td>
<td><code>{JSON.stringify(r[0])}</code></td>
<td>{r[1]}</td>
</tr> ))}
</table>
</> );
}
```
## Values and Formatting
### Dates and Times
In XLS and other file formats that extended the Lotus 1-2-3 worksheet file
format, dates and times are stored as numeric codes. The application uses the
number format to determine whether the value should be interpreted as a date.
:::note pass
Interpretation of date codes is covered in ["Dates and Times"](/docs/csf/features/dates).
:::
The following repeatable tokens force a date interpretation:
| Tokens | Description |
|:-----------------|:-------------------------------------------------------|
| `Y` | Year |
| `M` | Month or Minute (contextual) |
| `D` | Day |
| `H` | Hours (0-23 normally, but 1-12 if meridiem is present) |
| `S` | Seconds |
| `A/P` or `AM/PM` | Meridiem |
| `[h]` or `[hh]` | Absolute hours (duration) |
| `[m]` or `[mm]` | Absolute minutes (duration) |
| `[s]` or `[ss]` | Absolute seconds (duration) |
| `B1` or `B2` | Use Gregorian Calendar (`B1`) or Hijri Calendar (`B2`) |
| `E` | "Era Year" or standard year depending on locale |
| `G` | "Era" modifier or empty string depending on locale |
If a format is detected to be a date, the decimal tokens `.0`, `.00` and `.000`
represent the sub-second portion of the time.
### Percentages
Percentage formats automatically scale values by 100. Multiple percent symbols
repeat the effect. For example, a cell with value `2.19%` is typically stored as
a numeric cell with value `0.0219` and number format `0.00%`
The following table uses the `en-US` locale (`.` as the decimal point symbol).
Formatted text is rendered using the embedded SheetJS `SSF` formatting library.
```jsx live
function SheetJSPCT() {
const data = [
{ n: 0.0219, z: "0.00%"},
{ n: 2.19, z: "0.00%"},
{ n: 0.0219, z: "0.00%%"},
{ n: 2.19, z: "0.00%%"},
];
return ( <table><tr><th>Number</th><th>Format</th><th>Text</th></tr>
{data.map(r => (<tr>
<td><code>{r.n}</code></td>
<td><code>{r.z}</code></td>
<td><code>{XLSX.SSF.format(r.z, r.n)}</code></td>
</tr>))}
</table> );
}
```
### Fractions
Some applications support displaying numbers in fractional form.
Fractions with a fixed denominator are calculated by scaling and rounding the
fractional part of the number.
Fractions with a variable denominator are typically specified by the number of
digits in the denominator (for example, "Up to one digit").
:::info pass
The optimal solution from a mathematical perspective is the "Mediant" method.
This algorithm can be very slow in the worst case, so spreadsheet applications
tend to use a continued fraction approach.
The common algorithm produces unexpected results for "Up to one digit":
| Value | Mediant | Excel 2019 |
|:------|--------:|-----------:|
| `0.3` | `2/7` | `2/7` |
| `1.3` | `1 2/7` | `1 1/3` |
| `2.3` | `2 2/7` | `2 2/7` |
| `3.3` | `3 2/7` | `3 2/7` |
:::
## Miscellany
The default formats are listed in ECMA-376 18.8.30:
<details>
<summary><b>Default Number Formats</b> (click to show)</summary>
| ID | Format |
|---:|:---------------------------|
| 0 | `General` |
| 1 | `0` |
| 2 | `0.00` |
| 3 | `#,##0` |
| 4 | `#,##0.00` |
| 9 | `0%` |
| 10 | `0.00%` |
| 11 | `0.00E+00` |
| 12 | `# ?/?` |
| 13 | `# ??/??` |
| 14 | `m/d/yy` (see below) |
| 15 | `d-mmm-yy` |
| 16 | `d-mmm` |
| 17 | `mmm-yy` |
| 18 | `h:mm AM/PM` |
| 19 | `h:mm:ss AM/PM` |
| 20 | `h:mm` |
| 21 | `h:mm:ss` |
| 22 | `m/d/yy h:mm` |
| 37 | `#,##0 ;(#,##0)` |
| 38 | `#,##0 ;[Red](#,##0)` |
| 39 | `#,##0.00;(#,##0.00)` |
| 40 | `#,##0.00;[Red](#,##0.00)` |
| 45 | `mm:ss` |
| 46 | `[h]:mm:ss` |
| 47 | `mmss.0` |
| 48 | `##0.0E+0` |
| 49 | `@` |
</details>
Format 14 (`m/d/yy`) is localized by Excel: even though the file specifies that
number format, it will be drawn differently based on system settings. It makes
sense when the producer and consumer of files are in the same locale, but that
is not always the case over the Internet. To get around this ambiguity, parse
functions accept the `dateNF` option to override the interpretation of that
specific format string.
### Excel Format Categories
Excel officially recognizes a small number of formats as "Currency" and another
set of formats as "Accounting". The exact formats in `en-US` are listed below:
#### Currency
| JS String | Decimal Places | Negative Color |
|:-----------------------------------------|:---------------|:---------------|
| `'"$"#,##0_);\\("$"#,##0\\)'` | 0 | Black |
| `'"$"#,##0_);[Red]\\("$"#,##0\\)'` | 0 | Red |
| `'"$"#,##0.00_);\\("$"#,##0.00\\)'` | 2 | Black |
| `'"$"#,##0.00_);[Red]\\("$"#,##0.00\\)'` | 2 | Red |
#### Accounting
| JS String | Decimal | Sigil |
|:-----------------------------------------------------------------|:--------|:------|
| `'_(* #,##0_);_(* \\(#,##0\\);_(* "-"_);_(@_)'` | 0 | |
| `'_("$"* #,##0_);_("$"* \\(#,##0\\);_("$"* "-"_);_(@_)'` | 0 | `$` |
| `'_(* #,##0.00_);_(* \\(#,##0.00\\);_(* "-"??_);_(@_)'` | 2 | |
| `'_("$"* #,##0.00_);_("$"* \\(#,##0.00\\);_("$"* "-"??_);_(@_)'` | 2 | `$` |
For other locales, the formats can be discovered by creating a file with the
desired format and testing with [the Number Format Strings demo](#number-format-strings)
### HTML Override
[**This feature is discussed in the HTML utilities section**](/docs/api/utilities/html#value-override)
### Plaintext Export
Built-in utilities that use formatted text (such as the CSV exporter) will use
the `w` text if available. When programmatically changing values, the `w` text
should be deleted before attempting to export. Utilities will regenerate the `w`
text from the number format (`cell.z`) and the raw value if possible.
[^1]: The ["Review guidelines for customizing a number format"](https://support.microsoft.com/en-us/office/review-guidelines-for-customizing-a-number-format-c0a1d1fa-d3f4-4018-96b7-9c9354dd99f5) page in the Excel documentation covered custom number format minutiae.