From fc8923d20fbea8829d5ad396a5a5d5cb714bb4ca Mon Sep 17 00:00:00 2001 From: SheetJS Date: Fri, 9 Sep 2022 19:44:12 -0400 Subject: [PATCH] codepage --- docz/docs/08-api/07-write-options.md | 31 +++++++++++++++------------- docz/docs/09-miscellany/02-errors.md | 20 ++++++++++++++++++ 2 files changed, 37 insertions(+), 14 deletions(-) diff --git a/docz/docs/08-api/07-write-options.md b/docz/docs/08-api/07-write-options.md index 1ea8ae3..ddfd609 100644 --- a/docz/docs/08-api/07-write-options.md +++ b/docz/docs/08-api/07-write-options.md @@ -21,27 +21,30 @@ If `o` is omitted, the writer will use the third argument as the callback. The write functions accept an options argument: -| Option Name | Default | Description | -| :---------- | -------: | :-------------------------------------------------- | -|`type` | | Output data encoding (see Output Type below) | -|`cellDates` | `false` | Store dates as type `d` (default is `n`) | -|`bookSST` | `false` | Generate Shared String Table ** | -|`bookType` | `"xlsx"` | Type of Workbook (see below for supported formats) | -|`sheet` | `""` | Name of Worksheet for single-sheet formats ** | -|`compression`| `false` | Use ZIP compression for ZIP-based formats ** | -|`Props` | | Override workbook properties when writing ** | -|`themeXLSX` | | Override theme XML when writing XLSX/XLSB/XLSM ** | -|`ignoreEC` | `true` | Suppress "number as text" errors ** | -|`numbers` | | Payload for NUMBERS export ** | +| Option Name | Default | Description | +| :---------- | -------: | :------------------------------------------------- | +|`type` | | Output data encoding (see Output Type below) | +|`cellDates` | `false` | Store dates as type `d` (default is `n`) | +|`codepage` | | If specified, use code page when appropriate ** | +|`bookSST` | `false` | Generate Shared String Table ** | +|`bookType` | `"xlsx"` | Type of Workbook (see below for supported formats) | +|`sheet` | `""` | Name of Worksheet for single-sheet formats ** | +|`compression`| `false` | Use ZIP compression for ZIP-based formats ** | +|`Props` | | Override workbook properties when writing ** | +|`themeXLSX` | | Override theme XML when writing XLSX/XLSB/XLSM ** | +|`ignoreEC` | `true` | Suppress "number as text" errors ** | +|`numbers` | | Payload for NUMBERS export ** | - `bookSST` is slower and more memory intensive, but has better compatibility with older versions of iOS Numbers -- The raw data is the only thing guaranteed to be saved. Features not described +- The raw data is the only thing guaranteed to be saved. Features not described in this README may not be serialized. - `cellDates` only applies to XLSX output and is not guaranteed to work with third-party readers. Excel itself does not usually write cells with type `d` so non-Excel tools may ignore the data or error in the presence of dates. -- `Props` is an object mirroring the workbook `Props` field. See the table from +- `codepage` is applied to legacy formats including DBF. Characters missing + from the encoding will be replaced with underscore characters (`_`). +- `Props` is an object mirroring the workbook `Props` field. See the table from the [Workbook File Properties](../csf/book#file-properties) section. - if specified, the string from `themeXLSX` will be saved as the primary theme for XLSX/XLSB/XLSM files (to `xl/theme/theme1.xml` in the ZIP) diff --git a/docz/docs/09-miscellany/02-errors.md b/docz/docs/09-miscellany/02-errors.md index acbf706..fbda36a 100644 --- a/docz/docs/09-miscellany/02-errors.md +++ b/docz/docs/09-miscellany/02-errors.md @@ -100,6 +100,26 @@ The ESM build, used in tools like Webpack and in Deno, does not include the codepage tables by default. The ["Frameworks and Bundlers"](../02-getting-started/01-installation/02-frameworks.md#encoding-support) section explains how to load support. +#### DBF files with Chinese or Japanese characters have underscores + +As mentioned in the previous answer, codepage tables must be loaded. + +When reading legacy files that do not include character set metadata, the +`codepage` option controls the codepage. Common values: + +| `codepage` | Description | +|-----------:|:-------------------------| +| 874 | Windows Thai | +| 932 | Japanese Shift-JIS | +| 936 | Simplified Chinese GBK | +| 950 | Traditional Chinese Big5 | +| 1200 | UTF-16 Little Endian | +| 1252 | Windows Latin 1 | + +When writing files in legacy formats like DBF, the default codepage 1252 will +be used. The codepage option will override the setting. Any characters missing +from the character set will be replaced with underscores. + #### Worksheet only includes one row of data Some third-party writer tools will not update the dimensions records in XLSX or