sylk record detail

This commit is contained in:
SheetJS 2022-05-04 20:30:17 -04:00
parent d32a56a766
commit 902537a351

View File

@ -3,14 +3,23 @@
Files start with `ID` (`0x49 0x44`). Files are interpreted as plaintext in the
system ANSI codepage.
This is a native file format of Multiplan and has been supported in all versions
of Excel for Windows (to date). It is also used in the game "Warcraft III" and
various mods including "Defense of the Ancients".
## Basics
**Records**
The file consists of a series of plaintext records. Records are separated by
newline characters (both `\r\n` and `\n` newlines are accepted by newer versions
of Excel, but generated files should prefer CRLF).
### Fields
As stated in the Multiplan manual, "parsers must be prepared to ignore records
and fields that they do not understand". Loosely speaking, software can report
error messages on each unsupported record but should read valid records.
**Fields**
A record consists of a record type and a series of fields. Each part of the
record is separated by a single `;` character.
@ -21,9 +30,31 @@ The literal semicolon is encoded as two consecutive semicolons `;;`. Example:
C;Y1;X1;K"abc;;def"
```
### Global State
The `Y` and `X` fields set the current row / column before processing records.
Parsing is stateful. Records that apply to a specific cell but do not have `X`
or `Y` fields will use the global state:
```sylk
F;M4;Y1;X1 <-- set current cell to A1
// current cell is A1
C;K"A1" <-- set cell value to "A1"
F;M5;X2 <-- set current column to B (no Y -> row is unchanged)
// current cell is B1
C;K"C1";X3 <-- set current column to C, then assign value "C1"
// current cell is C1
C;K"C2";Y2 <-- set current row to 2, then assign value "C2"
// current cell is C2
F;M4 <-- set current cell style
```
This also means that records must be processed in order.
### Encoding
In addition to the escaped semicolon, Excel understand two types of Encodings:
In addition to the escaped semicolon, Excel understand two types of Encodings.
They are not covered in the Multiplan documentation.
#### Raw Byte Trigrams
@ -69,37 +100,340 @@ For example, `\x1BNj` encodes byte `0x8C`
## Record Types
| Record Type | Description |
|:------------|:---------------------|
| `ID` | Header |
| `E` | EOF |
| `B` | Worksheet Dimensions |
| `O` | Options |
| `P` | Number Format |
| `F` | Formatting |
| `C` | Cell |
The following table lists the known record types.
| Type | Description | Vintage |
|:-----|:---------------------------------|:----------|
| `ID` | [Header](#header-id) | Multiplan |
| `P` | [Style](#style-p) | Excel |
| `F` | [Format](#format-f) | Multiplan |
| `B` | [Dimensions](#dimensions-b) | Multiplan |
| `O` | [Options](#options-o) | Excel |
| `NN` | [Defined Name](#defined-name-nn) | Multiplan |
| `C` | [Cell](#cell-c) | Multiplan |
| `E` | [EOF](#eof-e) | Multiplan |
| `W` | Window Layout | Multiplan |
| `NE` | External Link | Multiplan |
| `NU` | Filename Substitution | Multiplan |
| `NL` | Chart External Link | Excel |
## EOF Record (E)
The supported fields for each type are listed in the relevant subsections. Excel
supports every field that Multiplan supports.
There are no fields.
### Header (ID)
Files must start with the `ID` record.
## Cell Record (C)
_Multiplan_
The `P` field specifies the name of the program that generated the file. This
record is not validated, although the typical value `WXL` is used in Excel.
### Comments
### Style (P)
_Undocumented_
The `P` record encodes data for multiple style tables, based on the fields. Each
table is zero-indexed.
```sylk
ID;PWXL;N;E
P;PGeneral
P;P0
P;P0.00
P;P#,##0
```
The 4 `P` records above are number format records. In the number format table,
index 0 will be `General`, index 1 will be `0`, etc.
#### Number Format Table
The `P` field indicates that the record specifies a number format. The value is
an escaped number format similar to XLS encoding. `;;` encodes a semicolon as
used in a multi-part number format. For example:
```sylk
P;P#,##0.00_);;[Red]\(#,##0.00\)
```
corresponds to the XLSX number format `#,##0.00_);[Red]\(#,##0.00\)`
#### Font Table
The four default fonts (normal, bold, italic, bold+italic) are specified with
the `F` field. Other fonts are specified with the `E` field. It appears that
Excel treats the fields as interchangeable, so either field type can be used.
Other supported fields are listed below:
| Field | Interpretation |
|------:|:----------------------------------------|
| `F/E` | Font name |
| `M` | Font size in twips |
| `L` | Indexed color (from 1 to 64) |
| `S` | Font Attributes (see table below) |
The `S` field value is a list of attribute characters:
| Value | Interpretation |
|------:|:---------------|
| `B` | Bold |
| `I` | Italic |
| `U` | Underline |
| `S` | Strikeout |
### Format (F)
This record includes worksheet-level and cell-level formatting properties. The
fields and interpretations vary based on position in the file.
#### Common Value Types
Multiplan "Cell Type" format codes:
| Value | Interpretation | Multiplan name |
|:------|:------------------|:---------------|
| `D` | Default | Def |
| `C` | "Continuous" | Cont |
| `E` | Exponential | Exp |
| `F` | Fixed Point | Fix |
| `G` | General | Gen |
| `$` | Currency | Dollar |
| `*` | Data Bar Cond Fmt | Bar Graph |
| `%` | Percentage | Percent |
Note that there is an error in the `sylksum.doc` documentation: `C` is a normal
format (the spec claims it is "currency")
Multiplan "Horizontal Alignment" format codes:
| Value | Interpretation | XLS HorizAlign |
|:------|:-----------------------------------|:----------------|
| `D` | Default | |
| `G` | General (text left, numbers right) | `0x00 ALCGEN` |
| `L` | Left | `0x01 ALCLEFT` |
| `C` | Center | `0x02 ALCCTR` |
| `R` | Right | `0x03 ALCRIGHT` |
| `X` | Fill | `0x04 ALCFILL` |
| `-` | Unspecified | `0xFF ALCNIL` |
#### Default Styling (immediately after P records)
The records in this area typically define high-level properties including the
default format and column widths.
| Field | Interpretation |
|:---------|:-----------------------------------------------------------------|
| `P#` | Default number format (index into table) |
| `M#` | Default row height in twips |
| `D_#_#` | Default cell type, decimals, horizontal alignment, column width |
For example, the following record sets the default number format to index 0,
the default cell type to "General", the left cell alignment to left, the default
column width to 8 characters, and the default row height to 32 pt:
```sylk
F;P0;DG0L8;M640
```
#### Column Widths (immediately after O record)
The `W` field specifies widths for multiple columns and takes the form:
```sylk
F;W# # # <-- 1-indexed start col, 1-indexed end col, width in characters
```
The first two parameters are the starting and ending column (1-indexed numbers)
and the last parameter is the width as measured in characters. When specifying
a single column width, the start and end should be equal:
```sylk
F;W1 1 11 <-- column "A" is 11 characters wide
F;W2 3 6 <-- columns "B" and "C" are 6 characters wide
```
#### Cell Styling (interspersed with cell records)
Cell level styling is distinguished by the absence of the `W`, `R`, `D` and `C`
fields or the presence of the `X` or `Y` fields.
`X` and `Y` fields modify the global state before applying formatting.
| Field | Interpretation |
|:-------|:---------------------------------------------------------|
| `F_#_` | Simple format: cell type, decimals, horizontal alignment |
| `S...` | Style string (see below) |
| `P#` | Number format (index into format table) |
The style string can include the following attributes:
| Value | Interpretation |
|:------|:---------------|
| `D` | Bold |
| `I` | Italic |
| `M#` | Font index |
| `L` | Left Border |
| `R` | Right Border |
| `T` | Top Border |
| `B` | Bottom Border |
| `S` | Fill "gray125" |
#### Row Heights and Styling (after column widths, before first cell of row)
The `R` field indicates that a format record applies to the specified row. In
addition to the cell styling properties, the row height can be specified with
the `M` field.
For example, the following record sets the height of row 5 to 19 pt and sets
the font to index 78 of the font table:
```sylk
F;R5;SM78;M380 <-- use index 78 of font table and set height to 19 pt for row 5
```
#### Column Styling (after column widths, before first cell of column)
The `C` field indicates that a format record applies to the specified column. As
column widths are handled separately, the supported fields are identical to the
cell-level styling fields:
```sylk
F;C1;SM78 <-- use index 78 of font table for column 1
```
### Dimensions (B)
The bounds are not authoritative, and cells can exist outside of the range.
As with XLSX/XLSB/XLS, Excel ignores this field and uses the actual cell records
to determine the dimensions.
_Multiplan_
The `Y` and `X` fields specify the number of rows and columns respectively.
_Undocumented_
The `D` field specifies the worksheet dimensions, in the order `r c R C` with
zero-indexed values. For example:
```sylk
B;Y5;X3;D3 1 4 2
```
Multiplan will interpret the dimensions based on the `Y` and `X` field, assuming
an origin of `A1`. This would be `A1:C5` in the example.
Excel will use `3 1 4 2` which is `B4:C5` (`3 1` cell `B4` and `4 2` cell `E5`)
### Options (O)
This record includes a number of workbook-level settings
_Excel_
Field interpretations in quotes do not appear to be used in Excel 2019.
| Field | Interpretation |
|:-------|:------------------------------------------------------------|
| `A# #` | XLS CalcIter / CalcDelta (enables iterative calculation) |
| `C` | "Completion test at current cell" |
| `P` | "Sheet is protected (but no password)." |
| `L` | Use A1-style formulae (default is R1C1 formulae) |
| `M` | Manual recalculation (XLS CalcMode 0) |
| `R` | Precision as displayed (XLS CalcPrecision 0) |
| `E` | "File is a macrosheet" |
_Undocumented_
| Field | Interpretation |
|:-------|:-----------------------------------------------------------------|
| `G# #` | XLS CalcIter / CalcDelta (does not enable iterative calculation) |
| `V#` | Date system: (0 = 1900, 1/2/3/4 = 1904) |
| `K#` | currently unknown (Value must be between 1 and 255) |
| `D` | currently unknown |
| `B` | currently unknown |
| `S` | currently unknown (found in Warcraft III files) |
### Defined Name (NN)
The `N` field of the `NN` record is the name of the defined name.
The `E` field is the expression (interpreted as R1C1 or A1-style depending on
the presence or absence of the `L` field in the `O` record.
```sylk
NN;N_rng;ER4C3:R7C4 <-- name "_rng" reference to `$C$4:$D$7`
NN;N_arr;E{"a","b","c";;1,2,3} <-- name "_arr" excel array {"a","b","c";1,2,3}
```
### Cell (C)
`X` and `Y` fields modify the global state before applying cell values.
The `K` field specifies the cell value. Numbers are specified as-is. Text
should be wrapped in double quotes. Logical values are specified as TRUE/FALSE.
Dates should be specified using the date codes after applying the appropriate
number format (behavior identical to XLS):
```sylk
ID;PWXL;N;E
P;PGeneral <-- format 0 is "General"
P;Pm/d/yy <-- format 1 specifies the default Date format
C;Y1;X1;K123 <-- set cell A1 value to the number 123
C;X2;K"123" <-- set cell B1 value to the string "123"
C;X3;KTRUE <-- set cell C1 value to the logical TRUE
F;Y2;P1 <-- move to cell C2, set number format to date
C;K44444 <-- set cell C2 value to the number 44444 (formatted date 9/5/21)
E
```
The `E` field specifies a formula. If the formula is included, it must be
consistent with the worksheet expression style (A1 or R1C1) in the `O` record.
#### Comments
The `A` field of the `C` record can specify plaintext comments. They are encoded
using the same text encoding in `K` fields.
### Shared Formulae
```sylk
C;Y4;X2;AHello! <-- sets comment on cell B4 to "Hello!"
```
[`comment.slk`](./comment.slk) includes a few comments with newline encoding.
#### Shared Formulae
The `S` field of the `C` record signals that a cell is using a shared formula.
The `R` and `C` fields are the 1-indexed row and column indices of the cell with
the formula. The formula should be extracted from the original location and
shifted to the current cell (relative references adjusted by the offset).
```sylk
C;Y1;X1;K1 <-- cell A1=1
C;Y2;K2;ER[-1]C+1 <-- cell B1=A1+1 (both column and row relative)
C;Y3;K3;S;R2;C1 <-- cell C1=B1+1 (shifting formula from B1 +1 row)
C;X2;K3;S;R2;C1 <-- cell C2=B2+1 (shifting formula from B1 +1 row +1 col)
```
[`shared_formula.slk`](./shared_formula.slk) includes a few shared formulae.
### EOF (E)
This must be the last record of the file. There are no fields.
## References
The Multiplan manual (1982) includes an appendix covering the SYLK format.
`sylksum.doc` (1986) with author `MCK, Microsoft` was available on a Microsoft
server. Public references to its existence date back to the 20th century.
Günter Born's "The File Formats Handbook" expands upon `sylksum.doc`. While the
core details are covered in official specs, the chart extension details are not
covered in the public specifications.
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes)