diff --git a/README.md b/README.md index f843e97..9a4ebd2 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,10 @@ -# notes -Various file format notes +# SheetJS File Format Notes + +Various spreadsheet file format notes. + +- [Symbolic Link (SLK/SYLK)](/sylk/README.md) +- [XLSB Short Records](/xlsb_short_records/README.md) + +Project sponsored by [SheetJS](https://sheetjs.com) + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..3516067 --- /dev/null +++ b/_config.yml @@ -0,0 +1 @@ +title: SheetJS File Format Notes diff --git a/sylk/README.md b/sylk/README.md new file mode 100644 index 0000000..d7e1913 --- /dev/null +++ b/sylk/README.md @@ -0,0 +1,83 @@ +# Symbolic Link format + +Files start with `ID` (`0x49 0x44`). Files are interpreted as plaintext in the +system ANSI codepage. + + +## Basics + +The file consists of a series of plaintext records. Records are separated by +newline characters (both `\r\n` and `\n` newlines are accepted by newer versions +of Excel, but generated files should prefer CRLF). + +### Fields + +A record consists of a record type and a series of fields. Each part of the +record is separated by a single `;` character. + +The literal semicolon is encoded as two consecutive semicolons `;;`. Example: + +``` +C;Y1;X1;K"abc;;def" +``` + +### Encoding + +In addition to the escaped semicolon, Excel understand two types of Encodings: + +#### Raw Byte Trigrams + +Trigrams matching the pattern `\x1B[\x20-\x2F][\x30-\x3F]` are decoded into a +single byte whose high bits are taken from the second character and whose low +bits are taken from the third character. + +For example. `"\x1B :" == "\x1B\x20\x3A` encodes the byte `"\x0A"` (newline) + +`"\x1B#;` encodes a literal semicolon. + +#### Special Escapes + +Excel also understands a set of special escapes that start with `\x1BN`. For +clarity, the `\x1BN` part is not included in the table: + +| sequence | text | +|:---------|:-----| +| `AA` | `À` | + + +## Record Types + +| Record Type | Description | +|:------------|:---------------------| +| `ID` | Header | +| `E` | EOF | +| `B` | Worksheet Dimensions | +| `O` | Options | +| `P` | Number Format | +| `F` | Formatting | +| `C` | Cell | + + +## EOF Record (E) + +There are no fields. + + +## Cell Record (C) + + +### Comments + +The `A` field of the `C` record can specify plaintext comments. They are encoded +using the same text encoding in `K` fields. + +### Shared Formulae + +The `S` field of the `C` record signals that a cell is using a shared formula. +The `R` and `C` fields are the 1-indexed row and column indices of the cell with +the formula. The formula should be extracted from the original location and +shifted to the current cell (relative references adjusted by the offset). + + + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/sylk/comment.slk b/sylk/comment.slk new file mode 100644 index 0000000..60dc901 --- /dev/null +++ b/sylk/comment.slk @@ -0,0 +1,10 @@ +ID;PWXL;N;E +P;PGeneral +F;P0;DG0G10;M320 +B;Y3;X1;D0 0 9 0 +C;Y1;X1;AArthas: :I would gladly bear any curse to save my homeland. +C;Y2;X2;AMuradin: :Leave it be, Arthas. Forget this business and lead your men home. +C;Y1;X1;K1 +C;Y1;X2;K2 +C;Y2;X1;K3 +E diff --git a/sylk/shared_formula.slk b/sylk/shared_formula.slk new file mode 100644 index 0000000..bd7b733 --- /dev/null +++ b/sylk/shared_formula.slk @@ -0,0 +1,8 @@ +ID;PWXL;N;E +P;PGeneral +F;P0;DG0G10;M320 +B;Y3;X1;D0 0 9 0 +C;Y1;X1;K1 +C;Y2;K2;ER[-1]C+1 +C;Y3;K3;S;R2;C1 +E diff --git a/xlsb_short_records/README.md b/xlsb_short_records/README.md new file mode 100644 index 0000000..14b6c09 --- /dev/null +++ b/xlsb_short_records/README.md @@ -0,0 +1,90 @@ +# XLSB Short Records + +There are 7 undocumented XLSB records (record types 12-18) that Excel supports. +They appear to specify cells using a "Short" cell structure + +## Cell Structures + +XLSB Cell structures are 8 bytes with the following layout: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +``` + +A "Short" structure is 4 bytes and omits the column: + +``` +style index (3 bytes) +flags (1 byte) +``` + +The actual column index is understood to be the column after the previous cell. +For example, if D3 was the last cell, a record using the Short structure is +defining cell E3. + +## Cell Records + +The various cell records (BrtCellBlank, BrtCellBool, etc) consist of a Cell +structure followed by the cell data. The various formula records (BrtFmlaBool, +BrtFmlaError, etc) append the formula structure to the base cell record. + +The "Short" cell records follow similar patterns but omit the 4-byte column +field from the cell structure. + +For example, record type 18 "BrtShortIsst" is the short form of BrtCellIsst. + +BrtCellIsst has the following layout: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +shared string table index (4 bytes) +``` + +BrtShortIsst omits the column index: + +``` +style index (3 bytes) +flags (1 byte) +shared string table index (4 bytes) +``` + +## Records + +| Record | Name | Long Cell Record | +|-------:|:--------------|:-----------------| +| `12` | BrtShortBlank | BrtCellBlank | +| `13` | BrtShortRk | BrtCellRk | +| `14` | BrtShortError | BrtFmlaError | +| `15` | BrtShortBool | BrtCellBool | +| `16` | BrtShortReal | BrtCellReal | +| `17` | BrtShortSt | BrtCellSt | +| `18` | BrtShortIsst | BrtCellIsst | + +Record 13 is informally referred to as "BrtShortRk". It is the short form of +BrtCellRk. BrtCellRk is a 12 byte structure: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +value stored as RkNumber (4 bytes) +``` + +The short form BrtShortRk is therefore an 8 byte structure: + +``` +style index (3 bytes) +flags (1 byte) +value stored as RkNumber (4 bytes) +``` + +## Test Files + +- [`brt_str.xlsb`](./brt_str.xlsb) includes types 12,13,14,15,16,17 +- [`brt_sst.xlsb`](./brt_sst.xlsb) includes types 12,13,14,15,16,18 + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/xlsb_short_records/brt_sst.xlsb b/xlsb_short_records/brt_sst.xlsb new file mode 100644 index 0000000..0a2f331 Binary files /dev/null and b/xlsb_short_records/brt_sst.xlsb differ diff --git a/xlsb_short_records/brt_str.xlsb b/xlsb_short_records/brt_str.xlsb new file mode 100644 index 0000000..a723f97 Binary files /dev/null and b/xlsb_short_records/brt_str.xlsb differ