sylk and xlsb_short_records

This commit is contained in:
SheetJS 2021-08-18 15:03:20 -04:00
parent a2d9e018bf
commit 89a4acbcdf
8 changed files with 202 additions and 2 deletions

View File

@ -1,2 +1,10 @@
# notes
Various file format notes
# SheetJS File Format Notes
Various spreadsheet file format notes.
- [Symbolic Link (SLK/SYLK)](/sylk/README.md)
- [XLSB Short Records](/xlsb_short_records/README.md)
Project sponsored by [SheetJS](https://sheetjs.com)
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes)

1
_config.yml Normal file
View File

@ -0,0 +1 @@
title: SheetJS File Format Notes

83
sylk/README.md Normal file
View File

@ -0,0 +1,83 @@
# Symbolic Link format
Files start with `ID` (`0x49 0x44`). Files are interpreted as plaintext in the
system ANSI codepage.
## Basics
The file consists of a series of plaintext records. Records are separated by
newline characters (both `\r\n` and `\n` newlines are accepted by newer versions
of Excel, but generated files should prefer CRLF).
### Fields
A record consists of a record type and a series of fields. Each part of the
record is separated by a single `;` character.
The literal semicolon is encoded as two consecutive semicolons `;;`. Example:
```
C;Y1;X1;K"abc;;def"
```
### Encoding
In addition to the escaped semicolon, Excel understand two types of Encodings:
#### Raw Byte Trigrams
Trigrams matching the pattern `\x1B[\x20-\x2F][\x30-\x3F]` are decoded into a
single byte whose high bits are taken from the second character and whose low
bits are taken from the third character.
For example. `"\x1B :" == "\x1B\x20\x3A` encodes the byte `"\x0A"` (newline)
`"\x1B#;` encodes a literal semicolon.
#### Special Escapes
Excel also understands a set of special escapes that start with `\x1BN`. For
clarity, the `\x1BN` part is not included in the table:
| sequence | text |
|:---------|:-----|
| `AA` | `À` |
## Record Types
| Record Type | Description |
|:------------|:---------------------|
| `ID` | Header |
| `E` | EOF |
| `B` | Worksheet Dimensions |
| `O` | Options |
| `P` | Number Format |
| `F` | Formatting |
| `C` | Cell |
## EOF Record (E)
There are no fields.
## Cell Record (C)
### Comments
The `A` field of the `C` record can specify plaintext comments. They are encoded
using the same text encoding in `K` fields.
### Shared Formulae
The `S` field of the `C` record signals that a cell is using a shared formula.
The `R` and `C` fields are the 1-indexed row and column indices of the cell with
the formula. The formula should be extracted from the original location and
shifted to the current cell (relative references adjusted by the offset).
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes)

10
sylk/comment.slk Normal file
View File

@ -0,0 +1,10 @@
ID;PWXL;N;E
P;PGeneral
F;P0;DG0G10;M320
B;Y3;X1;D0 0 9 0
C;Y1;X1;AArthas: :I would gladly bear any curse to save my homeland.
C;Y2;X2;AMuradin: :Leave it be, Arthas. Forget this business and lead your men home.
C;Y1;X1;K1
C;Y1;X2;K2
C;Y2;X1;K3
E

8
sylk/shared_formula.slk Normal file
View File

@ -0,0 +1,8 @@
ID;PWXL;N;E
P;PGeneral
F;P0;DG0G10;M320
B;Y3;X1;D0 0 9 0
C;Y1;X1;K1
C;Y2;K2;ER[-1]C+1
C;Y3;K3;S;R2;C1
E

View File

@ -0,0 +1,90 @@
# XLSB Short Records
There are 7 undocumented XLSB records (record types 12-18) that Excel supports.
They appear to specify cells using a "Short" cell structure
## Cell Structures
XLSB Cell structures are 8 bytes with the following layout:
```
column index (4 bytes)
style index (3 bytes)
flags (1 byte)
```
A "Short" structure is 4 bytes and omits the column:
```
style index (3 bytes)
flags (1 byte)
```
The actual column index is understood to be the column after the previous cell.
For example, if D3 was the last cell, a record using the Short structure is
defining cell E3.
## Cell Records
The various cell records (BrtCellBlank, BrtCellBool, etc) consist of a Cell
structure followed by the cell data. The various formula records (BrtFmlaBool,
BrtFmlaError, etc) append the formula structure to the base cell record.
The "Short" cell records follow similar patterns but omit the 4-byte column
field from the cell structure.
For example, record type 18 "BrtShortIsst" is the short form of BrtCellIsst.
BrtCellIsst has the following layout:
```
column index (4 bytes)
style index (3 bytes)
flags (1 byte)
shared string table index (4 bytes)
```
BrtShortIsst omits the column index:
```
style index (3 bytes)
flags (1 byte)
shared string table index (4 bytes)
```
## Records
| Record | Name | Long Cell Record |
|-------:|:--------------|:-----------------|
| `12` | BrtShortBlank | BrtCellBlank |
| `13` | BrtShortRk | BrtCellRk |
| `14` | BrtShortError | BrtFmlaError |
| `15` | BrtShortBool | BrtCellBool |
| `16` | BrtShortReal | BrtCellReal |
| `17` | BrtShortSt | BrtCellSt |
| `18` | BrtShortIsst | BrtCellIsst |
Record 13 is informally referred to as "BrtShortRk". It is the short form of
BrtCellRk. BrtCellRk is a 12 byte structure:
```
column index (4 bytes)
style index (3 bytes)
flags (1 byte)
value stored as RkNumber (4 bytes)
```
The short form BrtShortRk is therefore an 8 byte structure:
```
style index (3 bytes)
flags (1 byte)
value stored as RkNumber (4 bytes)
```
## Test Files
- [`brt_str.xlsb`](./brt_str.xlsb) includes types 12,13,14,15,16,17
- [`brt_sst.xlsb`](./brt_sst.xlsb) includes types 12,13,14,15,16,18
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes)

Binary file not shown.

Binary file not shown.