From 89a4acbcdfc321dec9dec93e82cee682fcd5aa1d Mon Sep 17 00:00:00 2001 From: SheetJS Date: Wed, 18 Aug 2021 15:03:20 -0400 Subject: [PATCH] sylk and xlsb_short_records --- README.md | 12 ++++- _config.yml | 1 + sylk/README.md | 83 +++++++++++++++++++++++++++++ sylk/comment.slk | 10 ++++ sylk/shared_formula.slk | 8 +++ xlsb_short_records/README.md | 90 ++++++++++++++++++++++++++++++++ xlsb_short_records/brt_sst.xlsb | Bin 0 -> 14152 bytes xlsb_short_records/brt_str.xlsb | Bin 0 -> 13763 bytes 8 files changed, 202 insertions(+), 2 deletions(-) create mode 100644 _config.yml create mode 100644 sylk/README.md create mode 100644 sylk/comment.slk create mode 100644 sylk/shared_formula.slk create mode 100644 xlsb_short_records/README.md create mode 100644 xlsb_short_records/brt_sst.xlsb create mode 100644 xlsb_short_records/brt_str.xlsb diff --git a/README.md b/README.md index f843e97..9a4ebd2 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,10 @@ -# notes -Various file format notes +# SheetJS File Format Notes + +Various spreadsheet file format notes. + +- [Symbolic Link (SLK/SYLK)](/sylk/README.md) +- [XLSB Short Records](/xlsb_short_records/README.md) + +Project sponsored by [SheetJS](https://sheetjs.com) + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..3516067 --- /dev/null +++ b/_config.yml @@ -0,0 +1 @@ +title: SheetJS File Format Notes diff --git a/sylk/README.md b/sylk/README.md new file mode 100644 index 0000000..d7e1913 --- /dev/null +++ b/sylk/README.md @@ -0,0 +1,83 @@ +# Symbolic Link format + +Files start with `ID` (`0x49 0x44`). Files are interpreted as plaintext in the +system ANSI codepage. + + +## Basics + +The file consists of a series of plaintext records. Records are separated by +newline characters (both `\r\n` and `\n` newlines are accepted by newer versions +of Excel, but generated files should prefer CRLF). + +### Fields + +A record consists of a record type and a series of fields. Each part of the +record is separated by a single `;` character. + +The literal semicolon is encoded as two consecutive semicolons `;;`. Example: + +``` +C;Y1;X1;K"abc;;def" +``` + +### Encoding + +In addition to the escaped semicolon, Excel understand two types of Encodings: + +#### Raw Byte Trigrams + +Trigrams matching the pattern `\x1B[\x20-\x2F][\x30-\x3F]` are decoded into a +single byte whose high bits are taken from the second character and whose low +bits are taken from the third character. + +For example. `"\x1B :" == "\x1B\x20\x3A` encodes the byte `"\x0A"` (newline) + +`"\x1B#;` encodes a literal semicolon. + +#### Special Escapes + +Excel also understands a set of special escapes that start with `\x1BN`. For +clarity, the `\x1BN` part is not included in the table: + +| sequence | text | +|:---------|:-----| +| `AA` | `À` | + + +## Record Types + +| Record Type | Description | +|:------------|:---------------------| +| `ID` | Header | +| `E` | EOF | +| `B` | Worksheet Dimensions | +| `O` | Options | +| `P` | Number Format | +| `F` | Formatting | +| `C` | Cell | + + +## EOF Record (E) + +There are no fields. + + +## Cell Record (C) + + +### Comments + +The `A` field of the `C` record can specify plaintext comments. They are encoded +using the same text encoding in `K` fields. + +### Shared Formulae + +The `S` field of the `C` record signals that a cell is using a shared formula. +The `R` and `C` fields are the 1-indexed row and column indices of the cell with +the formula. The formula should be extracted from the original location and +shifted to the current cell (relative references adjusted by the offset). + + + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/sylk/comment.slk b/sylk/comment.slk new file mode 100644 index 0000000..60dc901 --- /dev/null +++ b/sylk/comment.slk @@ -0,0 +1,10 @@ +ID;PWXL;N;E +P;PGeneral +F;P0;DG0G10;M320 +B;Y3;X1;D0 0 9 0 +C;Y1;X1;AArthas: :I would gladly bear any curse to save my homeland. +C;Y2;X2;AMuradin: :Leave it be, Arthas. Forget this business and lead your men home. +C;Y1;X1;K1 +C;Y1;X2;K2 +C;Y2;X1;K3 +E diff --git a/sylk/shared_formula.slk b/sylk/shared_formula.slk new file mode 100644 index 0000000..bd7b733 --- /dev/null +++ b/sylk/shared_formula.slk @@ -0,0 +1,8 @@ +ID;PWXL;N;E +P;PGeneral +F;P0;DG0G10;M320 +B;Y3;X1;D0 0 9 0 +C;Y1;X1;K1 +C;Y2;K2;ER[-1]C+1 +C;Y3;K3;S;R2;C1 +E diff --git a/xlsb_short_records/README.md b/xlsb_short_records/README.md new file mode 100644 index 0000000..14b6c09 --- /dev/null +++ b/xlsb_short_records/README.md @@ -0,0 +1,90 @@ +# XLSB Short Records + +There are 7 undocumented XLSB records (record types 12-18) that Excel supports. +They appear to specify cells using a "Short" cell structure + +## Cell Structures + +XLSB Cell structures are 8 bytes with the following layout: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +``` + +A "Short" structure is 4 bytes and omits the column: + +``` +style index (3 bytes) +flags (1 byte) +``` + +The actual column index is understood to be the column after the previous cell. +For example, if D3 was the last cell, a record using the Short structure is +defining cell E3. + +## Cell Records + +The various cell records (BrtCellBlank, BrtCellBool, etc) consist of a Cell +structure followed by the cell data. The various formula records (BrtFmlaBool, +BrtFmlaError, etc) append the formula structure to the base cell record. + +The "Short" cell records follow similar patterns but omit the 4-byte column +field from the cell structure. + +For example, record type 18 "BrtShortIsst" is the short form of BrtCellIsst. + +BrtCellIsst has the following layout: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +shared string table index (4 bytes) +``` + +BrtShortIsst omits the column index: + +``` +style index (3 bytes) +flags (1 byte) +shared string table index (4 bytes) +``` + +## Records + +| Record | Name | Long Cell Record | +|-------:|:--------------|:-----------------| +| `12` | BrtShortBlank | BrtCellBlank | +| `13` | BrtShortRk | BrtCellRk | +| `14` | BrtShortError | BrtFmlaError | +| `15` | BrtShortBool | BrtCellBool | +| `16` | BrtShortReal | BrtCellReal | +| `17` | BrtShortSt | BrtCellSt | +| `18` | BrtShortIsst | BrtCellIsst | + +Record 13 is informally referred to as "BrtShortRk". It is the short form of +BrtCellRk. BrtCellRk is a 12 byte structure: + +``` +column index (4 bytes) +style index (3 bytes) +flags (1 byte) +value stored as RkNumber (4 bytes) +``` + +The short form BrtShortRk is therefore an 8 byte structure: + +``` +style index (3 bytes) +flags (1 byte) +value stored as RkNumber (4 bytes) +``` + +## Test Files + +- [`brt_str.xlsb`](./brt_str.xlsb) includes types 12,13,14,15,16,17 +- [`brt_sst.xlsb`](./brt_sst.xlsb) includes types 12,13,14,15,16,18 + +[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/notes?pixel)](https://github.com/SheetJS/notes) diff --git a/xlsb_short_records/brt_sst.xlsb b/xlsb_short_records/brt_sst.xlsb new file mode 100644 index 0000000000000000000000000000000000000000..0a2f331c4620a210aeba0b17f2fcb65080cd7936 GIT binary patch literal 14152 zcmeGjS!^UnwZ`k+giS(DDOn_-)f|yn?D6qn}cK{h2skZ3h!0*RXsIhdv^&6 zq^<7hu6kF!_v+R4yt1^f_aMP*$-Alg;$xq`@lz#2@V*hsOwU*eJU=WOUcmad9DDK( zcxW3e2rbW@(kuM~y2f0?GcC6{rQcm!=o{0uFruzWZO>&>`Zf#oJEpHYIBED3NCe5T zAhKAf0VOw_F#IXK6-E9;xf~iT=FqSoAY7e?71{0n;cmn>oyl!g8g*QqEu_7C502f!E|99V^=Qp-r*4dD^BS zDpTmK&`JR}H#hq?hqxFZf2I7%!?6s1hy41*JP$b{~OVz5>5Q=bPW_8>eG|QeBZVV8iBV> zS6hrlcU33L`NGLXM$!3Js1=0MlN-^*1~Vcr&_e5TV5&h~tJ9DnxhZ{v1{QTAAd6t) zgTLE;7zNY!c)?nTJi^H`0Ej~VnupE5Tf@^84gfDQAQBLQK!q$~&5CRmo@%VnAW}$N z$xF;-5b_FfxDEu*ynMXLz^OpHoEf{|U!Qe17Jmhu9Pit}zqai1X4?ER*FvSgZn+fD0Uwe6d8%l|Jy)K3@+i1uqzdhv!3D)U8|vZ5MyHE z^YJK&1%@5)fr?2O9M)W`AVjzw&f0;tLG39W_?X^i4F(2XvuPMX#gsn4FBsn@%XA_} zvZD^TIlKja#dtA-xi-jo2F-fPw6ZWfeq=7C&jIYH9d3Spes;c+(>sFkm?uZy2=}wImGM0@@1F^HECzJh@G672FJym3JD(98(2& z=lU5*)@mzQk1Pe}rIsTBE_=!x}oAdOI^vd^XonY z)x;?!-Osj|B<6|@Gg4UDxGd7J&Rb)+8i;#D>ZSKumZaBZ%Lu&CYed?;mNr8zrAlg6 zU2>!hY0+}PF59X$gQ+FmET7h9JX_W3Ikq7~LWiKXLiif9B{kkj*CXnvNY0kk(~^d$=MN@@0Gd)tvAVZpklqIrUPULJDJ zTbpfB$5+8tEw`12Jsz%si&AaHi&Wstp4>z58|*;cj^;jXG0EIJ2ZFn;rX*74AXu{= zC~SEr^CbVRZrcsYc$9&VE=6j`a=V3X+k&_)IJ_ISr}Semh^acyVj);}f_n#s=sCLX zt}y5o?eXs22>ot^*8k@tbf><2<>PV|iXlqlghb4>`8>nvIPq>+wq1?3ZFVB$bMFua zAael<5s&W|vUDNlx8N0fGBA+fngQiFTHrn(x6j>;NxT z(nh8*_m9MCa|G8RB65T@TJ}s)JcPx`aw8?frE0=%8=XiP4IzQuG8;M)6NR>23bP6d zwQFt_E*cMt61TL;sAAkeM(w9fgThpsFtgEMJ_K4Gg&k*P4Kqn*POU zqPW$fAuVb$VIrwGr_BntUE)G*?aNHJH*Ka`1EuXO1b2(h{Gj17fi^*>1jX z*{-%ZrH_vcj_8^};ZURjD_wYXd@x#wb3tl1;mE~^0^zkg_C?-Fvlfr5y{ZD67acl` z)Q$ZhjOJ+A5;o*jv8ikqISKg(M~2bau1o_I!DM$*42^+{{MV!aQ_H4hjRtJc=Z&2! zLEVYZI1+m6aPCxXnVVYOUJq6&m>+bC?J=zoHY|9D#t+zCLv8aaIkv`;CclTo;Y2yL z{T7X*CGO57Jx+l0c#_J%zjES1+my2N#_hc+*yFEMQ}S;7f7TQn4qyji$V3Ymu(=yh z4SULz?gdfHgDB*;EMp;nai7Ny@D|vig=$bf9MkgOfNkKf=yQSw@7Ayoj8?6t7FaNx zM6H0a6$ne*`*t8!iH9CR(vAsnkIIP~`a(^gsDK z?m_h?&QbFHO^Rgc&?)-B+dXj+of#Tcm`kAlCi+iBOh@-@^b`0TJsBhO4Z(_RmNcC7 z3Tf~gACFmGq6oVgfOBK5h*)hc9vn%811q+-7z)$*j1T)j8a`nBvjW2w=M^}h*1T1S ztr{FZ2>of`)qNP%z=JSAinlth--5DHvfbEwkIGIb zQ8GKJFJ4Jy#~PM&Ey_Bo8Z2AvqEOUG)*;+5{Gcm;7Uk8?fT8DL*MHr{b8~2(4`1SC zCE(sw_ss41)egmgE4&WZFbHY8tE{0}=2o))NHrn4kT@YprpO!`(a_OS#39w9K}L+> z5mk(qDe~aX$SDTr3?Vy3<{N4%0&qSS2X0*20lWyZqSe^B5>46e<)~~I!mCxLT>0!) zWn!V&)liq9vS_Qj%3@7{^4c8_PB;S!ORp-AzcAu&R>~{&e!| zZLI%OFS#Hn9KQX}KnG5wd1l3dI18W$Pts=zB{rPR*Wn=BA}{rl-}UxlRzTkzJV?(` zc)-;jOo1)W7wT*OEH9 zhLG2L$wg5&fLSm+c>+)QPx71qhb!)lUh?;Sm-fzDB7}ySP%p&;oEc9zQtG*8t znZnOpUKRnTA#w!gnewwV0J%Gfx-+G8f^DF6i(ab zU0R176!-utYWf>pBtb?pKo>m0}FQlFyrwgPdzFZ(!h4$wufHXg_+3h%q2JOmTQ_BiTdimfDit;B(EK4 zckiW~*beGzOV-8ztIF6z!b?x0U(NTgOB}+b zJ_@UeJWcI5jO4*_b5PgBxVp#8e2I3KJpCFCnhaLzsfCjNFF@x}2{g2OfmF%rvRn#N z0{NmyiPLnD2=cUdtD>DHk|8o-ieVCDDd*J_7M$+f4~{?oGjKS(-v>?y_`>fnge5t* z?+_onq6*GJcxvdhD9jHD9jv{F4k6o8q}|H?Cx!0J*itc0J>Qnjz{i?Qh_-R8gF=R{ zr2C7ZCKDnz^r$?()N>2z{69%|4^B3hUWI7%3tbbJjED8CZV;a?vS@1d6*H z8G%oHM981+c@$BLE8PtBq1%<~-XekGIGYhzDodko5wRF6GsK@)xX>LU=gQK4+#&!E OJV@?`IQ!FK{Q5WcgB9Tb literal 0 HcmV?d00001 diff --git a/xlsb_short_records/brt_str.xlsb b/xlsb_short_records/brt_str.xlsb new file mode 100644 index 0000000000000000000000000000000000000000..a723f97c1b1d32ce265c1ce3339ace88c2750702 GIT binary patch literal 13763 zcmeGjS!^UnwZ`@)WOFA-$sz%*1`&zHGag@STeh?7E8cj$Yk4Lc0zy$ucg;+%`%-uN zLiq@rfO1HIgd79~35gFR5D+Pm_}ToxFJFA{1Io|lAO{N94=7Q1ud1)=sqwW26i8d$ z(_Qu6t5@%;>v?6Vx9=#yYstH%_PeJZxcMt3Lh!yB%1qB#2|Pcn7+%22TaG<>Cp@$b z7KE1PPU+S1fUYsu@J!2XP3ia47y8F^EsUsZQrmOcl)lYE{m$v@k4_r?1QJ1VEQl-? zYCy>iCk%f|Z%2_oQK^JRn>jQr1B45eO)qe01fM~x;!|Udwpe9wU|_W3u!x#8qMWOK zKc%S0BuyhF>8}SiCtw;CW;2JmQCO*#tCb8kVu2GX5qM1w(y^j#ANmxto2P9WqB4cv z3au1yb91x2ImFoj{;QQQFQ2LLuJ&6l%F1ARWqL(Y#N)Rv-agnv2;Mj46+wOf%JfiF zK+5F~()XH8%V2Y!v5o>1N?=f}&s@@VNZFZA+)+{4VhKUN~&Dv>9F{;{O=BzzFO@j^OR-NIktfutyD z5QE%&1D4^pNf>G7zFVYCEFhOq=pa}61ft?~U0?Pj$FM29IuLrU6BWP-yj;)GF&K(B z&d9N(lO4!?i9|^0#v`1L_$iTNymct*z&XN2iNN1~9y=y5ez^~JgZk0X6Xc{6^}@CF9FdE=q5xm$BsuiYPq`RQU@Z} zIUazuIzVliYTi{oW7bBQ!gS^e4)kqtyE)0*Km`mw%06Xf0o1dSbov)_! zH~=*a7%u0Qo*Q+uHHTjCf&~CTKpVzwSHn3|lNw;fv(#z82#gKT z2}W}@qR0q4im)v$Gy=RnLLGaly!(vqrE#jCz||5(aGpZ3XH1s4W4W-KMq* zZidOiJ0C#sPyybxenH~3)(+MqOVRRD+mQg5J+5by-FUE`hg+p| zL)G>ybtTo#ulo>G6Qh)5Kig&!n=3X<9$^vUvPi=^Z;j!^J@$yyO7F8QNv_M55qP23 zjI{eKZH8J(k<_h*WJnp(qUC^IwpD!wRZFs2KCjJqwyM{2Y(s_whoH8C`8u;DG2TVj zBkHI~)~RFX1%8G`Z55NX?ZC*x&BIh1T4%PW&6_L?RXFDY=xvZaWWqI$Q%2rP_)Yslb;#*+cLf^g!LN>OOBVN!_~# zf_to%#8PG;ShF4|ba@x^B>Syx+fB-Nl!1^gMQX=#cM99K1#w%n@Nw9l(oex4rgEOe zLa^=z_Yn-yb9CKZVc->=@$TaY{c(iW|K}rgH(x&dxSWMzh|)MA5p!)m&v3dAf%u~C@&M`E=(g6j|wIYJsOd!{HJ!s2APk&@wDwP0h6P9zNtA%VR% zHgqB;3Vl0BV^w5o&#hHCYdk1Qt)*Q?5#wGjN=*&7!_V_+ixH7-Ea zvT9kg347^zW#>vzcH%RRgx)$F5!Kq}rq;06gH;OZ2bp4jOe=&93+~YP0lRA`ZC)kE z);QAS_mDW8D5tjHrg5~y-I>JW1UQc;DG&TB#~$=e$vf}d{=0%6e^_0~oALiyS8zCh z9fTnhHDJKzZa_8klquZ{qP7Q7$ZuQ5LIC4Fj~!qwutN*gAbdDc<-Y;jz+d5Wf(CbM zSO`X|R!a*k7*3*gz}O0eC2oDY5Ua*Rk05Esgt$fJ*bRNbc7ts&yN)q%6l1WiwZYVR zG>O9ihb!8j{2hBxqlI&nyuV42EFC(95B#SmF2b3iQ-!$%+HWHNl*M#(&qhCi&(V`H zGT#xb$Yx2yiC0L2-}rdk)B#!8%K)4kYemHB=<(o4!W?L^v&T@F&S!ks2h#8XqDNXrtKDl zjhyX8-}@AHI*F2-lYH?og&j*+lC>!7Dr(Sd(M6%CnXE&&V)#K<{w&ICTmVJS!LI+h zjYr(jJs-Zr%SynltKpg3@v9w*0asWZu3-?;c2`+bv&^kz{gFySxR5v@Nv6mg8qv_u zQp6$EqCrNC;Sp7gmMQY!?#L+yZy7>%ip)3ER0QCBEDqebv;%k%WJRyhxe`g)?v;pa z56r7Yrd;{#Rb*nJxTB&DAhKv{XOYE{0^xNU9-MFn6qa5!9)Drb5thzPwvZBW+vn~t zUk7L7cwaA$gF{qHh=f{n6+`1=L9%haPRh!fAnG+0QqMxdB1lB->}Pd zo?Bu=ib4&N6t+Rcj4Z=#f{@oiTjLyR1->?bTGRz#!MA-hc{*s ze03K58qTWk$DE_^Ygd*{RErZ!*o~?vV zN`vjPL*cY--lYxLL4gmTqNcyGBVR~g&O+=LIdG7m#&h7xfIlblHM%r_35vV)VCNPb z&t&#-PDb5@hT|6+fQ@L#k;HN}hL3v_k}?Dama8#ZR#Sbzd&{tXpSzm8$GG zBg%lu(<;NZ<%h>(M?43ay@3V0f0&$jmZu(73Ta@ww6+hg+@Ud%hMCJl+-s|85+lm1 zO9OoH&n0>7LVIT~<-~RoUp9@{rKwDKeB#lq4@s76nw7l^*+LHz3&3>jT=woxok&{* z=~E^;e9QPu=k_USI~gNgQugf=FR_rBdp{? zFxk)3)K0@d8=N)=bxjO=`^<5dXz!5mUZ+8e!9qIKP?D*S#-n0rXzvWE(!OQB6siRB zMV1ny>B13YJnvORyK^K%WXu$!81Pcbizh4?-4kO!`bj?+9NwP-qXT?lcNk%kl-Q4m z4IZk1vk03iIxPzG5xa}F_u&w-E=5+XY=1oZW=ige>|J;`R38x0J!2vl?Nvrr>iHz; z{tq``X}PPzsaKX{^y0sXWVnyQvNA=F9k7U9{3n16`z^?Nk3FViC@xYdrj2J*K7jP~ zg5MR9i#gaC@}FRF2gpU076}wr?HPghA%z?WWa=oQ7FTW=>d+m^bzG4^abU^_JX4m2 jsUl)AxMYY&tDNaBp=M?22rhZye=)8h(-4?Ez5xFh^{1hF literal 0 HcmV?d00001