feat: Add Sheet Protection for XLS (BIFF8) #3202

Merged
sheetjs merged 1 commits from lpicchi/sheetjs:biff8-protection into master 2024-10-26 20:37:21 +00:00
Contributor

Fixes #3201

Fixes #3201
Owner

There are a few issues but good first start :)

At a high level, it is better to reference the primary source MS-XLS since the provenance of OpenOffice's documentation is murky. In case you are wondering why provenance is important, CONTRIBUTING.md is a good starting point.

The record you are trying to write in write_ws_biff8_protection is [MS-XLS] 2.4.112 FeatHdr

BIFF5 uses a different record (0x12) so the write should be guarded with a b8 test (look at how write_ws_cols_biff8 is guarded)


Try to make the write operations aligned with the spec. For example, since the record starts with a future record header (12 bytes), write the FrtHeader first:

	out.write_shift(2, 0x0867);
	out.write_shift(2, 0x0000);
	out.write_shift(4, 0x00000000);
	out.write_shift(4, 0x00000000);

After that is SharedFeatureType (2.5.237):

	out.write_shift(2, 0x0002);

After that is one byte that much be 1:

	out.write_shift(1, 0x01);

After that is cbHdrData which should be 0xFFFFFFFF:

	out.write_shift(4, 0xffffffff);

Finally we have the rgbHdrData bit flag of type EnhancedProtection 2.5.104:

	out.write_shift(2, flags);
	out.write_shift(2, 0x0000);
    // note: this could have been out.write_shift(4, flags); but splitting is fine.
There are a few issues but good first start :) At a high level, it is better to reference the primary source `MS-XLS` since the provenance of OpenOffice's documentation is murky. In case you are wondering why provenance is important, [`CONTRIBUTING.md`](https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/CONTRIBUTING.md) is a good starting point. The record you are trying to write in `write_ws_biff8_protection` is `[MS-XLS] 2.4.112 FeatHdr` BIFF5 uses a different record (0x12) so the write should be guarded with a `b8` test (look at how `write_ws_cols_biff8` is guarded) --- Try to make the write operations aligned with the spec. For example, since the record starts with a future record header (12 bytes), write the `FrtHeader` first: ```js out.write_shift(2, 0x0867); out.write_shift(2, 0x0000); out.write_shift(4, 0x00000000); out.write_shift(4, 0x00000000); ``` After that is SharedFeatureType (2.5.237): ```js out.write_shift(2, 0x0002); ``` After that is one byte that much be 1: ```js out.write_shift(1, 0x01); ``` After that is `cbHdrData` which should be `0xFFFFFFFF`: ```js out.write_shift(4, 0xffffffff); ``` Finally we have the `rgbHdrData` bit flag of type EnhancedProtection 2.5.104: ```js out.write_shift(2, flags); out.write_shift(2, 0x0000); // note: this could have been out.write_shift(4, flags); but splitting is fine. ```
lpicchi force-pushed biff8-protection from a1a14fe069 to 1cc81cd477 2024-09-10 14:27:58 +00:00 Compare
Author
Contributor

Hi @sheetjs thank you for your feedback. I made a quick reasearch (two days) into the documentation I could find in the web and how sheetjs handled CFB files creation to address an urgent requirement we had in a project (I had no previous knowledge on CFB files or BIFF8 format 😅). After implementing the solution at work I have the intention to contribute my findings to this project (I've been using it for a long time to handle Excel files an had no clue about what was happening under the hood 🙃). Given the urgency of my situation I ended up in the OpenOffice's documentation that helped me to solve the issue. Thank you for pointing me in the direction of future headers docs, now I understand each byte been written by the code.

Hi @sheetjs thank you for your feedback. I made a quick reasearch (two days) into the documentation I could find in the web and how sheetjs handled CFB files creation to address an urgent requirement we had in a project (I had no previous knowledge on CFB files or BIFF8 format 😅). After implementing the solution at work I have the intention to contribute my findings to this project (I've been using it for a long time to handle Excel files an had no clue about what was happening under the hood 🙃). Given the urgency of my situation I ended up in the OpenOffice's documentation that helped me to solve the issue. Thank you for pointing me in the direction of future headers docs, now I understand each byte been written by the code.
Author
Contributor

Updated the branch with the fixes you mentioned. I didn't realize the function write_ws_biff8 was being used to write biff5 files too, that's the reason I didn't use the b8 test, added it.

Updated the branch with the fixes you mentioned. I didn't realize the function `write_ws_biff8` was being used to write biff5 files too, that's the reason I didn't use the `b8` test, added it.
Owner

Thanks for following up! In the future, please raise an issue first and we can give some pointers.

Are the generated files valid? We'll have to take a closer look and test against older versions of Excel, but a cursory glance at the BNF suggests some of the records are in the wrong place. Look for the WORKSHEETCONTENT production in 2.1.7.20.5 Worksheet Substream. The relevant parts and line numbers against the current HEAD version are listed below:

WORKSHEETCONTENT =
  ...
  PAGESETUP {ends on line 539}
  ...
  [PROTECTION]  <-- Protect and Password records
  COLUMNS {line 541}
  ...
  Dimensions {line 543}
  [CELLTABLE] {nested loop in lines 551-562}
  ...
  *MergeCells {line 569}
  ...
  [CodeName] {line 573}
  ...
  *FEAT {line 575} <-- FeatHdr records
  ...
  EOF {line 577}

PROTECTION =
  [Protect]
  ...
  [Password]

(The PAGESETUP, PROTECTION, and CELLTABLE productions are defined in 2.1.7.20.6 Common Productions)

Based on the production rules, it would seem that the Protect and Password records must be written just before the column metadata and the FeatHdr record must be written in the FEAT block. It's probably better to put the record in the write_FEAT function since it is only invoked when exporting to BIFF8. The spec does not say anything about relative order of features, but it makes sense to add the new FeatHdr record at the start of the function (given that the existing shared feature has a larger SharedFeatureType value).


Some more background about the BIFF formats:

BIFF2/BIFF3/BIFF4 sheets are raw byte streams.

BIFF5/BIFF8 use the "OLE container format" (the code refers to it as CFB -- container file blobs -- to align with the MS-CFB terminology). The Worksheet substreams are structurally similar to the BIFF2/3/4 raw byte streams.

The XLS writer has a single entry point (write_biff_buf https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/78_writebiff.js#L695). BIFF2/3/4 writers share a common code path.

Thanks for following up! In the future, please raise an issue first and we can give some pointers. Are the generated files valid? We'll have to take a closer look and test against older versions of Excel, but a cursory glance at the BNF suggests some of the records are in the wrong place. Look for the `WORKSHEETCONTENT` production in `2.1.7.20.5 Worksheet Substream`. The relevant parts and line numbers against [the current HEAD version](https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/78_writebiff.js) are listed below: ``` WORKSHEETCONTENT = ... PAGESETUP {ends on line 539} ... [PROTECTION] <-- Protect and Password records COLUMNS {line 541} ... Dimensions {line 543} [CELLTABLE] {nested loop in lines 551-562} ... *MergeCells {line 569} ... [CodeName] {line 573} ... *FEAT {line 575} <-- FeatHdr records ... EOF {line 577} PROTECTION = [Protect] ... [Password] ``` (The `PAGESETUP`, `PROTECTION`, and `CELLTABLE` productions are defined in `2.1.7.20.6 Common Productions`) Based on the production rules, it would seem that the `Protect` and `Password` records must be written just before the column metadata and the `FeatHdr` record must be written in the `FEAT` block. It's probably better to put the record in the [`write_FEAT` function](https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/78_writebiff.js#L434) since it is [only invoked when exporting to BIFF8](https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/78_writebiff.js#L575). The spec does not say anything about relative order of features, but it makes sense to add the new `FeatHdr` record at the start of the function (given that the existing shared feature has a larger SharedFeatureType value). --- Some more background about the BIFF formats: BIFF2/BIFF3/BIFF4 sheets are raw byte streams. BIFF5/BIFF8 use the "OLE container format" (the code refers to it as CFB -- container file blobs -- to align with the `MS-CFB` terminology). The Worksheet substreams are structurally similar to the BIFF2/3/4 raw byte streams. The XLS writer has a single entry point (`write_biff_buf` https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/78_writebiff.js#L695). BIFF2/3/4 writers share a common code path.
lpicchi force-pushed biff8-protection from 1cc81cd477 to 9c406b1dec 2024-09-10 17:03:26 +00:00 Compare
Author
Contributor

After reading 2.1.7.20.5 Worksheet Substream and your clarification about FEAT block I think now the Records are written in the correct places

After reading `2.1.7.20.5 Worksheet Substream` and your clarification about `FEAT` block I think now the Records are written in the correct places
sheetjs force-pushed biff8-protection from 9c406b1dec to 58fe461d34 2024-10-26 20:37:00 +00:00 Compare
sheetjs merged commit 6c0f950f83 into master 2024-10-26 20:37:21 +00:00
Sign in to join this conversation.
No description provided.