Some text in string cell broken Exel #1341

Closed
opened 2018-11-06 08:42:21 +00:00 by Lebed71 · 6 comments
Lebed71 commented 2018-11-06 08:42:21 +00:00 (Migrated from github.com)

Hello! I have a generated file. In Numbers or LibreOffice he open correct. But if I open he by Excel all broken. Excel error log don't give anything info. Can you please check it? Archive with file in attach
big6.xls.zip

Hello! I have a generated file. In Numbers or LibreOffice he open correct. But if I open he by Excel all broken. Excel error log don't give anything info. Can you please check it? Archive with file in attach [big6.xls.zip](https://github.com/SheetJS/js-xlsx/files/2552232/big6.xls.zip)
SheetJSDev commented 2018-11-06 09:00:56 +00:00 (Migrated from github.com)

There's a neat BIFF format validator at https://www.microsoft.com/en-us/download/details.aspx?id=26794 -- the output gives you a sense for where to look in the binary stream.

One cell has an extremely long string, whose byte representation is about 10K bytes. Excel enforces this 8224 byte rule (so writers would have to create "Continue" records and split up the original string into chunks) but other tools don't. Since we aren't generating the continue structure, Excel is choking.

To generate a valid file, try taking a slice of the first 1000 characters of the string.

There's a neat BIFF format validator at https://www.microsoft.com/en-us/download/details.aspx?id=26794 -- the output gives you a sense for where to look in the binary stream. One cell has an extremely long string, whose byte representation is about 10K bytes. Excel enforces this 8224 byte rule (so writers would have to create "Continue" records and split up the original string into chunks) but other tools don't. Since we aren't generating the continue structure, Excel is choking. To generate a valid file, try taking a slice of the first 1000 characters of the string.
Lebed71 commented 2018-11-06 10:41:12 +00:00 (Migrated from github.com)

Thanks! Could you please give me link on some information about "Continue records"?

Thanks! Could you please give me link on some information about "Continue records"?
SheetJSDev commented 2018-11-06 10:52:45 +00:00 (Migrated from github.com)

The conceptual overview is covered in [MS-XLS] 2.1.4 Record.

On the parsing side, the slurp function takes a peek at the next record to decide if there is a continuation.

If you are interested in contributing a fix, write_biff_rec takes a payload and generates the appropriate header and record. That part should inspect the payload length and, if it is greater than 8224, split it up into blocks of 8224 bytes with Continue records (record type 0x3c)

The conceptual overview is covered in [`[MS-XLS] 2.1.4 Record`](https://msdn.microsoft.com/en-us/library/dd943823.aspx). On the parsing side, the [`slurp` function](https://github.com/SheetJS/js-xlsx/blob/master/bits/76_xls.js#L37) takes a peek at the next record to decide if there is a continuation. If you are interested in contributing a fix, [`write_biff_rec`](https://github.com/SheetJS/js-xlsx/blob/master/bits/78_writebiff.js#L1) takes a payload and generates the appropriate header and record. That part should inspect the payload length and, if it is greater than 8224, split it up into blocks of 8224 bytes with Continue records (record type 0x3c)
Lebed71 commented 2018-11-07 15:57:55 +00:00 (Migrated from github.com)

Thanks! But I have a problem on some first step.
I understend, what continue record algorithm has some parts:

  1. write first record with this record type and with max record length (8224)
  2. other parts write how continue-records with type 0x3c
    But when I tried slice/split input buffer and get first part with length 8224 and write only it the output file was broken.

Example:

Existing code:

function write_biff_rec(ba/*:BufArray*/, type/*:number|string*/, payload, length/*:?number*/)/*:void*/ {
	var t/*:number*/ = +type || +XLSRE[/*::String(*/type/*::)*/];
	if(isNaN(t)) return;
	var len = length || (payload||[]).length || 0;
	var o = ba.next(4);
	o.write_shift(2, t);
	o.write_shift(2, len);
	if(/*:: len != null &&*/len > 0 && is_buf(payload)) ba.push(payload);
}

My example code:

function write_biff_rec(ba/*:BufArray*/, type/*:number|string*/, payload, length/*:?number*/)/*:void*/ {
	var t/*:number*/ = +type || +XLSRE[/*::String(*/type/*::)*/];
	if(isNaN(t)) return;
	var len = length || (payload||[]).length || 0;
	var o = ba.next(4);
        if (len > 8224) {
            var first_part_record = payload.slice(0, 8224);
            o.write_shift(2, t);
	    o.write_shift(2, 8224);
	    if(/*:: len != null &&*/len > 0 && is_buf(first_part_record)) ba.push(first_part_record);
        } else {
           o.write_shift(2, t);
           o.write_shift(2, len);
           if(/*:: len != null &&*/len > 0 && is_buf(payload)) ba.push(payload);
        }
}

Maybe I don't no how correct slice a part of buffer?

Thanks! But I have a problem on some first step. I understend, what continue record algorithm has some parts: 1) write first record with this record type and with max record length (8224) 2) other parts write how continue-records with type 0x3c But when I tried slice/split input buffer and get first part with length 8224 and write only it the output file was broken. Example: Existing code: ``` function write_biff_rec(ba/*:BufArray*/, type/*:number|string*/, payload, length/*:?number*/)/*:void*/ { var t/*:number*/ = +type || +XLSRE[/*::String(*/type/*::)*/]; if(isNaN(t)) return; var len = length || (payload||[]).length || 0; var o = ba.next(4); o.write_shift(2, t); o.write_shift(2, len); if(/*:: len != null &&*/len > 0 && is_buf(payload)) ba.push(payload); } ``` My example code: ``` function write_biff_rec(ba/*:BufArray*/, type/*:number|string*/, payload, length/*:?number*/)/*:void*/ { var t/*:number*/ = +type || +XLSRE[/*::String(*/type/*::)*/]; if(isNaN(t)) return; var len = length || (payload||[]).length || 0; var o = ba.next(4); if (len > 8224) { var first_part_record = payload.slice(0, 8224); o.write_shift(2, t); o.write_shift(2, 8224); if(/*:: len != null &&*/len > 0 && is_buf(first_part_record)) ba.push(first_part_record); } else { o.write_shift(2, t); o.write_shift(2, len); if(/*:: len != null &&*/len > 0 && is_buf(payload)) ba.push(payload); } } ``` Maybe I don't no how correct slice a part of buffer?
Lebed71 commented 2018-12-17 07:46:52 +00:00 (Migrated from github.com)

@SheetJSDev can you give me some answer please?

@SheetJSDev can you give me some answer please?
SheetJSDev commented 2021-09-12 20:15:41 +00:00 (Migrated from github.com)

When writing long strings using the shared string table (bookSST: true), Continue records are generated.

The inline strings (Label records) are capped at 255 characters. To wrap up the issue, we'd accept a PR that enforces the length by slicing and optionally warning. The places to slice are

https://github.com/SheetJS/sheetjs/blob/master/bits/78_writebiff.js#L72

			write_biff_rec(ba, 0x0004, write_BIFF2LABEL(R, C, (cell.v||"").slice(0,255)));

https://github.com/SheetJS/sheetjs/blob/master/bits/78_writebiff.js#L198

			} else write_biff_rec(ba, "Label", write_Label(R, C, (cell.v||"").slice(0,255), os, opts));
When writing long strings using the shared string table (`bookSST: true`), Continue records are generated. The inline strings (Label records) are capped at 255 characters. To wrap up the issue, we'd accept a PR that enforces the length by slicing and optionally warning. The places to slice are https://github.com/SheetJS/sheetjs/blob/master/bits/78_writebiff.js#L72 ```js write_biff_rec(ba, 0x0004, write_BIFF2LABEL(R, C, (cell.v||"").slice(0,255))); ``` https://github.com/SheetJS/sheetjs/blob/master/bits/78_writebiff.js#L198 ```js } else write_biff_rec(ba, "Label", write_Label(R, C, (cell.v||"").slice(0,255), os, opts)); ```
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1341
No description provided.