html-based excel file - failing to recognize merged columns due to case sensitive properties. #1204

Closed
opened 2018-08-06 23:53:17 +00:00 by jjsquillante · 0 comments
jjsquillante commented 2018-08-06 23:53:17 +00:00 (Migrated from github.com)

Thanks for creating this open source library, it's helped save a good bit of my time!

Recently, I noticed when converting an html-based excel file:

// html-based excel ie.
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

There was a minor bug (seems similar but not related to issue #1187 ) while parsing out a merged column or colspan.

// ie.
<TD COLSPAN=2 class=xl145>some text value here...</TD>

Within the html_to_sheet (~line 18,081) function, we call parsexmltag (defined ~line 2131, invoked ~18105) function and for each <TD> look for a tag, slice at the equal sign, trim to get the attribute, and set the attribute as a property to an object.

// simplified example within `parsexmltag`.
var z = {};
q = "COLSPAN=2".slice(0,7).trim()
z[q] = 'some value';
return z;

Once the object is returned to the tag variable, we resume the html_to_sheet function flow and set CS = tag.colspan ? +tag.colspan : 1. This is where we possibly run into a case-sensitive issue with colspan. Unless the colspan property is lowercase, this will be falsy and return 1. In my case, the default attributes of this file are set and read as uppercase - COLSPAN.

I'm requesting a simple fix to uniformly set the attributes as lowercase properties before returning the object in parsexmltag. I think the simplest solution is to chain .toLowerCase():

q = cc.slice(0,c).trim(); // before
q = cc.slice(0,c).trim().toLowerCase(); // after

Please let me know if you have any questions or need more information. I'm submitting a PR and sending test files/example in an email to sheetjs@gmail.com

Thanks!
James

Thanks for creating this open source library, it's helped save a good bit of my time! Recently, I noticed when converting an html-based excel file: ``` // html-based excel ie. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> ``` There was a minor bug (seems similar but not related to issue #1187 ) while parsing out a merged column or `colspan`. ``` // ie. <TD COLSPAN=2 class=xl145>some text value here...</TD> ``` Within the `html_to_sheet` (~line 18,081) function, we call `parsexmltag` (defined ~line 2131, invoked ~18105) function and for each `<TD>` look for a tag, slice at the equal sign, trim to get the attribute, and set the attribute as a property to an object. ``` // simplified example within `parsexmltag`. var z = {}; q = "COLSPAN=2".slice(0,7).trim() z[q] = 'some value'; return z; ``` Once the `object` is returned to the `tag` variable, we resume the `html_to_sheet` function flow and set `CS = tag.colspan ? +tag.colspan : 1`. This is where we possibly run into a case-sensitive issue with `colspan`. Unless the `colspan` property is lowercase, this will be `falsy` and `return 1`. In my case, the default attributes of this file are set and read as uppercase - `COLSPAN`. I'm requesting a simple fix to uniformly set the attributes as lowercase properties before returning the `object` in `parsexmltag`. I think the simplest solution is to chain `.toLowerCase()`: ``` q = cc.slice(0,c).trim(); // before q = cc.slice(0,c).trim().toLowerCase(); // after ``` Please let me know if you have any questions or need more information. I'm submitting a PR and sending test files/example in an email to sheetjs@gmail.com Thanks! James
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1204
No description provided.