tag regex doesn't handle attributes containing > #768

Closed
opened 2017-08-08 15:57:05 +00:00 by MonochromeChameleon · 2 comments
MonochromeChameleon commented 2017-08-08 15:57:05 +00:00 (Migrated from github.com)

I've recently had a number of excel files fail when parsing the styles, with a fairly unclear error message:

TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object.
at fromObject (buffer.js:262:9)
at Function.Buffer.from (buffer.js:101:10)
at Buffer (buffer.js:80:17)
at utf8readc (.xlsx/xlsx.js:1717:52)
at parse_numFmts (.xlsx/xlsx.js:7046:23)
at parse_sty_xml (./xlsx/xlsx.js:7147:34)
at parse_sty (./xlsx/xlsx.js:12836:9)
at parse_zip (./xlsx/xlsx.js:17104:26)
at read_zip (./xlsx/xlsx.js:17393:9)
at Object.readSync [as read] (./xlsx/xlsx.js:17418:68)

I've tracked the issue down to the fact that the current regex used for detecting tags is /<[^>]*>/g, which can't handle <numFmt numFmtId="164" formatCode="[>0]General" /> as it matches the closing angle bracket inside the formatCode attribute and interprets that as the end of the tag. The regex should be updated to include any number of attributes, disregarding their content, prior to the end of the tag.

I've recently had a number of excel files fail when parsing the styles, with a fairly unclear error message: ``` TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object. at fromObject (buffer.js:262:9) at Function.Buffer.from (buffer.js:101:10) at Buffer (buffer.js:80:17) at utf8readc (.xlsx/xlsx.js:1717:52) at parse_numFmts (.xlsx/xlsx.js:7046:23) at parse_sty_xml (./xlsx/xlsx.js:7147:34) at parse_sty (./xlsx/xlsx.js:12836:9) at parse_zip (./xlsx/xlsx.js:17104:26) at read_zip (./xlsx/xlsx.js:17393:9) at Object.readSync [as read] (./xlsx/xlsx.js:17418:68) ``` I've tracked the issue down to the fact that the current regex used for detecting tags is `/<[^>]*>/g`, which can't handle `<numFmt numFmtId="164" formatCode="[>0]General" />` as it matches the closing angle bracket inside the formatCode attribute and interprets that as the end of the tag. The regex should be updated to include any number of attributes, disregarding their content, prior to the end of the tag.
SheetJSDev commented 2017-08-08 16:14:05 +00:00 (Migrated from github.com)

@MonochromeChameleon thanks for the report! Do you know how were these files produced? In your example, if you put the format code in Excel and save, the generated xml has the encoded &gt;

@MonochromeChameleon thanks for the report! Do you know how were these files produced? In your example, if you put the format code in Excel and save, the generated xml has the encoded `&gt;`
MonochromeChameleon commented 2017-08-08 16:25:42 +00:00 (Migrated from github.com)

I'm not sure exactly where the format code was coming from - I wasn't adding it myself! I'm using excel on a mac instead of on windows, which may have made a difference?

I'm not sure exactly where the format code was coming from - I wasn't adding it myself! I'm using excel on a mac instead of on windows, which may have made a difference?
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#768
No description provided.