Weird character conversion when input XLSX file contains _X2069_ #3177
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#3177
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
(this was originally posted on StackOverflow as a question regarding node-xlsx )
I'm having a weird problem with reading XLSX files when the input contains the character sequence
_X2069_
(underscore X 2 0 6 9 underscore).Upon reading, the string
_X2069_
is converted to the Unicode character U+2069 (Pop Directional Isolate).Here's a minimal code sample that exhibits the problem (I tried different values for the "type" option, but it seems to have no effect):
Here's my input (see also attached file input.xlsx):
Here's a screenshot of the output:
OS: Windows 11 (German language settings)
Excel: 2016 (used to generate the input file)
xlsx: v0.20.2
This looks like it might be related to Issue 72: full Unicode support, but I'm not 100% sure.
How can I ensure my input is parsed properly?
Thanks for raising an issue!
In XLSX, certain unicode characters are encoded using a special representation
_x####_
based on the hexadecimal code. For example, the string_x2069_
is actually stored as_x005F_x2069_
(where the first underscore is encoded as_x005F_
, the character code for_
)The current code treats the
x
as case insensitive, so_X2069_
is treated as the encoded version of the string. Fortunately it is a one-character fix:So the root cause for the issue is that the code thinks that
_X2069_
is actually_x2069_
and therefore part of a special representation? Ok, that explains it. Do you have a rough estimate when the fixed version will become available?