Problem with the encoding of Cyrillic characters #912
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#912
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
There are problems with the encoding of Cyrillic characters in some files. The sample file I uploaded to google drive: file xls. This file opens correctly in Excel 16.0.
The result of the conversion to csv from the page http://oss.sheetjs.com/js-xlsx/
Can it be fixed by some manipulation:
@makcbrain thanks for sharing! This is a BIFF5 XLS (Excel 5.0/95) file with no CodePage record, so there's no way to inspect the file and determine the correct encoding. To see that this is a file ambiguity, try opening this in Excel 2016 for Mac and you'll see different content corresponding to the default Mac Roman codepage 10000:
That string does correspond to the original set of bytes, as you can verify manually:
Just as discussed in #907 the final solution will involve adding a default codepage option to the read functions (e.g.
XLSX.readFile("file.xls", {codepage:1251})
)This option ({codepage: 1251}) does not change anything. What could be wrong?
Tell me please how to set the default encoding when loading a file. I will be very grateful for the answer.
The option
codepage:1251
should work in version 0.11.13. It's also exposed as the--codepage
flag in thexlsx
bin script:Hi, man. Did you find the solution?
pass the
codepage
option toread
orreadFile
https://github.com/SheetJS/sheetjs/#parsing-options .