Data Encoding #3262
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#3262
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I am using the XLSX.Read to read in an Excel, parse it into strings and then send it to a back end where I insert the data into a SQl Server. However, I currently only have SQL Server 2008 and it doesn't like the encoding on some of the characters e.g. degree symbol and plus/minus symbol and puts weird characters in front them when inserting into the database.
I am trying to force the XLSX read to store as ANSI using the command below but this makes no difference.
const workbook = XLSX.read(data, { cellStyles: true, codepage: 1252 });
Data will just be text, numbers and some mathematical symbols. Any advice on how I can resolve this issue.
Can you share more details.
Thanks for the response. For example in my Excel file I have the following: Heat Sealer Seal Temp (ºF) and 290 ± 10
When stored in the database that becomes Heat Sealer Seal Temp (ºF) and 290 ± 10
If you're seeing garbled characters, it's likely due to SQL Server 2008's default encoding. Back then SQL Server used Windows-1252 encoding for non-unicode strings since UTF-8 wasn't widely adopted yet. When your JS sends UTF-8 encoded text, this mismatch causes the corruption.
Either convert it in your backend or use the codepage library to convert your strings first before sending them.
https://git.sheetjs.com/sheetjs/js-codepage
I tried encoding my string before sending it back using the 1252 Codepage (Windows Latin) but it ends up worse in the database
In my JavaScript code I added:
sheetfields = cptable.utils.encode(1252, sheetfields, 'str');
Then in the database I end up with:
HeatTSealerTSealTTempT(¼F)
290T˜T10
Think about the whole data flow. There are 3 steps:
The
codepage
option inXLSX.read
governs how legacy data is interpreted. It applies to CSV, legacy XLS, SLK and a few other plaintext formats. The end result in all cases is a JS string (irrespective of encoding).If you are sending data to a backend using
fetch
, it's almost certainly encoded as UTF8 on the wire. That means the backend is receiving data in a UTF8 string. If data is being sent in JSON format, your decoder will generate strings.In your backend, you need to convert from UTF8 to whatever encoding is required for the SQL Server connector library.
To be absolutely sure the SheetJS library is generating the correct strings, try logging the data from the frontend:
That should show the data as your frontend sees it. If you are seeing
Heat Sealer Seal Temp (ºF) and 290 ± 10
, then the library is doing the right thing and you should debug the rest of the data flow. If you are seeingHeat Sealer Seal Temp (ºF) and 290 ± 10
, please share a file and we can take a closer look. If you can't share it publicly, send an email or a message in the SheetJS Chat