Data Encoding #3262

Closed
opened 2024-11-29 13:27:33 +00:00 by nelliott · 5 comments

I am using the XLSX.Read to read in an Excel, parse it into strings and then send it to a back end where I insert the data into a SQl Server. However, I currently only have SQL Server 2008 and it doesn't like the encoding on some of the characters e.g. degree symbol and plus/minus symbol and puts weird characters in front them when inserting into the database.

I am trying to force the XLSX read to store as ANSI using the command below but this makes no difference.

const workbook = XLSX.read(data, { cellStyles: true, codepage: 1252 });

Data will just be text, numbers and some mathematical symbols. Any advice on how I can resolve this issue.

I am using the XLSX.Read to read in an Excel, parse it into strings and then send it to a back end where I insert the data into a SQl Server. However, I currently only have SQL Server 2008 and it doesn't like the encoding on some of the characters e.g. degree symbol and plus/minus symbol and puts weird characters in front them when inserting into the database. I am trying to force the XLSX read to store as ANSI using the command below but this makes no difference. const workbook = XLSX.read(data, { cellStyles: true, codepage: 1252 }); Data will just be text, numbers and some mathematical symbols. Any advice on how I can resolve this issue.

Can you share more details.

Can you share more details.
Author

Thanks for the response. For example in my Excel file I have the following: Heat Sealer Seal Temp (ºF) and 290 ± 10

When stored in the database that becomes Heat Sealer Seal Temp (ºF) and 290 ± 10

Thanks for the response. For example in my Excel file I have the following: Heat Sealer Seal Temp (ºF) and 290 ± 10 When stored in the database that becomes Heat Sealer Seal Temp (ºF) and 290 ± 10

If you're seeing garbled characters, it's likely due to SQL Server 2008's default encoding. Back then SQL Server used Windows-1252 encoding for non-unicode strings since UTF-8 wasn't widely adopted yet. When your JS sends UTF-8 encoded text, this mismatch causes the corruption.

Either convert it in your backend or use the codepage library to convert your strings first before sending them.

https://git.sheetjs.com/sheetjs/js-codepage

If you're seeing garbled characters, it's likely due to SQL Server 2008's default encoding. Back then SQL Server used Windows-1252 encoding for non-unicode strings since UTF-8 wasn't widely adopted yet. When your JS sends UTF-8 encoded text, this mismatch causes the corruption. Either convert it in your backend or use the codepage library to convert your strings first before sending them. https://git.sheetjs.com/sheetjs/js-codepage
Author

I tried encoding my string before sending it back using the 1252 Codepage (Windows Latin) but it ends up worse in the database

In my JavaScript code I added:
sheetfields = cptable.utils.encode(1252, sheetfields, 'str');

Then in the database I end up with:

HeatTSealerTSealTTempT(¼F)
290T˜T10

I tried encoding my string before sending it back using the 1252 Codepage (Windows Latin) but it ends up worse in the database In my JavaScript code I added: sheetfields = cptable.utils.encode(1252, sheetfields, 'str'); Then in the database I end up with: HeatTSealerTSealTTempT(¼F) 290T˜T10
Owner

Think about the whole data flow. There are 3 steps:

  1. Data is pulled from files in the front-end

The codepage option in XLSX.read governs how legacy data is interpreted. It applies to CSV, legacy XLS, SLK and a few other plaintext formats. The end result in all cases is a JS string (irrespective of encoding).

  1. Data is sent from the front-end to the back-end

If you are sending data to a backend using fetch, it's almost certainly encoded as UTF8 on the wire. That means the backend is receiving data in a UTF8 string. If data is being sent in JSON format, your decoder will generate strings.

  1. Data is stored in the database

In your backend, you need to convert from UTF8 to whatever encoding is required for the SQL Server connector library.


To be absolutely sure the SheetJS library is generating the correct strings, try logging the data from the frontend:

const workbook = XLSX.read(data);
workbook.SheetNames.map(name => workbook.Sheets[name]).forEach(ws => {
  console.log(XLSX.utils.sheet_to_json(ws));
});

That should show the data as your frontend sees it. If you are seeing Heat Sealer Seal Temp (ºF) and 290 ± 10, then the library is doing the right thing and you should debug the rest of the data flow. If you are seeing Heat Sealer Seal Temp (ºF) and 290 ± 10, please share a file and we can take a closer look. If you can't share it publicly, send an email or a message in the SheetJS Chat

Think about the whole data flow. There are 3 steps: 1) Data is pulled from files in the front-end The `codepage` option in `XLSX.read` governs how legacy data is interpreted. It applies to CSV, legacy XLS, SLK and a few other plaintext formats. The end result in all cases is a JS string (irrespective of encoding). 2) Data is sent from the front-end to the back-end If you are sending data to a backend using `fetch`, it's almost certainly encoded as UTF8 on the wire. That means the backend is receiving data in a UTF8 string. If data is being sent in JSON format, your decoder will generate strings. 3) Data is stored in the database In your backend, you need to convert from UTF8 to whatever encoding is required for the SQL Server connector library. --- To be absolutely sure the SheetJS library is generating the correct strings, try logging the data from the frontend: ```js const workbook = XLSX.read(data); workbook.SheetNames.map(name => workbook.Sheets[name]).forEach(ws => { console.log(XLSX.utils.sheet_to_json(ws)); }); ``` That should show the data as your frontend sees it. If you are seeing `Heat Sealer Seal Temp (ºF) and 290 ± 10`, then the library is doing the right thing and you should debug the rest of the data flow. If you are seeing `Heat Sealer Seal Temp (ºF) and 290 ± 10`, please share a file and we can take a closer look. If you can't share it publicly, send an email or a message in the [SheetJS Chat](https://sheetjs.com/chat)
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3262
No description provided.