XLSX.utils.table_to_book convert #673

Closed
opened 2017-06-05 07:03:56 +00:00 by huhm · 10 comments
huhm commented 2017-06-05 07:03:56 +00:00 (Migrated from github.com)

I use XLSX.utils.table_to_book to download the excel file from a table in the html page.
There's something wrong in the auto converting.
Some values like 11% will be converted to date.
How can I get it to convert correctly?
Here is the example:
https://jsfiddle.net/f8rxfrj5/2/

I use XLSX.utils.table_to_book to download the excel file from a table in the html page. There's something wrong in the auto converting. Some values like 11% will be converted to date. How can I get it to convert correctly? Here is the example: https://jsfiddle.net/f8rxfrj5/2/
SheetJSDev commented 2017-06-05 15:27:50 +00:00 (Migrated from github.com)

@huhm good catch! the JS native date mechanism recognizes that as a date.

$ TZ="Asia/Shanghai" node -pe 'new Date("11.01%")'
2001-10-31T16:00:00.000Z
$ TZ="America/Los_Angeles" node -pe 'new Date("11.01%")'
2001-11-01T08:00:00.000Z

We can check for common numeric formats before falling back to the date.

@huhm good catch! the JS native date mechanism recognizes that as a date. ``` $ TZ="Asia/Shanghai" node -pe 'new Date("11.01%")' 2001-10-31T16:00:00.000Z $ TZ="America/Los_Angeles" node -pe 'new Date("11.01%")' 2001-11-01T08:00:00.000Z ``` We can check for common numeric formats before falling back to the date.
cristhiank commented 2017-07-10 09:36:13 +00:00 (Migrated from github.com)

+1 Having the same issue

+1 Having the same issue
samighawi commented 2017-07-12 09:29:44 +00:00 (Migrated from github.com)

Any ETA to fix this bug please? If it requires time, is there a way to disable automatic date formatting entirely as I don't need it as a temporary solution?

Any ETA to fix this bug please? If it requires time, is there a way to disable automatic date formatting entirely as I don't need it as a temporary solution?
huhm commented 2017-07-12 11:55:26 +00:00 (Migrated from github.com)

I finally use the method: wb.addSheetFromArray in which way I can format by myself;

I finally use the method: wb.addSheetFromArray in which way I can format by myself;
nknapp commented 2017-07-12 14:48:45 +00:00 (Migrated from github.com)

Is the problem here? https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L1511
I think raw numbers in the CSV should not be dates.

The funny thing is, that when parsing CSV, in

"a","b"
"1","x"

"1" is interpreted as date, but in

"a","b"
1,"x"

(without quotes) it isn't

Is the problem here? https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L1511 I think raw numbers in the CSV should not be dates. The funny thing is, that when parsing CSV, in ```csv "a","b" "1","x" ``` `"1"` is interpreted as date, but in ```csv "a","b" 1,"x" ``` (without quotes) it isn't
SheetJSDev commented 2017-07-12 20:06:12 +00:00 (Migrated from github.com)

@huhm @nknapp tl;dr: it's a complex problem, you can sidestep the whole thing by passing raw:true option to XLSX.read and XLSX.readFile

CSV, like the Lotus formatted PRN and other simple plaintext formats, leaves value interpretation up to the application. Even the "spec" RFC4180 is ambiguous :(

So what does Excel do? It appears that they interpret values as if you typed them into the formula bar. Formulas are processed -- try the following CSV data in Excel:

1,2,3
=A1,=A1+B1,=A1+B1+C1

In the percentage case case, the problem is that the number matching is too strict (https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L534 in the bits, https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L5678 in the final script).

In general, the problem is compounded by localization (try changing your computer regional settings!)

Short term solution: as part of https://github.com/SheetJS/js-xlsx/issues/719 we added a raw option to XLSX.read and XLSX.readFile -- this skips all of the guess work and just returns strings. See https://runkit.com/sheetjs/5964eadbea82990011acd7f7 for a demonstration.

Long term solution: eventually the libraries will need to process and understand locale information.

@huhm @nknapp tl;dr: it's a complex problem, you can sidestep the whole thing by passing `raw:true` option to `XLSX.read` and `XLSX.readFile` CSV, like the Lotus formatted PRN and other simple plaintext formats, leaves value interpretation up to the application. Even the "spec" RFC4180 is ambiguous :( So what does Excel do? It *appears* that they interpret values as if you typed them into the formula bar. Formulas are processed -- try the following CSV data in Excel: ```csv 1,2,3 =A1,=A1+B1,=A1+B1+C1 ``` In the percentage case case, the problem is that the number matching is too strict (https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L534 in the bits, https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L5678 in the final script). In general, the problem is compounded by localization (try changing your computer regional settings!) Short term solution: as part of https://github.com/SheetJS/js-xlsx/issues/719 we added a `raw` option to `XLSX.read` and `XLSX.readFile` -- this skips all of the guess work and just returns strings. See https://runkit.com/sheetjs/5964eadbea82990011acd7f7 for a demonstration. Long term solution: eventually the libraries will need to process and understand locale information.
nknapp commented 2017-07-13 07:26:11 +00:00 (Migrated from github.com)

With xlsx.read(data, { raw: true}) the column with the value "1" still appears as

{
  "z": "m/d/yy",
  "t": "n",
  "v": 36892.041666666664,
  "w": "1/1/01"
}

My short term solution now is to create a fork that disables date-parsing altogether (we don't need it).
But once this is fixed, I would like to go back to the original library again.

With `xlsx.read(data, { raw: true})` the column with the value `"1"` still appears as ```json { "z": "m/d/yy", "t": "n", "v": 36892.041666666664, "w": "1/1/01" } ``` My short term solution now is to create a fork that disables date-parsing altogether (we don't need it). But once this is fixed, I would like to go back to the original library again.
SheetJSDev commented 2017-07-13 08:00:28 +00:00 (Migrated from github.com)

@nknapp Are you testing against xlsx 0.10.8? What does console.log(xlsx.version) show?

Against 0.10.8 it appears to give the correct value for your first example:

$ cat t.csv 
"a","b"
"1","x"
$ node -pe 'require("xlsx").readFile("t.csv", {raw:false}).Sheets.Sheet1.A2'
{ z: 'm/d/yy', t: 'n', v: 36892.041666666664, w: '1/1/01' }
$ node -pe 'require("xlsx").readFile("t.csv", {raw:true}).Sheets.Sheet1.A2'
{ t: 's', v: '"1"' }

If you just tested against "1" it's because the parser isn't recognizing it as CSV:

$ node -pe 'require("xlsx").read("\"1\"", {type:"binary", raw:true}).Sheets.Sheet1'
{ A1: { v: 36892.041666666664, z: 'm/d/yy', t: 'n', w: '1/1/01' }, '!ref': 'A1' }

The same fix has to be applied to the PRN parser. Can you check if the following diff resolves the issue:

 var PRN = (function() {
-       function set_text_arr(data, arr, R, C) {
-               if(data === 'TRUE') arr[R][C] = true;
+       function set_text_arr(data, arr, R, C, o) {
+               if(o.raw) arr[R][C] = data;
+               else if(data === 'TRUE') arr[R][C] = true;
                else if(data === 'FALSE') arr[R][C] = false;
@@ ... @@
        function prn_to_aoa_str(f, opts) {
+               var o = opts || {};
                var arr = ([]);
@@ ... @@
                        /* TODO: confirm that widths are always 10 */
                        var C = 0;
-                       set_text_arr(lines[R].slice(0, start).trim(), arr, R, C);
+                       set_text_arr(lines[R].slice(0, start).trim(), arr, R, C, o);
                        for(C = 1; C <= (lines[R].length - start)/10 + 1; ++C)
-                               set_text_arr(lines[R].slice(start+(C-1)*10,start+C*10).trim(),arr,R,C);
+                               set_text_arr(lines[R].slice(start+(C-1)*10,start+C*10).trim(),arr,R,C,o);
                }

You can either apply it directly to xlsx.js or apply to bits/40_harb.js and run make.

(Note: for plaintext files, if there is no comma or tab character, the parser assumes the output is Lotus Formatted Text PRN)

@nknapp Are you testing against xlsx 0.10.8? What does `console.log(xlsx.version)` show? Against 0.10.8 it appears to give the correct value for your first example: ```bash $ cat t.csv "a","b" "1","x" $ node -pe 'require("xlsx").readFile("t.csv", {raw:false}).Sheets.Sheet1.A2' { z: 'm/d/yy', t: 'n', v: 36892.041666666664, w: '1/1/01' } $ node -pe 'require("xlsx").readFile("t.csv", {raw:true}).Sheets.Sheet1.A2' { t: 's', v: '"1"' } ``` If you just tested against `"1"` it's because the parser isn't recognizing it as CSV: ```bash $ node -pe 'require("xlsx").read("\"1\"", {type:"binary", raw:true}).Sheets.Sheet1' { A1: { v: 36892.041666666664, z: 'm/d/yy', t: 'n', w: '1/1/01' }, '!ref': 'A1' } ``` The same fix has to be applied to the PRN parser. Can you check if the following diff resolves the issue: ```diff var PRN = (function() { - function set_text_arr(data, arr, R, C) { - if(data === 'TRUE') arr[R][C] = true; + function set_text_arr(data, arr, R, C, o) { + if(o.raw) arr[R][C] = data; + else if(data === 'TRUE') arr[R][C] = true; else if(data === 'FALSE') arr[R][C] = false; @@ ... @@ function prn_to_aoa_str(f, opts) { + var o = opts || {}; var arr = ([]); @@ ... @@ /* TODO: confirm that widths are always 10 */ var C = 0; - set_text_arr(lines[R].slice(0, start).trim(), arr, R, C); + set_text_arr(lines[R].slice(0, start).trim(), arr, R, C, o); for(C = 1; C <= (lines[R].length - start)/10 + 1; ++C) - set_text_arr(lines[R].slice(start+(C-1)*10,start+C*10).trim(),arr,R,C); + set_text_arr(lines[R].slice(start+(C-1)*10,start+C*10).trim(),arr,R,C,o); } ``` You can either apply it directly to xlsx.js or apply to bits/40_harb.js and run make. (Note: for plaintext files, if there is no comma or tab character, the parser assumes the output is Lotus Formatted Text PRN)
nknapp commented 2017-07-13 08:36:02 +00:00 (Migrated from github.com)

I had version "0.10.5", sorry about that. "raw:true" works for my CSV (which I sadly can't share) with "0.10.8".

But it keeps the surrounding quotes in the value and removing those may probably only be done in CSV and not when reading xlsx-files.

I think it is easier and safer for me to stick to my fork at the moment, but I would like to thank you for your quick responses.

I had version "0.10.5", sorry about that. "raw:true" works for my CSV (which I sadly can't share) with "0.10.8". But it keeps the surrounding quotes in the value and removing those may probably only be done in CSV and not when reading xlsx-files. I think it is easier and safer for me to stick to my fork at the moment, but I would like to thank you for your quick responses.
thiagopetherson commented 2021-08-21 20:22:49 +00:00 (Migrated from github.com)

@huhm @nknapp tl; dr: é um problema complexo, você pode contornar a coisa toda passando a raw:trueopção para XLSX.readeXLSX.readFile

O CSV, como o PRN formatado pela Lotus e outros formatos de texto simples simples, deixa a interpretação do valor para o aplicativo. Mesmo a "especificação" RFC4180 é ambígua :(

Então, o que o Excel faz? Ele parece que eles interpretam valores como se você digitou-los na barra de fórmulas. As fórmulas são processadas - tente os seguintes dados CSV no Excel:

1,2,3
=A1,=A1+B1,=A1+B1+C1

No caso da porcentagem, o problema é que a correspondência do número é muito restrita ( https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L534 nos bits, https: // github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L5678 no script final).

Em geral, o problema é agravado pela localização (tente alterar as configurações regionais do seu computador!)

Solução de curto prazo: como parte do # 719 , adicionamos uma rawopção para XLSX.reade XLSX.readFile- isso ignora todo o trabalho de adivinhação e retorna apenas strings. Consulte https://runkit.com/sheetjs/5964eadbea82990011acd7f7 para uma demonstração.

Solução de longo prazo: eventualmente, as bibliotecas precisarão processar e entender as informações de local.

it worked for me. I love you ♥

> @huhm @nknapp tl; dr: é um problema complexo, você pode contornar a coisa toda passando a `raw:true`opção para `XLSX.read`e`XLSX.readFile` > > O CSV, como o PRN formatado pela Lotus e outros formatos de texto simples simples, deixa a interpretação do valor para o aplicativo. Mesmo a "especificação" RFC4180 é ambígua :( > > Então, o que o Excel faz? Ele _parece_ que eles interpretam valores como se você digitou-los na barra de fórmulas. As fórmulas são processadas - tente os seguintes dados CSV no Excel: > > ``` > 1,2,3 > =A1,=A1+B1,=A1+B1+C1 > ``` > > No caso da porcentagem, o problema é que a correspondência do número é muito restrita ( https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L534 nos bits, [https: // github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L5678](https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L5678) no script final). > > Em geral, o problema é agravado pela localização (tente alterar as configurações regionais do seu computador!) > > Solução de curto prazo: como parte do [# 719](https://github.com/SheetJS/sheetjs/issues/719) , adicionamos uma `raw`opção para `XLSX.read`e `XLSX.readFile`- isso ignora todo o trabalho de adivinhação e retorna apenas strings. Consulte https://runkit.com/sheetjs/5964eadbea82990011acd7f7 para uma demonstração. > > Solução de longo prazo: eventualmente, as bibliotecas precisarão processar e entender as informações de local. it worked for me. I love you ♥
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#673
No description provided.