Discrepancy between dates parsed from CVS and XLSX #712

Open
opened 6 years ago by chrishenx · 5 comments
chrishenx commented 6 years ago (Migrated from github.com)

Hi everyone,

I have loaded a CSV file which has a cell with this value "07/02/11 19:08", and the resulting cell stored in the parsed sheet is this:

CSV Wihout XLSX.parse Options

Apparently, the default format is "m/d/yy" and new Date().getTimezoneOffset() gives me 300 minutes. The timezone offset is something I can manually deal with, so there is no problem with it. What I want is to parse my file using the format "d/m/yy h:mm" for dates, but when I set the options for XLSX.read to { cellNF: true, dateNF: 'd/m/yy h:mm'} I get the following parsed cell:

CSV With XLSX.parse Options

So, as you can see, the day and month are still swapped.

Then I saved my CSV file in MS Excel as XLSX, without touching the date cells.

Without { cellNF: true, dateNF: 'd/m/yy h:mm'} I get this:

con xlsx - sin formato

That is actually what I needed and no timezone offset is added, lets see what happens when I include { cellNF: true, dateNF: 'd/m/yy h:mm'} in the XLSX.parse function:

con xslx - con formato

Well, it is not following the format I specified, but still the month and day are correct and no offset is added again.

So, can someone please explain to me what is happening here?

It would be awesome if I could just read the raw string values of the cells, and reject all cells that do not match the "d/m/yy h:mm" format, but I have read the docs and I could not find any hint of an option like that.

I don't know if these are the expected outputs, but I will really appreciate some hint of what can I do, since I was considering using another parser just for CSVs.

Thanks a lot for your time!

Hi everyone, I have loaded a CSV file which has a cell with this value **"07/02/11 19:08"**, and the resulting cell stored in the parsed sheet is this: ![CSV Wihout XLSX.parse Options](https://user-images.githubusercontent.com/6645155/27743457-abfba054-5d81-11e7-98da-623e7f3174b7.png) Apparently, the default format is **"m/d/yy"** and ```new Date().getTimezoneOffset()``` gives me 300 minutes. The timezone offset is something I can manually deal with, so there is no problem with it. What I want is to parse my file using the format **"d/m/yy h:mm"** for dates, but when I set the options for ```XLSX.read``` to ```{ cellNF: true, dateNF: 'd/m/yy h:mm'}``` I get the following parsed cell: ![CSV With XLSX.parse Options](https://user-images.githubusercontent.com/6645155/27743816-dc735488-5d82-11e7-9877-523487fc6511.png) So, as you can see, the day and month are still swapped. Then I saved my CSV file in MS Excel as XLSX, without touching the date cells. Without ```{ cellNF: true, dateNF: 'd/m/yy h:mm'}``` I get this: ![con xlsx - sin formato](https://user-images.githubusercontent.com/6645155/27744146-0d45ea16-5d84-11e7-813c-b4d6923e6e92.png) That is actually what I needed and no timezone offset is added, lets see what happens when I include ```{ cellNF: true, dateNF: 'd/m/yy h:mm'}``` in the ```XLSX.parse``` function: ![con xslx - con formato](https://user-images.githubusercontent.com/6645155/27744332-b3e6eb9a-5d84-11e7-938b-60243269293b.png) Well, it is not following the format I specified, but still the month and day are correct and no offset is added again. So, can someone please explain to me what is happening here? It would be awesome if I could just read the raw string values of the cells, and reject all cells that do not match the **"d/m/yy h:mm"** format, but I have read the docs and I could not find any hint of an option like that. I don't know if these are the expected outputs, but I will really appreciate some hint of what can I do, since I was considering using another parser just for CSVs. Thanks a lot for your time!
reviewher commented 6 years ago (Migrated from github.com)

Excel generally stores those numbers you see in the v field, and the library has to jump through some hoops to deduce the original date. The numbers themselves are relative to the timezone of the author (the 0 does not correspond to a specific absolute date!) but the files don't actually store the author's timezone! So there's a bit of dark magic involved and it is currently not locale aware.

The proper solution is a proper localization mechanism for the formatter library https://github.com/sheetjs/ssf -- in quite a few places it assumes en_US locale.

So to first order: if you set your machine regional settings to US English, do you see consistent behavior?

Excel generally stores those numbers you see in the `v` field, and the library has to jump through some hoops to deduce the original date. The numbers themselves are relative to the timezone of the author (the `0` does not correspond to a specific absolute date!) but the files don't actually store the author's timezone! So there's a bit of dark magic involved and it is currently not locale aware. The proper solution is a proper localization mechanism for the formatter library https://github.com/sheetjs/ssf -- in quite a few places it assumes en_US locale. So to first order: if you set your machine regional settings to US English, do you see consistent behavior?
chrishenx commented 6 years ago (Migrated from github.com)

Hi @reviewher,

I have set my regional settings to US English: window.navigator.languages gives me: ["en", "es-ES", "es"] ; and window.navigator.language: "en-US".

But the outputs remains the same. I have to remark that I only have trouble with CSV files, since those are plan text files, is there an option to turn off the parsing of values and just get the raw text?

Thanks in advance!

Hi @reviewher, I have set my regional settings to US English: ```window.navigator.languages``` gives me: **["en", "es-ES", "es"]** ; and ```window.navigator.language```: **"en-US"**. But the outputs remains the same. I have to remark that I only have trouble with CSV files, since those are plan text files, is there an option to turn off the parsing of values and just get the raw text? Thanks in advance!
chrishenx commented 6 years ago (Migrated from github.com)

I mean, how is this even possible?

captura de pantalla 2017-08-22 a la s 18 44 32
I mean, how is this even possible? <img width="206" alt="captura de pantalla 2017-08-22 a la s 18 44 32" src="https://user-images.githubusercontent.com/6645155/29592503-2a653650-876a-11e7-985c-42c765e7a6d1.png">
SheetJSDev commented 6 years ago (Migrated from github.com)

At some level, the date mechanism is determined by the browser support. If you really want to see something wild, try this in chrome:

new Date("How is this a date 1") // Mon Jan 01 2001
new Date("12 Marks") // Mon March 12 2001

What was the original string that gave you the result of "13/1/16 0:00:00" ?

At some level, the date mechanism is determined by the browser support. If you really want to see something wild, try this in chrome: ```js new Date("How is this a date 1") // Mon Jan 01 2001 new Date("12 Marks") // Mon March 12 2001 ``` What was the original string that gave you the result of `"13/1/16 0:00:00"` ?
chrishenx commented 6 years ago (Migrated from github.com)

@SheetJSDev This one: "13/01/16" That is January 13th, 2016

@SheetJSDev This one: "13/01/16" That is January 13th, 2016
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#712
Loading…
There is no content yet.