What is the best practice to read date type value? #1565
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#1565
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
With
cellDates
read option, xlsx library tries to convert date-type cell to js Date object.However, it does not seem to respect
date1904
property of sheet when constructing js Date object. https://github.com/SheetJS/js-xlsx/issues/126excel_date.xlsx
The above code with the attached excel file gives the following result:
I expected that the generated js Date objects are of '2019-01-01', but they are skewed due to
date1904
problem.I converted all js Date values in my program.
But I think It would be better that the library do this magical conversion so that users do not need to consider
date1904
anymore.Am I missing useful option?
I am also experiencing this, the dates are wrong already in the workbook returned by
readFile
. This must be a bug since it breaks thecellDates
functionality, the dates in the workbook can't be relied upon due to this issue.I am having trouble finding a good workaround for this. Could you explain your workaround @jngbng? Did you manually add 4 years to dates in the workbook before converting it to JSON when
date1904
is true?@zoeesilcock There are two more problems. SSF module output, instead of JS native Date, is preferable to represent date when importing excel file.
TL;DR. I am using following workaround code.
Run above code with tz_test_dates.xlsx and will get following result:
tz_test_dates.xlsx preview:
Detail
sheet_to_json
uses following code to convert Excel date code to JS Date object.issue 1: precision bug. refer to #1470
On some countries, you may lose some time (in Korea, -52 sec when parsing).
The problem is that
Date.getTimezoneOffset()
is not precise enough. (SheetJS/ssf#38)Run above code after setting computer's time zone to Asia/Seoul (UTC+09:00) then will get:
Notice that 52 seconds error has gone, but '1960-01-01' and '1908-01-01' are not correctly parsed.
It is due to following issue.
issue2: timezone offset is not constant within one time zone.
Noice that
dnthresh
depends on the timezone offset of CURRENT TIME.But on some countries, timezone offset changes (or have changed) over time.
In Korea, it is GMT+09:00 now, but it was GMT+08:30 in 1960 and GMT+08:27 in 1908.
In Los Angeles in US, it is GMT-08:00 in January and GMT-07:00 in October due to summer time.
For these countries,
dnthresh
should not be constant and we should consider time zone change.SSF module which is timezone-agnostic rescues us.
Above code gives following result:
On LosAngeles timezone:
On Asia/Seoul timezone:
@jngbng thanks for going over this! Was not aware of the minute differences in KST.
The date handling is a bit of a mess and we likely have to revisit both SSF and this library. In the next push, we're going to deprecate the SSF repo and merge the contents as a package in this repo to make it easier to update both at the same time (they are separate because originally the XLS parsing was a separate library and both used this component)
Since you looked into this a bit, maybe you can share your opinion about the best representation for the library:
Excel dates
Under the hood, Excel dates are numbers representing the number of days (+ fraction of a day) from a given epoch.
The default epoch is December 31 1899 (rendered as "1/0/00"). When the 1904 Date System is used, the default epoch is January 1 1904.
As is the case with JS, Excel days are exactly 86400 seconds long, even during DST adjustments.
How Excel deviates from JS
!(year % 4) && ((year % 100) || !(year % 400))
Excel has a fake day with date code 60 ("2/29/00"). Other things like "day of week" were changed for dates before 2/29/00 in order to be consistent.This is why SSF has a special date struct -- to support the date that JS cannot.
"relative time": The Excel epochs are dates are relative to the computer timezone. In the 1900 date system, if you store a cell with value 0 and format "yyyy-mm-dd hh:mm:ss", the formatted text will be "1900-01-00 00:00:00" no matter what timezone your computer uses! JS has a concept of Universal Time.
"timezone": The previous point wouldn't normally be a huge issue, but Excel doesn't store any information about the timezone of the computer that saved the file. There's no way to know the universal date corresponding to the value. XLS has a "Country" record, but that's useless for countries like USA which have multiple timezones.
File Dates
Excel has three ways of storing dates in files:
Every Excel format can store a number with a format code that represents a field from a date interpretation (e.g. "yyyy" for 4-digit date)
XLSX has a special date type "d" which can understand ISO8601 strings (
2019-10-31T00:00:00.000Z
is interpreted as midnight of 2019-10-31)Plaintext formats like CSV have values that are interpreted by the core Excel engine.
What SheetJS currently attempts to do
The internal representation of date cells is the date as understood in the JS engine timezone.
For example, in ET (currently UTC-400), a date cell like
or a number cell using a date format like
will be parsed and stored as
43769
ifcellDates
is not specified OR2019-10-31T04:00:00.000Z
ifcellDates
is set to true.The main reason is to make the easier to supply dates from JS. For example:
will create a file that stores the date code
43769.625
even thoughnew Date("October 31 2019 3:00 PM")
changes depending on the timezone: in ET,new Date("October 31 2019 3:00 PM").valueOf()
is1572548400000
, while it is1572534000000
in UTC.Design goals and considerations
"Date representation": Converting between XLSX type "d" and Excel Date codes should work irrespective of the computer timezone setting.
"Round-trip": Reading a file and writing it back should always store the same date.
"Cross-Timezone": Server-side processing is tricky because the timezone of the user and server oftentimes differ.
"Limited complexity": Limited date shifting (adding or subtracting the timezone offset) in the end-user code
Since you mentioned that the timezone offset changed by a little bit, we'll rethink that part.
I have not looked into this as deeply yet, but it is also a problem for me - ideally I want to interpret dates as UTC, perhaps this could be an option? I realise that you may then run into round-tripping problems. It also seems in new versions as opposed to old ones - say
0.10.0
you can't manually tell whether an XLS cell is a date after reading a workbook?Naively, as a first solution to the fractional timezone issues, the timezone offset could be worked out using
.getTime()
and a constant value?This thread has ballooned to include a general discussion about handling dates. Could we return to the original issue, namely that
readFile
doesn't return correct data? As was described in the original issue, the library correctly identifies that the file is based on 1904 rather than 1900 but the dates in the result don't reflect that. Surely this isn't meant to work this way?I've just been bitten by this—shouldn't we autodetect which date format we have, instead of returning an off-by-4-years date?
@Fil "off by 4 years" sounds like a problem with the date system (1900 vs 1904) which is not related to the issue at hand. Please raise a new issue and include an example
I don't understand your comment. The issue as described in the OP gives the example of a file with four rows that contain a date of 2019-01-01. When it's opened with xlsx, three of the dates are transformed (on my computer in European timezone) into 2014-12-30T23:59:39, which is off by 4 years, 1 day and 21 seconds. For the OP (evaluated in Korean TZ), the offset is 4 years, 9 hours and 42 seconds.
How about changing the basedate to UTC?
So I have a file that's created in the Norway timezone and I'm in India and when I convert the dates using
cellDates: true
there's this one day difference in some dates. I'm using node.js.Dates in the sheet:
24.09.2020
10.09.2020
10.09.2020
10.09.2020
15.09.2020
25.09.2020
Dates after doing this:
2020-09-24T17:29:49.999Z
2020-09-09T19:29:50.000Z
2020-09-09T19:29:50.000Z
2020-09-09T19:29:50.000Z
2020-09-14T19:29:50.000Z
2020-09-24T19:29:50.000Z
There's a one day difference. The date 10/09/2020 is coming as 09/09/2020. 15th as 14 and 25th as 24.
I'm not sure if the issue is because of the timezone difference or because of the above mentioned problems that I don't understand much.. What should I do to get the date as in the sheet? Is the date I got after reading file in IST ? Should I convert it to Norway time? Also how would you know which time zone the excel date is in and the converted date is in ?
@SheetJSDev
For my project i do not need any dates before year 2000 so i convert all the values of 1900 to 2000 and 1899 to 1999 in sheetjs code, that has one side affect that this problem has gone away but all my excel dates became 100 years ago, so i add 100 years to my javascript date objects like:
new Date(dt.setFullYear(dt.getFullYear() + 100))
, and that is my temporary solution to this problem until the bug is fixed by chromium(v8 engine) or the code is changed by sheetjs developers.@SheetJSDev I am having the same issue my excel has
21-Dec-2021
but the returning json shows44551
.Is there any option that we can pass to
sheet_to_json
function so that it returns as21-Dec-2021
without any formating?Seems like Date parsing isnt good right now, only with different workarounds for post-update the data
i also meet the same issue, here is my solution: when xlsx.read is called, when there is a date need to handle, i just add a '#' to the start and end of the date, to make it a regular string, when i call sheet_to_json, then i just remove the '#' to restore the original date. ex: date = '2022-01-01 12:40:33', when i call xlsx.read i trun it into '#2022-01-01 12:40:33#', when i call sheet_to_json i remove '#' and change it back to '2022-01-01 12:40:33'. just turn date to a regular string as 'abcd...' and change it back then.
the advantage is that it can store any date format you want, the shortage is that it looks ugly.
how to add ’#‘
export function SerializeDateTime(dt){
return '#' + dt + '#'
}
export function DeserializeDateTime(dt) {
if (dt.length > 2) {
var len = dt.length
return dt.slice(1, len - 1)
}
return dt
}
but when i use xlsx.read(data) from file,date is already changed.