Read excel when contains date format data #3107

Closed
opened 2024-04-09 13:40:54 +00:00 by winfa · 1 comment

I was trying to read excel data as json by calling XLSX.read. And there are columns using Date format. (which may using 1900 date system stored the date as number).

I'm calling read function by setting cellDates: true. Getting this kind of bug:
TimeZone: China Time

Excel File:
https://docs.sheetjs.com/DateTime.xlsx

Parsed Result:

[{
  "sheetName": "Sheet1",
  "data": [
    {
      "Type": "Date",
      "Value": "2048-10-05T15:59:17.000Z",
    },
    {
      "Type": "Time",
      "Value": "1899-12-30T06:54:17.000z"
    },
    {
      "Type": "DateTime",
      "Value": "2048-10-06T06:59:17.000Z"
    }
  ]
}]
  • "2048-10-06" should be "2048-10-05T16:00:00.000Z" not "2048-10-05T15:59:17.000Z".
  • "2048-10-06 15:00:00" should be "2048-10-06T07:00:00.000Z" not "2048-10-06706:59:17.000Z".

There is '43s' missing here. Looks like this is because excel is using 1900 date system, and there is a time zone change from 805 to 8 for China. And looks like cellDates: true does not handle it in a right way.

I was trying to read excel data as json by calling XLSX.read. And there are columns using Date format. (which may using 1900 date system stored the date as number). I'm calling `read` function by setting `cellDates: true`. Getting this kind of bug: TimeZone: China Time Excel File: https://docs.sheetjs.com/DateTime.xlsx Parsed Result: ```json [{ "sheetName": "Sheet1", "data": [ { "Type": "Date", "Value": "2048-10-05T15:59:17.000Z", }, { "Type": "Time", "Value": "1899-12-30T06:54:17.000z" }, { "Type": "DateTime", "Value": "2048-10-06T06:59:17.000Z" } ] }] ``` * "2048-10-06" should be "2048-10-05T16:00:00.000Z" not "2048-10-05T15:59:17.000Z". * "2048-10-06 15:00:00" should be "2048-10-06T07:00:00.000Z" not "2048-10-06706:59:17.000Z". There is '43s' missing here. Looks like this is because excel is using 1900 date system, and there is a time zone change from 805 to 8 for China. And looks like `cellDates: true` does not handle it in a right way.
winfa changed title from excel files error to Read excel when contains date format data 2024-04-09 13:42:12 +00:00
Owner

Using version 0.20.2, in the Asia/Shanghai timezone, you should see

[
  { Type: 'Date', Value: 2048-10-05T16:00:00.000Z },
  { Type: 'Time', Value: 1899-12-31T06:54:17.000Z },
  { Type: 'DateTime', Value: 2048-10-06T07:00:00.000Z }
]

Date and DateTime will be correct since they are after the timezone shift.

The Time is incorrect since that happened before the shift.


The issue here is a mismatch between how JavaScript and Excel dates work. https://docs.sheetjs.com/docs/csf/features/dates

The fractional part of the date code serves as the time marker. Excel assumes each day has exactly 86400 seconds

Any sort of mapping between Excel and JavaScript concepts will have unexpected conversions (breaking UTC interpretation and/or local interpretation and/or differences between times).


If you want to force UTC interpretation, pass the option UTC: true to sheet_to_json (https://docs.sheetjs.com/docs/api/utilities/array/#array-output)

To verify this, in Linux or macOS, you can set the TZ environment variable to force timezone:

env TZ="Asia/Shanghai" node -pe 'var XLSX = require("xlsx"); var ws = XLSX.readFile("DateTime (2).xlsx", {cellDates: true}).Sheets.Sheet1; XLSX.utils.sheet_to_json(ws, {UTC: true})'

The result will be

[
  { Type: 'Date', Value: 2048-10-06T00:00:00.000Z },
  { Type: 'Time', Value: 1899-12-31T15:00:00.000Z },
  { Type: 'DateTime', Value: 2048-10-06T15:00:00.000Z }
]
Using version 0.20.2, in the `Asia/Shanghai` timezone, you should see ```js [ { Type: 'Date', Value: 2048-10-05T16:00:00.000Z }, { Type: 'Time', Value: 1899-12-31T06:54:17.000Z }, { Type: 'DateTime', Value: 2048-10-06T07:00:00.000Z } ] ``` Date and DateTime will be correct since they are after the timezone shift. The Time is incorrect since that happened before the shift. --- The issue here is a mismatch between how JavaScript and Excel dates work. https://docs.sheetjs.com/docs/csf/features/dates > The fractional part of the date code serves as the time marker. Excel assumes each day has exactly 86400 seconds Any sort of mapping between Excel and JavaScript concepts will have unexpected conversions (breaking UTC interpretation and/or local interpretation and/or differences between times). --- If you want to force UTC interpretation, pass the option `UTC: true` to `sheet_to_json` (https://docs.sheetjs.com/docs/api/utilities/array/#array-output) To verify this, in Linux or macOS, you can set the `TZ` environment variable to force timezone: ```bash env TZ="Asia/Shanghai" node -pe 'var XLSX = require("xlsx"); var ws = XLSX.readFile("DateTime (2).xlsx", {cellDates: true}).Sheets.Sheet1; XLSX.utils.sheet_to_json(ws, {UTC: true})' ``` The result will be ```js [ { Type: 'Date', Value: 2048-10-06T00:00:00.000Z }, { Type: 'Time', Value: 1899-12-31T15:00:00.000Z }, { Type: 'DateTime', Value: 2048-10-06T15:00:00.000Z } ] ```
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3107
No description provided.