Date cells #126

Closed
opened 2014-10-04 00:34:34 +00:00 by elad · 40 comments
elad commented 2014-10-04 00:34:34 +00:00 (Migrated from github.com)

Excel supports cells that display formatted dates. For testing purposes I created a file with only one cell (A1) and entered the value 2014-09-28. The value was immediately formatted as a date and displayed as 9/28/14.

I wanted to see how I can reliably extract back the original date. I used the following program:

var xlsx = require('xlsx');

var workbook = xlsx.readFile(process.argv[2]);
for (var i = 0; i < workbook.SheetNames.length; i++){
    var worksheet = workbook.Sheets[workbook.SheetNames[i]];
    console.log(worksheet);
}

And got the following output (after running it with node myfile.xlsx);

{ A1: { t: 'n', v: 41910, w: '9/28/14' }, '!ref': 'A1' }
{}
{}

I see several problems here:

  • The number 41910 is the number of days from some epoch, either 1900 or 1904, depending on where the file was created (Windows or Mac). There should be a way to tell how to treat this value.
  • The t field (for type) indicates this is a number. Can it instead indicate it's a date? or should there be a different property?
  • There's no way to retrieve the actual value that was entered - 2014-09-28 - so the original date, along with a reliable way of retrieving it, seems to be lost.

All of the above also applies for time fields.

Am I missing something? Is there a way to reliably get 2014-09-28 from the cell data?

Excel supports cells that display formatted dates. For testing purposes I created a file with only one cell (A1) and entered the value `2014-09-28`. The value was immediately formatted as a date and displayed as `9/28/14`. I wanted to see how I can reliably extract back the original date. I used the following program: ``` var xlsx = require('xlsx'); var workbook = xlsx.readFile(process.argv[2]); for (var i = 0; i < workbook.SheetNames.length; i++){ var worksheet = workbook.Sheets[workbook.SheetNames[i]]; console.log(worksheet); } ``` And got the following output (after running it with `node myfile.xlsx`); ``` { A1: { t: 'n', v: 41910, w: '9/28/14' }, '!ref': 'A1' } {} {} ``` I see several problems here: - The number 41910 is the number of days from some epoch, either 1900 or 1904, depending on where the file was created (Windows or Mac). There should be a way to tell how to treat this value. - The `t` field (for type) indicates this is a number. Can it instead indicate it's a date? or should there be a different property? - There's no way to retrieve the actual value that was entered - `2014-09-28` - so the original date, along with a reliable way of retrieving it, seems to be lost. All of the above also applies for time fields. Am I missing something? Is there a way to reliably get 2014-09-28 from the cell data?
SheetJSDev commented 2014-10-04 01:34:27 +00:00 (Migrated from github.com)

My comment should not be construed to imply that I think any of this should be the "right" way to do things, and I'm sure we'll come up with a better solution.

The t field (for type) indicates this is a number. Can it instead indicate it's a date? or should there be a different property?

Quick review: the real thorn here is the bad date 2/29/1900 (1900 was not a leap year yet Excel recognizes it as a valid date). This is a bug from Lotus 1-2-3 that Excel replicated (decades later, it still haunts us). There is another issue regarding how to handle the day of week.

We could decide to throw away the corner case entirely and just store dates. In fact, the XLSX format allows for a cell type d for a date string. There would be problems at date code 60, but otherwise would be fine.

There should be a way to tell how to treat this value

The date1904 mode is stored in the (admittedly awkward) location Workbook.WBProps.date1904. Using the roo_190*_base files:

> require('xlsx').readFile('test_files/roo_1900_base.xlsx').Workbook.WBProps.date1904
'0'
> require('xlsx').readFile('test_files/roo_1904_base.xlsx').Workbook.WBProps.date1904
'1'

the original date, along with a reliable way of retrieving it, seems to be lost.

To convert the date, the SSF library has a handy parse_date_code function:

> require('xlsx').SSF.parse_date_code(41910, {date1904:false})
{ D: 41910,
  T: 0,
  u: 0,
  y: 2014,
  m: 9,
  d: 28,
  H: 0,
  M: 0,
  S: 0,
  q: 0 }
> require('xlsx').SSF.parse_date_code(41910, {date1904:true})
{ D: 41910,
  T: 0,
  u: 0,
  y: 2018,
  m: 9,
  d: 29,
  H: 0,
  M: 0,
  S: 0,
  q: 6 }

Those fields are described here: https://github.com/SheetJS/ssf/#usage

To recover a JS rendering of the date, take a look at the js-harb helper for this: https://github.com/SheetJS/js-harb/blob/master/bits/22_helpers.js#L7-L17

It probably makes sense for parse_date_code to return the original date object as well, as that would obviate the need for an additional function.

@hmalphettes @notatestuser @vratiu @wbrandongeorge thoughts?

My comment should not be construed to imply that I think any of this should be the "right" way to do things, and I'm sure we'll come up with a better solution. > The t field (for type) indicates this is a number. Can it instead indicate it's a date? or should there be a different property? Quick review: the real thorn here is the bad date 2/29/1900 (1900 was not a leap year yet Excel recognizes it as a valid date). This is a bug from Lotus 1-2-3 that Excel replicated (decades later, it still haunts us). There is another issue regarding how to handle the day of week. We could decide to throw away the corner case entirely and just store dates. In fact, the XLSX format allows for a cell type `d` for a date string. There would be problems at date code 60, but otherwise would be fine. > There should be a way to tell how to treat this value The date1904 mode is stored in the (admittedly awkward) location Workbook.WBProps.date1904. Using the roo_190*_base files: ``` > require('xlsx').readFile('test_files/roo_1900_base.xlsx').Workbook.WBProps.date1904 '0' > require('xlsx').readFile('test_files/roo_1904_base.xlsx').Workbook.WBProps.date1904 '1' ``` > the original date, along with a reliable way of retrieving it, seems to be lost. To convert the date, the SSF library has a handy parse_date_code function: ``` > require('xlsx').SSF.parse_date_code(41910, {date1904:false}) { D: 41910, T: 0, u: 0, y: 2014, m: 9, d: 28, H: 0, M: 0, S: 0, q: 0 } > require('xlsx').SSF.parse_date_code(41910, {date1904:true}) { D: 41910, T: 0, u: 0, y: 2018, m: 9, d: 29, H: 0, M: 0, S: 0, q: 6 } ``` Those fields are described here: https://github.com/SheetJS/ssf/#usage To recover a JS rendering of the date, take a look at the js-harb helper for this: https://github.com/SheetJS/js-harb/blob/master/bits/22_helpers.js#L7-L17 It probably makes sense for parse_date_code to return the original date object as well, as that would obviate the need for an additional function. @hmalphettes @notatestuser @vratiu @wbrandongeorge thoughts?
elad commented 2014-10-04 01:44:11 +00:00 (Migrated from github.com)

You've answered most of my concerns, thanks! :)

Quick review: the real thorn here is the bad date 2/29/1900 (1900 was not a leap year yet Excel recognizes it as a valid date). This is a bug from Lotus 1-2-3 that Excel replicated (decades later, it still haunts us). There is another issue regarding how to handle the day of week.

We could decide to throw away the corner case entirely and just store dates. In fact, the XLSX format allows for a cell type d for a date string. There would be problems at date code 60, but otherwise would be fine.

I think it's very reasonable, at least for now, to document the problem with 2/29/1900, why it exists, and just throw away the corner case.

What about time fields?

It probably makes sense for parse_date_code to return the original date object as well, as that would obviate the need for an additional function.

Yes please!

You've answered most of my concerns, thanks! :) > Quick review: the real thorn here is the bad date 2/29/1900 (1900 was not a leap year yet Excel recognizes it as a valid date). This is a bug from Lotus 1-2-3 that Excel replicated (decades later, it still haunts us). There is another issue regarding how to handle the day of week. > > We could decide to throw away the corner case entirely and just store dates. In fact, the XLSX format allows for a cell type `d` for a date string. There would be problems at date code 60, but otherwise would be fine. I think it's very reasonable, at least for now, to document the problem with 2/29/1900, why it exists, and just throw away the corner case. What about time fields? > It probably makes sense for parse_date_code to return the original date object as well, as that would obviate the need for an additional function. Yes please!
SheetJSDev commented 2014-10-04 01:53:11 +00:00 (Migrated from github.com)

Time codes are basically the fraction part of date codes. This allows for easy date arithmetic: you can subtract dates to find the number of days in between.

Times are unaffected by the date1904 issue. The time fields from parse_date_code are:

  • T: number of seconds since midnight.
  • H: hours in the 24-hour system
  • M: minutes
  • S: seconds
  • u: fraction of a second (unfortunately needed for some odd formatting issues)

e.g. for the time 21:36 (9:36 PM), the code is 0.9

> require('xlsx').SSF.parse_date_code(0.9)
{ D: 0,
  T: 77760,
  u: 0,
  y: 1900,
  m: 1,
  d: 0,
  H: 21,
  M: 36,
  S: 0,
  q: 6 }

Assuming we do decide to shift to JS Date, times would also nicely map to JS Date objects

Time codes are basically the fraction part of date codes. This allows for easy date arithmetic: you can subtract dates to find the number of days in between. Times are unaffected by the date1904 issue. The time fields from parse_date_code are: - `T`: number of seconds since midnight. - `H`: hours in the 24-hour system - `M`: minutes - `S`: seconds - `u`: fraction of a second (unfortunately needed for some odd formatting issues) e.g. for the time 21:36 (9:36 PM), the code is 0.9 ``` > require('xlsx').SSF.parse_date_code(0.9) { D: 0, T: 77760, u: 0, y: 1900, m: 1, d: 0, H: 21, M: 36, S: 0, q: 6 } ``` Assuming we do decide to shift to JS Date, times would also nicely map to JS Date objects
elad commented 2014-10-04 02:00:20 +00:00 (Migrated from github.com)

Thanks - makes sense, sorry for asking an obvious question.

Trying to sum it up then, we're discussing the following changes:

  • Using field type d for date(/time) fields and documenting the bug with the 1904 issue
  • Documenting how to get the date1904 mode flag
  • Changing SSF.parse_date_code to return a date object

(Please point out anything I've forgotten or got wrong.)

I think the first change would be very useful because it makes it easier to parse and handle data in exchange for documenting an esoteric bug (that we might address in the future).

The second change is straightforward - if you're interested, I'll submit a pull request.

The third change is useful, but since it's not difficult to get a Date object using the existing return value, it's more of a convenience.

Thanks - makes sense, sorry for asking an obvious question. Trying to sum it up then, we're discussing the following changes: - Using field type `d` for date(/time) fields and documenting the bug with the 1904 issue - Documenting how to get the date1904 mode flag - Changing `SSF.parse_date_code` to return a date object (Please point out anything I've forgotten or got wrong.) I think the first change would be very useful because it makes it easier to parse and handle data in exchange for documenting an esoteric bug (that we might address in the future). The second change is straightforward - if you're interested, I'll submit a pull request. The third change is useful, but since it's not difficult to get a Date object using the existing return value, it's more of a convenience.
SheetJSDev commented 2014-10-04 02:13:05 +00:00 (Migrated from github.com)

Since we are opening up the conversation, I wonder whether it makes sense to store date1904 in a better location. The original form was based on a lazy translation of the XLSX format (and replicated in the XLS and other parsers), but I think this probably deserves a better home.

Since we are opening up the conversation, I wonder whether it makes sense to store date1904 in a better location. The original form was based on a lazy translation of the XLSX format (and replicated in the XLS and other parsers), but I think this probably deserves a better home.
elad commented 2014-10-04 02:24:08 +00:00 (Migrated from github.com)

Sure, it should probably be a boolean too. Assuming var workbook = xlsx.readFile(file) I see that there's workbook.Workbook.Props as well as workbook.Props. Was there ever an intention to merge them? (Should they be merged?)

Sure, it should probably be a boolean too. Assuming `var workbook = xlsx.readFile(file)` I see that there's `workbook.Workbook.Props` as well as `workbook.Props`. Was there ever an intention to merge them? (Should they be merged?)
SheetJSDev commented 2014-10-04 02:37:53 +00:00 (Migrated from github.com)

They come from different locations in the original XLSX. It is a historical artifact, since XLS and SpreadsheetML actually store different properties with different names in different locations (sadly, not even the "Author" property is stored as "Author" in all of the Excel formats, and some formats treat that author field as a custom property) Issues like property inconsistencies remind us all of why no one has succeeded in making a universal representation of a workbook

They come from different locations in the original XLSX. It is a historical artifact, since XLS and SpreadsheetML actually store different properties with different names in different locations (sadly, not even the "Author" property is stored as "Author" in all of the Excel formats, and some formats treat that author field as a custom property) Issues like property inconsistencies remind us all of why no one has succeeded in making a universal representation of a workbook
elad commented 2014-10-04 02:52:21 +00:00 (Migrated from github.com)

I see. Well, if there's no obvious solution right now we could always document it and state it's subject to change in the future...

I see. Well, if there's no obvious solution right now we could always document it and state it's subject to change in the future...
elad commented 2014-10-11 23:58:17 +00:00 (Migrated from github.com)

Reviving this issue...

Turns out that the corner case has actually become part of the standard. The Leap year bug Wikipedia article quotes the specification as saying that value 60 is expected to be treated as 2/29/1900 in 1900-based date systems. You can find it in section 3.17.4.1, pages 2522-2523.

Doesn't this mean that we could do the following:

  • Provide a parseDates option that when true sets .d to a date object
  • Optionally set .t to 'd' to indicate it's a date (for convenience only; redundant since cell.t === 'd' can be written as cell.hasOwnProperty('d'))
  • A date cell with value 60 in files with 1900-based dates gets .d set to null (but all other fields stay the same)
  • We document this corner case very well indicating there simply cannot be a valid date object for an invalid date and if this is something the developer needs they should check for it and handle it manually.

(I'm asking because if this corner case is part of the standard then there should be an expected way of handling it... I'd assume. :)

Reviving this issue... Turns out that the corner case has actually become part of the standard. The [Leap year bug](http://en.wikipedia.org/wiki/Leap_year_bug) Wikipedia article quotes [the specification](http://www.ecma-international.org/news/TC45_current_work/Office%20Open%20XML%20Part%204%20-%20Markup%20Language%20Reference.pdf) as saying that value 60 is expected to be treated as 2/29/1900 in 1900-based date systems. You can find it in section 3.17.4.1, pages 2522-2523. Doesn't this mean that we could do the following: - Provide a `parseDates` option that when `true` sets `.d` to a date object - Optionally set `.t` to `'d'` to indicate it's a date (for convenience only; redundant since `cell.t === 'd'` can be written as `cell.hasOwnProperty('d')`) - A date cell with value 60 in files with 1900-based dates gets `.d` set to `null` (but all other fields stay the same) - We document this corner case very well indicating there simply cannot be a valid date object for an invalid date and if this is something the developer needs they should check for it and handle it manually. (I'm asking because if this corner case is part of the standard then there should be an expected way of handling it... I'd assume. :)
SheetJSDev commented 2014-10-13 02:54:49 +00:00 (Migrated from github.com)

if this corner case is part of the standard

That language is from a technical draft but does not appear in the current version (Check part 1 Section 18.7.4 on page 2056 from http://www.ecma-international.org/publications/standards/Ecma-376.htm). Somewhere down the line, the committee must have decided that it was not appropriate to codify that logic.

> if this corner case is part of the standard That language is from a technical draft but does not appear in the current version (Check part 1 Section 18.7.4 on page 2056 from http://www.ecma-international.org/publications/standards/Ecma-376.htm). Somewhere down the line, the committee must have decided that it was not appropriate to codify that logic.
elad commented 2014-10-13 10:35:11 +00:00 (Migrated from github.com)

You're right. I was looking at an older version, sorry about that.

Putting aside date value 60 handling, are we in agreement about everything else? Specifically, the .d field (for date object), the 'd' type (for date), and the parseDates option, for putting the date object in the .d field?

You're right. I was looking at an older version, sorry about that. Putting aside date value 60 handling, are we in agreement about everything else? Specifically, the `.d` field (for date object), the `'d'` type (for date), and the `parseDates` option, for putting the date object in the `.d` field?
SheetJSDev commented 2014-10-15 15:31:07 +00:00 (Migrated from github.com)

@elad I have a different thought: lets just allow for a new cell type d. The default behavior should stay the same. The option for this feature should be called cellDates.

Currently, the XLSX explicitly generates the date number and changes the cell type to n: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L223-L226 Instead of doing that, lets just keep the date as-is (p.t = 'd'; p.v = Date.parse(p.v);). There are a few other places that need to be adjusted:

  • the formatting library needs to handle date objects. As a stopgap measure, in safe_format just compute the datenum if the value is a date and pass it to SSF (there is a similar fallback for int/double resolution in that function).
  • the XLSX writer needs to handle dates. It should default to writing the datenum with type n (if the z cell format is not set, default to 14) and should write the ISO string if the write option cellDates is set.

So long as the w formatted text is preserved, the other exporters (e.g. CSV) should work as expected.

Note: this is half the puzzle. The other half (proactively converting number cells to date cells) should wait until the new formatting library version is pushed

PRs are welcome and deeply appreciated :)

@elad I have a different thought: lets just allow for a new cell type `d`. The default behavior should stay the same. The option for this feature should be called `cellDates`. Currently, the XLSX explicitly generates the date number and changes the cell type to `n`: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L223-L226 Instead of doing that, lets just keep the date as-is (`p.t = 'd'; p.v = Date.parse(p.v);`). There are a few other places that need to be adjusted: - the formatting library needs to handle date objects. As a stopgap measure, in `safe_format` just compute the datenum if the value is a date and pass it to SSF (there is a similar fallback for int/double resolution in that function). - the XLSX writer needs to handle dates. It should default to writing the datenum with type `n` (if the `z` cell format is not set, default to `14`) and should write the ISO string if the write option `cellDates` is set. So long as the `w` formatted text is preserved, the other exporters (e.g. CSV) should work as expected. Note: this is half the puzzle. The other half (proactively converting number cells to date cells) should wait until the new formatting library version is pushed PRs are welcome and deeply appreciated :)
SheetJSDev commented 2014-10-26 06:08:48 +00:00 (Migrated from github.com)

@elad version 0.7.12 is the first step. It supports cell dates via the cellDates option. There are numerous tests involving consistency of writing and reading files using the cell dates options. To ensure JSON consistency (since JSON.parse(JSON.stringify(new Date())) is a string and not a date), the parser will store datestrings that can easily be parsed in JS with new Date(cell.v).

There is a massive and annoying bug in Excel regarding ISO 8601 datetimes. The string 2014-10-26T06:03:14.331Z references a well-defined millisecond (thanks to the Z timezone modifier) but Excel actually interprets the date as 2014-10-26T06:03:14.331 in your local timezone (so in ET this is interpreted as 6:03 AM when the actual time is 2:03 AM ET). Correcting it requires knowledge of the origin timezone.

@elad version 0.7.12 is the first step. It supports cell dates via the `cellDates` option. There are numerous tests involving consistency of writing and reading files using the cell dates options. To ensure JSON consistency (since `JSON.parse(JSON.stringify(new Date()))` is a string and not a date), the parser will store datestrings that can easily be parsed in JS with `new Date(cell.v)`. There is a massive and annoying bug in Excel regarding ISO 8601 datetimes. The string `2014-10-26T06:03:14.331Z` references a well-defined millisecond (thanks to the `Z` timezone modifier) but Excel actually interprets the date as `2014-10-26T06:03:14.331` in your local timezone (so in ET this is interpreted as 6:03 AM when the actual time is 2:03 AM ET). Correcting it requires knowledge of the origin timezone.
elad commented 2014-10-26 10:52:28 +00:00 (Migrated from github.com)

Why do we need JSON consistency? The parsing is done from XLSX to a javascript object. I think we should keep the original value (fraction) in cell.v and have an additional cell.d for the date object. A lot of logic assumes you can access this value (for example, I get this as the first hit for "excel timezone") and I don't see why we need it as a string. A date object both sidesteps this problem and is manipulable for developers. Why not just provide them with that instead? :)

Why do we need JSON consistency? The parsing is done from XLSX to a javascript object. I think we should keep the original value (fraction) in `cell.v` and have an additional `cell.d` for the date object. A lot of logic assumes you can access this value (for example, I get [this](http://excelribbon.tips.net/T009309_Adjusting_Times_for_Time_Zones.html) as the first hit for "excel timezone") and I don't see why we need it as a string. A date object both sidesteps this problem and is manipulable for developers. Why not just provide them with that instead? :)
SheetJSDev commented 2014-10-26 17:52:49 +00:00 (Migrated from github.com)

@elad XLSB and XLS only store dates with the date code style, and for those cases just adding a separate cell field is best. XLSX can store dates with the date code, but it can also store using a special type 'd' (18.17.4 "Dates and Times" explains this in further detail). As a result, for those cases "original value" technically would be a date string, not the fraction.

With that in mind, there are four key questions:

  1. How should bona fide dates from the XLSX be represented in the object?

If the cellDates option is false, they are converted to Excel date codes and the date code is stored in .v. If the cellDates option is true, the ISO8601 date string is stored.

  1. How should numbers that correspond to dates be represented in the object?

Currently, the raw date code is stored to .v. A few pending changes have to be pushed to SSF (ripping the tokenizer out of the eval function), but the next iteration will convert back from the date code to the raw Date if cellDates is specified.

  1. How should the other piece of information (the date code if we are storing dates in the .v field) be represented?

As you are proposing, we should save both. I hesitate because of possible confusion with JS date numbers (the type you get from Date.parse). IMHO we should expose both functions (Date -> num and num -> Date) as utility functions. I expect the raw dates to be more valuable than the number codes in terms of manipulation, so I'm toying with the idea of flipping the default (make dates stored as Date by default, require a function call to get the other form). Thoughts?

  1. Under what circumstances should the writer emit ISO8601 dates?

The cellDates option on the writer controls this behavior. There are new tests verifying consistency throughout the process (file with d/n -> representation with cellDates true/false -> writing a file with cellDates true/false -> parsing the written file) for all combinations of arguments.

Why do we need JSON consistency?

WebWorkers. When we first looked into them, passing the whole object was problematic so we sidestepped the issue by calling JSON.stringify from the worker and JSON.parse from the browser. Mandating that we keep the field as JS Dates breaks consistency, since JSON.parse(JSON.stringify(date)) is a string even if date is a date

@elad XLSB and XLS only store dates with the date code style, and for those cases just adding a separate cell field is best. XLSX can store dates with the date code, but it can also store using a special type 'd' (18.17.4 "Dates and Times" explains this in further detail). As a result, for those cases "original value" technically would be a date string, not the fraction. With that in mind, there are four key questions: 1) How should bona fide dates from the XLSX be represented in the object? If the `cellDates` option is false, they are converted to Excel date codes and the date code is stored in `.v`. If the `cellDates` option is true, the ISO8601 date string is stored. 2) How should numbers that correspond to dates be represented in the object? Currently, the raw date code is stored to `.v`. A few pending changes have to be pushed to SSF (ripping the tokenizer out of the eval function), but the next iteration will convert back from the date code to the raw Date if `cellDates` is specified. 3) How should the other piece of information (the date code if we are storing dates in the `.v` field) be represented? As you are proposing, we should save both. I hesitate because of possible confusion with JS date numbers (the type you get from `Date.parse`). IMHO we should expose both functions (Date -> num and num -> Date) as utility functions. I expect the raw dates to be more valuable than the number codes in terms of manipulation, so I'm toying with the idea of flipping the default (make dates stored as Date by default, require a function call to get the other form). Thoughts? 4) Under what circumstances should the writer emit ISO8601 dates? The cellDates option on the writer controls this behavior. There are [new tests](https://github.com/SheetJS/js-xlsx/blob/master/test.js#L730-L758) verifying consistency throughout the process (file with d/n -> representation with cellDates true/false -> writing a file with cellDates true/false -> parsing the written file) for all combinations of arguments. > Why do we need JSON consistency? WebWorkers. When we first looked into them, passing the whole object was problematic so we sidestepped the issue by calling JSON.stringify from the worker and JSON.parse from the browser. Mandating that we keep the field as JS Dates breaks consistency, since `JSON.parse(JSON.stringify(date))` is a string even if `date` is a date
elad commented 2014-10-26 18:55:28 +00:00 (Migrated from github.com)

@elad XLSB and XLS only store dates with the date code style, and for those cases just adding a separate cell field is best. XLSX can store dates with the date code, but it can also store using a special type 'd' (18.17.4 "Dates and Times" explains this in further detail). As a result, for those cases "original value" technically would be a date string, not the fraction.

While this describes the (on-disk) format, I think we have the freedom to make better decisions for the javascript library. In some cases it makes sense to reflect the specification, but in others - for example, with the .d field - I think it doesn't. From a developer's perspective, it makes much more sense that the type of the value in a date field (as opposed to a number or a string) will be a Date. So to answer your questions:

  1. How should bona fide dates from the XLSX be represented in the object?

Same as now if cellDates is false. If it's true, as Date objects.

  1. How should numbers that correspond to dates be represented in the object?

Same as now, in .v.

  1. How should the other piece of information (the date code if we are storing dates in the .v field) be represented?

I think the date as a Date object should be in .d and the number corresponding to it be in .v.

  1. Under what circumstances should the writer emit ISO8601 dates?

Why should it? Excel doesn't care about ISO8601, it uses its own numerical representation. I'd say the behavior should be to write .v if cellDates is false and convert the Date object in .d to a numerical representation that will be stored in .v before writing if cellDates is true. Pseudo code:

function write_cell(cell, options) {
    if (options.cellDates && cell.t === 'd') {
        cell.v = convert_date_to_number_using_ssf(cell.d);
    }

    write(cell.v);
}
> @elad XLSB and XLS only store dates with the date code style, and for those cases just adding a separate cell field is best. XLSX can store dates with the date code, but it can also store using a special type 'd' (18.17.4 "Dates and Times" explains this in further detail). As a result, for those cases "original value" technically would be a date string, not the fraction. While this describes the (on-disk) format, I think we have the freedom to make better decisions for the javascript library. In some cases it makes sense to reflect the specification, but in others - for example, with the `.d` field - I think it doesn't. From a developer's perspective, it makes much more sense that the type of the value in a date field (as opposed to a number or a string) will be a `Date`. So to answer your questions: > 1) How should bona fide dates from the XLSX be represented in the object? Same as now if `cellDates` is `false`. If it's `true`, as `Date` objects. > 2) How should numbers that correspond to dates be represented in the object? Same as now, in `.v`. > 3) How should the other piece of information (the date code if we are storing dates in the .v field) be represented? I think the date as a `Date` object should be in `.d` and the number corresponding to it be in `.v`. > 4) Under what circumstances should the writer emit ISO8601 dates? Why should it? Excel doesn't care about ISO8601, it uses its own numerical representation. I'd say the behavior should be to write `.v` if `cellDates` is `false` and convert the `Date` object in `.d` to a numerical representation that will be stored in `.v` before writing if `cellDates` is `true`. Pseudo code: ``` function write_cell(cell, options) { if (options.cellDates && cell.t === 'd') { cell.v = convert_date_to_number_using_ssf(cell.d); } write(cell.v); } ```
SheetJSDev commented 2014-10-26 19:52:42 +00:00 (Migrated from github.com)

@elad If you have a copy of Excel 2013, try saving a file under "Strict OpenXML" format. It does not save the number code. http://sheetjs.com/artifacts/date.xlsx was saved as standard Excel document and http://sheetjs.com/artifacts/date_strict.xlsx was saved as Strict OpenXML.

In the former case, A1 is stored as a date number:

      <c r="A1" s="1">
        <f>DATE(2012,11,10)</f>
        <v>41223</v>
      </c>

In the latter case, the date is stored as an actual date:

      <c r="A1" s="1" t="d">
        <f>DATE(2012,11,10)</f>
        <v>2012-11-10</v>
      </c>

There are other issues preventing this file from being read in older versions of excel (different XML schemas), but they will accept the latter form (see the file https://github.com/SheetJS/test_files/blob/master/xlsx-stream-d-date-cell.xlsx)

@elad If you have a copy of Excel 2013, try saving a file under "Strict OpenXML" format. It does not save the number code. http://sheetjs.com/artifacts/date.xlsx was saved as standard Excel document and http://sheetjs.com/artifacts/date_strict.xlsx was saved as Strict OpenXML. In the former case, A1 is stored as a date number: ``` <c r="A1" s="1"> <f>DATE(2012,11,10)</f> <v>41223</v> </c> ``` In the latter case, the date is stored as an actual date: ``` <c r="A1" s="1" t="d"> <f>DATE(2012,11,10)</f> <v>2012-11-10</v> </c> ``` There are other issues preventing this file from being read in older versions of excel (different XML schemas), but they will accept the latter form (see the file https://github.com/SheetJS/test_files/blob/master/xlsx-stream-d-date-cell.xlsx)
elad commented 2014-10-26 20:06:26 +00:00 (Migrated from github.com)

That's fine, the logic can be accommodated to parse/store/write .v appropriately. The point I was trying to get across is that I think the javascript object itself should keep Date objects for fields we know are date and/or time if cellDates is true. :)

As for number vs. date string, what does the date string look like for the cases where both date and time are stored? (I don't have Excel 2013...)

That's fine, the logic can be accommodated to parse/store/write `.v` appropriately. The point I was trying to get across is that I think the javascript object itself should keep `Date` objects for fields we know are date and/or time if `cellDates` is `true`. :) As for number vs. date string, what does the date string look like for the cases where both date and time are stored? (I don't have Excel 2013...)
SheetJSDev commented 2014-10-26 20:48:02 +00:00 (Migrated from github.com)

Updated the artifacts to include a time example:

      <c r="A2" s="2" t="d">
        <f>TIME(14,15,16)</f>
        <v>14:15:15.999999999999303600</v>
      </c>

The bug I was referencing refers to the fact that excel will accept an ISO8601 date string -- see the example file.

      <c r="B5" t="d" s="2">
        <v>2014-02-14T08:27:48.765Z</v>
      </c>

Here is the bug: That date-time is being interpreted as 2014-02-14 8:27:48 in every timezone (that Z specifically means UTC), although the spec suggests it should be interpreted as if its UTC.

Updated the artifacts to include a time example: ``` <c r="A2" s="2" t="d"> <f>TIME(14,15,16)</f> <v>14:15:15.999999999999303600</v> </c> ``` The bug I was referencing refers to the fact that excel will accept an ISO8601 date string -- see the example file. ``` <c r="B5" t="d" s="2"> <v>2014-02-14T08:27:48.765Z</v> </c> ``` Here is the bug: That date-time is being interpreted as 2014-02-14 8:27:48 in every timezone (that Z specifically means UTC), although the spec suggests it should be interpreted as if its UTC.
elad commented 2014-10-26 20:52:56 +00:00 (Migrated from github.com)

Here is the bug: That date-time is being interpreted as 2014-02-14 8:27:48 in every timezone (that Z specifically means UTC), although the spec suggests it should be interpreted as if its UTC.

That's what I missed. That the specification says one thing and Excel does another. (Drops the Z and treats it as local time.)

Is there bug compatibility across spreadsheets? Will Numbers parse it correctly, for example?

I think your decision to ignore it and just document the behavior is correct. I'm willing to bet most users of the library are feeding it Excel files, and expect to see the same values as they do in the GUI. So, at most, a flag that says "disable ISO8601 date parsing bug compatibility." :)

> Here is the bug: That date-time is being interpreted as 2014-02-14 8:27:48 in every timezone (that Z specifically means UTC), although the spec suggests it should be interpreted as if its UTC. That's what I missed. That the specification says one thing and Excel does another. (Drops the `Z` and treats it as local time.) Is there bug compatibility across spreadsheets? Will Numbers parse it correctly, for example? I think your decision to ignore it and just document the behavior is correct. I'm willing to bet most users of the library are feeding it Excel files, and expect to see the same values as they do in the GUI. So, at most, a flag that says "disable ISO8601 date parsing bug compatibility." :)
Aymkdn commented 2014-12-24 10:16:59 +00:00 (Migrated from github.com)

Hi guys,

I have a date in my Excel file. I tried to read the file with:
XLSX.read(e.target.result, {type: 'binary', cellDates:true, cellStyles:true})

But my cell still returns n for the type whatever the value of cellDates:
{ t="n", v=41996, w="12/23/14"}

If I understood correctly the cellDates is supposed to change the way it reads the file to change the type to d.

I investigated and I found my cell looks like that:
<c r="A3" s="4"> <v>41997</v> </c>
There is no t="d". It's a regular Excel 2010 file (no strict mode or whatever). I guess the s="4" should indicate it's a date?

Any help?

Hi guys, I have a date in my Excel file. I tried to read the file with: `XLSX.read(e.target.result, {type: 'binary', cellDates:true, cellStyles:true})` But my cell still returns `n` for the type whatever the value of cellDates: `{ t="n", v=41996, w="12/23/14"}` If I understood correctly the `cellDates` is supposed to change the way it reads the file to change the type to `d`. I investigated and I found my cell looks like that: `<c r="A3" s="4"> <v>41997</v> </c>` There is no `t="d"`. It's a regular Excel 2010 file (no strict mode or whatever). I guess the `s="4"` should indicate it's a date? Any help?
SheetJSDev commented 2014-12-27 20:44:22 +00:00 (Migrated from github.com)

@Aymkdn there are two types of dates in XLSX: a number that is formatted like a date, and an ISO8601 date. In the current version, the cellDates option generates date cells for the latter but doesn't proactively change the former. To implement cellDates for the numbers, a pending change in the formatting logic has to be pushed (basically, determining whether a cell format is actually a date format, which is simply extracting the tokenizer from SSF https://github.com/SheetJS/ssf/blob/master/ssf.js#L526-L596)

@Aymkdn there are two types of dates in XLSX: a number that is formatted like a date, and an ISO8601 date. In the current version, the cellDates option generates date cells for the latter but doesn't proactively change the former. To implement cellDates for the numbers, a pending change in the formatting logic has to be pushed (basically, determining whether a cell format is actually a date format, which is simply extracting the tokenizer from SSF https://github.com/SheetJS/ssf/blob/master/ssf.js#L526-L596)
Aymkdn commented 2014-12-29 10:17:40 +00:00 (Migrated from github.com)

Thanks @SheetJSDev -- I looked at what you said, but it's not clear for me how to change eval_fmt to have a d type... So I tried to reverse the code until I found which changes I had to do. Here is the result: 9c05476339 (diff-41772ca6056f87e686455e101ffe79b9)

I changed the datenum function to make sure the v parameter is correctly parsed.
Also I changed safe_format() to look at the fmtid... if the fmtid is one of the date format, and if cellDates is true, then we should have a d type. It works but the way I do it doesn't look good to me... Any advice?

Thanks

Thanks @SheetJSDev -- I looked at what you said, but it's not clear for me how to change `eval_fmt` to have a `d` type... So I tried to reverse the code until I found which changes I had to do. Here is the result: https://github.com/Aymkdn/js-xlsx/commit/9c05476339fd24ed6d32d1390981763ca62bc653#diff-41772ca6056f87e686455e101ffe79b9 I changed the `datenum` function to make sure the `v` parameter is correctly parsed. Also I changed `safe_format()` to look at the `fmtid`... if the `fmtid` is one of the date format, and if `cellDates` is true, then we should have a `d` type. It works but the way I do it doesn't look good to me... Any advice? Thanks
Aymkdn commented 2015-01-06 11:34:10 +00:00 (Migrated from github.com)

Any clue ? :-)

Any clue ? :-)
SheetJSDev commented 2015-01-08 04:56:27 +00:00 (Migrated from github.com)

@Aymkdn haven't forgotten, muddling through some other uncommitted SSF changes.

@Aymkdn haven't forgotten, muddling through some other uncommitted SSF changes.
satya-dash commented 2015-06-18 09:17:48 +00:00 (Migrated from github.com)

Hi Everyone.Can you please tell me how to put a formula in a cell of my excel like SUM(E2:E26).

I have some numbers in a column and want to add them with a SUM formula at the end of the data.So I tried to put the the formula in that particular cell but i couldn't.I am not able to use the cell object key "f" for the formula.So can anyone tell me how to use the formula in the excel so that my created Excel sheets will work more intelligently.

Hi Everyone.Can you please tell me how to put a formula in a cell of my excel like SUM(E2:E26). I have some numbers in a column and want to add them with a SUM formula at the end of the data.So I tried to put the the formula in that particular cell but i couldn't.I am not able to use the cell object key "f" for the formula.So can anyone tell me how to use the formula in the excel so that my created Excel sheets will work more intelligently.
SystemParadox commented 2015-06-18 09:19:34 +00:00 (Migrated from github.com)

@satya-dash what has that got to do with dates? Please open a new issue.

@satya-dash what has that got to do with dates? Please open a new issue.
satya-dash commented 2015-06-18 09:23:26 +00:00 (Migrated from github.com)

Nothing with date.Am creating a new issue in guthub regarding my query.

The query is all about the use of the cell object key *f *for putting
formula in the excel sheet cell.
From the documentation it is not clear and am not getting anything about it
from google also.So thought of asking directly to you.

Please reply me so that i can create a new thread regarding this issue.

On Thu, Jun 18, 2015 at 2:49 PM, Simon Williams notifications@github.com
wrote:

@satya-dash https://github.com/satya-dash what has that got to do with
dates? Please open a new issue.


Reply to this email directly or view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/126#issuecomment-113085608.

Nothing with date.Am creating a new issue in guthub regarding my query. The query is all about the use of the cell object key *f *for putting formula in the excel sheet cell. From the documentation it is not clear and am not getting anything about it from google also.So thought of asking directly to you. Please reply me so that i can create a new thread regarding this issue. On Thu, Jun 18, 2015 at 2:49 PM, Simon Williams notifications@github.com wrote: > @satya-dash https://github.com/satya-dash what has that got to do with > dates? Please open a new issue. > > — > Reply to this email directly or view it on GitHub > https://github.com/SheetJS/js-xlsx/issues/126#issuecomment-113085608.
Aymkdn commented 2016-01-13 15:02:20 +00:00 (Migrated from github.com)

My suggestion is to add fmt and fmtid to the returned cell object, so the developer can then look at these values and decide what to do on his/her hand: fb4e4ef4ea/xlsx.js (L7250-L7252)

For example if it's a date, the cell will have cell.t == "n" but also cell.fmt == "[$-409]d\-mmm\-yy;@" -- the developer can detect this special fmt and then he/she can use XLSX.SSF.parse_date_code(cell.v) to get the related date!

Note also that you'll need to replace o.substr(-1) by o.slice(-1) in function eval_fmt() if you use IE8 (see https://github.com/SheetJS/js-xlsx/pull/351)

My suggestion is to add `fmt` and `fmtid` to the returned cell object, so the developer can then look at these values and decide what to do on his/her hand: https://github.com/Aymkdn/js-xlsx/blob/fb4e4ef4ea5fdbf3efc78174740f4567587b2e59/xlsx.js#L7250-L7252 For example if it's a date, the cell will have `cell.t == "n"` but also `cell.fmt == "[$-409]d\-mmm\-yy;@"` -- the developer can detect this special `fmt` and then he/she can use `XLSX.SSF.parse_date_code(cell.v)` to get the related date! Note also that you'll need to replace `o.substr(-1)` by `o.slice(-1)` in function `eval_fmt()` if you use IE8 (see https://github.com/SheetJS/js-xlsx/pull/351)
despairblue commented 2016-03-09 18:32:54 +00:00 (Migrated from github.com)

@Aymkdn that would be awesome. When parsing the sheet I'm not sure if I got a date or a number, the mapping to my application model happens at a later point. It would be useful to differentiate between number and date when reading the file.

@Aymkdn that would be awesome. When parsing the sheet I'm not sure if I got a date or a number, the mapping to my application model happens at a later point. It would be useful to differentiate between number and date when reading the file.
Aymkdn commented 2016-03-09 18:42:07 +00:00 (Migrated from github.com)

@despairblue you can look at my library here http://aymkdn.github.io/ExcelPlus/ that does what I stated here :-)

@despairblue you can look at my library here http://aymkdn.github.io/ExcelPlus/ that does what I stated here :-)
despairblue commented 2016-03-10 10:16:51 +00:00 (Migrated from github.com)

@Aymkdn sadly I need a nodejs package 😔

@Aymkdn sadly I need a nodejs package :pensive:
despairblue commented 2016-03-14 10:20:18 +00:00 (Migrated from github.com)

I settled for a simple, albeit dirty, solution:

const cellToType = (cell) => {
  switch (cell.t) {
    // ...
    case 'n':
      // see https://github.com/SheetJS/js-xlsx/issues/126
      if (cell.w.includes('/')) {
        return new Date(cell.w)
      }
      return Number.parseFloat(cell.w)
    // ...
    default:
      throw new Error(`${cell} is no valid Cell.`)
  }
}
I settled for a simple, albeit dirty, solution: ``` js const cellToType = (cell) => { switch (cell.t) { // ... case 'n': // see https://github.com/SheetJS/js-xlsx/issues/126 if (cell.w.includes('/')) { return new Date(cell.w) } return Number.parseFloat(cell.w) // ... default: throw new Error(`${cell} is no valid Cell.`) } } ```
lucianoybanez commented 2016-06-07 19:09:56 +00:00 (Migrated from github.com)

be carefull with this solution if the date is 10/10/3000 it takes like 1/1/00 and when convert to date it converts to 10 october " 2000"

be carefull with this solution if the date is 10/10/3000 it takes like 1/1/00 and when convert to date it converts to 10 october " 2000"
lucianoybanez commented 2016-06-07 20:22:59 +00:00 (Migrated from github.com)
if((new Date(val.w) !== "Invalid Date" && !isNaN(new Date(val.w)))){
                var dateSplited = val.w.split('/');
                if(dateSplited[2].length==4){                    
                    ret = new Date(val.w);
                }else{
                    throw 'XLSX library known issue. Select format date on excel m/d/yyyy';
                }           
``` if((new Date(val.w) !== "Invalid Date" && !isNaN(new Date(val.w)))){ var dateSplited = val.w.split('/'); if(dateSplited[2].length==4){ ret = new Date(val.w); }else{ throw 'XLSX library known issue. Select format date on excel m/d/yyyy'; } ```
despairblue commented 2016-06-07 22:07:42 +00:00 (Migrated from github.com)

@lucianoybanez thanks, that looks more robust. I'm going to use this, although we don't have to handle any dates later than about year 2100.

@lucianoybanez thanks, that looks more robust. I'm going to use this, although we don't have to handle any dates later than about year 2100.
SheetJSDev commented 2017-03-21 16:57:15 +00:00 (Migrated from github.com)

We're making the following changes to address this:

  • SSF.is_date(fmt) uses the tokenizer from the formatter to determine if a format string is a date
  • cellDates:true option for every format will look at the number format and generate dates (type d) if the format is a date format
We're making the following changes to address this: - `SSF.is_date(fmt)` uses the tokenizer from the formatter to determine if a format string is a date - `cellDates:true` option for every format will look at the number format and generate dates (type `d`) if the format is a date format
rajeshprkpp commented 2021-08-16 08:20:13 +00:00 (Migrated from github.com)

Recently I found this link https://lifesaver.codes/answer/format-date-question, it has provided me the solution to converting date to yyyy-mm-dd. If we have date format yyyy-mm-dd in excel, while uploading it, the date is converting to format yyyy/mm/dd.

It says
var wb = XLSX.read(binary, {type:'binary', cellDates:true, cellNF: false, cellText:false});
XLSX.utils.sheet_to_json(ws, {dateNF:"YYYY-MM-DD"})

I applied this code and it worked for me.

So this is applicable even if you have date format as dd-mm-yyyyy in your excel and also if you have date columns as Text formatted in your excel.

Recently I found this link https://lifesaver.codes/answer/format-date-question, it has provided me the solution to converting date to yyyy-mm-dd. If we have date format yyyy-mm-dd in excel, while uploading it, the date is converting to format yyyy/mm/dd. It says var wb = XLSX.read(binary, {type:'binary', cellDates:true, cellNF: false, cellText:false}); XLSX.utils.sheet_to_json(ws, {dateNF:"YYYY-MM-DD"}) I applied this code and it worked for me. So this is applicable even if you have date format as dd-mm-yyyyy in your excel and also if you have date columns as Text formatted in your excel.
ligang-code2020 commented 2021-08-16 12:49:36 +00:00 (Migrated from github.com)

Now I want to turn off the date conversion. For example, I want to export the string '1.9.3', and sheet.js is automatically installed and replaced with 2003/1/9. How to turn it off

Now I want to turn off the date conversion. For example, I want to export the string '1.9.3', and sheet.js is automatically installed and replaced with 2003/1/9. How to turn it off
ligang-code2020 commented 2021-08-17 11:37:57 +00:00 (Migrated from github.com)

Now I want to turn off the date conversion. For example, I want to export the string '1.9.3', and sheet.js is automatically installed and replaced with 2003/1/9. How to turn it off

well. i find way to resolve my question.just to set up {type:"array",raw:"true"}

> Now I want to turn off the date conversion. For example, I want to export the string '1.9.3', and sheet.js is automatically installed and replaced with 2003/1/9. How to turn it off well. i find way to resolve my question.just to set up {type:"array",raw:"true"}
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#126
No description provided.