Date cells #126
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#126
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Excel supports cells that display formatted dates. For testing purposes I created a file with only one cell (A1) and entered the value
2014-09-28
. The value was immediately formatted as a date and displayed as9/28/14
.I wanted to see how I can reliably extract back the original date. I used the following program:
And got the following output (after running it with
node myfile.xlsx
);I see several problems here:
t
field (for type) indicates this is a number. Can it instead indicate it's a date? or should there be a different property?2014-09-28
- so the original date, along with a reliable way of retrieving it, seems to be lost.All of the above also applies for time fields.
Am I missing something? Is there a way to reliably get 2014-09-28 from the cell data?
My comment should not be construed to imply that I think any of this should be the "right" way to do things, and I'm sure we'll come up with a better solution.
Quick review: the real thorn here is the bad date 2/29/1900 (1900 was not a leap year yet Excel recognizes it as a valid date). This is a bug from Lotus 1-2-3 that Excel replicated (decades later, it still haunts us). There is another issue regarding how to handle the day of week.
We could decide to throw away the corner case entirely and just store dates. In fact, the XLSX format allows for a cell type
d
for a date string. There would be problems at date code 60, but otherwise would be fine.The date1904 mode is stored in the (admittedly awkward) location Workbook.WBProps.date1904. Using the roo_190*_base files:
To convert the date, the SSF library has a handy parse_date_code function:
Those fields are described here: https://github.com/SheetJS/ssf/#usage
To recover a JS rendering of the date, take a look at the js-harb helper for this: https://github.com/SheetJS/js-harb/blob/master/bits/22_helpers.js#L7-L17
It probably makes sense for parse_date_code to return the original date object as well, as that would obviate the need for an additional function.
@hmalphettes @notatestuser @vratiu @wbrandongeorge thoughts?
You've answered most of my concerns, thanks! :)
I think it's very reasonable, at least for now, to document the problem with 2/29/1900, why it exists, and just throw away the corner case.
What about time fields?
Yes please!
Time codes are basically the fraction part of date codes. This allows for easy date arithmetic: you can subtract dates to find the number of days in between.
Times are unaffected by the date1904 issue. The time fields from parse_date_code are:
T
: number of seconds since midnight.H
: hours in the 24-hour systemM
: minutesS
: secondsu
: fraction of a second (unfortunately needed for some odd formatting issues)e.g. for the time 21:36 (9:36 PM), the code is 0.9
Assuming we do decide to shift to JS Date, times would also nicely map to JS Date objects
Thanks - makes sense, sorry for asking an obvious question.
Trying to sum it up then, we're discussing the following changes:
d
for date(/time) fields and documenting the bug with the 1904 issueSSF.parse_date_code
to return a date object(Please point out anything I've forgotten or got wrong.)
I think the first change would be very useful because it makes it easier to parse and handle data in exchange for documenting an esoteric bug (that we might address in the future).
The second change is straightforward - if you're interested, I'll submit a pull request.
The third change is useful, but since it's not difficult to get a Date object using the existing return value, it's more of a convenience.
Since we are opening up the conversation, I wonder whether it makes sense to store date1904 in a better location. The original form was based on a lazy translation of the XLSX format (and replicated in the XLS and other parsers), but I think this probably deserves a better home.
Sure, it should probably be a boolean too. Assuming
var workbook = xlsx.readFile(file)
I see that there'sworkbook.Workbook.Props
as well asworkbook.Props
. Was there ever an intention to merge them? (Should they be merged?)They come from different locations in the original XLSX. It is a historical artifact, since XLS and SpreadsheetML actually store different properties with different names in different locations (sadly, not even the "Author" property is stored as "Author" in all of the Excel formats, and some formats treat that author field as a custom property) Issues like property inconsistencies remind us all of why no one has succeeded in making a universal representation of a workbook
I see. Well, if there's no obvious solution right now we could always document it and state it's subject to change in the future...
Reviving this issue...
Turns out that the corner case has actually become part of the standard. The Leap year bug Wikipedia article quotes the specification as saying that value 60 is expected to be treated as 2/29/1900 in 1900-based date systems. You can find it in section 3.17.4.1, pages 2522-2523.
Doesn't this mean that we could do the following:
parseDates
option that whentrue
sets.d
to a date object.t
to'd'
to indicate it's a date (for convenience only; redundant sincecell.t === 'd'
can be written ascell.hasOwnProperty('d')
).d
set tonull
(but all other fields stay the same)(I'm asking because if this corner case is part of the standard then there should be an expected way of handling it... I'd assume. :)
That language is from a technical draft but does not appear in the current version (Check part 1 Section 18.7.4 on page 2056 from http://www.ecma-international.org/publications/standards/Ecma-376.htm). Somewhere down the line, the committee must have decided that it was not appropriate to codify that logic.
You're right. I was looking at an older version, sorry about that.
Putting aside date value 60 handling, are we in agreement about everything else? Specifically, the
.d
field (for date object), the'd'
type (for date), and theparseDates
option, for putting the date object in the.d
field?@elad I have a different thought: lets just allow for a new cell type
d
. The default behavior should stay the same. The option for this feature should be calledcellDates
.Currently, the XLSX explicitly generates the date number and changes the cell type to
n
: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L223-L226 Instead of doing that, lets just keep the date as-is (p.t = 'd'; p.v = Date.parse(p.v);
). There are a few other places that need to be adjusted:safe_format
just compute the datenum if the value is a date and pass it to SSF (there is a similar fallback for int/double resolution in that function).n
(if thez
cell format is not set, default to14
) and should write the ISO string if the write optioncellDates
is set.So long as the
w
formatted text is preserved, the other exporters (e.g. CSV) should work as expected.Note: this is half the puzzle. The other half (proactively converting number cells to date cells) should wait until the new formatting library version is pushed
PRs are welcome and deeply appreciated :)
@elad version 0.7.12 is the first step. It supports cell dates via the
cellDates
option. There are numerous tests involving consistency of writing and reading files using the cell dates options. To ensure JSON consistency (sinceJSON.parse(JSON.stringify(new Date()))
is a string and not a date), the parser will store datestrings that can easily be parsed in JS withnew Date(cell.v)
.There is a massive and annoying bug in Excel regarding ISO 8601 datetimes. The string
2014-10-26T06:03:14.331Z
references a well-defined millisecond (thanks to theZ
timezone modifier) but Excel actually interprets the date as2014-10-26T06:03:14.331
in your local timezone (so in ET this is interpreted as 6:03 AM when the actual time is 2:03 AM ET). Correcting it requires knowledge of the origin timezone.Why do we need JSON consistency? The parsing is done from XLSX to a javascript object. I think we should keep the original value (fraction) in
cell.v
and have an additionalcell.d
for the date object. A lot of logic assumes you can access this value (for example, I get this as the first hit for "excel timezone") and I don't see why we need it as a string. A date object both sidesteps this problem and is manipulable for developers. Why not just provide them with that instead? :)@elad XLSB and XLS only store dates with the date code style, and for those cases just adding a separate cell field is best. XLSX can store dates with the date code, but it can also store using a special type 'd' (18.17.4 "Dates and Times" explains this in further detail). As a result, for those cases "original value" technically would be a date string, not the fraction.
With that in mind, there are four key questions:
If the
cellDates
option is false, they are converted to Excel date codes and the date code is stored in.v
. If thecellDates
option is true, the ISO8601 date string is stored.Currently, the raw date code is stored to
.v
. A few pending changes have to be pushed to SSF (ripping the tokenizer out of the eval function), but the next iteration will convert back from the date code to the raw Date ifcellDates
is specified..v
field) be represented?As you are proposing, we should save both. I hesitate because of possible confusion with JS date numbers (the type you get from
Date.parse
). IMHO we should expose both functions (Date -> num and num -> Date) as utility functions. I expect the raw dates to be more valuable than the number codes in terms of manipulation, so I'm toying with the idea of flipping the default (make dates stored as Date by default, require a function call to get the other form). Thoughts?The cellDates option on the writer controls this behavior. There are new tests verifying consistency throughout the process (file with d/n -> representation with cellDates true/false -> writing a file with cellDates true/false -> parsing the written file) for all combinations of arguments.
WebWorkers. When we first looked into them, passing the whole object was problematic so we sidestepped the issue by calling JSON.stringify from the worker and JSON.parse from the browser. Mandating that we keep the field as JS Dates breaks consistency, since
JSON.parse(JSON.stringify(date))
is a string even ifdate
is a dateWhile this describes the (on-disk) format, I think we have the freedom to make better decisions for the javascript library. In some cases it makes sense to reflect the specification, but in others - for example, with the
.d
field - I think it doesn't. From a developer's perspective, it makes much more sense that the type of the value in a date field (as opposed to a number or a string) will be aDate
. So to answer your questions:Same as now if
cellDates
isfalse
. If it'strue
, asDate
objects.Same as now, in
.v
.I think the date as a
Date
object should be in.d
and the number corresponding to it be in.v
.Why should it? Excel doesn't care about ISO8601, it uses its own numerical representation. I'd say the behavior should be to write
.v
ifcellDates
isfalse
and convert theDate
object in.d
to a numerical representation that will be stored in.v
before writing ifcellDates
istrue
. Pseudo code:@elad If you have a copy of Excel 2013, try saving a file under "Strict OpenXML" format. It does not save the number code. http://sheetjs.com/artifacts/date.xlsx was saved as standard Excel document and http://sheetjs.com/artifacts/date_strict.xlsx was saved as Strict OpenXML.
In the former case, A1 is stored as a date number:
In the latter case, the date is stored as an actual date:
There are other issues preventing this file from being read in older versions of excel (different XML schemas), but they will accept the latter form (see the file https://github.com/SheetJS/test_files/blob/master/xlsx-stream-d-date-cell.xlsx)
That's fine, the logic can be accommodated to parse/store/write
.v
appropriately. The point I was trying to get across is that I think the javascript object itself should keepDate
objects for fields we know are date and/or time ifcellDates
istrue
. :)As for number vs. date string, what does the date string look like for the cases where both date and time are stored? (I don't have Excel 2013...)
Updated the artifacts to include a time example:
The bug I was referencing refers to the fact that excel will accept an ISO8601 date string -- see the example file.
Here is the bug: That date-time is being interpreted as 2014-02-14 8:27:48 in every timezone (that Z specifically means UTC), although the spec suggests it should be interpreted as if its UTC.
That's what I missed. That the specification says one thing and Excel does another. (Drops the
Z
and treats it as local time.)Is there bug compatibility across spreadsheets? Will Numbers parse it correctly, for example?
I think your decision to ignore it and just document the behavior is correct. I'm willing to bet most users of the library are feeding it Excel files, and expect to see the same values as they do in the GUI. So, at most, a flag that says "disable ISO8601 date parsing bug compatibility." :)
Hi guys,
I have a date in my Excel file. I tried to read the file with:
XLSX.read(e.target.result, {type: 'binary', cellDates:true, cellStyles:true})
But my cell still returns
n
for the type whatever the value of cellDates:{ t="n", v=41996, w="12/23/14"}
If I understood correctly the
cellDates
is supposed to change the way it reads the file to change the type tod
.I investigated and I found my cell looks like that:
<c r="A3" s="4"> <v>41997</v> </c>
There is no
t="d"
. It's a regular Excel 2010 file (no strict mode or whatever). I guess thes="4"
should indicate it's a date?Any help?
@Aymkdn there are two types of dates in XLSX: a number that is formatted like a date, and an ISO8601 date. In the current version, the cellDates option generates date cells for the latter but doesn't proactively change the former. To implement cellDates for the numbers, a pending change in the formatting logic has to be pushed (basically, determining whether a cell format is actually a date format, which is simply extracting the tokenizer from SSF https://github.com/SheetJS/ssf/blob/master/ssf.js#L526-L596)
Thanks @SheetJSDev -- I looked at what you said, but it's not clear for me how to change
eval_fmt
to have ad
type... So I tried to reverse the code until I found which changes I had to do. Here is the result:9c05476339 (diff-41772ca6056f87e686455e101ffe79b9)
I changed the
datenum
function to make sure thev
parameter is correctly parsed.Also I changed
safe_format()
to look at thefmtid
... if thefmtid
is one of the date format, and ifcellDates
is true, then we should have ad
type. It works but the way I do it doesn't look good to me... Any advice?Thanks
Any clue ? :-)
@Aymkdn haven't forgotten, muddling through some other uncommitted SSF changes.
Hi Everyone.Can you please tell me how to put a formula in a cell of my excel like SUM(E2:E26).
I have some numbers in a column and want to add them with a SUM formula at the end of the data.So I tried to put the the formula in that particular cell but i couldn't.I am not able to use the cell object key "f" for the formula.So can anyone tell me how to use the formula in the excel so that my created Excel sheets will work more intelligently.
@satya-dash what has that got to do with dates? Please open a new issue.
Nothing with date.Am creating a new issue in guthub regarding my query.
The query is all about the use of the cell object key *f *for putting
formula in the excel sheet cell.
From the documentation it is not clear and am not getting anything about it
from google also.So thought of asking directly to you.
Please reply me so that i can create a new thread regarding this issue.
On Thu, Jun 18, 2015 at 2:49 PM, Simon Williams notifications@github.com
wrote:
My suggestion is to add
fmt
andfmtid
to the returned cell object, so the developer can then look at these values and decide what to do on his/her hand:fb4e4ef4ea/xlsx.js (L7250-L7252)
For example if it's a date, the cell will have
cell.t == "n"
but alsocell.fmt == "[$-409]d\-mmm\-yy;@"
-- the developer can detect this specialfmt
and then he/she can useXLSX.SSF.parse_date_code(cell.v)
to get the related date!Note also that you'll need to replace
o.substr(-1)
byo.slice(-1)
in functioneval_fmt()
if you use IE8 (see https://github.com/SheetJS/js-xlsx/pull/351)@Aymkdn that would be awesome. When parsing the sheet I'm not sure if I got a date or a number, the mapping to my application model happens at a later point. It would be useful to differentiate between number and date when reading the file.
@despairblue you can look at my library here http://aymkdn.github.io/ExcelPlus/ that does what I stated here :-)
@Aymkdn sadly I need a nodejs package 😔
I settled for a simple, albeit dirty, solution:
be carefull with this solution if the date is 10/10/3000 it takes like 1/1/00 and when convert to date it converts to 10 october " 2000"
@lucianoybanez thanks, that looks more robust. I'm going to use this, although we don't have to handle any dates later than about year 2100.
We're making the following changes to address this:
SSF.is_date(fmt)
uses the tokenizer from the formatter to determine if a format string is a datecellDates:true
option for every format will look at the number format and generate dates (typed
) if the format is a date formatRecently I found this link https://lifesaver.codes/answer/format-date-question, it has provided me the solution to converting date to yyyy-mm-dd. If we have date format yyyy-mm-dd in excel, while uploading it, the date is converting to format yyyy/mm/dd.
It says
var wb = XLSX.read(binary, {type:'binary', cellDates:true, cellNF: false, cellText:false});
XLSX.utils.sheet_to_json(ws, {dateNF:"YYYY-MM-DD"})
I applied this code and it worked for me.
So this is applicable even if you have date format as dd-mm-yyyyy in your excel and also if you have date columns as Text formatted in your excel.
Now I want to turn off the date conversion. For example, I want to export the string '1.9.3', and sheet.js is automatically installed and replaced with 2003/1/9. How to turn it off
well. i find way to resolve my question.just to set up {type:"array",raw:"true"}