sheet_to_json results in two-digit dates #560
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#560
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In xlsx.js the initialization of the table_fmt array defaults several date formats to two-digit years for dates. Invoking sheet_to_json on an excel file with 4-digit dates makes use of this array and drops the first two digits from the original value (e.g. 12/31/1989 in Excel becomes 12/31/89 in JSON) causing a lot of ambiguity. Either the format array should have four-digit years for all date formats by default or one of the opts attributes passed in as a parameter to sheet_to_json should allow logic to specify 4-digit year values as required.
@dpackage as a workaround try changing
table_fmt[14]
to a code with 4 y's instead of 2. You stumbled upon a defect in the actual XLSX file format. There's a sample with screenshots in an older issue: https://github.com/SheetJS/js-xlsx/issues/181Excel "localizes" that specific date format based on the system locale but doesn't actually save the locale of the last user in the file itself. So you could be in a locale that defaults to 4-digit years, save the file, send it to someone in the US and they would see 2-digit years.
The right implementation would be a locale option, as there are other formats that are available in some locales but not others.
Thanks, I've been operating with that workaround already, but was wondering what else that might break in other parts of the workflow. If the answer is "nothing would break" under any other boundary conditions (local XLSX formats, previous office versions like 2007, etc.) why not change here on GitHub?
@dpackage I don't think I explained the issue clearly enough. For example, running Excel 95 on Windows 95, the same exact file shows different date strings based on the computer settings:
Russian:
English:
Portugese:
This is arguably a defect in Excel, since you can't guarantee that the displayed value will be the same across countries. Changing the default would make it work for your case but would break US users.
As for the general localization problem, there are two main issues:
What locale do you use? Can you detect it automatically? Excel has the luxury of querying the client system locale settings, but there is no cross-browser way to do the same. Recent versions of Chrome expose the locale via
navigator.language
but Safari and IE are a lot trickier.What data is actually locale-specific? My current understanding is that there are some specific number formats (like the format code 14 as well as some SpreadsheetML formats like "Short Date"), month names and localized formula names.
@SheetJSDev sorry I'm still confused. You say it would " ... work for your case but would break US users ...." In the original xlsx.js (before I used the workaround) that array element had the m/d/yy string and I'm using US versions of windows (10) and excel (2013) and sheet_to_json produces 2-digit dates. Before calling that method, the code invoking sheetjs libraries uses XLSX.read(f.raw, { type: 'binary' }) to open the uploaded excel file. Would using different parameters to call read() change the results? As I said, your workaround is something I'd already worked out for myself, but I'm reluctant to go changing the guts of sheetjs that many layers down not knowing how the code fits together.
@dpackage maybe I misunderstood. are you saying that the original Excel file shows a 4-digit date in a cell and js-xlsx is giving a 2-digit date? If that is the case, it is definitely a bug and I'd like to take a look at the file.
Note that the value shown in the formula bar is not necessarily the same format as the cell. Dates in the US locale are shown as 4 digit years in the formula year but the default cell format shows 2 digits:
A2 is today's date typed out, A3 is the formula
=TODAY()
, both are using the default number format for dates.@SheetJSDev Attached the sample input. Without the workaround changing the format array as discussed above, the JSON coming back from both sheet_to_row_object_array and sheet_to_json has two-digit years. Confirmed in debug mode and writing JSON to javascript console output. With the workaround in place, the JSON comes back as four-digit years. Tested in firefox, chrome and IE. The app uses an angular drop box input to have the user upload a file and XLSX.read() to load the file into a variable before invoking either of the two util functions.
sample_input.xlsx
@dpackage This is what I see in Excel 2016 for Mac
Try dragging into http://oss.sheetjs.com/js-xlsx/ with the JSON output. I see:
It looks to me like the original file was using a locale-specific number format:
The comment at the bottom of the format says:
As I was explaining earlier, Excel doesn't actually store the original number format. It expects the client to localize based on the settings of the computer in question. That part has not been implemented yet.
That bottom screenshot on your latest comment is an excel settings view, yes? The local test that I'm running (created a small file and used the live demo, thank you for that link I missed it earlier) was created using my local Excel where that same settings window shows 4-digit year as default, but the test ends with a 2-digit year in JSON.
So far it seems like you've been trying to establish that the fault lies with the local Excel or local environment, but I've seen 4-digit going in and 2-digit coming out. I think maybe what you were trying to say is that the table_fmt array is actually creating a "virtual" local settings condition so that the code knows how to render it "just like excel would" (my phrasing). All that said, I come back around to my original suggestion that the defaults should be 4-digit years (not as a workaround but as more correct date representation for mixed dates of birth and "current" dates to track events e.g.). Alternatively (and I'm willing to spend some time on this as a mental exercise for myself) create pass-through workflow so that one of the member attributes of the opts param (https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L12013) can specify number of year digits for elements in the table_fmt array.
I'll probably submit the latter as a code block to this thread in a few days and let you and any other stakeholders decide whether to integrate it.
For now, I'm comfortable just using the workaround by changing the internal logic of sheet_to_json.
@dpackage that bottom screen shows up when you right click a cell and choose "format cells". That window shows you the exact number format that Excel will use on your computer.
The overarching point is that for some date formats Excel doesn't tell you whether to use 2 or 4 digits for the year. The file just stores it as
m/dd/yy
but Excel will display the date based on your current computer settings. We chose to go with theEnglish (United States)
settings.The best bet in the very short term is to directly modify that entry in the script (there's an object called
table_fmt
in the code, change entry14
to the desired format).Note: Just specifying the number of year digits isn't enough, you also need to pass the default delimiter to use as well as the default order (mdy or ydm or ymd), so at that point you are basically overriding the entire format.
Note: There are other weird formats. The SpreadsheetML Excel 2003 file type actually stores
"Short Date"
"Medium Date"
and"Long Date"
and forces readers to figure out which format to use. https://github.com/SheetJS/js-xlsx/blob/master/bits/11_ssfutils.js is the relevant conversion table.I am also facing the same problem. I am reading a csv file. In csv file it is showing 09/08/1988, but after reading it the json, it shows 9/8/88. Also if it reads 25/12/1988 from csv file, it is showing 25/12/1988 in json format. After trying a lot of cases, I came to the conclusion that the bug is appearing only when you are using a date with single digit day.
Thanks
Harshal