sheet_to_json results in two-digit dates #560

Closed
opened 2017-02-16 13:35:25 +00:00 by dpackage · 10 comments
dpackage commented 2017-02-16 13:35:25 +00:00 (Migrated from github.com)

In xlsx.js the initialization of the table_fmt array defaults several date formats to two-digit years for dates. Invoking sheet_to_json on an excel file with 4-digit dates makes use of this array and drops the first two digits from the original value (e.g. 12/31/1989 in Excel becomes 12/31/89 in JSON) causing a lot of ambiguity. Either the format array should have four-digit years for all date formats by default or one of the opts attributes passed in as a parameter to sheet_to_json should allow logic to specify 4-digit year values as required.

In xlsx.js the initialization of the table_fmt array defaults several date formats to two-digit years for dates. Invoking sheet_to_json on an excel file with 4-digit dates makes use of this array and ***drops*** the first two digits from the original value (e.g. 12/31/1989 in Excel becomes 12/31/89 in JSON) causing a lot of ambiguity. Either the format array should have four-digit years for all date formats by default or one of the opts attributes passed in as a parameter to sheet_to_json should allow logic to specify 4-digit year values as required.
SheetJSDev commented 2017-02-16 15:45:24 +00:00 (Migrated from github.com)

@dpackage as a workaround try changing table_fmt[14] to a code with 4 y's instead of 2. You stumbled upon a defect in the actual XLSX file format. There's a sample with screenshots in an older issue: https://github.com/SheetJS/js-xlsx/issues/181

Excel "localizes" that specific date format based on the system locale but doesn't actually save the locale of the last user in the file itself. So you could be in a locale that defaults to 4-digit years, save the file, send it to someone in the US and they would see 2-digit years.

The right implementation would be a locale option, as there are other formats that are available in some locales but not others.

@dpackage as a workaround try changing `table_fmt[14]` to a code with 4 y's instead of 2. You stumbled upon a defect in the actual XLSX file format. There's a sample with screenshots in an older issue: https://github.com/SheetJS/js-xlsx/issues/181 Excel "localizes" that specific date format based on the system locale but doesn't actually save the locale of the last user in the file itself. So you could be in a locale that defaults to 4-digit years, save the file, send it to someone in the US and they would see 2-digit years. The right implementation would be a locale option, as there are other formats that are available in some locales but not others.
dpackage commented 2017-02-16 17:06:44 +00:00 (Migrated from github.com)

Thanks, I've been operating with that workaround already, but was wondering what else that might break in other parts of the workflow. If the answer is "nothing would break" under any other boundary conditions (local XLSX formats, previous office versions like 2007, etc.) why not change here on GitHub?

Thanks, I've been operating with that workaround already, but was wondering what else that might break in other parts of the workflow. If the answer is "nothing would break" under any other boundary conditions (local XLSX formats, previous office versions like 2007, etc.) why not change here on GitHub?
SheetJSDev commented 2017-02-16 17:43:48 +00:00 (Migrated from github.com)

@dpackage I don't think I explained the issue clearly enough. For example, running Excel 95 on Windows 95, the same exact file shows different date strings based on the computer settings:

Russian:
russian

English:
english

Portugese:
portugese

This is arguably a defect in Excel, since you can't guarantee that the displayed value will be the same across countries. Changing the default would make it work for your case but would break US users.

As for the general localization problem, there are two main issues:

  • What locale do you use? Can you detect it automatically? Excel has the luxury of querying the client system locale settings, but there is no cross-browser way to do the same. Recent versions of Chrome expose the locale via navigator.language but Safari and IE are a lot trickier.

  • What data is actually locale-specific? My current understanding is that there are some specific number formats (like the format code 14 as well as some SpreadsheetML formats like "Short Date"), month names and localized formula names.

@dpackage I don't think I explained the issue clearly enough. For example, running Excel 95 on Windows 95, the *same exact file* shows different date strings based on the computer settings: Russian: <img width="511" alt="russian" src="https://cloud.githubusercontent.com/assets/6070939/23032501/6209652e-f429-11e6-8e8a-b781289a6552.png"> English: <img width="502" alt="english" src="https://cloud.githubusercontent.com/assets/6070939/23032502/620ad5e4-f429-11e6-96d9-d257bc0baf73.png"> Portugese: <img width="499" alt="portugese" src="https://cloud.githubusercontent.com/assets/6070939/23032500/6206d674-f429-11e6-8509-c804606a1a55.png"> This is arguably *a defect in Excel*, since you can't guarantee that the displayed value will be the same across countries. Changing the default would make it work for your case but would break US users. As for the general localization problem, there are two main issues: - What locale do you use? Can you detect it automatically? Excel has the luxury of querying the client system locale settings, but there is no cross-browser way to do the same. Recent versions of Chrome expose the locale via `navigator.language` but Safari and IE are a lot trickier. - What data is actually locale-specific? My current understanding is that there are some specific number formats (like the format code 14 as well as some SpreadsheetML formats like "Short Date"), month names and localized formula names.
dpackage commented 2017-02-16 18:21:53 +00:00 (Migrated from github.com)

@SheetJSDev sorry I'm still confused. You say it would " ... work for your case but would break US users ...." In the original xlsx.js (before I used the workaround) that array element had the m/d/yy string and I'm using US versions of windows (10) and excel (2013) and sheet_to_json produces 2-digit dates. Before calling that method, the code invoking sheetjs libraries uses XLSX.read(f.raw, { type: 'binary' }) to open the uploaded excel file. Would using different parameters to call read() change the results? As I said, your workaround is something I'd already worked out for myself, but I'm reluctant to go changing the guts of sheetjs that many layers down not knowing how the code fits together.

@SheetJSDev sorry I'm still confused. You say it would " ... work for your case but would break US users ...." In the original xlsx.js (before I used the workaround) that array element had the m/d/yy string and I'm using US versions of windows (10) and excel (2013) and sheet_to_json produces 2-digit dates. Before calling that method, the code invoking sheetjs libraries uses XLSX.read(f.raw, { type: 'binary' }) to open the uploaded excel file. Would using different parameters to call read() change the results? As I said, your workaround is something I'd already worked out for myself, but I'm reluctant to go changing the guts of sheetjs that many layers down not knowing how the code fits together.
SheetJSDev commented 2017-02-16 18:37:19 +00:00 (Migrated from github.com)

@dpackage maybe I misunderstood. are you saying that the original Excel file shows a 4-digit date in a cell and js-xlsx is giving a 2-digit date? If that is the case, it is definitely a bug and I'd like to take a look at the file.

Note that the value shown in the formula bar is not necessarily the same format as the cell. Dates in the US locale are shown as 4 digit years in the formula year but the default cell format shows 2 digits:

dates

A2 is today's date typed out, A3 is the formula =TODAY(), both are using the default number format for dates.

@dpackage maybe I misunderstood. are you saying that the original Excel file shows a 4-digit date in a cell and js-xlsx is giving a 2-digit date? If that is the case, it is definitely a bug and I'd like to take a look at the file. Note that the value shown in the formula bar is not necessarily the same format as the cell. Dates in the US locale are shown as 4 digit years in the formula year but the default cell format shows 2 digits: <img width="252" alt="dates" src="https://cloud.githubusercontent.com/assets/6070939/23035371/b344e6e8-f433-11e6-9088-97a41258250d.png"> A2 is today's date typed out, A3 is the formula `=TODAY()`, both are using the default number format for dates.
dpackage commented 2017-02-17 15:38:20 +00:00 (Migrated from github.com)

@SheetJSDev Attached the sample input. Without the workaround changing the format array as discussed above, the JSON coming back from both sheet_to_row_object_array and sheet_to_json has two-digit years. Confirmed in debug mode and writing JSON to javascript console output. With the workaround in place, the JSON comes back as four-digit years. Tested in firefox, chrome and IE. The app uses an angular drop box input to have the user upload a file and XLSX.read() to load the file into a variable before invoking either of the two util functions.
sample_input.xlsx

@SheetJSDev Attached the sample input. Without the workaround changing the format array as discussed above, the JSON coming back from both sheet_to_row_object_array and sheet_to_json has two-digit years. Confirmed in debug mode and writing JSON to javascript console output. With the workaround in place, the JSON comes back as four-digit years. Tested in firefox, chrome and IE. The app uses an angular drop box input to have the user upload a file and XLSX.read() to load the file into a variable before invoking either of the two util functions. [sample_input.xlsx](https://github.com/SheetJS/js-xlsx/files/783598/sample_input.xlsx)
SheetJSDev commented 2017-02-18 02:55:12 +00:00 (Migrated from github.com)

@dpackage This is what I see in Excel 2016 for Mac

screen shot 2017-02-17 at 18 50 00

Try dragging into http://oss.sheetjs.com/js-xlsx/ with the JSON output. I see:

screen shot 2017-02-17 at 18 50 52

It looks to me like the original file was using a locale-specific number format:

screen shot 2017-02-17 at 18 51 26

The comment at the bottom of the format says:

Date formats with an asterisk (*) are based on regional date and time settings in System Preferences>Language & Text.

As I was explaining earlier, Excel doesn't actually store the original number format. It expects the client to localize based on the settings of the computer in question. That part has not been implemented yet.

@dpackage This is what I see in Excel 2016 for Mac <img width="578" alt="screen shot 2017-02-17 at 18 50 00" src="https://cloud.githubusercontent.com/assets/6070939/23089665/ec527f5e-f541-11e6-8fd3-9b62e83f869c.png"> Try dragging into http://oss.sheetjs.com/js-xlsx/ with the JSON output. I see: <img width="374" alt="screen shot 2017-02-17 at 18 50 52" src="https://cloud.githubusercontent.com/assets/6070939/23089670/07b27416-f542-11e6-9de4-97d7475ef04c.png"> It looks to me like the original file was using a locale-specific number format: <img width="543" alt="screen shot 2017-02-17 at 18 51 26" src="https://cloud.githubusercontent.com/assets/6070939/23089677/22833f6e-f542-11e6-8476-0508ec8afa54.png"> The comment at the bottom of the format says: > Date formats with an asterisk (*) are based on regional date and time settings in System Preferences>Language & Text. As I was explaining earlier, Excel doesn't actually store the original number format. It expects the client to localize based on the settings of the computer in question. That part has not been implemented yet.
dpackage commented 2017-02-18 15:47:08 +00:00 (Migrated from github.com)

That bottom screenshot on your latest comment is an excel settings view, yes? The local test that I'm running (created a small file and used the live demo, thank you for that link I missed it earlier) was created using my local Excel where that same settings window shows 4-digit year as default, but the test ends with a 2-digit year in JSON.

So far it seems like you've been trying to establish that the fault lies with the local Excel or local environment, but I've seen 4-digit going in and 2-digit coming out. I think maybe what you were trying to say is that the table_fmt array is actually creating a "virtual" local settings condition so that the code knows how to render it "just like excel would" (my phrasing). All that said, I come back around to my original suggestion that the defaults should be 4-digit years (not as a workaround but as more correct date representation for mixed dates of birth and "current" dates to track events e.g.). Alternatively (and I'm willing to spend some time on this as a mental exercise for myself) create pass-through workflow so that one of the member attributes of the opts param (https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L12013) can specify number of year digits for elements in the table_fmt array.

I'll probably submit the latter as a code block to this thread in a few days and let you and any other stakeholders decide whether to integrate it.

For now, I'm comfortable just using the workaround by changing the internal logic of sheet_to_json.

That bottom screenshot on your latest comment is an excel settings view, yes? The local test that I'm running (created a small file and used the live demo, thank you for that link I missed it earlier) was created using my local Excel where that same settings window shows 4-digit year as default, but the test ends with a 2-digit year in JSON. So far it seems like you've been trying to establish that the fault lies with the local Excel or local environment, but I've seen 4-digit going in and 2-digit coming out. I think maybe what you were trying to say is that the table_fmt array is actually creating a "virtual" local settings condition so that the code knows how to render it "just like excel would" (my phrasing). All that said, I come back around to my original suggestion that the defaults should be 4-digit years (not as a workaround but as more correct date representation for mixed dates of birth and "current" dates to track events e.g.). Alternatively (and I'm willing to spend some time on this as a mental exercise for myself) create pass-through workflow so that one of the member attributes of the opts param (https://github.com/SheetJS/js-xlsx/blob/master/xlsx.js#L12013) can specify number of year digits for elements in the table_fmt array. I'll probably submit the latter as a code block to this thread in a few days and let you and any other stakeholders decide whether to integrate it. For now, I'm comfortable just using the workaround by changing the internal logic of sheet_to_json.
SheetJSDev commented 2017-02-18 19:28:51 +00:00 (Migrated from github.com)

@dpackage that bottom screen shows up when you right click a cell and choose "format cells". That window shows you the exact number format that Excel will use on your computer.

The overarching point is that for some date formats Excel doesn't tell you whether to use 2 or 4 digits for the year. The file just stores it as m/dd/yy but Excel will display the date based on your current computer settings. We chose to go with the English (United States) settings.

The best bet in the very short term is to directly modify that entry in the script (there's an object called table_fmt in the code, change entry 14 to the desired format).

Note: Just specifying the number of year digits isn't enough, you also need to pass the default delimiter to use as well as the default order (mdy or ydm or ymd), so at that point you are basically overriding the entire format.

Note: There are other weird formats. The SpreadsheetML Excel 2003 file type actually stores "Short Date" "Medium Date" and "Long Date" and forces readers to figure out which format to use. https://github.com/SheetJS/js-xlsx/blob/master/bits/11_ssfutils.js is the relevant conversion table.

@dpackage that bottom screen shows up when you right click a cell and choose "format cells". That window shows you the exact number format that Excel will use on your computer. The overarching point is that for some date formats Excel doesn't tell you whether to use 2 or 4 digits for the year. The file just stores it as `m/dd/yy` but Excel will display the date based on your current computer settings. We chose to go with the `English (United States)` settings. The best bet in the very short term is to directly modify that entry in the script (there's an object called `table_fmt` in the code, change entry `14` to the desired format). Note: Just specifying the number of year digits isn't enough, you also need to pass the default delimiter to use as well as the default order (mdy or ydm or ymd), so at that point you are basically overriding the entire format. Note: There are other weird formats. The SpreadsheetML Excel 2003 file type actually stores `"Short Date"` `"Medium Date"` and `"Long Date"` and forces readers to figure out which format to use. https://github.com/SheetJS/js-xlsx/blob/master/bits/11_ssfutils.js is the relevant conversion table.
harshalagrawal03 commented 2018-06-10 08:18:21 +00:00 (Migrated from github.com)

I am also facing the same problem. I am reading a csv file. In csv file it is showing 09/08/1988, but after reading it the json, it shows 9/8/88. Also if it reads 25/12/1988 from csv file, it is showing 25/12/1988 in json format. After trying a lot of cases, I came to the conclusion that the bug is appearing only when you are using a date with single digit day.

Thanks
Harshal

I am also facing the same problem. I am reading a csv file. In csv file it is showing 09/08/1988, but after reading it the json, it shows 9/8/88. Also if it reads 25/12/1988 from csv file, it is showing 25/12/1988 in json format. After trying a lot of cases, I came to the conclusion that the bug is appearing only when you are using a date with single digit day. Thanks Harshal
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#560
No description provided.