Parsing string values and treating as dates is incorrect #1300
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#1300
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi!
There is a CSV file, which I try to read and it contains field with value
"Aprobil P 0.1%"
the short example of CSV:
conversion to workbook is the following:
after conversion I save the XLS, where value "Aprobil P 0.1%" is converted to a date 01.04.00
looking into the worksheet model and getting the certain cell, it contains:
is there any way to cover this case?
Could you pls assist, what to do.
Thanks in advance!
any update, any comments? is anybody here? ;)
I am having a similar issue with certain strings being converted to date values. Any update on this?
guys, @SheetJSDev, pls, just pay a little attention to it, tell us, is it possible to fix, or should we rely on our workaround?
guys, one year has been passed, no one even commented.
Do I need to switch to another project?
Hello @pzhelnov ! First, understand that this is an open source project and all work done on the project is charity. If you are concerned about response time or interested in supporting development, we offer paid builds and support as discussed in https://sheetjs.com/pro
That said, unfortunately there's no magic solution here. The Date parsing currently falls back on the JS Date builtin features, which are super flexible in chrome. As an example:
is the date January 1 2001, despite it clearly indicating that it isn't a date.
Setting the option
raw: true
will give you the raw strings, suppressing any sort of value interpretation.If you are interested in helping, the relevant function is
fuzzydate
. Google Docs / Office Online likely use a series of regular expressions based on some known formats like/\d{4}-\d{2}-\d{2}/
foryyyy-mm-dd
and that might be a more sensible approach than using the generic Date constructorThank you very much for an answer!
Sorry for raising my hand in such a rude manner, we've just stuck on this problem for a long time and my workaround approach was not so strict.
First of all I have to say this is great project, one of the best, flexible and configurable, which currently existing in the community.
So, keep going!
We'd accept a PR that does the following:
add an option to the reader, call it something like
strictDates
(use your imagination) that would use this alternative parserchange the line https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L834 to
fuzzydate
and load up the regexes in https://github.com/SheetJS/js-xlsx/blob/master/bits/20_jsutils.js#L126:This obviously isn't 100% correct but is roughly shaped like how we'd expect the ultimate code to be. The new code doesn't the existing behavior (so we could get away with a patch version bump) but it does give you a backdoor to solve your immediate problem and gives us room to grow as we grind further
Hey, is this issue still open to a PR? I'd like to give it a shot if I can
As is usually the case with these problems, the code is the easy part. The hard part is making a decision :(
If this is of interest, start by enumerating the common date formats that the parser should recognize (for example,
YYYY-MM-DD
) and constructing the corresponding regular expressions. Then we'd accept a PR that follows the approach outlined in the previous comment.Hmm, I see - let me start working on this in a bit and I'll see if I can add something of value that works. It's also my first open-source contrib, so please lmk if there's something more I can do!
Hey! Just created a PR for this here - let me know if I need to make any more changes, etc.
Based on some testing, in the
en-US
locale, there appear to be no valid Excel dates that contain letters outside of the month short or long names (even full day names are not accepted). Looking at a number of the issues, a guard like(where
lower_months
is an array of the long names of months) would address many of the problems. The proposed "strict" mode is still needed for determining field values and matching formats that Chrome does not accept (for example,1:23.04:56
)Fixed in 0.18.3: https://jsfiddle.net/9u84Lwja/
The relevant cell is