Invalid date recognition #1646

Closed
opened 2019-09-30 05:49:49 +00:00 by krojew · 4 comments
krojew commented 2019-09-30 05:49:49 +00:00 (Migrated from github.com)

fuzzydate() function incorrectly guesses date due to invalid regex pattern. For example, let's look at "Trollahaugen 64" text. First, it tries to convert it to date and it ends up being 01.01.1964 due to 64 at the end. Then it proceeds with looking for a month using /jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec/ regex. This is the source of error, since the text contains "aug" inside but it's not a separate word, hence should not be considered a month. Yet, the function assumes it is and returns 01.01.1964.

fuzzydate() function incorrectly guesses date due to invalid regex pattern. For example, let's look at "Trollahaugen 64" text. First, it tries to convert it to date and it ends up being 01.01.1964 due to 64 at the end. Then it proceeds with looking for a month using /jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec/ regex. This is the source of error, since the text contains "aug" inside but it's not a separate word, hence should not be considered a month. Yet, the function assumes it is and returns 01.01.1964.
SheetJSDev commented 2019-09-30 08:11:37 +00:00 (Migrated from github.com)

Just so we're on the same page: the root problem is the incredible flexibility of the Date constructor in V8 and other JS engines. In chrome, new Date("Trollahaugen 64") is interpreted as the beginning of the year 1964 (Jan 1 0:00 local time).

There are a handful of workarounds for common cases we've seen but it's not pretty. The "right" answer is to skip the date constructor entirely and just regex against a series of culture-specific formats

Just so we're on the same page: the root problem is the incredible flexibility of the Date constructor in V8 and other JS engines. In chrome, `new Date("Trollahaugen 64")` is interpreted as the beginning of the year 1964 (Jan 1 0:00 local time). There are a handful of workarounds for common cases we've seen but it's not pretty. The "right" answer is to skip the date constructor entirely and just regex against a series of culture-specific formats
krojew commented 2019-09-30 08:17:22 +00:00 (Migrated from github.com)

I understand the inherent problem with solving such issue - there always will be some edge cases. Working around this particular one seems to be trivial - simply expect the month name to be a whole word, surrounded with non-alphabetic characters, if we assume (which might not be true) that the string is a ECMA-262 Date.toString() representation.

I understand the inherent problem with solving such issue - there always will be some edge cases. Working around this particular one seems to be trivial - simply expect the month name to be a whole word, surrounded with non-alphabetic characters, if we assume (which might not be true) that the string is a ECMA-262 Date.toString() representation.
tophsic commented 2020-10-19 16:05:31 +00:00 (Migrated from github.com)

Hi, I had a similar issue with "SEPEM GRENOBLE 2018" text which is transformed in 9/1/18 date.

Hi, I had a similar issue with "SEPEM GRENOBLE 2018" text which is transformed in 9/1/18 date.
SheetJSDev commented 2021-09-12 07:03:15 +00:00 (Migrated from github.com)

Moving the discussion to https://github.com/SheetJS/sheetjs/issues/1300

@krojew The spec uses the phrase "implementation-dependent":

in general, the value produced by Date.parse is implementation-dependent when given any String value that does not conform to the Date Time String Format

That Date Time String Format is related to ISO 8601 and does not cover the set of date expressions understood by Excel

Moving the discussion to https://github.com/SheetJS/sheetjs/issues/1300 @krojew The spec uses the phrase "implementation-dependent": > in general, the value produced by Date.parse is implementation-dependent when given any String value that does not conform to the Date Time String Format That Date Time String Format is related to ISO 8601 and does not cover the set of date expressions understood by Excel
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1646
No description provided.