Unexpected Automatic Date Conversion #2196
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#2196
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Issue Overview
One of our systems which uses
sheetjs
for parsing csv/xlsx data produced an unexpected result this week. The data which caused the issue was"Mayslanding, NJ 08234"
, which the library coerced to adate
as{ t: 'n', v: 2313568, w: '5/1/34' }
. My hunch is that the code which attempts to guess if a cell might be a date is too lax given some inputs.This happens when
raw: false
, which is the default. We have worked around it by simply settingraw: true
and doing our own type conversions. Still, this unexpected result seems likely to trip up others using the default setting.I am willing to volunteer to submit a PR to fix this issue but would like some feedback from a maintainer before writing any code. My guess is that any changes to this date conversion code could very easily cause other unexpected side effects, so advice would be appreciated.
Steps to Reproduce:
Given the (somewhat fabricated) csv data below, the following code will produce the result:
CSV input:
Unexpected result:
To be clear, this ultimately boils down to how V8 (Chrome/Node) handles dates:
We try to correct for it in
fuzzydate
by looking for one of the month labels. I suspect the month name regex can be cleaned up a bit to explicitly match the short or long form for names, likeI'll take a closer look at those and put together a PR when I get a chance.
I am also seeing this when creating an excel file in my project. In our case, the data is a string "1/$1.99" It gets turned into a date with some of the numbers but some randomness as well. 1/2/1999
@jbull328, just to confirm, you're seeing that behavior on creation/output of a document with that data? I am not seeing it on input in
xlsx
(with several column formats) orcsv
.Correct @matthew-macgregor we did see that on output. I ended up abandoning using the module.
@jbull328 If this is showing up when you write a CSV, that's Excel's automatic conversion. For example, consider the CSV
If you write that as plaintext to a file and open in Excel, it will interpret those as dates:
In any case that's an issue unrelated to this one.
I want to confirm that the date parsing strategy of sheetjs is a bit too aggressive for my taste.
Effectively I can only use
raw
reading mode for csv files, as random decimal numbers are interpreted as date, which is not foreseeable so not reliable.There are dozens of issues on this topic.
If I switch on
raw
, this problem disappears, however everything is a string then. I have to parse numbers then.Is there something in between? So a config that will parse numbers if possible, and leaves the rest as string?
@flaushi it's aggressive because V8 (chrome/node) is aggressive.
fuzzydate
attempts some light validation. Unfortunately this story probably ends with a hard-coded list of acceptable date formats, as discussed in https://github.com/SheetJS/sheetjs/issues/1300#issuecomment-530266048Moving this to #1300
0.18.1 fixes this issue. To verify in NodeJS: