Reading from TSV - date (dd-MON-yy) comes through as integer #647

Closed
opened 2017-05-04 00:55:31 +00:00 by psalmody · 8 comments
psalmody commented 2017-05-04 00:55:31 +00:00 (Migrated from github.com)

Lines in TSV look like (see e-mail for full file) (and yes, this is fake data):

1	Emily	Fisher	efisher0@google.de	Female	161.31.81.163	89.64	15-NOV-2015	29-JAN-2016	89.62
2	Justin	Shaw	jshaw1@eepurl.com	Male	20.170.71.46	90.5	21-JUL-2016	28-MAR-2016	41.77

Doing a simple:

  let wb = XLSX.readFile('./local/input/tsv/SHORT_MOCK.txt', {
    FS: '\t',
    cellDates: true
  })
  XLSX.writeFile(wb, 'tsv.SHORT_MOCK.xlsx')

And my excel sheet dates look like:
image

Lines in TSV look like (see e-mail for full file) (and yes, this is fake data): ``` 1 Emily Fisher efisher0@google.de Female 161.31.81.163 89.64 15-NOV-2015 29-JAN-2016 89.62 2 Justin Shaw jshaw1@eepurl.com Male 20.170.71.46 90.5 21-JUL-2016 28-MAR-2016 41.77 ``` Doing a simple: ```js let wb = XLSX.readFile('./local/input/tsv/SHORT_MOCK.txt', { FS: '\t', cellDates: true }) XLSX.writeFile(wb, 'tsv.SHORT_MOCK.xlsx') ``` And my excel sheet dates look like: ![image](https://cloud.githubusercontent.com/assets/12420157/25687242/4084556a-3021-11e7-9de6-2b97f86cc2da.png)
SheetJSDev commented 2017-05-04 03:09:02 +00:00 (Migrated from github.com)

@psalmody good catch, and thanks for sending the file! There are actually 2 related errors (both in dates and in the IP addresses), boiling down to inappropriate use of parseFloat:

> var ip = "161.31.81.163", date = "15-NOV-2015";
> Number(ip)
NaN
> parseFloat(ip)
161.31 // <-- this is the value you see in the ip_address column
> Number(date)
NaN
> parseFloat(date)
15 // <-- this is the value you see rather than the date

The isNaN checks don't work as expected with parseFloat, which is easy to fix: the DSV check should be replaced with the equivalent check for the DIF format.

There are also two other items to address:

  • the dateNF / cellDates option should be applied for the DSV formats
  • we should investigate how to deduce the date format (e.g. "dd-mmm-yyyy" from "15-NOV-2015")

We're going to push a change sometime in the next day or two with this fix as well as a few other CSV-related issues

@psalmody good catch, and thanks for sending the file! There are actually 2 related errors (both in dates and in the IP addresses), boiling down to inappropriate use of `parseFloat`: ```js > var ip = "161.31.81.163", date = "15-NOV-2015"; > Number(ip) NaN > parseFloat(ip) 161.31 // <-- this is the value you see in the ip_address column > Number(date) NaN > parseFloat(date) 15 // <-- this is the value you see rather than the date ``` The `isNaN` checks don't work as expected with `parseFloat`, which is easy to fix: [the DSV check](https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L501) should be replaced with the [equivalent check for the DIF format](https://github.com/SheetJS/js-xlsx/blob/master/bits/40_harb.js#L366-L367). There are also two other items to address: - the `dateNF` / `cellDates` option should be applied for the DSV formats - we should investigate how to deduce the date format (e.g. `"dd-mmm-yyyy"` from `"15-NOV-2015"`) We're going to push a change sometime in the next day or two with this fix as well as a few other CSV-related issues
psalmody commented 2017-05-09 17:17:58 +00:00 (Migrated from github.com)

Any status on this?

Any status on this?
SheetJSDev commented 2017-05-09 17:32:56 +00:00 (Migrated from github.com)

We're testing 0.10.0 right now :)

We're testing 0.10.0 right now :)
SheetJSDev commented 2017-05-09 18:34:46 +00:00 (Migrated from github.com)

We just pushed 0.10.0. This version addresses most of the points here and you should see reasonable results. However, we are not done here.

The missing feature is preserving the original date format when reading the CSV (detecting that "15-NOV-2015" is actually of the form dd-mmm-yyyy) and we need an eversion of the SSF formatting process. Keep open for now

We just pushed 0.10.0. This version addresses most of the points here and you should see reasonable results. However, we are not done here. The missing feature is preserving the original date format when reading the CSV (detecting that "15-NOV-2015" is actually of the form `dd-mmm-yyyy`) and we need an eversion of the SSF formatting process. Keep open for now
psalmody commented 2017-05-09 22:15:10 +00:00 (Migrated from github.com)

So it's somewhat better - now running data outputs a date value, but the cells are formatted as "General" still. Tried this with both DD-MON-YYYY format and YYYY-MM-DD
image

So it's somewhat better - now running data outputs a date value, but the cells are formatted as "General" still. Tried this with both DD-MON-YYYY format and YYYY-MM-DD ![image](https://cloud.githubusercontent.com/assets/12420157/25875019/dc1f769a-34c1-11e7-9f00-e555653e0fb5.png)
SheetJSDev commented 2017-05-09 22:28:15 +00:00 (Migrated from github.com)

I can't reproduce the issue. Using the original text file I tried:

var XLSX = require('xlsx');
var wb = XLSX.readFile('SHORT_MOCK.txt', {cellDates:true});
XLSX.writeFile(wb, 'SHORT_MOCK.txt.xlsx');

Since you said it was fake data, hopefully posting the output isn't an issue: SHORT_MOCK.txt.xlsx

screen shot 2017-05-09 at 18 23 25
I can't reproduce the issue. Using the original text file I tried: ```js var XLSX = require('xlsx'); var wb = XLSX.readFile('SHORT_MOCK.txt', {cellDates:true}); XLSX.writeFile(wb, 'SHORT_MOCK.txt.xlsx'); ``` Since you said it was fake data, hopefully posting the output isn't an issue: [SHORT_MOCK.txt.xlsx](https://github.com/SheetJS/js-xlsx/files/988184/SHORT_MOCK.txt.xlsx) <img width="562" alt="screen shot 2017-05-09 at 18 23 25" src="https://cloud.githubusercontent.com/assets/6070939/25875389/25f6971c-34e5-11e7-9ce9-46add634f038.png">
psalmody commented 2017-05-09 22:39:40 +00:00 (Migrated from github.com)

OK - my only difference was the last line:

XLSX.writeFile(wb, 'SHORT_MOCK.txt.xlsx', {cellDates: true});
OK - my only difference was the last line: ```js XLSX.writeFile(wb, 'SHORT_MOCK.txt.xlsx', {cellDates: true}); ```
SheetJSDev commented 2017-05-09 22:43:17 +00:00 (Migrated from github.com)

Hmm My reading of the spec suggests that writing a date in XLSX would automatically imply the general date format. I guess not. That will be fixed in the next release

Hmm My reading of the spec suggests that writing a date in XLSX would automatically imply the general date format. I guess not. That will be fixed in the next release
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#647
No description provided.