Date is treated as as string #2873

Open
opened 2023-02-09 11:40:30 +00:00 by rohit-nx · 10 comments

if date is like 2023-01-01 00:00:00 it is treated as string not as date.
same if date is coming like same above format but other than 1st January it is treated as date.

The problem is that only 1st january date is treated as string.

example:-
"t": "d",
"v": "2023-01-02T18:30:00.000Z",
"w": "23/01/03"

"t": "s",
"v": "2023-01-01 00:00:00"

I am using function XLSX.read(resultString, {
cellDates: true,
});

Please resolve and let me know the solution

if date is like 2023-01-01 00:00:00 it is treated as string not as date. same if date is coming like same above format but other than 1st January it is treated as date. The problem is that only 1st january date is treated as string. example:- "t": "d", "v": "2023-01-02T18:30:00.000Z", "w": "23/01/03" "t": "s", "v": "2023-01-01 00:00:00" I am using function XLSX.read(resultString, { cellDates: true, }); Please resolve and let me know the solution
Owner

Can you share the original file?

Can you share the original file?
Author

This is not only for 2023 year, but also for 2021, 2022, etc main it can't convert it into dates if it is 1st january of any year

What do you mean by original file?

This is not only for 2023 year, but also for 2021, 2022, etc main it can't convert it into dates if it is 1st january of any year What do you mean by original file?
Author

you can try with this code also const resultString = "Date\r\n2021-01-01 00:00:00\r\n2022-01-02 00:00:00"; const wb = XLSX.read(resultString, { type: "string", cellDates: true }); XLSX.writeFile(wb,'export.xlsx);

as a result we get date format for 2nd january and string for 1 January

you can try with this code also ` const resultString = "Date\r\n2021-01-01 00:00:00\r\n2022-01-02 00:00:00"; const wb = XLSX.read(resultString, { type: "string", cellDates: true }); XLSX.writeFile(wb,'export.xlsx);` as a result we get date format for 2nd january and string for 1 January

Hi @sheetjs The issue appears to be with the logic at the end of function fuzzydate

function fuzzydate(s/*:string*/)/*:Date*/ {
	var o = new Date(s), n = new Date(NaN);
	var y = o.getYear(), m = o.getMonth(), d = o.getDate();
	if(isNaN(d)) return n;
	var lower = s.toLowerCase();
	if(lower.match(/jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec/)) {
		lower = lower.replace(/[^a-z]/g,"").replace(/([^a-z]|^)[ap]m?([^a-z]|$)/,"");
		if(lower.length > 3 && lower_months.indexOf(lower) == -1) return n;
	} else if(lower.match(/[a-z]/)) return n;
	if(y < 0 || y > 8099) return n;
	if((m > 0 || d > 1) && y != 101) return o;
	if(s.match(/[^-0-9:,\/\\]/)) return n;
	return o;
}

Specifically, If your datetime string in your CSV is "2023-01-01 00:00:00" for instance then this code will return n (Invalid Date) but it's the result of some logic that @sheetjs might need to explain a bit.

given that

var o = new Date(s)
var y = o.getYear(), m = o.getMonth(), d = o.getDate()

then

if((m > 0 || d > 1) && y != 101) return o;

will not return because the if condition is false - this may be accidental / a bug.

Date.prototype.getMonth returns a zero index value (0-11) and not (1-12) as you might expect. So January results in m == 0.

Date.prototype.getDate gets the date of the month so (1-28, 1-30, 1-31) depending on the month. So first of the month results in d == 1.

For the first of January for any year the if condition (m > 0 || d > 1) && y != 101 will be false. If you change the greater than > to greater than or equal to >= the 1st Jan will then pass.

Date.prototype.getYear will return the number of years after 1900 (this method is deprecated fyi) something not clear is why the if includes an expression for excluding the year 2001 (y != 101). It follows though that any data that is in 2001 this condition will also be false.

So this condition means the next line will execute

if(s.match(/[^-0-9:,\/\\]/)) return n;

This is looking for anything that is not 0-9, -, :, /, \, or , however the string if you recall looks like
"2023-01-01 00:00:00"
which contains a whitespace " ".

This means match will return a match and thus true and the fuzzydate function will return n, and Invalid Date object.

It seems to me that this could be fixed in a number of ways. Probably the >= is the right way to do it. Although adding whitespace to the RegEx would also fix this for this kind of string it's more likely to have side effects, like what would happen if you started with s = " "?

Any guidance? I could put in a PR for this.

One other question - is this the right place for this or the GitHub?

Hi @sheetjs The issue appears to be with the logic at the end of function fuzzydate ```javascript function fuzzydate(s/*:string*/)/*:Date*/ { var o = new Date(s), n = new Date(NaN); var y = o.getYear(), m = o.getMonth(), d = o.getDate(); if(isNaN(d)) return n; var lower = s.toLowerCase(); if(lower.match(/jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec/)) { lower = lower.replace(/[^a-z]/g,"").replace(/([^a-z]|^)[ap]m?([^a-z]|$)/,""); if(lower.length > 3 && lower_months.indexOf(lower) == -1) return n; } else if(lower.match(/[a-z]/)) return n; if(y < 0 || y > 8099) return n; if((m > 0 || d > 1) && y != 101) return o; if(s.match(/[^-0-9:,\/\\]/)) return n; return o; } ``` Specifically, If your datetime string in your CSV is "2023-01-01 00:00:00" for instance then this code will return n (Invalid Date) but it's the result of some logic that @sheetjs might need to explain a bit. given that ```javascript var o = new Date(s) var y = o.getYear(), m = o.getMonth(), d = o.getDate() ``` then ```javascript if((m > 0 || d > 1) && y != 101) return o; ``` will not return because the if condition is false - this may be accidental / a bug. Date.prototype.getMonth returns a zero index value (0-11) and not (1-12) as you might expect. So January results in m == 0. Date.prototype.getDate gets the date of the month so (1-28, 1-30, 1-31) depending on the month. So first of the month results in d == 1. For the first of January for any year the if condition `(m > 0 || d > 1) && y != 101` will be false. If you change the greater than `>` to greater than or equal to `>=` the 1st Jan will then pass. Date.prototype.getYear will return the number of years after 1900 (this method is deprecated fyi) something not clear is why the if includes an expression for excluding the year 2001 (`y != 101`). It follows though that any data that is in 2001 this condition will also be false. So this condition means the next line will execute ```javascript if(s.match(/[^-0-9:,\/\\]/)) return n; ``` This is looking for anything that is not 0-9, -, :, /, \\, or , however the string if you recall looks like `"2023-01-01 00:00:00"` which contains a whitespace " ". This means match will return a match and thus true and the fuzzydate function will return n, and Invalid Date object. It seems to me that this could be fixed in a number of ways. Probably the `>=` is the right way to do it. Although adding whitespace to the RegEx would also fix this for this kind of string it's more likely to have side effects, like what would happen if you started with s = " "? Any guidance? I could put in a PR for this. One other question - is this the right place for this or the GitHub?
Owner

The logic is a huge wokaround for V8 (the JS engine behind the Chrome and Node runtimes) and its extremely robust date parser. For example:

new Date("This is not a date 1") // 2001-01-01

https://github.com/SheetJS/sheetjs/issues/2196 has some other examples including addresses:

new Date("Mayslanding, NJ 08330"); // 8330-05-01

You'll notice these examples have day 1, hence the special casing around it.

.

The first iteration of that logic merely tested if the date was valid. After many refinements, the solution is still not great.

It might be possible to craft a regular expression to exactly match acceptable date strings. Would that be a better approach?

The logic is a huge wokaround for V8 (the JS engine behind the Chrome and Node runtimes) and its extremely robust date parser. For example: ```js new Date("This is not a date 1") // 2001-01-01 ``` https://github.com/SheetJS/sheetjs/issues/2196 has some other examples including addresses: ```js new Date("Mayslanding, NJ 08330"); // 8330-05-01 ``` You'll notice these examples have day `1`, hence the special casing around it. . The first iteration of that logic merely tested if the date was valid. After many refinements, the solution is still not great. It might be possible to craft a regular expression to exactly match acceptable date strings. Would that be a better approach?

Right, that's certainly robust and expected. Apparently it's down to "implementation-specific heuristics" https://bugs.chromium.org/p/v8/issues/detail?id=2602#c2

So your logic is:

  • if it's not Jan 1st and not 2001 then it's probably a date.
  • if it is 1st jan or 2001 and s only contains 0-9, -, :, /, , or , then it is still probably a date
  • otherwise it's not a date.

The issue is that there are a number of standard ways of expressing a datetime in a string for which the 1st Jan will be incorrectly treated in this logic.

There is the example above
"2023-01-01 00:00:00"
but also ISO styles
"2023-01-01T00:00:00Z"
"2023-01-01T00:00:00+01:00"

To let these pass you would need to allow "T", "Z", "+" and " " through the regex but we do know that they would be bracketed by a 0-9 digit on either side so adding that to the regex might be sufficiently fuzzy - could come up with a suggestion - what do you think to that approach?

Right, that's certainly robust and expected. Apparently it's down to "implementation-specific heuristics" https://bugs.chromium.org/p/v8/issues/detail?id=2602#c2 So your logic is: * if it's not Jan 1st and not 2001 then it's probably a date. * if it is 1st jan or 2001 and s only contains 0-9, -, :, /, \, or , then it is still probably a date * otherwise it's not a date. The issue is that there are a number of standard ways of expressing a datetime in a string for which the 1st Jan will be incorrectly treated in this logic. There is the example above `"2023-01-01 00:00:00"` but also ISO styles `"2023-01-01T00:00:00Z"` `"2023-01-01T00:00:00+01:00"` To let these pass you would need to allow "T", "Z", "+" and " " through the regex but we do know that they would be bracketed by a 0-9 digit on either side so adding that to the regex might be sufficiently fuzzy - could come up with a suggestion - what do you think to that approach?
Owner

ISO styles are not supported in Excel.

consider the CSV

ISO,2023-01-01T00:00:00Z,=TYPE(B1)
DATE,2023-01-01 00:00:00,=TYPE(B2)

The attached screenshot shows the view when you open in Excel.

The third column shows the cell type according to Excel. The first row is 2 since Excel treats that value as text. The second row is 1 for number (since Excel stores dates as numbers under the hood).

ISO styles are not supported in Excel. consider the CSV ``` ISO,2023-01-01T00:00:00Z,=TYPE(B1) DATE,2023-01-01 00:00:00,=TYPE(B2) ``` The attached screenshot shows the view when you open in Excel. The third column shows the cell type according to Excel. The first row is 2 since Excel treats that value as text. The second row is 1 for number (since Excel stores dates as numbers under the hood).

How about a check to see if the string is actually in ISO? Can pass multiple ISO styles. Piggybacking off the regular expression suggestion:

function isValidISODate(string) {
  const isoRegex = /^(\d{4})-([01]\d)-([0-3]\d)T([0-2]\d):([0-5]\d):([0-5]\d)(\.\d+)?(Z|[+-]([01]\d|2[0-3]):([0-5]\d))?$/;

  return isoRegex.test(string);
}

console.log(isValidISODate('2023-01-01T00:00:00Z')); // true
console.log(isValidISODate('2023-03-03T14:35:00-05:00')); // true
console.log(isValidISODate('2023-01-01T00:00:00+01:00')); // true
console.log(isValidISODate('2023-01-01 00:00:00')); // false
console.log(isValidISODate('Mayslanding, NJ 08234')); // false 
console.log(isValidISODate('This is not a date 1')); // false

How about a check to see if the string is actually in ISO? Can pass multiple ISO styles. Piggybacking off the regular expression suggestion: ``` function isValidISODate(string) { const isoRegex = /^(\d{4})-([01]\d)-([0-3]\d)T([0-2]\d):([0-5]\d):([0-5]\d)(\.\d+)?(Z|[+-]([01]\d|2[0-3]):([0-5]\d))?$/; return isoRegex.test(string); } ``` ``` console.log(isValidISODate('2023-01-01T00:00:00Z')); // true console.log(isValidISODate('2023-03-03T14:35:00-05:00')); // true console.log(isValidISODate('2023-01-01T00:00:00+01:00')); // true console.log(isValidISODate('2023-01-01 00:00:00')); // false console.log(isValidISODate('Mayslanding, NJ 08234')); // false console.log(isValidISODate('This is not a date 1')); // false ```

I guess it comes down to what the objective is here. My presumption that ISO dates should be converted was probably wrong @juctaposed.

Going by @sheetjs example of Excel parsing datetimes from CSV I presume that the objective is for SheetJS to produce the same automatic conversions as Excel would when opening a CSV.

If that is the case, then starting by converting string to date using the interpreter then trying to decide if the outcome matches what Excel would do seems like it might be more difficult than starting by checking for formats which are known to convert to datetimes when reading CSVs with Excel. For instance, it results in this bit of code to catch "implementation-specific heuristics" in V8. Is that list of formats provided by Microsoft anywhere?

It would be simple enough to add the check for a space separated date and time with the other characters

/[^-0-9:,\/\\(\d\s\d)]/

Where (\d\s\d) matches a sequence digit space digit.

So question then, are there any strings which include only -, digits, :, ,, /, \, and digit space digit which parse as dates in V8 but should be treated as string in Excel?

I guess it comes down to what the objective is here. My presumption that ISO dates should be converted was probably wrong @juctaposed. Going by @sheetjs example of Excel parsing datetimes from CSV I presume that the objective is for SheetJS to produce the same automatic conversions as Excel would when opening a CSV. If that is the case, then starting by converting string to date using the interpreter then trying to decide if the outcome matches what Excel would do seems like it might be more difficult than starting by checking for formats which are known to convert to datetimes when reading CSVs with Excel. For instance, it results in this bit of code to catch "implementation-specific heuristics" in V8. Is that list of formats provided by Microsoft anywhere? It would be simple enough to add the check for a space separated date and time with the other characters `/[^-0-9:,\/\\(\d\s\d)]/` Where `(\d\s\d)` matches a sequence digit space digit. So question then, are there any strings which include only -, digits, :, ,, /, \\, and digit space digit which parse as dates in V8 but should be treated as string in Excel?

Dates and times are usually a logistical nightmare, that's probably why the the chromium team pulled out the "implementation-specific heuristics". I imagine that's their terminology for "we used an entire roll of duct tape and it works most of the time, so...". What Excel will read as date/time is based on the version of Excel and the users device settings. There's the standard list of examples that Excel will read fine:

  • YYYY-MM-DD
  • MM/DD/YYYY
  • DD/MM/YYYY
  • YYYY/MM/DD
  • YYYY-MM-DD HH:MM:SS
  • MM/DD/YYYY HH:MM:SS
  • DD/MM/YYYY HH:MM:SS
  • YYYY/MM/DD HH:MM:SS
  • YYYY-MM-DD HH:MM
  • MM/DD/YYYY HH:MM
  • DD/MM/YYYY HH:MM
  • YYYY/MM/DD HH:MM

There might be niche cases where people reserve values in a similar format. One thing that comes to mind that could be cutting it close is Parcel IDs/Block and Lots for US properties (e.g. 0009-P-00150-0000-00). Though there's usually an appendix letter and longer numbers (atleast for parcels that I'm familiar with). I can't really think of other cases where strings would be in these formats if they weren't dates/time.

So the check seems good, worst case if the string passes and it appears in string format in the worksheet, the cell(s) should still be formattable by date/time in Excel. I did not realize the issue from Excel treating them as text comes from the formats including timezone offsets.

Depending on what route the devs want to go when handling that format, there are JS libraries like luxon that can pull the heavy lifting, parsing date formats with built in support for timezone logic and daylight savings time, plus conversion of strings to most of the readable ISOs from the list above. Though I'm not certain on any impending licensing issues.

I imagine an in house solution with time zones and daylight savings adjustments would be painful.

There is also the duct tape method of documenting for the user to apply DATEVALUE + TIMEVALUE +/- HOUR manually to any cells with offset values. That wouldn't be an automatic conversion of course, but this should do the trick in a pinch.

Dates and times are usually a logistical nightmare, that's probably why the the chromium team pulled out the "implementation-specific heuristics". I imagine that's their terminology for "we used an entire roll of duct tape and it works most of the time, so...". What Excel will read as date/time is based on the version of Excel and the users device settings. There's the standard list of examples that Excel will read fine: - YYYY-MM-DD - MM/DD/YYYY - DD/MM/YYYY - YYYY/MM/DD - YYYY-MM-DD HH:MM:SS - MM/DD/YYYY HH:MM:SS - DD/MM/YYYY HH:MM:SS - YYYY/MM/DD HH:MM:SS - YYYY-MM-DD HH:MM - MM/DD/YYYY HH:MM - DD/MM/YYYY HH:MM - YYYY/MM/DD HH:MM There might be niche cases where people reserve values in a similar format. One thing that comes to mind that could be cutting it close is Parcel IDs/Block and Lots for US properties (e.g. 0009-P-00150-0000-00). Though there's usually an appendix letter and longer numbers (atleast for parcels that I'm familiar with). I can't really think of other cases where strings would be in these formats if they weren't dates/time. So the check seems good, worst case if the string passes and it appears in string format in the worksheet, the cell(s) should still be formattable by date/time in Excel. I did not realize the issue from Excel treating them as text comes from the formats including timezone offsets. Depending on what route the devs want to go when handling that format, there are JS libraries like [luxon](https://www.npmjs.com/package/luxon) that can pull the heavy lifting, parsing date formats with built in support for timezone logic and daylight savings time, plus conversion of strings to most of the readable ISOs from the list above. Though I'm not certain on any impending licensing issues. I imagine an in house solution with time zones and daylight savings adjustments would be painful. There is also the duct tape method of documenting for the user to apply DATEVALUE + TIMEVALUE +/- HOUR manually to any cells with offset values. That wouldn't be an automatic conversion of course, but this should do the trick in a pinch.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2873
No description provided.