Cell values with date format showing as "1/0/1900" despite raw:true when cellStyles:true is enabled #3268

Open
opened 2025-01-07 12:25:18 +00:00 by deepak-negi-web · 7 comments
Contributor

Description

I'm experiencing an issue where cell values (both headers and data) that have date formatting in Excel are being converted to "1/0/1900" or "Invalid Date" in the output, even when using raw:true. This occurs specifically when cellStyles:true is enabled, which I need for the skipHidden functionality.

Current Behavior

  • When cellStyles:true is set, text values that have date formatting in Excel are converted to "1/0/1900" or "Invalid Date"
  • This happens even when raw:true is set both in the read options and sheet_to_json options
  • The issue affects both column headers and data cells
  • The conversion seems to override the raw:true setting

Expected Behavior

  • Text values should remain as text, regardless of their cell formatting in Excel
  • raw:true should take precedence over cell style formatting
  • Original text values should be preserved in the output

Reproduction Steps

  1. Create an Excel file with some columns having date formatting (but containing text values)
  2. Use the following code:
const sheetName = "Sheet2";
const filePath = "./sample-file.xlsx";
const workbook = XLSX.readFile(filePath, {
    raw: true,
    cellText: false,
    cellDates: false,
    cellFormula: false,
    cellHTML: false,
    cellStyles: true,  // Required for skipHidden
    sheets: sheetName,
    dense: true
});

const worksheet = workbook.Sheets[sheetName];

const jsonStream = XLSX.utils.sheet_to_json(worksheet, {
    blankrows: false,
    header: 1,
    raw: true,
    dateNF: 'yyyy-mm-dd',
    defval: '',
    skipHidden: true
});

console.log(jsonStream[0]); // Shows "1/0/1900" for date-formatted cells

Current Workaround

Currently, I can fix this by manually processing the cells:

if (worksheet['!data']) {
    worksheet['!data'].forEach(row=>{
        row.forEach((cell, index) => {
            if (cell) {
                // Store the formatted text value instead of the date value
                cell.v = cell.w || cell.v;
                // Remove the date type if it exists
                delete cell.z;
            }
        });
    })
}

However, this is not ideal because:

  1. For data cells (not just headers), this would require iterating through millions of cells
  2. Processing large datasets this way can cause performance issues
  3. It feels like raw:true should handle this automatically

Environment

  • SheetJS Version: 0.20.3
  • Node.js Version: 18.13.0
  • Operating System: windows

Additional Context

  • I need to keep cellStyles:true because it's required for the skipHidden property to work
  • The Excel files can be quite large (2M+ cells)
  • This affects client files where we cannot modify the original Excel formatting

Question

Is there a way to get the raw text values while keeping cellStyles:true enabled? Or is there an alternative approach to using skipHidden without requiring cellStyles:true?

## Description I'm experiencing an issue where cell values (both headers and data) that have date formatting in Excel are being converted to "1/0/1900" or "Invalid Date" in the output, even when using `raw:true`. This occurs specifically when `cellStyles:true` is enabled, which I need for the `skipHidden` functionality. ### Current Behavior - When `cellStyles:true` is set, text values that have date formatting in Excel are converted to "1/0/1900" or "Invalid Date" - This happens even when `raw:true` is set both in the read options and sheet_to_json options - The issue affects both column headers and data cells - The conversion seems to override the `raw:true` setting ### Expected Behavior - Text values should remain as text, regardless of their cell formatting in Excel - `raw:true` should take precedence over cell style formatting - Original text values should be preserved in the output ### Reproduction Steps 1. Create an Excel file with some columns having date formatting (but containing text values) 2. Use the following code: ```javascript const sheetName = "Sheet2"; const filePath = "./sample-file.xlsx"; const workbook = XLSX.readFile(filePath, { raw: true, cellText: false, cellDates: false, cellFormula: false, cellHTML: false, cellStyles: true, // Required for skipHidden sheets: sheetName, dense: true }); const worksheet = workbook.Sheets[sheetName]; const jsonStream = XLSX.utils.sheet_to_json(worksheet, { blankrows: false, header: 1, raw: true, dateNF: 'yyyy-mm-dd', defval: '', skipHidden: true }); console.log(jsonStream[0]); // Shows "1/0/1900" for date-formatted cells ``` ### Current Workaround Currently, I can fix this by manually processing the cells: ```javascript if (worksheet['!data']) { worksheet['!data'].forEach(row=>{ row.forEach((cell, index) => { if (cell) { // Store the formatted text value instead of the date value cell.v = cell.w || cell.v; // Remove the date type if it exists delete cell.z; } }); }) } ``` However, this is not ideal because: 1. For data cells (not just headers), this would require iterating through millions of cells 2. Processing large datasets this way can cause performance issues 3. It feels like `raw:true` should handle this automatically ### Environment - SheetJS Version: 0.20.3 - Node.js Version: 18.13.0 - Operating System: windows ### Additional Context - I need to keep `cellStyles:true` because it's required for the `skipHidden` property to work - The Excel files can be quite large (2M+ cells) - This affects client files where we cannot modify the original Excel formatting ### Question Is there a way to get the raw text values while keeping `cellStyles:true` enabled? Or is there an alternative approach to using `skipHidden` without requiring `cellStyles:true`?
Owner

Thanks for reporting! In this case, you have a text cell with a date format, which incorrectly elicited a date conversion. There is a missing break in the make_json_row function:

--- a/bits/90_utils.js
+++ b/bits/90_utils.js
@@ -24,7 +24,7 @@ function make_json_row(sheet/*:Worksheet*/, r/*:Range*/, R/*:number*/, cols/*:Ar
                switch(val.t){
                        case 'z': if(v == null) break; continue;
                        case 'e': v = (v == 0 ? null : void 0); break;
-                       case 's': case 'b':
+                       case 's': case 'b': break;
                        case 'n': if(!val.z || !fmt_is_date(val.z)) break;
Thanks for reporting! In this case, you have a text cell with a date format, which incorrectly elicited a date conversion. There is a missing `break` in the `make_json_row` function: ```diff --- a/bits/90_utils.js +++ b/bits/90_utils.js @@ -24,7 +24,7 @@ function make_json_row(sheet/*:Worksheet*/, r/*:Range*/, R/*:number*/, cols/*:Ar switch(val.t){ case 'z': if(v == null) break; continue; case 'e': v = (v == 0 ? null : void 0); break; - case 's': case 'b': + case 's': case 'b': break; case 'n': if(!val.z || !fmt_is_date(val.z)) break; ```
Author
Contributor

Thank you so much for the quick response and fix! The missing break statement explains the date conversion issue perfectly.

While we're on the topic, I also noticed that the raw: true option in sheet_to_json() doesn't seem to work as expected for dates. Even with raw: true, instead of getting the original Excel epoch timestamp, it returns a JavaScript timestamp. For example:

const jsonStream = XLSX.utils.sheet_to_json(worksheet, {
    raw: true,
    // other options...
});

Shouldn't raw: true return the original Excel epoch number rather than converting it to a JavaScript date/timestamp? This seems inconsistent with the documentation which suggests raw: true should return the underlying value without type conversion.

Thank you so much for the quick response and fix! The missing `break` statement explains the date conversion issue perfectly. While we're on the topic, I also noticed that the `raw: true` option in `sheet_to_json()` doesn't seem to work as expected for dates. Even with `raw: true`, instead of getting the original Excel epoch timestamp, it returns a JavaScript timestamp. For example: ```javascript const jsonStream = XLSX.utils.sheet_to_json(worksheet, { raw: true, // other options... }); ``` Shouldn't `raw: true` return the original Excel **epoch number** rather than converting it to a JavaScript date/timestamp? This seems inconsistent with the documentation which suggests `raw: true` should return the underlying value without type conversion.
Owner

There are two steps here:

flowchart LR
    file[("File")]
    wb((("SheetJS<br>Workbook")))
    data[("JSON<br>Rows")]

    file --readFile<br/><br/>--> wb --sheet_to_json<br/><br/>--> data

The raw parameter to read and readFile affect formats where plaintext values are parsed (including CSV and HTML). cellDates enables the value coercion for XLSX, XLSB, XLS and other formats that use date codes.

The raw parameter to sheet_to_json controls which values are emitted (underlying cell value or formatted text from the worksheet).

The only way to get the date codes is to pass the option cellDates: false when reading the file and raw: true when generating rows, as it ensures the date codes are preserved when parsing the file.

There are two steps here: ```mermaid flowchart LR file[("File")] wb((("SheetJS<br>Workbook"))) data[("JSON<br>Rows")] file --readFile<br/><br/>--> wb --sheet_to_json<br/><br/>--> data ``` The `raw` parameter to `read` and `readFile` affect formats where plaintext values are parsed (including CSV and HTML). `cellDates` enables the value coercion for XLSX, XLSB, XLS and other formats that use date codes. The `raw` parameter to `sheet_to_json` controls which values are emitted (underlying cell value or formatted text from the worksheet). The only way to get the date codes is to pass the option `cellDates: false` when reading the file and `raw: true` when generating rows, as it ensures the date codes are preserved when parsing the file.
Author
Contributor

Thanks for the explanation about the raw and cellDates parameters. However, I'm still seeing an issue with these settings. I am already using the combination you suggested:

// Reading the file
const workbook = XLSX.readFile(filePath, {
    raw: true,
    dense: true,
    cellDates: false,
    cellStyles: true,  // This seems to affect the behavior
    // other options...
});

// Converting to JSON
const jsonStream = XLSX.utils.sheet_to_json(worksheet, {
    raw: true,
    // other options...
});

Even with cellDates: false and raw: true in both places, I'm not getting the date codes. I've noticed this only happens when cellStyles: true is enabled. When I set cellStyles: false, the date codes are returned as expected.
Since I need cellStyles: true for the skipHidden functionality to work, is there a way to preserve the date codes while keeping cell styles enabled?

Thanks for the explanation about the `raw` and `cellDates` parameters. However, I'm still seeing an issue with these settings. I am already using the combination you suggested: ```javascript // Reading the file const workbook = XLSX.readFile(filePath, { raw: true, dense: true, cellDates: false, cellStyles: true, // This seems to affect the behavior // other options... }); // Converting to JSON const jsonStream = XLSX.utils.sheet_to_json(worksheet, { raw: true, // other options... }); ``` Even with `cellDates: false` and `raw: true` in both places, I'm not getting the date codes. I've noticed this only happens when `cellStyles: true` is enabled. When I set `cellStyles: false`, the date codes are returned as expected. Since I need `cellStyles: true` for the skipHidden functionality to work, is there a way to preserve the date codes while keeping cell styles enabled?
Owner

Currently, cellStyles: true forces cellNF: true (save number formats), which forces the date conversion in line 28:

			case 'n': if(!val.z || !fmt_is_date(val.z)) break;

In general, Dates are more useful in JS contexts than the raw date codes. We can add a cellDates option to sheet_to_json that would suppress the conversion.

Currently, `cellStyles: true` forces `cellNF: true` (save number formats), which forces the date conversion in line 28: ```js case 'n': if(!val.z || !fmt_is_date(val.z)) break; ``` In general, Dates are more useful in JS contexts than the raw date codes. We can add a `cellDates` option to `sheet_to_json` that would suppress the conversion.
Author
Contributor

Thanks for explaining about the cellStyles behavior. So if I understand correctly: if I add a cellDates option to sheet_to_json, it would allow us to control whether date codes get converted to JS dates, even when cellStyles is enabled?

I've added the change below - could you confirm if this is what you had in mind?

case 'n': if(!val.z || !fmt_is_date(val.z)) break;
           if(o.cellDates) {
               v = numdate(v); // TODO: date1904 setting should also be stored in worksheet object
           }

This would only perform the date conversion when cellDates: true is explicitly passed to sheet_to_json.

Thanks for explaining about the cellStyles behavior. So if I understand correctly: if I add a `cellDates` option to `sheet_to_json`, it would allow us to control whether date codes get converted to JS dates, even when `cellStyles` is enabled? I've added the change below - could you confirm if this is what you had in mind? ```js case 'n': if(!val.z || !fmt_is_date(val.z)) break; if(o.cellDates) { v = numdate(v); // TODO: date1904 setting should also be stored in worksheet object } ``` This would only perform the date conversion when `cellDates: true` is explicitly passed to `sheet_to_json`.
Author
Contributor

Also, I noticed something else regarding date formatting: When using cellStyles: true while reading the file,In the sheet_to_json method even though I specify dateNF: 'yyyy-mm-dd', the dates are being returned with the Excel cell's format (e.g., 'mm/dd/yyyy') instead of the requested format. Looking at the safe_format_cell function, it seems to use cell.z format instead of the provided dateNF.
Should dateNF take precedence over the cell's format when explicitly provided in the options? Or am I misunderstanding how these options should interact?

Also, I noticed something else regarding date formatting: When using `cellStyles: true` while reading the file,In the `sheet_to_json` method even though I specify `dateNF: 'yyyy-mm-dd'`, the dates are being returned with the Excel cell's format (e.g., 'mm/dd/yyyy') instead of the requested format. Looking at the `safe_format_cell` function, it seems to use `cell.z` format instead of the provided `dateNF`. Should `dateNF` take precedence over the cell's format when explicitly provided in the options? Or am I misunderstanding how these options should interact?
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3268
No description provided.