Issue in XLSX.read with CSV data having only one column #1976

Closed
opened 2020-06-03 19:47:01 +00:00 by manthanbaranwal · 3 comments
manthanbaranwal commented 2020-06-03 19:47:01 +00:00 (Migrated from github.com)

I came accross this issue while working with CSVs. When a CSV file has only one column (which means it wouldn't have any delimeter, e.g, ","), the workbook created by XLSX.read() method gets corrupt in a way that it breaks down cells into strings of lenght 10 when spaces are there in data.

E.g: If my CSV data is :

Column_1
ABCDE
ABCDE FGH
ABCDE FGHIJK
ABCDE FGHIJ KLMN

Then my workbook would look like :

{
  "SheetNames": [
    "Sheet1"
  ],
  "Sheets": {
    "Sheet1": {
      "A1": {
        "v": "Column_1",
        "t": "s"
      },
      "A2": {
        "v": "ABCDE",
        "t": "s"
      },
      "A3": {
        "v": "ABCDE FGH",
        "t": "s"
      },
      "A4": {
        "v": "ABCDE FGHI",
        "t": "s"
      },
      "B4": {
        "v": "JK",
        "t": "s"
      },
      "A5": {
        "v": "ABCDE FGHI",
        "t": "s"
      },
      "B5": {
        "v": "J KLMN",
        "t": "s"
      },
      "!ref": "A1:B5"
    }
  }
}

and eventually my JSON obejct after sheet_to_csv will look something like this :

[
  {
    "Column_1": "ABCDE"
  },
  {
    "Column_1": "ABCDE FGH"
  },
  {
    "Column_1": "ABCDE FGHI",
    "__EMPTY": "JK"
  },
  {
    "Column_1": "ABCDE FGHI",
    "__EMPTY": "J KLMN"
  }
]

By looking into code I found that this function is causing it :

	function prn_to_sheet_str(str, opts) {
		if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts);
		if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts);
		return aoa_to_sheet(prn_to_aoa_str(str, opts), opts);
	}

As the single column CSV file will not have actually any delimeter so it would not be considered CSV and instead dsv_to_sheet_str() method, aoa_to_sheet(prn_to_aoa_str()) would return the workbook data.

Possible Solution:

	function prn_to_sheet_str(str, opts) {
		if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts);
		if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts);
               // Cheking if row is not an array
               if(!Array.isArray(str.split("\n")[0])) return dsv_to_sheet_str(str, opts)
		return aoa_to_sheet(prn_to_aoa_str(str, opts), opts);
	}
I came accross this issue while working with CSVs. When a CSV file has only one column (which means it wouldn't have any delimeter, e.g, ","), the workbook created by `XLSX.read()` method gets corrupt in a way that it breaks down cells into strings of lenght 10 when spaces are there in data. E.g: If my CSV data is : ``` Column_1 ABCDE ABCDE FGH ABCDE FGHIJK ABCDE FGHIJ KLMN ``` Then my workbook would look like : ``` { "SheetNames": [ "Sheet1" ], "Sheets": { "Sheet1": { "A1": { "v": "Column_1", "t": "s" }, "A2": { "v": "ABCDE", "t": "s" }, "A3": { "v": "ABCDE FGH", "t": "s" }, "A4": { "v": "ABCDE FGHI", "t": "s" }, "B4": { "v": "JK", "t": "s" }, "A5": { "v": "ABCDE FGHI", "t": "s" }, "B5": { "v": "J KLMN", "t": "s" }, "!ref": "A1:B5" } } } ``` and eventually my JSON obejct after ```sheet_to_csv``` will look something like this : ``` [ { "Column_1": "ABCDE" }, { "Column_1": "ABCDE FGH" }, { "Column_1": "ABCDE FGHI", "__EMPTY": "JK" }, { "Column_1": "ABCDE FGHI", "__EMPTY": "J KLMN" } ] ``` By looking into code I found that this function is causing it : ``` function prn_to_sheet_str(str, opts) { if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts); if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts); return aoa_to_sheet(prn_to_aoa_str(str, opts), opts); } ``` As the single column CSV file will not have actually any delimeter so it would not be considered CSV and instead `dsv_to_sheet_str()` method, `aoa_to_sheet(prn_to_aoa_str())` would return the workbook data. Possible Solution: ``` function prn_to_sheet_str(str, opts) { if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts); if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts); // Cheking if row is not an array if(!Array.isArray(str.split("\n")[0])) return dsv_to_sheet_str(str, opts) return aoa_to_sheet(prn_to_aoa_str(str, opts), opts); } ```
SheetJSDev commented 2020-06-03 19:58:05 +00:00 (Migrated from github.com)

Thanks for sharing!

You correctly identified the issue -- when there are no common delimiters, the file is assumed to be lotus PRN. I wonder if the better fix is to just disable PRN parsing by default. It would look like

	function prn_to_sheet_str(str, opts) {
		if(!(opts && opts.PRN)) return dsv_to_sheet_str(str, opts);
		if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts);
		if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts);
		return aoa_to_sheet(prn_to_aoa_str(str, opts), opts);
	}

That would make a new parse option PRN which would control whether to enable Lotus PRN parsing. It's only really commonly used in very old files, so it probably doesn't make sense as a default.

Feel free to send a PR

Thanks for sharing! You correctly identified the issue -- when there are no common delimiters, the file is assumed to be lotus PRN. I wonder if the better fix is to just disable PRN parsing by default. It would look like ```js function prn_to_sheet_str(str, opts) { if(!(opts && opts.PRN)) return dsv_to_sheet_str(str, opts); if(str.slice(0,4) == "sep=") return dsv_to_sheet_str(str, opts); if(str.indexOf("\t") >= 0 || str.indexOf(",") >= 0 || str.indexOf(";") >= 0) return dsv_to_sheet_str(str, opts); return aoa_to_sheet(prn_to_aoa_str(str, opts), opts); } ``` That would make a new parse option `PRN` which would control whether to enable Lotus PRN parsing. It's only really commonly used in very old files, so it probably doesn't make sense as a default. Feel free to send a PR
manthanbaranwal commented 2020-06-03 21:14:33 +00:00 (Migrated from github.com)

Hey, Thanks for quick response. I would agree disabling PRN pasring by default would make more sense. How do I push to your repo?

Hey, Thanks for quick response. I would agree disabling PRN pasring by default would make more sense. How do I push to your repo?
SheetJSDev commented 2020-06-03 22:05:42 +00:00 (Migrated from github.com)

You have to make your own fork (press the fork button), push to that fork (github will let you push to your own fork), then make a pull request

You have to make your own fork (press the fork button), push to that fork (github will let you push to your own fork), then make a pull request
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1976
No description provided.