Custom delimiters/separators not working for csv files #1943

Closed
opened 2020-05-13 09:28:05 +00:00 by mighty98 · 3 comments
mighty98 commented 2020-05-13 09:28:05 +00:00 (Migrated from github.com)

I have a csv file with below entries
image
However parsed result doesn't consider this and gives incorrect result. Can someone please help
image

I have a csv file with below entries ![image](https://user-images.githubusercontent.com/32213921/81795642-f67ff480-9529-11ea-8032-8a73df1cba61.png) However parsed result doesn't consider this and gives incorrect result. Can someone please help ![image](https://user-images.githubusercontent.com/32213921/81795790-229b7580-952a-11ea-9508-1a83bbd80be5.png)
SheetJSDev commented 2020-05-13 16:16:21 +00:00 (Migrated from github.com)

Thanks for reporting! Can you share the file?

If I had to guess, the offending line is https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L816

		if(str.slice(0,4) == "sep=" && str.charCodeAt(5) == 10) { sep = str.charAt(4); str = str.slice(6); }

Currently it looks for a sequence of the form "sep=" followed by a single character followed by "\n". Normalizing the newlines happens afterwards, so if your file started with "sep=&\r\n" that sequence wouldn't be recognized

Thanks for reporting! Can you share the file? If I had to guess, the offending line is https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L816 ```js if(str.slice(0,4) == "sep=" && str.charCodeAt(5) == 10) { sep = str.charAt(4); str = str.slice(6); } ``` Currently it looks for a sequence of the form "sep=" followed by a single character followed by "\n". Normalizing the newlines happens afterwards, so if your file started with "sep=&\r\n" that sequence wouldn't be recognized
mighty98 commented 2020-05-13 17:13:40 +00:00 (Migrated from github.com)

Thanks for the quick response @SheetJSDev

I have attached plain text file. The same file was saved as .csv for my testing
FYI.. i tried by changing first line to below and still not working
delimiter.txt

&\n

Thanks for the quick response @SheetJSDev I have attached plain text file. The same file was saved as .csv for my testing FYI.. i tried by changing first line to below and still not working [delimiter.txt](https://github.com/SheetJS/sheetjs/files/4623520/delimiter.txt) &\n
SheetJSDev commented 2020-05-13 17:23:48 +00:00 (Migrated from github.com)

Yeah this is the problem I stated earlier. To verify, in nodejs:

> require("xlsx").readFile("delimiter.txt").Sheets.Sheet1
{
  A1: { t: 's', v: 'sep=&' },
  A2: { t: 's', v: 'string&comment' },
  A3: { t: 's', v: 'test_string&test_comment' },
  '!ref': 'A1:A3'
}

Transforming by proactively replacing the newlines:

> var file = fs.readFileSync("delimiter.txt").toString()
> var fixed = file.replace(/\r\n/gm, "\n")
> require("xlsx").read(fixed, {type: "string"}).Sheets.Sheet1
{
  A1: { t: 's', v: 'string' },
  B1: { t: 's', v: 'comment' },
  A2: { t: 's', v: 'test_string' },
  B2: { t: 's', v: 'test_comment' },
  '!ref': 'A1:B2'
}

If you'd like to send a fix, that line https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L816 should be replaced with something like:

		if(str.slice(0,4) == "sep=") {
			if(str.charCodeAt(5) == 13 && str.charCodeAt(6) == 10 ) { sep = str.charAt(4); str = str.slice(7); }
			else if(str.charCodeAt(5) == 13 || str.charCodeAt(5) == 10 ) { sep = str.charAt(4); str = str.slice(6); }
		}
Yeah this is the problem I stated earlier. To verify, in nodejs: ``` > require("xlsx").readFile("delimiter.txt").Sheets.Sheet1 { A1: { t: 's', v: 'sep=&' }, A2: { t: 's', v: 'string&comment' }, A3: { t: 's', v: 'test_string&test_comment' }, '!ref': 'A1:A3' } ``` Transforming by proactively replacing the newlines: ``` > var file = fs.readFileSync("delimiter.txt").toString() > var fixed = file.replace(/\r\n/gm, "\n") > require("xlsx").read(fixed, {type: "string"}).Sheets.Sheet1 { A1: { t: 's', v: 'string' }, B1: { t: 's', v: 'comment' }, A2: { t: 's', v: 'test_string' }, B2: { t: 's', v: 'test_comment' }, '!ref': 'A1:B2' } ``` If you'd like to send a fix, that line https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L816 should be replaced with something like: ```js if(str.slice(0,4) == "sep=") { if(str.charCodeAt(5) == 13 && str.charCodeAt(6) == 10 ) { sep = str.charAt(4); str = str.slice(7); } else if(str.charCodeAt(5) == 13 || str.charCodeAt(5) == 10 ) { sep = str.charAt(4); str = str.slice(6); } } ```
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1943
No description provided.