importing CSV doesn't handle string delimiters
#646
Closed
opened 6 years ago by jabbermarky
·
3 comments
No Branch/Tag Specified
master
sankhavaramsaitulasiram/feat-fix-2752
maybeswapnil/issue2737
gh-pages
scottysseus/2560_2
ivan-trusov/fix-basedate
nandanv2702/issue_1300
protobi/master
ThomasChan/master
grantfayvor/master
tom-groves/bug-1105/rounding-error
mgreter/master
v0.87
v0.9.9
v0.9.8
v0.9.7
v0.9.6
v0.9.5
v0.9.4
v0.9.3
v0.9.2
v0.9.13
v0.9.12
v0.9.11
v0.9.10
v0.9.1
v0.9.0
v0.8.8
v0.8.7
v0.8.6
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.9
v0.7.7
v0.7.6-i
v0.7.6-h
v0.7.6-a
v0.7.6
v0.7.5
v0.7.4
v0.7.3
v0.7.2
v0.7.11
v0.7.10
v0.7.1
v0.5.9
v0.5.8
v0.5.7
v0.5.10
v0.5.0
v0.4.3
v0.18.6
v0.18.5
v0.18.4+deno
v0.18.4
v0.18.3
v0.18.2
v0.18.1
v0.18.0+deno
v0.17.5
v0.17.0
v0.16.8
v0.16.7
v0.16.6
v0.16.5
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.15.6
v0.15.5
v0.15.2
v0.14.0
v0.13.5
v0.13.4
v0.13.3
v0.13.1
v0.13.0
v0.12.9
v0.12.8
v0.12.7
v0.12.6
v0.12.5
v0.12.4
v0.12.3
v0.12.2
v0.12.13
v0.12.12
v0.12.11
v0.12.10
v0.12.1
v0.12.0
v0.11.9
v0.11.8
v0.11.7
v0.11.6
v0.11.5
v0.11.4
v0.11.3
v0.11.2
v0.11.19
v0.11.18
v0.11.17
v0.11.16
v0.11.15
v0.11.14
v0.11.13
v0.11.12
v0.11.11
v0.11.10
v0.11.1
v0.11.0
v0.10.9
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
Milestone
Set milestone
Clear milestone
No items
No Milestone
Projects
Set Project
Clear projects
No project
Assignees
Assign users
Clear assignees
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#646
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
No
Yes
I am trying to import CSV files.
I am using commas to separate fields.
when fields have embedded commas, I either quote or double quote the string
when strings are double-quoted, the library strips the trailing double-quote, but does not strip the leading double-quote.
example in:
"hello, this is a test"
yields:
"hello, this is a test
I expected it to drop both double-quotes.
also when importing a CSV, how do I specify other field separators and string delimiters?
for instance, single quotes are not recognized at all.
how would I use pipe '|' as the field separator?
If I understood correctly, you tried a file like quotes.txt
I checked against the web demo at http://oss.sheetjs.com/js-xlsx/ (drag and drop ) and found the correct output:
The node module appears to do the right thing:
And to confirm that the object has the correct value:
Can you share the exact file you tested with?
I think it might have something to do with white space before/after the field separators.
for example, I always put a space after a comma. that shows up in the JSON.
so my hypothesis is that for this input ’Some Value, “hello, this is a test”'
2 columns are extracted
first column value is ‘Some Value’
and second column value is ‘ “hello, this is a test’
Mark Lummus
http://www.marklummus.com
On May 1, 2017, at 3:14 PM, SheetJSDev notifications@github.com wrote:
If I understood correctly, you tried a file like quotes.txt https://github.com/SheetJS/js-xlsx/files/968561/quotes.txt
I checked against the web demo at http://oss.sheetjs.com/js-xlsx/ http://oss.sheetjs.com/js-xlsx/ (drag and drop ) and found the correct output:
https://cloud.githubusercontent.com/assets/6070939/25590809/eb6a8a7e-2e7f-11e7-9cf3-6168d7aed0ec.png
The node module appears to do the right thing:
$ xlsx quotes.txt
Sheet1
"hello, this is a test"
And to confirm that the object has the correct value:
$ node -pe 'require("xlsx").readFile("quotes.txt").Sheets.Sheet1.A1'
{ t: 's', v: 'hello, this is a test' }
Can you share the exact file you tested with?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub https://github.com/SheetJS/js-xlsx/issues/646#issuecomment-298406573, or mute the thread https://github.com/notifications/unsubscribe-auth/AHccd6jAcdzonyk6fI3mGVnmDSmVcAckks5r1i8XgaJpZM4NNSFT.
@jabbermarky Thanks for reporting! You actually revealed two bugs.
If you don't put the whitespace Excel treats it as two cells.
The fix here is to treat any double quote as literal if the cell doesn't start with a double quote
sep=|
as the first line" (try it in Excel!) but there's a missingelse
in the code.So I believe the following diff addresses both issues as well as a related issue when the CSV does not end in a newline:
We'll add some more corner cases for the CSV