importing CSV doesn't handle string delimiters #646
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#646
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I am trying to import CSV files.
I am using commas to separate fields.
when fields have embedded commas, I either quote or double quote the string
when strings are double-quoted, the library strips the trailing double-quote, but does not strip the leading double-quote.
example in:
"hello, this is a test"
yields:
"hello, this is a test
I expected it to drop both double-quotes.
also when importing a CSV, how do I specify other field separators and string delimiters?
for instance, single quotes are not recognized at all.
how would I use pipe '|' as the field separator?
If I understood correctly, you tried a file like quotes.txt
I checked against the web demo at http://oss.sheetjs.com/js-xlsx/ (drag and drop ) and found the correct output:
The node module appears to do the right thing:
And to confirm that the object has the correct value:
Can you share the exact file you tested with?
I think it might have something to do with white space before/after the field separators.
for example, I always put a space after a comma. that shows up in the JSON.
so my hypothesis is that for this input ’Some Value, “hello, this is a test”'
2 columns are extracted
first column value is ‘Some Value’
and second column value is ‘ “hello, this is a test’
Mark Lummus
http://www.marklummus.com
On May 1, 2017, at 3:14 PM, SheetJSDev notifications@github.com wrote:
If I understood correctly, you tried a file like quotes.txt https://github.com/SheetJS/js-xlsx/files/968561/quotes.txt
I checked against the web demo at http://oss.sheetjs.com/js-xlsx/ http://oss.sheetjs.com/js-xlsx/ (drag and drop ) and found the correct output:
https://cloud.githubusercontent.com/assets/6070939/25590809/eb6a8a7e-2e7f-11e7-9cf3-6168d7aed0ec.png
The node module appears to do the right thing:
$ xlsx quotes.txt
Sheet1
"hello, this is a test"
And to confirm that the object has the correct value:
$ node -pe 'require("xlsx").readFile("quotes.txt").Sheets.Sheet1.A1'
{ t: 's', v: 'hello, this is a test' }
Can you share the exact file you tested with?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub https://github.com/SheetJS/js-xlsx/issues/646#issuecomment-298406573, or mute the thread https://github.com/notifications/unsubscribe-auth/AHccd6jAcdzonyk6fI3mGVnmDSmVcAckks5r1i8XgaJpZM4NNSFT.
@jabbermarky Thanks for reporting! You actually revealed two bugs.
If you don't put the whitespace Excel treats it as two cells.
The fix here is to treat any double quote as literal if the cell doesn't start with a double quote
sep=|
as the first line" (try it in Excel!) but there's a missingelse
in the code.So I believe the following diff addresses both issues as well as a related issue when the CSV does not end in a newline:
We'll add some more corner cases for the CSV