How do I read a CSV file? #489

Closed
opened 2016-10-29 16:28:13 +00:00 by vijayst · 20 comments
vijayst commented 2016-10-29 16:28:13 +00:00 (Migrated from github.com)

I want to read a CSV file and convert it to JSON. How do I do it with XSLX?

I want to read a CSV file and convert it to JSON. How do I do it with XSLX?
saidmoya12 commented 2016-11-08 21:42:49 +00:00 (Migrated from github.com)

+1

+1
azl397985856 commented 2016-12-09 03:33:20 +00:00 (Migrated from github.com)

+1

+1
pshbot commented 2016-12-14 10:55:17 +00:00 (Migrated from github.com)

This would be really handy. XLSX.utils.csv_to_json

This would be really handy. `XLSX.utils.csv_to_json`
azl397985856 commented 2016-12-15 09:10:27 +00:00 (Migrated from github.com)

now I am using csv.js (called comma-separated-values in npm). it works

now I am using csv.js (called comma-separated-values in npm). it works
vijayst commented 2016-12-15 14:02:55 +00:00 (Migrated from github.com)

I am using papaparse for now.

On Thu, Dec 15, 2016 at 2:40 PM, arida notifications@github.com wrote:

now I am using csv.js (called comma-separated-values in npm). it works


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/489#issuecomment-267275503,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADTCHdSzxsRn8MltqvZKyGIuh3_wL-9Eks5rIQQHgaJpZM4KkJ4C
.

I am using papaparse for now. On Thu, Dec 15, 2016 at 2:40 PM, arida <notifications@github.com> wrote: > now I am using csv.js (called comma-separated-values in npm). it works > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/SheetJS/js-xlsx/issues/489#issuecomment-267275503>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADTCHdSzxsRn8MltqvZKyGIuh3_wL-9Eks5rIQQHgaJpZM4KkJ4C> > . >
sheam commented 2017-01-29 21:09:51 +00:00 (Migrated from github.com)

+1, annoying to have to use two libs.

+1, annoying to have to use two libs.
reviewher commented 2017-01-31 05:15:26 +00:00 (Migrated from github.com)

CSV is a funny topic. RFC 4180 covers CSV but disagrees with Excel in a few important ways. Given that Excel predates the RFC by nearly 2 decades, I'm surprised the authors didn't make a more concerted effort to agree with Excel. For example, consider the following file:

sep=|
=1+1|2|3
=A1+B1+C1|4|5
=A2+B2+C2|6|7
s,h|e,e|t,js
="1"|="2"|="3"

Most JS CSV parsers follow the spec, so you will get strange results. This example highlights a few issues:

  • The sequence sep=<char> as the first line lets you specify the field separator. Excel does not include this line in the content, whereas an RFC compliant parser would treat that as a one-field row in the file
  • Excel permits formulae in the CSV stream. The =1+1 is calculated and the result is displayed. Excel further allows for formulae to reference other cells.

There are other nonobvious issues like codepage encoding (the mac default is codepage 10000 whereas windows default is 1252) and BOM handling.

I agree that it would be nice to also have support for CSV in this library, but the first step would be to understand exactly how Excel handles CSV files. Stay tuned.

CSV is a funny topic. [RFC 4180](https://tools.ietf.org/html/rfc4180) covers CSV but disagrees with Excel in a few important ways. Given that Excel predates the RFC by nearly 2 decades, I'm surprised the authors didn't make a more concerted effort to agree with Excel. For example, consider the following file: ``` sep=| =1+1|2|3 =A1+B1+C1|4|5 =A2+B2+C2|6|7 s,h|e,e|t,js ="1"|="2"|="3" ``` Most JS CSV parsers follow the spec, so you will get strange results. This example highlights a few issues: - The sequence `sep=<char>` as the first line lets you specify the field separator. Excel does not include this line in the content, whereas an RFC compliant parser would treat that as a one-field row in the file - Excel permits formulae in the CSV stream. The `=1+1` is calculated and the result is displayed. Excel further allows for formulae to reference other cells. There are other nonobvious issues like codepage encoding (the mac default is codepage 10000 whereas windows default is 1252) and BOM handling. I agree that it would be nice to also have support for CSV in this library, but the first step would be to understand exactly how Excel handles CSV files. Stay tuned.
SheetJSDev commented 2017-04-03 06:13:10 +00:00 (Migrated from github.com)

We added a first cut in version 0.9.9 and will slowly improve it in future versions

We added a first cut in version 0.9.9 and will slowly improve it in future versions
jnystrom commented 2017-12-04 14:47:17 +00:00 (Migrated from github.com)

I see that it looks like I can parse a CSV, but is there a way that I can tell XLSX the delimiter? The file I am using is ; delimited instead of , .

I see that it looks like I can parse a CSV, but is there a way that I can tell XLSX the delimiter? The file I am using is `;` delimited instead of `,` .
reviewher commented 2017-12-04 14:57:45 +00:00 (Migrated from github.com)

@jnystrom The same as you would in Excel: prepend the data with sep=;\n. In absence of the header, the parser counts the number of exposed semicolons, commas, and tab characters in the beginning of the file to guess the likely delimiter.

@jnystrom The same as you would in Excel: prepend the data with `sep=;\n`. In absence of the header, the parser counts the number of exposed semicolons, commas, and tab characters in the beginning of the file to guess the likely delimiter.
jnystrom commented 2017-12-04 15:02:00 +00:00 (Migrated from github.com)

I dont have control over the data. This is a file that just a CSV, but separated by semi colons instead of commas. I tried running the file as is, and used sheet_to_json, and it is putting the first row (column names) as the name of the json property, and the entire row of data (with semi colons) as the value to that property. I tried changing to delimit with commas and it seems to work correctly. I have no control of the file, any ideas?

I dont have control over the data. This is a file that just a CSV, but separated by semi colons instead of commas. I tried running the file as is, and used `sheet_to_json`, and it is putting the first row (column names) as the name of the json property, and the entire row of data (with semi colons) as the value to that property. I tried changing to delimit with commas and it seems to work correctly. I have no control of the file, any ideas?
reviewher commented 2017-12-04 15:06:55 +00:00 (Migrated from github.com)
demo fiddle: https://jsfiddle.net/vd0z8g2u/
jnystrom commented 2017-12-04 15:23:11 +00:00 (Migrated from github.com)

Thanks, that worked. I am using this in Node, and using buffers. My columns do have " and so do the values. Again, this all works if I change it all to be , instead of ; delimited.

Thanks, that worked. I am using this in Node, and using buffers. My columns do have `"` and so do the values. Again, this all works if I change it all to be `,` instead of `;` delimited.
reviewher commented 2017-12-04 15:26:51 +00:00 (Migrated from github.com)

Hmm https://runkit.com/embed/3efd6o91zh2j same works in node. Can you share the offending file?

Hmm https://runkit.com/embed/3efd6o91zh2j same works in node. Can you share the offending file?
jnystrom commented 2017-12-04 15:47:44 +00:00 (Migrated from github.com)

I cannot share the file, but i will try to create a similar one. I did see yours worked. One difference with your code. I am initially unzipping a file, and that file is being processed by streams. I used the sample in this repo to convert stream into buffers:

const buffers = [];
stream.on("data", (data) => {
                buffers.push(data);
            });
            stream.on("end", async () => {
                console.log("got data at end", buffers.length);
                const buffer = Buffer.concat(buffers);
                
                const workbook = XLSX.read(buffer, { type: "buffer" });
                const sheet = workbook.Sheets[workbook.SheetNames[0]];
                const parsedData = XLSX.utils.sheet_to_json(sheet, {raw: true});
                console.log("parsed Data:", parsedData);
                console.log(`Time it took to parse: ${(Date.now() - t0)}`);
            });
I cannot share the file, but i will try to create a similar one. I did see yours worked. One difference with your code. I am initially unzipping a file, and that file is being processed by streams. I used the sample in this repo to convert stream into buffers: ``` const buffers = []; stream.on("data", (data) => { buffers.push(data); }); stream.on("end", async () => { console.log("got data at end", buffers.length); const buffer = Buffer.concat(buffers); const workbook = XLSX.read(buffer, { type: "buffer" }); const sheet = workbook.Sheets[workbook.SheetNames[0]]; const parsedData = XLSX.utils.sheet_to_json(sheet, {raw: true}); console.log("parsed Data:", parsedData); console.log(`Time it took to parse: ${(Date.now() - t0)}`); }); ```
jnystrom commented 2017-12-04 16:11:29 +00:00 (Migrated from github.com)

@reviewher here is an example file:
https://1drv.ms/u/s!AgFOdN8fF2TMhqYscT1etb1kshMYpg

@reviewher here is an example file: https://1drv.ms/u/s!AgFOdN8fF2TMhqYscT1etb1kshMYpg
reviewher commented 2017-12-04 16:21:15 +00:00 (Migrated from github.com)

Hmm checking against http://oss.sheetjs.com/js-xlsx/ (use the input form to submit the file or drag and drop) it seems to generate the correct output:

Using the xlsx command that ships with a global install of the library, the generated XLSX file looks correct:

$ npm install -g xlsx
$ xlsx -X forPublic.csv

forPublic.csv.xlsx

Hmm checking against http://oss.sheetjs.com/js-xlsx/ (use the input form to submit the file or drag and drop) it seems to generate the correct output: <img width="288" alt="" src="https://user-images.githubusercontent.com/24845478/33563016-c3a9c3a0-d8e4-11e7-9018-e9a63879046a.png"> Using the `xlsx` command that ships with a global install of the library, the generated XLSX file looks correct: ``` $ npm install -g xlsx $ xlsx -X forPublic.csv ``` [forPublic.csv.xlsx](https://github.com/SheetJS/js-xlsx/files/1527878/forPublic.csv.xlsx) <img width="371" alt="" src="https://user-images.githubusercontent.com/24845478/33563170-35d526d6-d8e5-11e7-97a8-9ac734e5618b.png">
jnystrom commented 2017-12-04 16:27:03 +00:00 (Migrated from github.com)

I am very confused now. I am getting this, when I run it from my program:

pasted image at 2017_12_04 11_26 am

But when I change everything to , it works as expected.

I am very confused now. I am getting this, when I run it from my program: ![pasted image at 2017_12_04 11_26 am](https://user-images.githubusercontent.com/1293142/33563453-02195ae6-d8e6-11e7-92e8-597d4a794ce2.png) But when I change everything to `,` it works as expected.
reviewher commented 2017-12-04 16:29:14 +00:00 (Migrated from github.com)

You are probably running an older version -- the semicolon detection was contributed by @duzun and landed in 0.11.6. Can you update and check?

You are probably running an older version -- the semicolon detection was contributed by @duzun and landed in 0.11.6. Can you update and check?
jnystrom commented 2017-12-04 17:30:06 +00:00 (Migrated from github.com)

DAMN...you are right. That did seem to fix the issue with that file. THANK YOU!

DAMN...you are right. That did seem to fix the issue with that file. THANK YOU!
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#489
No description provided.