Examples and documentation improvements #64

Closed
opened 2014-05-21 02:18:51 +00:00 by nukulb · 24 comments
nukulb commented 2014-05-21 02:18:51 +00:00 (Migrated from github.com)

I am looking for an example on writing a JSON into xlxs file.

The read works great but I unable to find a working example. Can someone help me with a link or simply point me in the right direction.

I am looking for an example on writing a JSON into xlxs file. The read works great but I unable to find a working example. Can someone help me with a link or simply point me in the right direction.
SheetJSDev commented 2014-05-21 03:16:39 +00:00 (Migrated from github.com)

@nukulb the write process takes a JSON object as described in the Cell Object Description section, which isn't particularly convenient ATM. I am working on a way to take a CSV or row object array and convert to the internal format.

As a simple example using the intermediate format, suppose your data was in a 2d grid:

[1,2,3]
[true, false, <nothing>, "sheetjs"]
["foo","bar","0.3"]
["baz", <nothing>, "qux"]]

http://i.imgur.com/wfijIov.png

Suppose you wanted to write that to a sheet called "SheetJS". Then the code would be:

/* original data */
var data = [[1,2,3],[true, false, null, "sheetjs"],["foo","bar","0.3"], ["baz", null, "qux"]]
var ws_name = "SheetJS";

/* require XLSX */
var XLSX = require('xlsx')

/* set up workbook objects -- some of these will not be required in the future */
var wb = {}
wb.Sheets = {};
wb.Props = {};
wb.SSF = {};
wb.SheetNames = [];

/* create worksheet: */
var ws = {}

/* the range object is used to keep track of the range of the sheet */
var range = {s: {c:0, r:0}, e: {c:0, r:0 }};

/* Iterate through each element in the structure */
for(var R = 0; R != data.length; ++R) {
  if(range.e.r < R) range.e.r = R;
  for(var C = 0; C != data[R].length; ++C) {
    if(range.e.c < C) range.e.c = C;

    /* create cell object: .v is the actual data */
    var cell = { v: data[R][C] };
    if(cell.v == null) continue;

    /* create the correct cell reference */
    var cell_ref = XLSX.utils.encode_cell({c:C,r:R});

    /* determine the cell type */
    if(typeof cell.v === 'number') cell.t = 'n';
    else if(typeof cell.v === 'boolean') cell.t = 'b';
    else cell.t = 's';

    /* add to structure */
    ws[cell_ref] = cell;
  }
}
ws['!ref'] = XLSX.utils.encode_range(range);

/* add worksheet to workbook */
wb.SheetNames.push(ws_name);
wb.Sheets[ws_name] = ws;

/* write file */
XLSX.writeFile(wb, 'test.xlsx');

In the future, there will be utility functions to handle most of the logistics for you.

@nukulb the write process takes a JSON object as described in the [Cell Object Description](https://github.com/SheetJS/js-xlsx#cell-object-description) section, which isn't particularly convenient ATM. I am working on a way to take a CSV or row object array and convert to the internal format. As a simple example using the intermediate format, suppose your data was in a 2d grid: ``` [1,2,3] [true, false, <nothing>, "sheetjs"] ["foo","bar","0.3"] ["baz", <nothing>, "qux"]] ``` http://i.imgur.com/wfijIov.png Suppose you wanted to write that to a sheet called "SheetJS". Then the code would be: ``` /* original data */ var data = [[1,2,3],[true, false, null, "sheetjs"],["foo","bar","0.3"], ["baz", null, "qux"]] var ws_name = "SheetJS"; /* require XLSX */ var XLSX = require('xlsx') /* set up workbook objects -- some of these will not be required in the future */ var wb = {} wb.Sheets = {}; wb.Props = {}; wb.SSF = {}; wb.SheetNames = []; /* create worksheet: */ var ws = {} /* the range object is used to keep track of the range of the sheet */ var range = {s: {c:0, r:0}, e: {c:0, r:0 }}; /* Iterate through each element in the structure */ for(var R = 0; R != data.length; ++R) { if(range.e.r < R) range.e.r = R; for(var C = 0; C != data[R].length; ++C) { if(range.e.c < C) range.e.c = C; /* create cell object: .v is the actual data */ var cell = { v: data[R][C] }; if(cell.v == null) continue; /* create the correct cell reference */ var cell_ref = XLSX.utils.encode_cell({c:C,r:R}); /* determine the cell type */ if(typeof cell.v === 'number') cell.t = 'n'; else if(typeof cell.v === 'boolean') cell.t = 'b'; else cell.t = 's'; /* add to structure */ ws[cell_ref] = cell; } } ws['!ref'] = XLSX.utils.encode_range(range); /* add worksheet to workbook */ wb.SheetNames.push(ws_name); wb.Sheets[ws_name] = ws; /* write file */ XLSX.writeFile(wb, 'test.xlsx'); ``` In the future, there will be utility functions to handle most of the logistics for you.
SheetJSDev commented 2014-05-21 03:38:30 +00:00 (Migrated from github.com)

Preemptively closed this -- will deem complete once the documentation is up to snuff

Preemptively closed this -- will deem complete once the documentation is up to snuff
nukulb commented 2014-05-21 10:12:17 +00:00 (Migrated from github.com)

@SheetJSDev thanks a lot, I will try it out today but looks straightforward.

@SheetJSDev thanks a lot, I will try it out today but looks straightforward.
SheetJSDev commented 2014-05-28 22:23:59 +00:00 (Migrated from github.com)

@nukulb how is it going?

@nukulb how is it going?
nukulb commented 2014-05-28 22:25:28 +00:00 (Migrated from github.com)

I am starting this work tomorrow, I got side tracked with other stuff.

Thanks for checking in.

On Wed, May 28, 2014 at 6:24 PM, SheetJSDev notifications@github.comwrote:

@nukulb https://github.com/nukulb how is it going?


Reply to this email directly or view it on GitHubhttps://github.com/SheetJS/js-xlsx/issues/64#issuecomment-44472600
.

Nukul Bhasin

I am starting this work tomorrow, I got side tracked with other stuff. Thanks for checking in. On Wed, May 28, 2014 at 6:24 PM, SheetJSDev notifications@github.comwrote: > @nukulb https://github.com/nukulb how is it going? > > — > Reply to this email directly or view it on GitHubhttps://github.com/SheetJS/js-xlsx/issues/64#issuecomment-44472600 > . ## _Nukul Bhasin_
nukulb commented 2014-05-29 18:08:49 +00:00 (Migrated from github.com)

that seems to be failing on the last line when I got to write the file.

TypeError: Object #<Object> has no method 'writeFile'
    at exports.exportProducts (/Users/nukulb/src/hubba-node/lib/controllers/exportProducts.js:65:10)
    at callbacks (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:164:37)
    at auth (/Users/nukulb/src/hubba-node/lib/middleware.js:16:20)
    at callbacks (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:164:37)
    at param (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:138:11)
    at pass (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:145:5)
    at Router._dispatch (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:173:5)
    at Object.router (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:33:10)
    at next (/Users/nukulb/src/hubba-node/node_modules/express/node_modules/connect/lib/proto.js:193:15)
    at Object.handle (/Users/nukulb/src/hubba-node/lib/config/express.js:101:13)

Output XLSX

{ read: [Function: readSync],
  readFile: [Function: readFileSync],
  utils:
   { encode_col: [Function: encode_col],
     encode_row: [Function: encode_row],
     encode_cell: [Function: encode_cell],
     encode_range: [Function: encode_range],
     decode_col: [Function: decode_col],
     decode_row: [Function: decode_row],
     split_cell: [Function: split_cell],
     decode_cell: [Function: decode_cell],
     decode_range: [Function: decode_range],
     sheet_to_csv: [Function: sheet_to_csv],
     make_csv: [Function: sheet_to_csv],
     get_formulae: [Function: get_formulae],
     sheet_to_row_object_array: [Function: sheet_to_row_object_array] },
  version: '0.5.17' }

Do I need a different version for the 'writeFile' method?

that seems to be failing on the last line when I got to write the file. ``` TypeError: Object #<Object> has no method 'writeFile' at exports.exportProducts (/Users/nukulb/src/hubba-node/lib/controllers/exportProducts.js:65:10) at callbacks (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:164:37) at auth (/Users/nukulb/src/hubba-node/lib/middleware.js:16:20) at callbacks (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:164:37) at param (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:138:11) at pass (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:145:5) at Router._dispatch (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:173:5) at Object.router (/Users/nukulb/src/hubba-node/node_modules/express/lib/router/index.js:33:10) at next (/Users/nukulb/src/hubba-node/node_modules/express/node_modules/connect/lib/proto.js:193:15) at Object.handle (/Users/nukulb/src/hubba-node/lib/config/express.js:101:13) ``` Output XLSX ``` { read: [Function: readSync], readFile: [Function: readFileSync], utils: { encode_col: [Function: encode_col], encode_row: [Function: encode_row], encode_cell: [Function: encode_cell], encode_range: [Function: encode_range], decode_col: [Function: decode_col], decode_row: [Function: decode_row], split_cell: [Function: split_cell], decode_cell: [Function: decode_cell], decode_range: [Function: decode_range], sheet_to_csv: [Function: sheet_to_csv], make_csv: [Function: sheet_to_csv], get_formulae: [Function: get_formulae], sheet_to_row_object_array: [Function: sheet_to_row_object_array] }, version: '0.5.17' } ``` Do I need a different version for the 'writeFile' method?
nukulb commented 2014-05-29 18:17:55 +00:00 (Migrated from github.com)

seems like npm has a very old version of this.

I ran npm pack myself to get 0.7.3 version and this example works.

@SheetJSDev any chance you publish the latest to npm?

seems like npm has a very old version of this. I ran npm pack myself to get 0.7.3 version and this example works. @SheetJSDev any chance you publish the latest to npm?
SheetJSDev commented 2014-05-29 19:19:09 +00:00 (Migrated from github.com)

@nukulb https://www.npmjs.org/package/xlsx shows that the latest version is 0.7.3 and has been there for the last 4 days. What version of node/npm are you using? Also, can you show the line for xlsx in your package.json?

@nukulb https://www.npmjs.org/package/xlsx shows that the latest version is 0.7.3 and has been there for the last 4 days. What version of node/npm are you using? Also, can you show the line for xlsx in your package.json?
nukulb commented 2014-05-29 19:25:02 +00:00 (Migrated from github.com)

you are correct, npm has the right version problem seems to be on my end. I will figure it out on my end.

Your code seems to be work fine. I will start doing some more complicated stuff tomorrow and I might ask more questions here. I really appreciate your help. Thanks.

you are correct, npm has the right version problem seems to be on my end. I will figure it out on my end. Your code seems to be work fine. I will start doing some more complicated stuff tomorrow and I might ask more questions here. I really appreciate your help. Thanks.
SheetJSDev commented 2014-06-06 16:01:35 +00:00 (Migrated from github.com)

The README has a few browser and nodejs examples and the test suite also includes ad-hoc and round trip tests. Closing for now. If you encounter an issue with the examples please raise a new issue.

The README has a few browser and nodejs examples and the test suite also includes ad-hoc and round trip tests. Closing for now. If you encounter an issue with the examples please raise a new issue.
nukulb commented 2014-06-06 16:09:17 +00:00 (Migrated from github.com)

So far I have only used to write to a new file. And it works great for this use case. Thanks for all the support.

The next thing I am looking at is opening an existing file and then write to that file. Is there an example of this available?

So far I have only used to write to a new file. And it works great for this use case. Thanks for all the support. The next thing I am looking at is opening an existing file and then write to that file. Is there an example of this available?
SheetJSDev commented 2014-06-06 16:19:33 +00:00 (Migrated from github.com)
var wb = XLSX.readFile(file)
// make changes
XLSX.writeFile(wb, 'test.xlsx')

Since they use the same internal format, you can also use https://github.com/SheetJS/js-xls :

var wb = XLS.readFile(file)
// make changes
XLSX.writeFile(wb, 'test.xlsx')    

The test suite has a round trip section just for this: https://github.com/SheetJS/js-xlsx/blob/master/test.js#L630

``` var wb = XLSX.readFile(file) // make changes XLSX.writeFile(wb, 'test.xlsx') ``` Since they use the same internal format, you can also use https://github.com/SheetJS/js-xls : ``` var wb = XLS.readFile(file) // make changes XLSX.writeFile(wb, 'test.xlsx') ``` The test suite has a round trip section just for this: https://github.com/SheetJS/js-xlsx/blob/master/test.js#L630
SheetJSDev commented 2014-06-06 16:26:50 +00:00 (Migrated from github.com)

Reopening since there is no complete example of editing an existing file in the documentation.

Reopening since there is no complete example of editing an existing file in the documentation.
elad commented 2014-06-09 13:02:51 +00:00 (Migrated from github.com)

Hello,

Simplifying the read/write interfaces is something I'd like to see as well. I feel the current forms of traversal are too cumbersome.

Any thoughts on what cleaner interfaces should look like?

Are there "common structures" to Excel files the library should support? for example, simple tables?

Hello, Simplifying the read/write interfaces is something I'd like to see as well. I feel the current forms of traversal are too cumbersome. Any thoughts on what cleaner interfaces should look like? Are there "common structures" to Excel files the library should support? for example, simple tables?
SheetJSDev commented 2014-06-09 15:24:55 +00:00 (Migrated from github.com)

@eladxxx I agree that there should be a layer between the raw representation and applications. The interface is a tricky problem for a few reasons:

  1. you've read a file with three cells. A1=1, A2=2, A3=A1+A2. If you make a change to A1 (let's say A1=2 now) do you update the value of A3? You could also just mark the value as dirty.

  2. you delete a cell. How do you update the other cells to be consistent? It's possible (via cross-sheet references) that you would have to look at every sheet. Also, do you shift left or up?

There should be a function interface so that we could easily change the model later (that probably should have been built at the beginning :/ c'est la vie), including:

  • worksheet.get_cell(address) -> single cell object
  • worksheet.get_range(range) -> range object
  • worksheet.set_cell(address, value)
  • worksheet.set_formula(range, formula)
  • worksheet.insert_row(...) and insert_col
  • delete_XXX (cell, range, row, col)
  • set_format(...)

... and a separate function for each fundamental operation.

I'm inclined to keep this in a separate library and make the parsers expect a Workbook object -- there would be a dumb workbook implementation in this and in js-xls and they would use something more intelligent if you included the other script in the browser.

NB: To support opening and editing files, we would probably need a real engine that can parse excel formulae (and, if we are ambitious, evaluate). Since that formula engine is also needed for shared formulae (see https://github.com/SheetJS/js-xlsx/pull/67) that might be done sooner rather than later.

@eladxxx I agree that there should be a layer between the raw representation and applications. The interface is a tricky problem for a few reasons: 1) you've read a file with three cells. A1=1, A2=2, A3=A1+A2. If you make a change to A1 (let's say A1=2 now) do you update the value of A3? You could also just mark the value as dirty. 2) you delete a cell. How do you update the other cells to be consistent? It's possible (via cross-sheet references) that you would have to look at every sheet. Also, do you shift left or up? There should be a function interface so that we could easily change the model later (that probably should have been built at the beginning :/ c'est la vie), including: - worksheet.get_cell(address) -> single cell object - worksheet.get_range(range) -> range object - worksheet.set_cell(address, value) - worksheet.set_formula(range, formula) - worksheet.insert_row(...) and insert_col - delete_XXX (cell, range, row, col) - set_format(...) ... and a separate function for each fundamental operation. I'm inclined to keep this in a separate library and make the parsers expect a Workbook object -- there would be a dumb workbook implementation in this and in js-xls and they would use something more intelligent if you included the other script in the browser. NB: To support opening and editing files, we would probably need a real engine that can parse excel formulae (and, if we are ambitious, evaluate). Since that formula engine is also needed for shared formulae (see https://github.com/SheetJS/js-xlsx/pull/67) that might be done sooner rather than later.
elad commented 2014-06-09 15:38:32 +00:00 (Migrated from github.com)

You raise a lot of good points. However, what I meant by simple - and it could very well be that I'm looking at this too subjectively - is that we agree on some "common" input format(s) and work with that.

For example, I deal with a lot of files that are simple tables. Some of them have the first row as headers, some don't. Think CSV data imported into Excel. It would be nice if the library had a built-in function to read this type of data so that I could do:

// This is a function I write that receives an array of row data and returns whatever I want.
function parse_row(arr) {
   ...
}

// After this call, rows is an array of whatever I return from parse_row
var rows = xlsx.read_rows(wb, parse_row);

The idea of course isn't to provide read_rows as suggested above but to give something super simple to use in just a couple of lines of code to achieve common goals.

Of course if we can't agree on common goals and/or "common" means formulae-heavy documents and such, then it might be that more thought should be given to how this simplification happens.

That said, the interface functions you suggest seem useful, though I'm not sure they help in iterating over a document in order to parse it for some internal representation.

You raise a lot of good points. However, what I meant by simple - and it could very well be that I'm looking at this _too_ subjectively - is that we agree on some "common" input format(s) and work with that. For example, I deal with a lot of files that are simple tables. Some of them have the first row as headers, some don't. Think CSV data imported into Excel. It would be nice if the library had a built-in function to read this type of data so that I could do: ``` // This is a function I write that receives an array of row data and returns whatever I want. function parse_row(arr) { ... } // After this call, rows is an array of whatever I return from parse_row var rows = xlsx.read_rows(wb, parse_row); ``` The idea of course isn't to provide `read_rows` as suggested above but to give something super simple to use in just a couple of lines of code to achieve common goals. Of course if we can't agree on common goals and/or "common" means formulae-heavy documents and such, then it might be that more thought should be given to how this simplification happens. That said, the interface functions you suggest seem useful, though I'm not sure they help in iterating over a document in order to parse it for some internal representation.
SheetJSDev commented 2014-06-09 15:58:43 +00:00 (Migrated from github.com)

@eladxxx As part of this issue from js-xls the JSON output function was overhauled. If you are specifically interested in the data, this should work:

function read_rows(wb, cb) {
  XLSX.utils.sheet_to_json(wb, {header:1}).forEach(cb);
}
var rows = xlsx.read_rows(wb, parse_row);

If need be, sheet_to_json can be modified to optionally return the full object as well as row-level metadata.

@eladxxx As part of [this issue from js-xls](https://github.com/SheetJS/js-xls/issues/42#issuecomment-44340666) the JSON output function was overhauled. If you are specifically interested in the data, this should work: ``` function read_rows(wb, cb) { XLSX.utils.sheet_to_json(wb, {header:1}).forEach(cb); } var rows = xlsx.read_rows(wb, parse_row); ``` If need be, `sheet_to_json` can be modified to optionally return the full object as well as row-level metadata.
elad commented 2014-06-10 01:54:42 +00:00 (Migrated from github.com)

Is there any documentation for sheet_to_json? Specifically, a way to tell it "there are no headers in the data"? passing no options, or explicitly passing { header: 0 } (or -1) doesn't do the trick...

Is there any documentation for `sheet_to_json`? Specifically, a way to tell it "there are no headers in the data"? passing no options, or explicitly passing `{ header: 0 }` (or -1) doesn't do the trick...
SheetJSDev commented 2014-06-10 02:01:14 +00:00 (Migrated from github.com)

@elad this is how i described the header option in the other thread:

  • opts.header = 1 generates an array of arrays (data starts in first row)
  • opts.header = 'A' generates a structure using the column headers as keys (data starts in first row)
  • opts.header = ['header1', 'header2', ...] will use the specified labels (data starts in first row)

(default behavior remains: use the first row as headers)

this clearly needs to be documented properly

@elad this is how i described the header option in the other thread: - opts.header = `1` generates an array of arrays (data starts in first row) - opts.header = `'A'` generates a structure using the column headers as keys (data starts in first row) - opts.header = ['header1', 'header2', ...] will use the specified labels (data starts in first row) (default behavior remains: use the first row as headers) this clearly needs to be documented properly
hiwanz commented 2014-09-25 07:29:46 +00:00 (Migrated from github.com)

Is there an API to initiate a workbook now?

Is there an API to initiate a workbook now?
comerc commented 2016-06-20 09:13:24 +00:00 (Migrated from github.com)
https://github.com/SheetJS/js-xlsx/issues/365
blakek commented 2016-12-07 15:58:24 +00:00 (Migrated from github.com)

It would be great if the documentation listed accepted "type" values for the read() function (and what those options mean). For example: buffer, base64, file, binary, etc.

Are pull requests being looked at? If so, I'd be up for helping out on this.

It would be great if the documentation listed accepted "type" values for the read() function (and what those options mean). For example: buffer, base64, file, binary, etc. Are pull requests being looked at? If so, I'd be up for helping out on this.
SheetJSDev commented 2017-03-23 06:23:47 +00:00 (Migrated from github.com)

We're slowly improving the documentation. The underlying structure is stabilizing, and for most supported features all of the parsers should generate the same fields consistent with the README.

At the moment we're trying to go feature by feature and hammer out documentation while we ensure the main supported parsers (XLS/XLSX/XLSB/SpreadsheetML) produce similar parsed objects. If there is anything unclear in the README please let us know where we can improve.

@blakek if you are interested in contributing documentation, we'd gladly accept a PR! Documentation is the one area that doesn't involve hours of testing :D If you have any demos or projects, please add a link to the wiki page so others can find it!

We're slowly improving the documentation. The underlying structure is stabilizing, and for most supported features all of the parsers should generate the same fields consistent with the README. At the moment we're trying to go feature by feature and hammer out documentation while we ensure the main supported parsers (XLS/XLSX/XLSB/SpreadsheetML) produce similar parsed objects. If there is anything unclear in the README please let us know where we can improve. @blakek if you are interested in contributing documentation, we'd gladly accept a PR! Documentation is the one area that doesn't involve hours of testing :D If you have any demos or projects, please add a link to the wiki page so others can find it!
SheetJSDev commented 2017-03-29 20:00:15 +00:00 (Migrated from github.com)

We set up a gitbook integration: https://sheetjs.gitbooks.io/docs/

We set up a gitbook integration: https://sheetjs.gitbooks.io/docs/
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#64
No description provided.