sheet_to_json() returns a sparse array #602

New Issue

2017-03-23T18:33:37Z

EEaglehouse commented

2017-03-23 18:33:37 +00:00

(Migrated from github.com)

This is a request to add an option to indicate that the worksheet should be completely translated into an array, creating all rows instead of ignoring some.

If the original worksheet file has empty rows, it is stored with no references to the missing data. sheet_to_json() will faithfully convert the sheet to an array, omitting any rows that are empty. The resulting array cannot be used to build a table accurately because only the populated rows are available. There currently is no option to create array elements for the empty rows.

For example, if a work sheet's cells are represented by the object:
{ "A1": "Row 0 is populated", "J4": "Row 4 is populated" }
then, sheet_to_json(_ws, { header: "A" }) returns:
[ { "A": "Row 0 is populated", __rowNum__: 0 }, { "J": "Row 4 is populated", __rowNum__: 3 } ]

I noticed there is a pseudo-member in each array element named __rowNum__ that seems to specify the physical index of the row in the spreadsheet that the array element was derived from, but that's just my guess. If this were documented, I could use it so I could fill in the missing rows in a multi-step process, but I don't want to create a maintenance nightmare by relying on something undocumented.

Would you consider adding a way so all physical rows are represented in the output array? Or document the __rowNum__ reference to be official?

By the way, thank you all very much for the work you've done. This is a useful project.

This is a request to add an option to indicate that the worksheet should be completely translated into an array, creating all rows instead of ignoring some. If the original worksheet file has empty rows, it is stored with no references to the missing data. `sheet_to_json()` will faithfully convert the sheet to an array, omitting any rows that are empty. The resulting array cannot be used to build a table accurately because only the populated rows are available. There currently is no option to create array elements for the empty rows. For example, if a work sheet's cells are represented by the object: ` { "A1": "Row 0 is populated", "J4": "Row 4 is populated" } ` then, `sheet_to_json(_ws, { header: "A" })` returns: ` [ { "A": "Row 0 is populated", __rowNum__: 0 }, { "J": "Row 4 is populated", __rowNum__: 3 } ] ` I noticed there is a pseudo-member in each array element named `__rowNum__` that seems to specify the physical index of the row in the spreadsheet that the array element was derived from, but that's just my guess. If this were documented, I could use it so I could fill in the missing rows in a multi-step process, but I don't want to create a maintenance nightmare by relying on something undocumented. Would you consider adding a way so all physical rows are represented in the output array? Or document the `__rowNum__` reference to be official? By the way, thank you all very much for the work you've done. This is a useful project.

SheetJSDev commented

2017-03-23 18:53:37 +00:00

(Migrated from github.com)

We will document __rowNum__ behavior (it is official) and will add tests. In addition, we will add a new option blankrows to both sheet_to_json and sheet_to_csv. The behavior will be as follows:

sheet_to_csv will default to generating every row. If blankrows is explicitly false, rows with no content will not be generated.
sheet_to_json when header is not set to 1 will default to skipping blank rows. If blankrows is true, the blank row objects will be added. Irrespective of the blank rows setting, __rowNum__ will be available.
sheet_to_json with header:1 will default to generating every row. If blankrows is explicitly false, rows with no content will not be generated.

We're aiming to release 0.9.6 before the end of the month.

PS: __rowNum__ and the whole sheet_to_json function started from an older utility sheet_to_row_object_array (the alias is still there) contributed by @nathanathan -- incidentally it was our first issue/PR: https://github.com/SheetJS/js-xlsx/pull/1

We will document `__rowNum__` behavior (it is official) and will add tests. In addition, we will add a new option `blankrows` to both `sheet_to_json` and `sheet_to_csv`. The behavior will be as follows: - `sheet_to_csv` will default to generating every row. If `blankrows` is explicitly false, rows with no content will not be generated. - `sheet_to_json` when `header` is not set to 1 will default to skipping blank rows. If `blankrows` is true, the blank row objects will be added. Irrespective of the blank rows setting, `__rowNum__` will be available. - `sheet_to_json` with `header:1` will default to generating every row. If `blankrows` is explicitly false, rows with no content will not be generated. We're aiming to release `0.9.6` before the end of the month. PS: `__rowNum__` and the whole `sheet_to_json` function started from an older utility `sheet_to_row_object_array` (the alias is still there) contributed by @nathanathan -- incidentally it was our first issue/PR: https://github.com/SheetJS/js-xlsx/pull/1

EEaglehouse commented

2017-03-24 15:12:44 +00:00

(Migrated from github.com)

That's super! Thank you so much for the quick response! I'm very pleased with the news.

Sign in to join this conversation.