ODS parser bugs: same data and empty lines #328

Closed
opened 2015-12-01 14:40:29 +00:00 by think01 · 3 comments
think01 commented 2015-12-01 14:40:29 +00:00 (Migrated from github.com)

I'm using js-xlsx on node.js and found a couple of bugs when parsing ODS files.

  1. Same data on consecutive cells
    In any part of the spreadsheet, if there are two or more consecutive cells in a row with same data (and formatting, more on this below), only the first one gets correctly parsed: the following cells are treated as empty.
    Example: if I have in the sheet:

image

I get:

worksheet["A1"].v === 1
worksheet["B1"].v === 2
worksheet["C1"] === undefined
worksheet["D1"] === undefined
worksheet["E1"].v === 3
worksheet["F1"] === undefined

Note that for the bug to occur the cells have to be also formatted in the same way. It can happen that cells with same data appear to be exactly the same but the formatting is someway different, for example a number with no formatting at all and another cell with same number but with number format.

  1. Multiple empty lines
    After a first empty line, every following empty line is totally ignored (as if it didn't exist in the sheet file) and other following data lines have the row coordinate translated (subtracted) by an amount equal to the ignored empty lines.
    Example:

image

I get:

worksheet["B5"] === undefined
worksheet["B3"].v === "X"

The exact same .ods file exported as .xlsx (using Libre Office) gets correctly parsed and none of the above occurs.

Could someone please direct me on the right source file(s) in order to try to correct this issue? It's very important for the project I'm working on (where having the spreadsheet files in xlsx format is not an option).

Thanks,
Fabio

I'm using js-xlsx on node.js and found a couple of bugs when parsing ODS files. 1) **Same data on consecutive cells** In any part of the spreadsheet, if there are two or more consecutive cells in a row with same data (and formatting, more on this below), only the first one gets correctly parsed: the following cells are treated as empty. Example: if I have in the sheet: ![image](https://cloud.githubusercontent.com/assets/475965/11503233/bfc2e95c-983f-11e5-8576-626bc8c6c36a.png) I get: ``` javascript worksheet["A1"].v === 1 worksheet["B1"].v === 2 worksheet["C1"] === undefined worksheet["D1"] === undefined worksheet["E1"].v === 3 worksheet["F1"] === undefined ``` Note that for the bug to occur the cells have to be also formatted in the same way. It can happen that cells with same data appear to be exactly the same but the formatting is someway different, for example a number with no formatting at all and another cell with same number but with number format. 2) **Multiple empty lines** After a first empty line, every following empty line is totally ignored (as if it didn't exist in the sheet file) and other following data lines have the row coordinate translated (subtracted) by an amount equal to the ignored empty lines. Example: ![image](https://cloud.githubusercontent.com/assets/475965/11503410/bc4830d8-9840-11e5-81a4-e6b9f1016817.png) I get: ``` javascript worksheet["B5"] === undefined worksheet["B3"].v === "X" ``` The exact same .ods file exported as .xlsx (using Libre Office) gets correctly parsed and none of the above occurs. Could someone please direct me on the right source file(s) in order to try to correct this issue? It's very important for the project I'm working on (where having the spreadsheet files in xlsx format is not an option). Thanks, Fabio
kutay commented 2015-12-19 02:16:13 +00:00 (Migrated from github.com)

Hello, I've noticed this exact problem today too.

I noticed that the parser does not handle table:number-columns-repeated attributes in the content.xml.
It seems there is already a PR which solves this particular problem (see https://github.com/SheetJS/js-xlsx/pull/179), but I wonder if there is still someone managing this project...

Maybe we should try to find a fork...

Hello, I've noticed this exact problem today too. I noticed that the parser does not handle table:number-columns-repeated attributes in the content.xml. It seems there is already a PR which solves this particular problem (see https://github.com/SheetJS/js-xlsx/pull/179), but I wonder if there is still someone managing this project... Maybe we should try to find a fork...
think01 commented 2016-01-07 09:13:43 +00:00 (Migrated from github.com)

Yes, sorry for late replay, maybe You're right, I'll look as soon as possible - and if you find a good fork please link it here ;-)

Cheers,
Fabio

Yes, sorry for late replay, maybe You're right, I'll look as soon as possible - and if you find a good fork please link it here ;-) Cheers, Fabio
BhuvneshGupta commented 2016-12-16 12:24:07 +00:00 (Migrated from github.com)

its not fixed?

its not fixed?
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#328
No description provided.