Charts in file breaks roundtrip #111

Closed
opened 2014-09-12 03:48:21 +00:00 by johnyesberg · 19 comments
johnyesberg commented 2014-09-12 03:48:21 +00:00 (Migrated from github.com)

Roundtrip test program

    var xlsx = require('xlsx');
    var book = xlsx.readFile('test/reports/Book1.xlsx',{cellStyles:true,bookDeps:true});
    xlsx.writeFile(book, 'test/reports/Book1-out.xlsx');

Roundtrip works with no chart

  • Create a brand new spreadsheet.
  • In Sheet1!A1 type "Sheet 1"
  • In Sheet2!A1 type "Sheet 2"
  • In Sheet3!A1 type "Sheet 3"
  • Save as Book1.xlsx.
  • Output from roundtrip shows appropriate entries in cells.

Roundtrip fails with chart present

  • On Sheet1, select cell A1, and insert some kind of chart (e.g. Column chart).
  • Right click on chart and "Move Chart..." to New Sheet Chart1 (which appears to left of Sheet1)
  • Run roundtrip
  • Resulting output has three sheets (and no chart).
    • Sheet1 is blank
    • Sheet2 has 'Sheet1' in cell A1
    • Sheet3 has 'Sheet2' in cell A1
  • During parsing, book.Sheets.Sheet1['!ref]===undefined

It seems that the presence of the Chart is confusing the parser when it determines the names of the sheets.

## Roundtrip test program ``` javascript var xlsx = require('xlsx'); var book = xlsx.readFile('test/reports/Book1.xlsx',{cellStyles:true,bookDeps:true}); xlsx.writeFile(book, 'test/reports/Book1-out.xlsx'); ``` ## Roundtrip works with no chart - Create a brand new spreadsheet. - In Sheet1!A1 type "Sheet 1" - In Sheet2!A1 type "Sheet 2" - In Sheet3!A1 type "Sheet 3" - Save as Book1.xlsx. - Output from roundtrip shows appropriate entries in cells. ## Roundtrip fails with chart present - On Sheet1, select cell A1, and insert some kind of chart (e.g. Column chart). - Right click on chart and "Move Chart..." to New Sheet Chart1 (which appears to left of Sheet1) - Run roundtrip - Resulting output has three sheets (and no chart). - Sheet1 is blank - Sheet2 has 'Sheet1' in cell A1 - Sheet3 has 'Sheet2' in cell A1 - During parsing, book.Sheets.Sheet1['!ref]===undefined It seems that the presence of the Chart is confusing the parser when it determines the names of the sheets.
johnyesberg commented 2014-09-12 04:19:40 +00:00 (Migrated from github.com)

Looking into the code, it seems that the loop at line 4944 is looping the right number of times (once for each sheet (excluding charts), but perhaps the path lookup on 4945 is not checking to see if the item in wbrels is a sheet or a chart.

    for(i = 0; i != props.Worksheets; ++i) {
        if(wbrels) path = 'xl/' + (wbrels[i][1]).replace(/[\/]?xl\//, "");

Looking into the code, it seems that the loop at line 4944 is looping the right number of times (once for each _sheet_ (excluding charts), but perhaps the path lookup on 4945 is not checking to see if the item in wbrels is a sheet or a chart. ``` javascript for(i = 0; i != props.Worksheets; ++i) { if(wbrels) path = 'xl/' + (wbrels[i][1]).replace(/[\/]?xl\//, ""); ```
johnyesberg commented 2014-09-12 04:28:03 +00:00 (Migrated from github.com)

It seems that the code needs to determine whether wbrels[i] refers to a worksheet or something else (such as a chart). A simple hack to do this would be to check whether wbrels[i][1] started with 'worksheets' or 'chartsheets', but that doesn't seem robust.

It seems that the code needs to determine whether `wbrels[i]` refers to a worksheet or something else (such as a chart). A simple hack to do this would be to check whether `wbrels[i][1]` started with `'worksheets'` or `'chartsheets'`, but that doesn't seem robust.
SheetJSDev commented 2014-09-12 05:45:45 +00:00 (Migrated from github.com)

The right way to fix the wbrels portion is to check the Type field (line 4846 only preserves the .Target field, the .Type field should also be added to the array). It will be a schema url:

  • http://schemas.openxmlformats.org/officeDocument/2006/relationships/chartsheet for chartsheets
  • http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet for worksheets

The latter is held in a constant RELS.WS (line 2741), but the corresponding chartsheet constant has not been added yet.

...

Short-term patches aside, this is probably the best time to tackle the hard problem of representing different sheet types.

Excel actually understand 4 types of sheets: worksheets, chartsheets (what we are discussing), dialog sheets and macro sheets. The last two are artifacts of older Excel versions

Questions:

  1. Should SheetNames hold the names of the worksheets or of all types of sheets? How should the type information be represented?

  2. Should Sheets only hold worksheets or hold all types of sheets?

  3. What is the best internal representation for the Chart? Excel stores series as independent data arrays. We could preserve that type of structure or we could create a fake sheet with the same structure as worksheets.

@elad @sysarchitect @notatestuser any thoughts?

The right way to fix the wbrels portion is to check the Type field (line 4846 only preserves the .Target field, the .Type field should also be added to the array). It will be a schema url: - `http://schemas.openxmlformats.org/officeDocument/2006/relationships/chartsheet` for chartsheets - `http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet` for worksheets The latter is held in a constant `RELS.WS` (line 2741), but the corresponding chartsheet constant has not been added yet. ... Short-term patches aside, this is probably the best time to tackle the hard problem of representing different sheet types. Excel actually understand 4 types of sheets: worksheets, chartsheets (what we are discussing), dialog sheets and macro sheets. The last two are artifacts of older Excel versions Questions: 1) Should `SheetNames` hold the names of the worksheets or of all types of sheets? How should the type information be represented? 2) Should `Sheets` only hold worksheets or hold all types of sheets? 3) What is the best internal representation for the Chart? Excel stores series as independent data arrays. We could preserve that type of structure or we could create a fake sheet with the same structure as worksheets. @elad @sysarchitect @notatestuser any thoughts?
johnyesberg commented 2014-09-12 06:10:44 +00:00 (Migrated from github.com)

Around line 4944, need to distinguish carefully between:

  • props.Worksheets == 3
  • props.SheetNames == [ 'Sheet1', 'Sheet2', 'Sheet3']
  • wb.Sheets == [ [... name:'Chart1'...], [... name:'Sheet1'...], [... name:'Sheet2'...], [...name:'Sheet3'...]
  • wbrels == [ ['Chart1', 'chartsheets/sheet1.xml'], ['Sheet1', 'worksheets/sheet1.xml'], ['Sheet2', 'worksheets/sheet2.xml'], ['Sheet3', 'worksheets/sheet3.xml'] ]

The code is looping props.Worksheets times, but using this as an index to wbrels.
Perhaps a short term fix might be to filter wbrels so that it only had entries corresponding to props.SheetNames?

Adding this line as 4942 is a short term patch:

wbrels = wbrels.filter(function(wbrel_name){return props.SheetNames.some(function(prop_name){return prop_name==wbrel_name[0]})});
Around line 4944, need to distinguish carefully between: - `props.Worksheets == 3` - `props.SheetNames == [ 'Sheet1', 'Sheet2', 'Sheet3']` - `wb.Sheets == [ [... name:'Chart1'...], [... name:'Sheet1'...], [... name:'Sheet2'...], [...name:'Sheet3'...]` - `wbrels == [ ['Chart1', 'chartsheets/sheet1.xml'], ['Sheet1', 'worksheets/sheet1.xml'], ['Sheet2', 'worksheets/sheet2.xml'], ['Sheet3', 'worksheets/sheet3.xml'] ]` The code is looping `props.Worksheets` times, but using this as an index to wbrels. Perhaps a short term fix might be to filter wbrels so that it only had entries corresponding to props.SheetNames? Adding this line as 4942 is a short term patch: ``` javascript wbrels = wbrels.filter(function(wbrel_name){return props.SheetNames.some(function(prop_name){return prop_name==wbrel_name[0]})}); ```
elad commented 2014-09-12 14:04:02 +00:00 (Migrated from github.com)

If I understand what @johnyesberg described above as the following, then it makes sense to me:

  • Sheets should have sheets of all types, including charts. If it makes sense, then a type field could indicate the type of sheet. I think this is what you meant by preserving the .Type field.
  • SheetNames should only contain names of worksheets, or the sheets the user can actually click on when opening the file in Excel.

As for chart data, I'd opt for keeping it in the same format as Excel does for now.

If I understand what @johnyesberg described above as the following, then it makes sense to me: - `Sheets` should have sheets of all types, including charts. If it makes sense, then a `type` field could indicate the type of sheet. I think this is what you meant by preserving the `.Type` field. - `SheetNames` should only contain names of worksheets, or the sheets the user can actually click on when opening the file in Excel. As for chart data, I'd opt for keeping it in the same format as Excel does for now.
SheetJSDev commented 2014-09-12 14:34:13 +00:00 (Migrated from github.com)

SheetNames should only contain names of worksheets, or the sheets the user can actually click on when opening the file in Excel.

There are sheets that are not worksheets that the user can click on (such as charts).

@johnyesberg @elad there is a test file in the suite: https://github.com/SheetJS/test_files/blob/master/BlankSheetTypes.xlsm This file has all four types, and you can see they show up in the sheet tab list. There are 4 tabs: Sheet1, Chart1, Macro1, Dialog1

In this example, should SheetNames be ["Sheet1"] or [Sheet1, Chart1, Macro1, "Dialog1"] ?

> SheetNames should only contain names of worksheets, or the sheets the user can actually click on when opening the file in Excel. There are sheets that are not worksheets that the user can click on (such as charts). @johnyesberg @elad there is a test file in the suite: https://github.com/SheetJS/test_files/blob/master/BlankSheetTypes.xlsm This file has all four types, and you can see they show up in the sheet tab list. There are 4 tabs: Sheet1, Chart1, Macro1, Dialog1 In this example, should SheetNames be `["Sheet1"]` or `[Sheet1, Chart1, Macro1, "Dialog1"]` ?
elad commented 2014-09-12 14:37:41 +00:00 (Migrated from github.com)

Oh, in that case I misunderstood - I thought you meant Excel kept charts as "internal" sheets. In my opinion, if it's clickable by the user, it should appear in SheetNames. So ['Sheet1', 'Chart1', 'Macro1', 'Dialog1']. :)

Oh, in that case I misunderstood - I thought you meant Excel kept charts as "internal" sheets. In my opinion, if it's clickable by the user, it should appear in `SheetNames`. So `['Sheet1', 'Chart1', 'Macro1', 'Dialog1']`. :)
johnyesberg commented 2014-09-15 11:51:36 +00:00 (Migrated from github.com)

It's hard to predict what users of the API would like most, but probably best to stick with the clearest domain labels as elad suggested above. I think that if the name is Sheets, then it has to have all types of sheets.

It might be handy if there were a Worksheets property that contained the subset of sheets that are worksheets. But then it gets more complex if someone wants to add a new worksheet. And the overall API probably becomes more difficult to maintain. I think I would vote against such a property for now.

Do you think it makes sense to have a ws['!type'] property that could be set to RELS.WS or associated types (that would need to be added)?

It's hard to predict what users of the API would like most, but probably best to stick with the clearest domain labels as elad suggested above. I think that if the name is `Sheets`, then it has to have all types of sheets. It might be handy if there were a `Worksheets` property that contained the subset of sheets that are worksheets. But then it gets more complex if someone wants to add a new worksheet. And the overall API probably becomes more difficult to maintain. I think I would vote against such a property for now. Do you think it makes sense to have a `ws['!type']` property that could be set to RELS.WS or associated types (that would need to be added)?
SheetJSDev commented 2014-09-15 21:55:50 +00:00 (Migrated from github.com)

@johnyesberg the !type key in the Sheets objects is important. The Worksheets property unnecessarily duplicates data with no clear benefit. Instead of storing the relationship name (that's an Excel 2007+ism), I'd propose a short name for each type:

  • Worksheet: w or ws or worksheet
  • Chartsheet: c or cs or chart or chartsheet
  • Dialogsheet: d or ds or dialog or dialogsheet
  • Macrosheet: m or ms or vba or macrosheet
@johnyesberg the `!type` key in the Sheets objects is important. The `Worksheets` property unnecessarily duplicates data with no clear benefit. Instead of storing the relationship name (that's an Excel 2007+ism), I'd propose a short name for each type: - Worksheet: `w` or `ws` or `worksheet` - Chartsheet: `c` or `cs` or `chart` or `chartsheet` - Dialogsheet: `d` or `ds` or `dialog` or `dialogsheet` - Macrosheet: `m` or `ms` or `vba` or `macrosheet`
johnyesberg commented 2014-09-15 22:13:41 +00:00 (Migrated from github.com)

Sounds good to me. I don't see any reason for long names. We're already used to .v and .f properties, so I think w|c|d|m is fine.

On 16 September 2014 07:55, SheetJSDev notifications@github.com wrote:

@johnyesberg https://github.com/johnyesberg the !type key in the Sheets
objects is important. The Worksheets property unnecessarily duplicates
data with no clear benefit. Instead of storing the relationship name
(that's an Excel 2007+ism), I'd propose a short name for each type:

  • Worksheet: w or ws or worksheet
  • Chartsheet: c or cs or chart or chartsheet
  • Dialogsheet: d or ds or dialog or dialogsheet
  • Macrosheet: m or ms or vba or macrosheet


Reply to this email directly or view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/111#issuecomment-55665254.

Sounds good to me. I don't see any reason for long names. We're already used to .v and .f properties, so I think w|c|d|m is fine. On 16 September 2014 07:55, SheetJSDev notifications@github.com wrote: > @johnyesberg https://github.com/johnyesberg the !type key in the Sheets > objects is important. The Worksheets property unnecessarily duplicates > data with no clear benefit. Instead of storing the relationship name > (that's an Excel 2007+ism), I'd propose a short name for each type: > - Worksheet: w or ws or worksheet > - Chartsheet: c or cs or chart or chartsheet > - Dialogsheet: d or ds or dialog or dialogsheet > - Macrosheet: m or ms or vba or macrosheet > > — > Reply to this email directly or view it on GitHub > https://github.com/SheetJS/js-xlsx/issues/111#issuecomment-55665254.
elad commented 2014-09-16 04:06:23 +00:00 (Migrated from github.com)

Sounds good to me, and my vote is for full names ('worksheet' etc.) because it makes the code clearer and I don't (yet) see any reason to keep them short. :)

Edit: I just noticed @johnyesberg's response with respect to short names. It was my understanding that the short vs. long names was in reference to values, not keys, so that for example foo.type would be equal to 'worksheet' or 'dialog'. If that's the case, then I still prefer long names since code that looks like this:

if (foo.type === 'w') {
    // stuff
}

Isn't immediately obvious.

If, however, the names are for keys, then I can see consistency as case in favor of short names, but would still opt for longer names for clarity.

Sounds good to me, and my vote is for full names ('worksheet' etc.) because it makes the code clearer and I don't (yet) see any reason to keep them short. :) Edit: I just noticed @johnyesberg's response with respect to short names. It was my understanding that the short vs. long names was in reference to values, not keys, so that for example `foo.type` would be equal to `'worksheet'` or `'dialog'`. If that's the case, then I still prefer long names since code that looks like this: ``` if (foo.type === 'w') { // stuff } ``` Isn't immediately obvious. If, however, the names are for keys, then I can see consistency as case in favor of short names, but would still opt for longer names for clarity.
SheetJSDev commented 2014-09-16 04:43:15 +00:00 (Migrated from github.com)

@elad @johnyesberg in the case of cells, since we'd have many cells per workbook, one-letter keys makes the resultant JSON significantly smaller. However, since the type metadata appears once per sheet, using the full name is fine.

@elad @johnyesberg in the case of cells, since we'd have many cells per workbook, one-letter keys makes the resultant JSON significantly smaller. However, since the type metadata appears once per sheet, using the full name is fine.
johnyesberg commented 2014-09-16 05:52:26 +00:00 (Migrated from github.com)

I'm happy with the longer values, and agree that they won't contribute
significantly to json size.

On 16 September 2014 14:43, SheetJSDev notifications@github.com wrote:

@elad https://github.com/elad @johnyesberg
https://github.com/johnyesberg in the case of cells, since we'd have
many cells per workbook, one-letter keys makes the resultant JSON
significantly smaller. However, since the type metadata appears once per
sheet, using the full name is fine.


Reply to this email directly or view it on GitHub
https://github.com/SheetJS/js-xlsx/issues/111#issuecomment-55694226.

I'm happy with the longer values, and agree that they won't contribute significantly to json size. On 16 September 2014 14:43, SheetJSDev notifications@github.com wrote: > @elad https://github.com/elad @johnyesberg > https://github.com/johnyesberg in the case of cells, since we'd have > many cells per workbook, one-letter keys makes the resultant JSON > significantly smaller. However, since the type metadata appears once per > sheet, using the full name is fine. > > — > Reply to this email directly or view it on GitHub > https://github.com/SheetJS/js-xlsx/issues/111#issuecomment-55694226.
jbergknoff commented 2016-03-02 15:58:40 +00:00 (Migrated from github.com)

I'm running into this issue, also. Any progress on it?

I'm running into this issue, also. Any progress on it?
SheetJSDev commented 2017-03-31 02:54:28 +00:00 (Migrated from github.com)

We now save charts as worksheets with a !type field set to "chart", but they are saved as regular worksheets. We are working out the logistics of actually saving chartsheets as chartsheets

We now save charts as worksheets with a `!type` field set to `"chart"`, but they are saved as regular worksheets. We are working out the logistics of actually saving chartsheets as chartsheets
jacobq commented 2018-12-21 15:49:28 +00:00 (Migrated from github.com)

We are working out the logistics of actually saving chartsheets as chartsheets

@SheetJSDev Is there a place where we can check the progress of this work? (It's been 18+ months, and the lack of ability to preserve existing "special content" like charts is the only limitation blocking use of this library in my application.)

> We are working out the logistics of actually saving chartsheets as chartsheets @SheetJSDev Is there a place where we can check the progress of this work? (It's been 18+ months, and the lack of ability to preserve existing "special content" like charts is the only limitation blocking use of this library in my application.)
Nisgrak commented 2020-05-22 21:00:29 +00:00 (Migrated from github.com)

Something new?

Something new?
beingke commented 2021-04-16 20:45:05 +00:00 (Migrated from github.com)

Bump

Bump
SheetJSDev commented 2021-09-16 06:08:20 +00:00 (Migrated from github.com)

For roundtripping, we offer a special Pro Edit build which, instead of trying to read and write the charts properly, surgically edits the underlying data blocks to enable preservation.

Based on our experience offering a component to read and write the basic chart types, a proper roundtrip is certainly possible. Unfortunately Excel is adding new chart types with each major release (new charts were added in Excel 2016 and Excel 2019), and keeping up with the changes is beyond the scope of what a small number of unpaid volunteers can support.

For roundtripping, we offer a special [Pro Edit build](https://sheetjs.com/pro) which, instead of trying to read and write the charts properly, surgically edits the underlying data blocks to enable preservation. Based on our experience offering a component to read and write the basic chart types, a proper roundtrip is certainly possible. Unfortunately Excel is adding new chart types with each major release (new charts were added in Excel 2016 and Excel 2019), and keeping up with the changes is beyond the scope of what a small number of unpaid volunteers can support.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#111
No description provided.