Charts in file breaks roundtrip #111
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#111
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Roundtrip test program
Roundtrip works with no chart
Roundtrip fails with chart present
It seems that the presence of the Chart is confusing the parser when it determines the names of the sheets.
Looking into the code, it seems that the loop at line 4944 is looping the right number of times (once for each sheet (excluding charts), but perhaps the path lookup on 4945 is not checking to see if the item in wbrels is a sheet or a chart.
It seems that the code needs to determine whether
wbrels[i]
refers to a worksheet or something else (such as a chart). A simple hack to do this would be to check whetherwbrels[i][1]
started with'worksheets'
or'chartsheets'
, but that doesn't seem robust.The right way to fix the wbrels portion is to check the Type field (line 4846 only preserves the .Target field, the .Type field should also be added to the array). It will be a schema url:
http://schemas.openxmlformats.org/officeDocument/2006/relationships/chartsheet
for chartsheetshttp://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet
for worksheetsThe latter is held in a constant
RELS.WS
(line 2741), but the corresponding chartsheet constant has not been added yet....
Short-term patches aside, this is probably the best time to tackle the hard problem of representing different sheet types.
Excel actually understand 4 types of sheets: worksheets, chartsheets (what we are discussing), dialog sheets and macro sheets. The last two are artifacts of older Excel versions
Questions:
Should
SheetNames
hold the names of the worksheets or of all types of sheets? How should the type information be represented?Should
Sheets
only hold worksheets or hold all types of sheets?What is the best internal representation for the Chart? Excel stores series as independent data arrays. We could preserve that type of structure or we could create a fake sheet with the same structure as worksheets.
@elad @sysarchitect @notatestuser any thoughts?
Around line 4944, need to distinguish carefully between:
props.Worksheets == 3
props.SheetNames == [ 'Sheet1', 'Sheet2', 'Sheet3']
wb.Sheets == [ [... name:'Chart1'...], [... name:'Sheet1'...], [... name:'Sheet2'...], [...name:'Sheet3'...]
wbrels == [ ['Chart1', 'chartsheets/sheet1.xml'], ['Sheet1', 'worksheets/sheet1.xml'], ['Sheet2', 'worksheets/sheet2.xml'], ['Sheet3', 'worksheets/sheet3.xml'] ]
The code is looping
props.Worksheets
times, but using this as an index to wbrels.Perhaps a short term fix might be to filter wbrels so that it only had entries corresponding to props.SheetNames?
Adding this line as 4942 is a short term patch:
If I understand what @johnyesberg described above as the following, then it makes sense to me:
Sheets
should have sheets of all types, including charts. If it makes sense, then atype
field could indicate the type of sheet. I think this is what you meant by preserving the.Type
field.SheetNames
should only contain names of worksheets, or the sheets the user can actually click on when opening the file in Excel.As for chart data, I'd opt for keeping it in the same format as Excel does for now.
There are sheets that are not worksheets that the user can click on (such as charts).
@johnyesberg @elad there is a test file in the suite: https://github.com/SheetJS/test_files/blob/master/BlankSheetTypes.xlsm This file has all four types, and you can see they show up in the sheet tab list. There are 4 tabs: Sheet1, Chart1, Macro1, Dialog1
In this example, should SheetNames be
["Sheet1"]
or[Sheet1, Chart1, Macro1, "Dialog1"]
?Oh, in that case I misunderstood - I thought you meant Excel kept charts as "internal" sheets. In my opinion, if it's clickable by the user, it should appear in
SheetNames
. So['Sheet1', 'Chart1', 'Macro1', 'Dialog1']
. :)It's hard to predict what users of the API would like most, but probably best to stick with the clearest domain labels as elad suggested above. I think that if the name is
Sheets
, then it has to have all types of sheets.It might be handy if there were a
Worksheets
property that contained the subset of sheets that are worksheets. But then it gets more complex if someone wants to add a new worksheet. And the overall API probably becomes more difficult to maintain. I think I would vote against such a property for now.Do you think it makes sense to have a
ws['!type']
property that could be set to RELS.WS or associated types (that would need to be added)?@johnyesberg the
!type
key in the Sheets objects is important. TheWorksheets
property unnecessarily duplicates data with no clear benefit. Instead of storing the relationship name (that's an Excel 2007+ism), I'd propose a short name for each type:w
orws
orworksheet
c
orcs
orchart
orchartsheet
d
ords
ordialog
ordialogsheet
m
orms
orvba
ormacrosheet
Sounds good to me. I don't see any reason for long names. We're already used to .v and .f properties, so I think w|c|d|m is fine.
On 16 September 2014 07:55, SheetJSDev notifications@github.com wrote:
Sounds good to me, and my vote is for full names ('worksheet' etc.) because it makes the code clearer and I don't (yet) see any reason to keep them short. :)
Edit: I just noticed @johnyesberg's response with respect to short names. It was my understanding that the short vs. long names was in reference to values, not keys, so that for example
foo.type
would be equal to'worksheet'
or'dialog'
. If that's the case, then I still prefer long names since code that looks like this:Isn't immediately obvious.
If, however, the names are for keys, then I can see consistency as case in favor of short names, but would still opt for longer names for clarity.
@elad @johnyesberg in the case of cells, since we'd have many cells per workbook, one-letter keys makes the resultant JSON significantly smaller. However, since the type metadata appears once per sheet, using the full name is fine.
I'm happy with the longer values, and agree that they won't contribute
significantly to json size.
On 16 September 2014 14:43, SheetJSDev notifications@github.com wrote:
I'm running into this issue, also. Any progress on it?
We now save charts as worksheets with a
!type
field set to"chart"
, but they are saved as regular worksheets. We are working out the logistics of actually saving chartsheets as chartsheets@SheetJSDev Is there a place where we can check the progress of this work? (It's been 18+ months, and the lack of ability to preserve existing "special content" like charts is the only limitation blocking use of this library in my application.)
Something new?
Bump
For roundtripping, we offer a special Pro Edit build which, instead of trying to read and write the charts properly, surgically edits the underlying data blocks to enable preservation.
Based on our experience offering a component to read and write the basic chart types, a proper roundtrip is certainly possible. Unfortunately Excel is adding new chart types with each major release (new charts were added in Excel 2016 and Excel 2019), and keeping up with the changes is beyond the scope of what a small number of unpaid volunteers can support.