Formulae not retained on export #152

Closed
opened 2014-12-10 20:28:58 +00:00 by ghost · 17 comments
ghost commented 2014-12-10 20:28:58 +00:00 (Migrated from github.com)

When a worksheet has cells with valid .f properties (formula), the value is not retained during exporting to XLSX format via:

var workbook = workbook = Xlsx.readFile('OnDemand.xlsx', { 
    cellFormula: true, 
    cellStyles: true, 
    sheetStubs: true, 
    cellNF: true, 
    bookDeps: true 
});
var out = Xlsx.write( workbook, { bookType: 'xlsx', /*bookSST: true,*/ type: 'binary' });

I need to support such functionality for the modification of templated workbooks. For example, the template has a cell, Q16 with the formula =IF(ISERROR((N16*M16*R16)/101.8),0,((N16*M16*R16)/101.8)). When output via the above code, the value "0" is produced instead.

No formula information is retained, and the cell is a "General" cell of value "0". Is there a workaround or solution to this?

When a worksheet has cells with valid `.f` properties (formula), the value is not retained during exporting to XLSX format via: ``` javascript var workbook = workbook = Xlsx.readFile('OnDemand.xlsx', { cellFormula: true, cellStyles: true, sheetStubs: true, cellNF: true, bookDeps: true }); var out = Xlsx.write( workbook, { bookType: 'xlsx', /*bookSST: true,*/ type: 'binary' }); ``` I need to support such functionality for the modification of templated workbooks. For example, the template has a cell, **Q16** with the formula `=IF(ISERROR((N16*M16*R16)/101.8),0,((N16*M16*R16)/101.8))`. When output via the above code, the value "0" is produced instead. No formula information is retained, and the cell is a "General" cell of value "0". Is there a workaround or solution to this?
aravindkoneru commented 2015-02-24 22:44:28 +00:00 (Migrated from github.com)

@florf
Sorry for the bump, but I just wanted to know if you ever found a solution to this problem. I have cells with valid .f properties that won't write, rather they just show up as empty boxes. If you found a work-around or solution, I'd really appreciate it if you could share.

Thanks!

@florf Sorry for the bump, but I just wanted to know if you ever found a solution to this problem. I have cells with valid .f properties that won't write, rather they just show up as empty boxes. If you found a work-around or solution, I'd really appreciate it if you could share. Thanks!
ryangallen commented 2015-03-30 15:17:38 +00:00 (Migrated from github.com)

Bumping this issue.

According to the docs: "The raw data is the only thing guaranteed to be saved. Formulae, formatting, and other niceties may not be serialized (pending CSF standardization)"

Is there any word on progress for this topic? Has anyone found a workaround?

Bumping this issue. According to the docs: "The raw data is the only thing guaranteed to be saved. Formulae, formatting, and other niceties may not be serialized (pending CSF standardization)" Is there any word on progress for this topic? Has anyone found a workaround?
SheetJSDev commented 2015-03-30 17:29:30 +00:00 (Migrated from github.com)

The formula situation is not as simple as it seems. In general, when considering any feature, we should always try to understand the full scope of the problem first.

A1-style versus RC-style references

Most people are familiar with the A-1 style, but the RC style has a few cool benefits. Let's say you want to keep a running total in a separate column. One common approach is to write formulae like B2=A2+B1, B3=A3+B2, .... Now you are forced to write different things in different cells. In the RC form, they are all the same formula: =RC[-1]+R[-1]C. This gives rise to the so-called "shared formula" storage (see [MS-XLS] 2.4.260 ShrFmla)

Array formula

Unlike shared formulae, array formulae change the formula semantics and oftentimes apply to ranges of cells. In a context where a scalar is expected (e.g. COS), non-array formulae will implicitly slice a column to find the corresponding cell.

For example, consider a worksheet with A1=1, A2=2, A3=3, A4=4, A5=5. Thanks to the intersection rule, setting C2 to the formula =SUM(A1:A5*A1:A5) will give you 4 (A2*A2=4) and setting B4 to that formula will give you 16 (A4*A4 = 16). If you enter the formula and hit control-shift-enter, you will see the formula as {=SUM(A1:A5*A1:A5)} and the value is 55.

File representation

This process would be much easier if there were one representation shared by the various Excel formats. That is not the case:

  • XLSX stores raw A1-reference formula strings
  • XLSB stores a parsed representation (a set of tokens which roughly matches how Excel internally parses formulae)
  • XLS stores a slightly different parsed representation
  • SpreadsheetML stores the formulas using RC-reference notation -- there's a small regex operation involved in converting from RC to A1
  • CSV has some special rules for determining whether a cell encodes an A1-reference formula or if it should be treated as literal data.

Fortunately, the test suite has files in various formats so we can compare the representations.

Requirements

A "standardized solution" consists of:

A) A standard formula representation for simple single-cell formula

This boils down to deciding between A1-style and RC-style formula references.

B) A standard formula representation for multi-cell array formulae

This is not as straightforward since it requires some sort of worksheet-level storage. Slicing the formula is problematic as it loses information that you will need to perform certain operations.

C) A set of utility functions for converting back and forth between various representations

@tinotruppel and others have looked into this problem. The formula stress test has some fun corner cases to consider. Note that once this is done, actually writing the formulae in XLSX is super simple.

D) A set of utility functions for linting worksheets (for example, verifying that a simple formula does not intersect an array formula)

This is pretty straightforward once we decide on the representation.

The formula situation is not as simple as it seems. In general, when considering any feature, we should always try to understand the full scope of the problem first. ### A1-style versus RC-style references Most people are familiar with the A-1 style, but the RC style has a few cool benefits. Let's say you want to keep a running total in a separate column. One common approach is to write formulae like `B2=A2+B1, B3=A3+B2, ...`. Now you are forced to write different things in different cells. In the RC form, they are all the same formula: `=RC[-1]+R[-1]C`. This gives rise to the so-called "shared formula" storage (see [MS-XLS] 2.4.260 ShrFmla) ### Array formula Unlike shared formulae, array formulae change the formula semantics and oftentimes apply to ranges of cells. In a context where a scalar is expected (e.g. COS), non-array formulae will implicitly slice a column to find the corresponding cell. For example, consider a worksheet with `A1=1, A2=2, A3=3, A4=4, A5=5`. Thanks to the intersection rule, setting C2 to the formula `=SUM(A1:A5*A1:A5)` will give you 4 (`A2*A2=4`) and setting B4 to that formula will give you 16 (`A4*A4 = 16`). If you enter the formula and hit control-shift-enter, you will see the formula as `{=SUM(A1:A5*A1:A5)}` and the value is 55. ### File representation This process would be much easier if there were one representation shared by the various Excel formats. That is not the case: - XLSX stores raw A1-reference formula strings - XLSB stores a parsed representation (a set of tokens which roughly matches how Excel internally parses formulae) - XLS stores a slightly different parsed representation - SpreadsheetML stores the formulas using RC-reference notation -- there's a small [regex operation](https://github.com/SheetJS/js-xls/blob/master/bits/62_formutils.js) involved in converting from RC to A1 - CSV has some special rules for determining whether a cell encodes an A1-reference formula or if it should be treated as literal data. Fortunately, the [test suite](https://github.com/SheetJS/test_files) has files in various formats so we can compare the representations. ### Requirements A "standardized solution" consists of: A) A standard formula representation for simple single-cell formula This boils down to deciding between A1-style and RC-style formula references. B) A standard formula representation for multi-cell array formulae This is not as straightforward since it requires some sort of worksheet-level storage. Slicing the formula is problematic as it loses information that you will need to perform certain operations. C) A set of utility functions for converting back and forth between various representations @tinotruppel and others have looked into this problem. The formula stress test has some fun corner cases to consider. Note that once this is done, actually writing the formulae in XLSX is super simple. D) A set of utility functions for linting worksheets (for example, verifying that a simple formula does not intersect an array formula) This is pretty straightforward once we decide on the representation.
ryangallen commented 2015-03-30 17:56:36 +00:00 (Migrated from github.com)

@SheetJSDev Thank you for the background info. Is there a forum for discussing the representation options/timeline for making the decision?

@SheetJSDev Thank you for the background info. Is there a forum for discussing the representation options/timeline for making the decision?
ryangallen commented 2015-04-01 13:12:58 +00:00 (Migrated from github.com)

@SheetJSDev @tinotruppel Since the .xlsx format seems to be the simplest to implement, I vote adding that functionality first. js-xlsx is the name of the library after all. I think most developers would understand if other formats aren't supported right away.

Can we specify which representation to use in the options object passed to the XLSX.write method? It looks like line 350 of https://github.com/SheetJS/js-xlsx/blob/master/bits/68_wsbin.js is what I'm looking for to do the job but I'm not sure. Please let me know if I can help code for the repo or if there is something I can add in a personal fork to get basic string formulae writing out.

Thank you!

@SheetJSDev @tinotruppel Since the .xlsx format seems to be the simplest to implement, I vote adding that functionality first. js-xlsx is the name of the library after all. I think most developers would understand if other formats aren't supported right away. Can we specify which representation to use in the options object passed to the XLSX.write method? It looks like line 350 of https://github.com/SheetJS/js-xlsx/blob/master/bits/68_wsbin.js is what I'm looking for to do the job but I'm not sure. Please let me know if I can help code for the repo or if there is something I can add in a personal fork to get basic string formulae writing out. Thank you!
cara commented 2015-04-26 22:39:20 +00:00 (Migrated from github.com)

@ryangallen @SheetJSDev A workaround for writing functions to xslx would be great! Should be the big deal with your knowledge, it would be very kind of you, if you could give us some hint.
Thank you!

@ryangallen @SheetJSDev A workaround for writing functions to xslx would be great! Should be the big deal with your knowledge, it would be very kind of you, if you could give us some hint. Thank you!
mcsf commented 2015-05-10 22:45:59 +00:00 (Migrated from github.com)

@florf @aravindkoneru @ryangallen if, like me, you still need this feature, I have a patch that works for me:

d43964564d

@florf @aravindkoneru @ryangallen if, like me, you still need this feature, I have a patch that works for me: https://github.com/mcsf/js-xlsx/commit/d43964564d2d455271a9348f904fe2c34e5b1a59
aravindkoneru commented 2015-05-11 20:57:31 +00:00 (Migrated from github.com)

@mcsf Thanks for the patch! It works great for me.

@mcsf Thanks for the patch! It works great for me.
alekseyl1992 commented 2015-06-04 12:01:47 +00:00 (Migrated from github.com)

@mcsf thanks, it works!

@mcsf thanks, it works!
mdumouchel commented 2015-06-10 22:22:36 +00:00 (Migrated from github.com)

@mcsf would the below formula work with the changes you made?

{ t: 'n', f: '=SUMIF(B2:B2, ">="&B2,D2:D2)' }

I was looking through the code it formats the above like so

<f>=SUMIF(B2:B2, &quot;&gt;=&quot;&amp;B2,D2:D2)</f>

Not sure what the format should be for the xml or where the documentation on that might be.

When I create excel document and save it as xml I get
7820864.7891291287

Again not certain what format it should be.

@mcsf would the below formula work with the changes you made? ``` { t: 'n', f: '=SUMIF(B2:B2, ">="&B2,D2:D2)' } ``` I was looking through the code it formats the above like so ``` <f>=SUMIF(B2:B2, &quot;&gt;=&quot;&amp;B2,D2:D2)</f> ``` Not sure what the format should be for the xml or where the documentation on that might be. When I create excel document and save it as xml I get <Cell ss:Formula="=SUMIF(R[-69]C[1]:R[-3]C[1], &quot;&gt;=&quot;&amp;R[-1]C, R[-69]C[4]:R[-3]C[4])"><Data ss:Type="Number">7820864.7891291287</Data></Cell> Again not certain what format it should be.
mcsf commented 2015-06-10 23:52:31 +00:00 (Migrated from github.com)

Hey @mdumouchel — I'm not sure what the question is. Special characters have to be escaped with predefined XML entities whenever any bits are added to the document.

As for the output discrepancies between xlsx and Excel, you'll notice that Excel is preferring RC-style references over A1-style. As to why Excel is seemingly not escaping things like ampersands, I have no answer.

Again not certain what format it should be.

The important question is really whether <f>=SUMIF(B2:B2, &quot;&gt;=&quot;&amp;B2,D2:D2)</f> actually works for you once you open your document with Libre/Excel.

Hey @mdumouchel — I'm not sure what the question is. Special characters have to be escaped with predefined XML entities whenever any bits are added to the document. As for the output discrepancies between xlsx and Excel, you'll notice that Excel is preferring [RC-style references over A1-style](https://github.com/SheetJS/js-xlsx/issues/152#issuecomment-87761940). As to why Excel is seemingly not escaping things like ampersands, I have no answer. > Again not certain what format it should be. The important question is really whether `<f>=SUMIF(B2:B2, &quot;&gt;=&quot;&amp;B2,D2:D2)</f>` actually works for you once you open your document with Libre/Excel.
mdumouchel commented 2015-06-10 23:58:10 +00:00 (Migrated from github.com)

After reviewing it again and playing around with formatting found it was error in !ref and the cell was out of range. Since that was only thing I wasn't seeing I thought had something to do with the formula.

Thanks for the fast reply and the patch!

After reviewing it again and playing around with formatting found it was error in !ref and the cell was out of range. Since that was only thing I wasn't seeing I thought had something to do with the formula. Thanks for the fast reply and the patch!
AndrewFreemantle commented 2015-06-21 08:02:24 +00:00 (Migrated from github.com)

Thanks for the patch @mcsf 👍

Thanks for the patch @mcsf :+1:
mdumouchel commented 2015-10-02 17:17:43 +00:00 (Migrated from github.com)

Has this made it back into the repo? Is still the plan to get this back into the main repo is this forever forked?

Has this made it back into the repo? Is still the plan to get this back into the main repo is this forever forked?
michaelshopsin commented 2016-03-16 19:47:46 +00:00 (Migrated from github.com)

Could someone merge this change back into the main branch since it's required to save formulas?

Could someone merge this change back into the main branch since it's required to save formulas?
ngehlert commented 2016-05-13 16:34:50 +00:00 (Migrated from github.com)

any update on when formulas will be supported or at least the xlsx patch from @mcsf will be merged?

any update on when formulas will be supported or at least the xlsx patch from @mcsf will be merged?
SheetJSDev commented 2017-03-18 00:58:12 +00:00 (Migrated from github.com)

We finally settled on a convention: for non-array formulae just setting the f key will export the formula. For array formulae, set the topleft corner cell's F key to the array formula range. This works for range as well as point array formulae.

We updated the write test with an example

We finally settled on a convention: for non-array formulae just setting the `f` key will export the formula. For array formulae, set the topleft corner cell's `F` key to the array formula range. This works for range as well as point array formulae. We updated the [write test](https://github.com/SheetJS/js-xlsx/blob/master/tests/write.js#L82-L93) with an example
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#152
No description provided.