Multiple issues updating named items in a file #3199

Open
opened 2024-09-09 19:52:22 +00:00 by mikeblanton · 6 comments

Hello! I'm trying to use your library to update some named items in an Excel file and we're running into 3 problems. I'm happy to break these into separate tickets if that's helpful.

We're using version 0.23.0. We're running this in a Lambda on AWS, downloading the XLSX file from S3, making the updates, and then uploading the XLSX file to the same location.

After we download the file from S3, it gets saved to local storage, then we're opening it with

const _workbook = XLSX.readFile(_localPath, {type: 'file', cellNF: true, cellStyles: true, cellDates: true});

Once we make our updates, we write the file with

XLSX.write(_workbook, {bookType: 'xlsx', type: 'buffer', cellDates: true, cellStyles: true});

and then send that output to S3.

For our updates, there are a bunch of named items in the file. There are named items that an excel add-in of ours creates, and we go through and update a named id with a Name of ABCDE to have a Name of 12345 and then save the file. We also do some reading of the data in those named items and save them in our database as well. All of that appears to be working fine.

The issues...

  1. First thing I'm noticing, the output file is REALLY big. The original file is 10MB, the output file is 60MB.
  2. The original document does have 2 formulas that reference cells in other documents. When we open the new document, we get the error "We found a problem with some content in ..." and then are prompted to repair the workbook. The repaired file does not have references to these linked documents (those formulas are replaced with their values in the repaired document). I suspect this is what's causing the prompt to repair the document.
  3. Dates are not displaying properly. The date values are on a hidden worksheet and other cells on other worksheets reference those hidden sheet values. On the hidden worksheet on the new file, I'm seeing a value of 45291 instead of December 31, 2023.

Any thoughts as to what might be going on here? Happy to provide whatever other information might be needed.

Hello! I'm trying to use your library to update some named items in an Excel file and we're running into 3 problems. I'm happy to break these into separate tickets if that's helpful. We're using version 0.23.0. We're running this in a Lambda on AWS, downloading the XLSX file from S3, making the updates, and then uploading the XLSX file to the same location. After we download the file from S3, it gets saved to local storage, then we're opening it with ``` const _workbook = XLSX.readFile(_localPath, {type: 'file', cellNF: true, cellStyles: true, cellDates: true}); ``` Once we make our updates, we write the file with ``` XLSX.write(_workbook, {bookType: 'xlsx', type: 'buffer', cellDates: true, cellStyles: true}); ``` and then send that output to S3. For our updates, there are a bunch of named items in the file. There are named items that an excel add-in of ours creates, and we go through and update a named id with a Name of `ABCDE` to have a Name of `12345` and then save the file. We also do some reading of the data in those named items and save them in our database as well. All of that appears to be working fine. The issues... 1. First thing I'm noticing, the output file is REALLY big. The original file is 10MB, the output file is 60MB. 2. The original document does have 2 formulas that reference cells in other documents. When we open the new document, we get the error "We found a problem with some content in ..." and then are prompted to repair the workbook. The repaired file does not have references to these linked documents (those formulas are replaced with their values in the repaired document). I suspect this is what's causing the prompt to repair the document. 3. Dates are not displaying properly. The date values are on a hidden worksheet and other cells on other worksheets reference those hidden sheet values. On the hidden worksheet on the new file, I'm seeing a value of `45291` instead of `December 31, 2023`. Any thoughts as to what might be going on here? Happy to provide whatever other information might be needed.

I'm not a maintainer but might be able to help with 3. To convert Date values we found a separate package that helps (we could have implemented the logic ourselves too), I'm not sure if that's correct or best way but does work. Also, if it isn't I'd be interested to understand a better way

import { getJsDateFromExcel } from 'excel-date-to-js';

const dateValue = getJsDateFromExcel(row[field] as unknown as number);
I'm not a maintainer but might be able to help with 3. To convert Date values we found a separate package that helps (we could have implemented the logic ourselves too), I'm not sure if that's correct or best way but does work. Also, if it isn't I'd be interested to understand a better way ```ts import { getJsDateFromExcel } from 'excel-date-to-js'; const dateValue = getJsDateFromExcel(row[field] as unknown as number); ```
Author

I've updated my test source workbook and removed all of the linked formulas that were referencing external files. I've also got a simple script that is just opening the file and saving the file (no modifications).

  1. The repair problem is gone, so it does seem to be related to those linked formulas. Will the Pro build fix that?
  2. I've changed my load to be XLSX.readFile('./source.xlsx', {type: 'file', cellNF: true}) and my write to be XLSX.writeFileXLSX(_workbook, './output.xlsx', {bookType: 'xlsx', compression: true, cellDates: true}). The destination file is 20MB (which the source file is only 10MB), but it appears that the date problem is solved with one of those as well.

The remaining questions then come from:

  1. Where is the bloat coming from?
  2. Will the Pro edition also solve my linked formula problem? We're already considering that since we're losing styles on our sheets.
I've updated my test source workbook and removed all of the linked formulas that were referencing external files. I've also got a simple script that is just opening the file and saving the file (no modifications). 1. The repair problem is gone, so it does seem to be related to those linked formulas. Will the Pro build fix that? 2. I've changed my load to be `XLSX.readFile('./source.xlsx', {type: 'file', cellNF: true})` and my write to be `XLSX.writeFileXLSX(_workbook, './output.xlsx', {bookType: 'xlsx', compression: true, cellDates: true})`. The destination file is 20MB (which the source file is only 10MB), but it appears that the date problem is solved with one of those as well. The remaining questions then come from: 1. Where is the bloat coming from? 2. Will the Pro edition also solve my linked formula problem? We're already considering that since we're losing styles on our sheets.
Owner
  1. You can pass compression: true to enable compression (which can be optimized further, as shown in the example). If there are many repeated strings, bookSST: true de-duplicates strings at the expense of memory usage.

  2. External workbook references are not currently supported. They were added in our Pro builds thanks to some customer requests, but we will look into porting support here.

  3. The default options will be changed at some point, probably the 1.0 release, but you should pass cellNF: true to the read function to expose number formats.

1) You can pass `compression: true` to enable compression (which can be optimized further, as shown in the example). If there are many repeated strings, `bookSST: true` de-duplicates strings at the expense of memory usage. 2) External workbook references are not currently supported. They were added in our Pro builds thanks to some customer requests, but we will look into porting support here. 3) The default options will be changed at some point, probably the 1.0 release, but you should pass `cellNF: true` to the `read` function to expose number formats.
Author

Thanks! Can you provide a link to the demo you referenced that outlines the compression options? All I've found is compression: true.

Thanks! Can you provide a link to the demo you referenced that outlines the compression options? All I've found is `compression: true`.
Owner

bookSST is mentioned in passing in "Writing Options": https://docs.sheetjs.com/docs/api/write-options#writing-options


The destination file is 20MB (which the source file is only 10MB)

As discussed in the SheetJS Discord (you will need to join https://discord.gg/sheetjs first in order to see that link), there are a few reasons for the size difference:

  • The DEFLATE implementation is optimized for IE compatibility at the expense of runtime performance and at the expense of file size. If we were to drop IE compatibility and focus on latest Chrome / NodeJS, more performant algorithms could be used.

  • bookSST is disabled by default since the shared string table construction is memory intensive. Excel generates XLSX files with the shared string table by default.

  • There are a few places where SheetJS generates sub-optimal XML to work around some LibreOffice bugs.

We are keeping this issue open until we decide how to proceed.

`bookSST` is mentioned in passing in "Writing Options": https://docs.sheetjs.com/docs/api/write-options#writing-options --- > The destination file is 20MB (which the source file is only 10MB) As discussed in the [SheetJS Discord](https://discord.com/channels/1039771292859109436/1039788802220232744/1282824085650411560) (you will need to join https://discord.gg/sheetjs first in order to see that link), there are a few reasons for the size difference: - The DEFLATE implementation is optimized for IE compatibility at the expense of runtime performance and at the expense of file size. If we were to drop IE compatibility and focus on latest Chrome / NodeJS, more performant algorithms could be used. - `bookSST` is disabled by default since the shared string table construction is memory intensive. Excel generates XLSX files with the shared string table by default. - There are a few places where SheetJS generates sub-optimal XML to work around some LibreOffice bugs. We are keeping this issue open until we decide how to proceed.
Author

We definitely don't need IE compatibility. Is configuring another compression implementation something I can do on my side?

We definitely don't need IE compatibility. Is configuring another compression implementation something I can do on my side?
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#3199
No description provided.