[Improvement] reading excel file from base64 should allow ignoring data uri information #2762

Closed
opened 2022-08-08 19:48:53 +00:00 by AbdelrahmanHafez · 4 comments
AbdelrahmanHafez commented 2022-08-08 19:48:53 +00:00 (Migrated from github.com)

When reading a base64 file, it would be nice if XLSX would ignore the leading data URI parameters.
So it can treat
data:text/csv;base64,TmFtZXMNCkhhZmV6DQpTYW0NCg==
the same way it treats
TmFtZXMNCkhhZmV6DQpTYW0NCg==

const workBook = XLSX.read(sheetInBase64, { type: 'base64' });
// workBook should be the same whether `sheetInBase64` starts with a dataURI scheme or without it.

I would like to create a PR with that fix, but the tests file seem to be automatically generated, and when I run npm test I simply get bits/some_file.js in the log, with no output telling me whether the tests passed or failed.

When reading a base64 file, it would be nice if XLSX would ignore the leading data URI parameters. So it can treat `data:text/csv;base64,TmFtZXMNCkhhZmV6DQpTYW0NCg==` the same way it treats `TmFtZXMNCkhhZmV6DQpTYW0NCg==` ``` const workBook = XLSX.read(sheetInBase64, { type: 'base64' }); // workBook should be the same whether `sheetInBase64` starts with a dataURI scheme or without it. ``` I would like to create a PR with that fix, but the tests file seem to be automatically generated, and when I run `npm test` I simply get `bits/some_file.js` in the log, with no output telling me whether the tests passed or failed.
SheetJSDev commented 2022-08-08 20:12:53 +00:00 (Migrated from github.com)

https://docs.sheetjs.com/docs/miscellany/contributing#os-specific-setup and https://docs.sheetjs.com/docs/miscellany/testing should cover how to get started. If this does not work, please leave a note and we'll try to redo it.

To run the tests manually, use make test. Under the hood it actually runs mocha -R spec -t 30000. The smaller test can be run with env WTF=1 make test_misc

RFC2397 covers the full set of data URI options. Should the parser handle data that is not base64-encoded?

https://docs.sheetjs.com/docs/miscellany/contributing#os-specific-setup and https://docs.sheetjs.com/docs/miscellany/testing should cover how to get started. If this does not work, please leave a note and we'll try to redo it. To run the tests manually, use `make test`. Under the hood it actually runs `mocha -R spec -t 30000`. The smaller test can be run with `env WTF=1 make test_misc` RFC2397 covers the full set of data URI options. Should the parser handle data that is not base64-encoded?
AbdelrahmanHafez commented 2022-08-08 20:23:39 +00:00 (Migrated from github.com)

Thanks. I can see that we have .ts and .js files in the bits directories.
Having both can always bring up the question as to which one we should modify.

Is there a reason to not only have the .ts files and compile the .js ones out of them?

Also, a lot of the code feels to be automatically generated (one letter variables), with unusual styling, which is quite limiting in terms of contribution. Is there a reason behind that?

I can see that there are a lot of compiled files in the git repo which can be quite distracting when developing and makes it difficult to see where the source files are. Perhaps it would help if we can have a single src folder that includes all the source code, and anything else can be .gitignored, and for publishing purposes, the build process can be done on a CI/CD pipeline.

What do you think?

Thanks. ~~I can see that we have `.ts` and `.js` files in the bits directories.~~ ~~Having both can always bring up the question as to which one we should modify.~~ ~~Is there a reason to not only have the `.ts` files and compile the `.js` ones out of them?~~ Also, a lot of the code _feels_ to be automatically generated (one letter variables), with unusual styling, which is quite limiting in terms of contribution. Is there a reason behind that? I can see that there are a lot of compiled files in the git repo which can be quite distracting when developing and makes it difficult to see where the source files are. Perhaps it would help if we can have a single `src` folder that includes all the source code, and anything else can be `.gitignore`d, and for publishing purposes, the build process can be done on a CI/CD pipeline. What do you think?
SheetJSDev commented 2022-08-08 21:02:44 +00:00 (Migrated from github.com)

(answering the question posed in the last comment)

This is a very old project. The first commits were in 2012, around the time that TypeScript was publicly released.

One of the primary goals is preserving compatibility with a wide array of JavaScript engines and runtimes. In 2022, PhotoShop and other Adobe products ship with ExtendScript, a runtime based on ES3 that is incompatible with modern JavaScript. In 2022, NetSuite still uses Rhino, a JS engine based on ES3. We still test against IE10 to ensure compatibility.

This is not a goal shared by the rest of the community. Countless projects that worked 6 months ago with modern tooling have broken in subtle ways because of drastic API changes and deep dependencies not properly pinning versions.

With regards to the short variable names, we've more or less had to abandon tooling for Photoshop. There are a number of tracking issues like https://github.com/mishoo/UglifyJS/issues/1144#issuecomment-227015448 and https://github.com/mishoo/UglifyJS/issues/1930#issuecomment-301225514 where other ecosystem developers decided that PhotoShop is not an important target. We were forced to adapt by opting for short variable names.

.

The main scripts are built with a cat step to ensure that we have complete control over the final scripts. Every line in xlsx.extendscript.js maps to a line in the source that we can directly observe and edit.

So when we explored modernizing the codebase, we wanted to incrementally shift over to TypeScript. esbuild presented a pathway to replacing individual files in the bits folder. For base64 the sequence is

  1. run make 04_base64.js from the modules directory. That creates modules/04_base64.js

  2. run make from the base directory. This copies modules/04_base64.js to bits/04_base64.js, inserting the file in the correct spot of the output.

We hope to eventually port everything over, but there is quite a bit of prototype manipulation and other tricks that need to be re-worked.

.

The actual docs site is built from https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/04-testing.md and https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/05-contributing.md -- please let us know how to improve those (and you can send a PR)

(answering the question posed in the last comment) This is a very old project. The first commits were in 2012, around the time that TypeScript was publicly released. One of the primary goals is preserving compatibility with a wide array of JavaScript engines and runtimes. In 2022, PhotoShop and other Adobe products ship with ExtendScript, a runtime based on ES3 that is incompatible with modern JavaScript. In 2022, NetSuite still uses Rhino, a JS engine based on ES3. We still test against IE10 to ensure compatibility. This is not a goal shared by the rest of the community. Countless projects that worked 6 months ago with modern tooling have broken in subtle ways because of drastic API changes and deep dependencies not properly pinning versions. With regards to the short variable names, we've more or less had to abandon tooling for Photoshop. There are a number of tracking issues like https://github.com/mishoo/UglifyJS/issues/1144#issuecomment-227015448 and https://github.com/mishoo/UglifyJS/issues/1930#issuecomment-301225514 where other ecosystem developers decided that PhotoShop is not an important target. We were forced to adapt by opting for short variable names. . The main scripts are built with a `cat` step to ensure that we have complete control over the final scripts. Every line in `xlsx.extendscript.js` maps to a line in the source that we can directly observe and edit. So when we explored modernizing the codebase, we wanted to incrementally shift over to TypeScript. `esbuild` presented a pathway to replacing individual files in the `bits` folder. For `base64` the sequence is 1) run `make 04_base64.js` from the `modules` directory. That creates `modules/04_base64.js` 2) run `make` from the base directory. This copies `modules/04_base64.js` to `bits/04_base64.js`, inserting the file in the correct spot of the output. We hope to eventually port everything over, but there is quite a bit of prototype manipulation and other tricks that need to be re-worked. . The actual docs site is built from https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/04-testing.md and https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/05-contributing.md -- please let us know how to improve those (and you can send a PR)
AbdelrahmanHafez commented 2022-08-08 21:07:20 +00:00 (Migrated from github.com)

I understand that having to support older environments can make the development experience a bit challenging.
I'm hoping I'll find some time to fix the challenges I faced while submitting a simple fix, and incrementally put such a nice project in a nicer state.

Thanks a lot for all the hard work. 👍

I understand that having to support older environments can make the development experience a bit challenging. I'm hoping I'll find some time to fix the challenges I faced while submitting a simple fix, and incrementally put such a nice project in a nicer state. Thanks a lot for all the hard work. 👍
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#2762
No description provided.