Standalone Script Type Checks

This commit is contained in:
SheetJS 2024-08-18 23:56:07 -04:00
parent bbaf012efd
commit c761e870f7
4 changed files with 152 additions and 40 deletions

@ -90,9 +90,8 @@ For broad compatibility with JavaScript engines, the library is written using
ECMAScript 3 language dialect. A "shim" script provides implementations of
functions for older browsers and environments.
Due to SSL compatibility issues, older versions of IE will not be able to
use the CDN scripts directly. They should be downloaded and saved to a public
directory in the site:
Due to SSL compatibility issues, older versions of IE will not be able to use
the CDN scripts directly. They should be downloaded and saved to a public path:
<ul>
<li>Standalone: <a href={"https://cdn.sheetjs.com/xlsx-" + current + "/package/dist/xlsx.mini.min.js"}>{"https://cdn.sheetjs.com/xlsx-" + current + "/package/dist/xlsx.mini.min.js"}</a></li>
@ -118,12 +117,75 @@ importScripts("https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js"
importScripts("https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js");`}
</CodeBlock>
### Type Checker
:::danger VSCode Telemetry and Data Exfiltration
The official Microsoft builds of Visual Studio Code embed telemetry and send
information to external servers.
**[VSCodium](https://vscodium.com/) is a telemetry-free fork of VSCode.**
When writing code that may process personally identifiable information (PII),
the SheetJS team strongly encourages building VSCode from source or using IDEs
that do not exfiltrate data.
:::
The type checker integrated in VSCodium and VSCode do not currently provide type
hints when using the standalone build. Using the JSDoc `@type` directive coupled
with type imports, VSCodium will recognize the types:
![VSCodium types](pathname:///files/standalone-types.png)
<ol start="1">
<li><p>Download the types (<code parentName="pre">index.d.ts</code>) for
the desired version. The current version is available at <a href={"https://cdn.sheetjs.com/xlsx-" + current + "/package/types/index.d.ts"}>{"https://cdn.sheetjs.com/xlsx-" + current + "/package/types/index.d.ts"}</a></p></li>
</ol>
2) Rename the types file to `xlsx.d.ts`. It does not need to reside in the same
folder as the standalone script.
3) In the browser script referencing the global, prepend the following lines:
```js title="Prepend this fragment in each source file referencing the XLSX global"
/** @type {import("./xlsx")} */
const XLSX = globalThis.XLSX;
```
4) If the `xlsx.d.ts` file is in a different folder, change the argument to the
`import` method to reflect the relative path. For example, given the structure:
```text title="Folder Structure"
- /vendor
- /vendor/xlsx.ts
- /src
- /src/app.js
```
`/src/app.js` must refer to the types as `../vendor/xlsx`:
```js title="Preamble for /src/app.js when types are at /vendor/xlsx.d.ts"
// highlight-next-line
/** @type {import("../vendor/xlsx")} */
const XLSX = globalThis.XLSX;
```
The `.d.ts` file extension must be omitted.
:::warning pass
JSDoc types using the `@import` directive are not supported in `<script>` tags.
**This is a known bug with VSCode!**
:::
## ECMAScript Module Imports
:::caution pass
This section refers to imports in HTML pages using `script type="module"`.
This section refers to imports in HTML pages using `<script type="module">`.
The ["Frameworks and Bundlers"](/docs/getting-started/installation/frameworks)
section covers imports in projects using bundlers (ViteJS) or frameworks

@ -367,6 +367,9 @@ function SheetJSToTFJSCSV() {
#### NodeJS Demo
<details>
<summary><b>Demo Steps</b> (click to show)</summary>
0) Create a new project:
```bash
@ -393,6 +396,8 @@ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz @tensor
node SheetJSTF.js
```
</details>
#### Kaioken Demo
:::tip pass
@ -404,6 +409,9 @@ The SheetJS team strongly recommends using Kaioken in projects using TF.js.
:::
<details>
<summary><b>Demo Steps</b> (click to show)</summary>
1) Create a new site.
```bash
@ -470,6 +478,8 @@ The process will display a URL:
Open the displayed URL (`http://localhost:5173/` in this example) with a web
browser. Click the "Click to Run" button to see the results.
</details>
## JS Array Interchange
[The official Linear Regression tutorial](https://www.tensorflow.org/js/tutorials/training/linear_regression)
@ -508,17 +518,19 @@ Differences from the official example are highlighted below:
*/
async function getData() {
// highlight-start
/* fetch file */
/* fetch file and pull data into an ArrayBuffer */
const carsDataResponse = await fetch('https://docs.sheetjs.com/cd.xls');
/* get file data (ArrayBuffer) */
const carsDataAB = await carsDataResponse.arrayBuffer();
/* parse */
const carsDataWB = XLSX.read(carsDataAB);
/* get first worksheet */
const carsDataWS = carsDataWB.Sheets[carsDataWB.SheetNames[0]];
/* generate array of JS objects */
const carsData = XLSX.utils.sheet_to_json(carsDataWS);
// highlight-end
const cleaned = carsData.map(car => ({
mpg: car.Miles_per_Gallon,
horsepower: car.Horsepower,
@ -560,8 +572,8 @@ When a `tensor2d` can be exported, it will look different from the spreadsheet:
```js
const data_set_2d = [
[5.1, 4.9, ...],
[3.5, 3, ...],
[5.1, 4.9, /*...*/],
[3.5, 3, /*...*/],
// ...
];
```
@ -572,49 +584,85 @@ This is the transpose of how people use spreadsheets!
The `aoa_to_sheet` method[^11] can generate a worksheet from an array of arrays.
ML libraries typically provide APIs to pull an array of arrays, but it will be
transposed. To export multiple data sets, the data should be transposed:
transposed. The following function transposes arrays of normal and typed arrays:
```js title="Transpose array of arrays"
/* `data` is an array of (typed or normal) arrays */
function transpose_array_of_arrays(data) {
const aoa = [];
for(let i = 0; i < data.length; ++i) {
for(let j = 0; j < data[i].length; ++j) {
if(!aoa[j]) aoa[j] = [];
aoa[j][i] = data[i][j];
}
}
return aoa;
}
```
It is recommended to create a new worksheet from the header row and add the
transposed data using the `sheet_add_aoa` method. The option `origin: -1`[^12]
ensures that the data is written after the headers:
```js
/* assuming data is an array of typed arrays */
const aoa = [];
for(let i = 0; i < data.length; ++i) {
for(let j = 0; j < data[i].length; ++j) {
if(!aoa[j]) aoa[j] = [];
aoa[j][i] = data[i][j];
}
}
/* aoa can be directly converted to a worksheet object */
const ws = XLSX.utils.aoa_to_sheet(aoa);
const headers = [ "sepal length", "sepal width"];
const data_set_2d = [
[5.1, 4.9, /*...*/],
[3.5, 3, /*...*/],
// ...
];
// highlight-start
/* transpose data */
const transposed_data = transpose_array_of_arrays(data_set_2d);
// highlight-end
/* create worksheet from headers */
const ws = XLSX.utils.aoa_to_sheet([ headers ])
/* add the transposed data starting on row 2 */
XLSX.utils.sheet_add_aoa(ws, transposed_data, { origin: 1 });
```
### Importing Data from a Spreadsheet
`sheet_to_json` with the option `header: 1`[^12] will generate a row-major array
`sheet_to_json` with the option `header: 1`[^13] will generate a row-major array
of arrays that can be transposed. However, it is more efficient to walk the
sheet manually:
sheet manually. The following function accepts a number of header rows to skip:
```js
/* find worksheet range */
const range = XLSX.utils.decode_range(ws['!ref']);
const out = []
/* walk the columns */
for(let C = range.s.c; C <= range.e.c; ++C) {
/* create the typed array */
const ta = new Float32Array(range.e.r - range.s.r + 1);
/* walk the rows */
for(let R = range.s.r; R <= range.e.r; ++R) {
/* find the cell, skip it if the cell isn't numeric or boolean */
const cell = ws["!data"] ? (ws["!data"][R]||[])[C] : ws[XLSX.utils.encode_cell({r:R, c:C})];
if(!cell || cell.t != 'n' && cell.t != 'b') continue;
/* assign to the typed array */
ta[R - range.s.r] = cell.v;
```js title="Worksheet to transposed array of typed arrays"
function sheet_to_array_of_f32(ws, header_row_count) {
const out = [];
/* find worksheet range */
const range = XLSX.utils.decode_range(ws['!ref']);
/* skip specified number of headers */
range.s.r += (header_row_count | 0);
/* walk the columns */
for(let C = range.s.c; C <= range.e.c; ++C) {
/* create the typed array */
const ta = new Float32Array(range.e.r - range.s.r + 1);
/* walk the rows */
for(let R = range.s.r; R <= range.e.r; ++R) {
/* find the cell, skip it if the cell isn't numeric or boolean */
const cell = ws["!data"] ? (ws["!data"][R]||[])[C] : ws[XLSX.utils.encode_cell({r:R, c:C})];
if(!cell || cell.t != 'n' && cell.t != 'b') continue;
/* assign to the typed array */
ta[R - range.s.r] = cell.v;
}
/* add typed array to output */
out.push(ta);
}
out.push(ta);
return out;
}
```
If the data set has a header row, the loop can be adjusted to skip those rows.
### TF.js Tensors
A single `Array#map` can pull individual named fields from the result, which
@ -674,4 +722,5 @@ const worksheet = XLSX.utils.aoa_to_sheet(aoa);
[^9]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^10]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^11]: See [`aoa_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)
[^12]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^12]: See [the `origin` option of `sheet_add_aoa` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)
[^13]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)

@ -191,6 +191,7 @@ This demo was tested in the following environments:
| macOS 14.4 | `darwin-x64` | `29.1.4` | 2024-03-15 |
| macOS 14.5 | `darwin-arm` | `30.0.8` | 2024-05-28 |
| Windows 10 | `win10-x64` | `31.2.0` | 2024-07-12 |
| Windows 11 | `win11-x64` | `31.2.0` | 2024-08-18 |
| Windows 11 | `win11-arm` | `30.0.8` | 2024-05-28 |
| Linux (HoloOS) | `linux-x64` | `29.1.4` | 2024-03-21 |
| Linux (Debian) | `linux-arm` | `30.0.8` | 2024-05-28 |

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB