This commit is contained in:
SheetJS 2024-01-03 01:47:00 -05:00
parent e3d16d8108
commit d6abde0e8e
47 changed files with 1453 additions and 493 deletions

1
.gitignore vendored

@ -3,3 +3,4 @@
package-lock.json
pnpm-lock.yaml
/docs
node_modules

@ -0,0 +1,307 @@
---
title: Sheets in DanfoJS
sidebar_label: DanfoJS
pagination_prev: demos/index
pagination_next: demos/frontend/index
---
<head>
<script src="https://cdn.jsdelivr.net/npm/danfojs@1.1.2/lib/bundle.min.js"></script>
</head>
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
[DanfoJS](https://danfo.jsdata.org/) is a library for processing structured
data. It uses SheetJS under the hood for reading and writing spreadsheets.
This demo covers details elided in the official DanfoJS documentation.
:::note Tested Deployments
This example was last tested on 2024 January 03 against DanfoJS 1.1.2.
:::
:::info Browser integration
The live demos on this page include the DanfoJS browser bundle:
```html
<script src="https://cdn.jsdelivr.net/npm/danfojs@1.1.2/lib/bundle.min.js"></script>
```
There are known issues with the documentation generator. If a demo explicitly
prints "RELOAD THIS PAGE", please reload or refresh the page.
:::
## DataFrames and Worksheets
The DanfoJS `DataFrame`[^1] represents two-dimensional tabular data. It is the
starting point for most DanfoJS data processing tasks. A `DataFrame` typically
corresponds to one SheetJS worksheet[^2].
<table><thead><tr><th>Spreadsheet</th><th>DanfoJS DataFrame</th></tr></thead><tbody><tr><td>
![`pres.xlsx` data](pathname:///pres.png)
</td><td>
```
╔════╤═══════════════╤═══════╗
║ │ Name │ Index ║
╟────┼───────────────┼───────╢
║ 0 │ Bill Clinton │ 42 ║
╟────┼───────────────┼───────╢
║ 1 │ GeorgeW Bush │ 43 ║
╟────┼───────────────┼───────╢
║ 2 │ Barack Obama │ 44 ║
╟────┼───────────────┼───────╢
║ 3 │ Donald Trump │ 45 ║
╟────┼───────────────┼───────╢
║ 4 │ Joseph Biden │ 46 ║
╚════╧═══════════════╧═══════╝
```
</td></tr></tbody></table>
## DanfoJS SheetJS Integration
:::note pass
The official documentation inconsistently names the library object `danfo` and
`dfd`. Since `dfd` is the browser global, the demos use the name `dfd`.
:::
Methods to read and write spreadsheets are attached to the main `dfd` object.
### Importing DataFrames
`readExcel`[^3] accepts two arguments: source data and options.
The source data must be a `string` or `File` object. Strings are interpreted as
URLs while `File` objects are treated as data.
_Selecting a Worksheet_
DanfoJS will generate a dataframe from one worksheet. The parser normally uses
the first worksheet. The `sheet` property of the options object controls the
selected worksheet. It is expected to be a zero-indexed number:
```js
const first_sheet = await dfd.readExcel(url, {sheet: 0});
const second_sheet = await dfd.readExcel(url, {sheet: 1});
```
_More Parsing Options_
The `parsingOptions` property of the options argument is passed directly to the
SheetJS `read` method[^4].
For example, the `sheetRows` property controls how many rows are extracted from
larger worksheets. To pull 3 data rows, `sheetRows` must be set to 4:
```js
const first_three_rows = await dfd.readExcel(url, { parsingOptions: {
// see https://docs.sheetjs.com/docs/api/parse-options for details
sheetRows: 4
} });
```
#### URL source
The following example fetches a [test file](https://sheetjs.com/pres.xlsx),
parses with SheetJS and generates a DanfoJS dataframe.
```jsx live
function DanfoReadExcelURL() {
const [text, setText] = React.useState("");
React.useEffect(() => { (async() => {
if(typeof dfd === "undefined") return setText("RELOAD THIS PAGE!");
const df = await dfd.readExcel("https://sheetjs.com/pres.xlsx");
setText("" + df.head());
})(); }, []);
return (<pre>{text}</pre>);
}
```
#### File source
The following example uses a file input element. The "File API"[^5] section of
the "Local File Access" demo covers the browser API in more detail.
```jsx live
function DanfoReadExcelFile() {
const [text, setText] = React.useState("Select a spreadsheet");
return (<><pre>{text}</pre><input type="file" onChange={async(e) => {
if(typeof dfd === "undefined") return setText("RELOAD THIS PAGE!");
/* get first file */
const file = e.target.files[0];
/* create dataframe and pretty-print the first 10 rows */
const df = await dfd.readExcel(file);
setText("" + df.head());
}}/></>);
}
```
### Exporting DataFrames
`toExcel`[^6] accepts two arguments: dataframe and options. Under the hood, it
uses the SheetJS `writeFile` method[^7].
_Exported File Name_
The relevant property for the file name depends on the platform:
| Platform | Property |
|:---------|:-----------|
| NodeJS | `filePath` |
| Browser | `fileName` |
The exporter will deduce the desired file format from the file extension.
_Worksheet Name_
The `sheetName` property specifies the name of the worksheet in the workbook:
```js
dfd.toExcel(df, {
fileName: "test.xlsx", // generate `test.xlsx`
// highlight-next-line
sheetName: "Export" // The name of the worksheet will be "Export"
});
```
:::caution pass
The DanfoJS integration forces the `.xlsx` file extension. Exporting to other
file formats will require [low-level operations](#generating-files).
:::
_More Writing Options_
The `writingOptions` property of the options argument is passed directly to the
SheetJS `writeFile` method[^8].
For example, the `compression` property enables ZIP compression for XLSX and
other formats:
```js
dfd.toExcel(df, {fileName: "export.xlsx", writingOptions: {
// see https://docs.sheetjs.com/docs/api/write-options for details
compression: true
}});
```
#### Export to File
The following example exports a sample dataframe to a XLSX spreadsheet.
```jsx live
function DanfoToExcel() {
if(typeof dfd === "undefined") return (<b>RELOAD THIS PAGE</b>);
/* sample dataframe */
const df = new dfd.DataFrame([{Sheet:1,JS:2},{Sheet:3,JS:4}]);
return ( <><button onClick={async() => {
/* dfd.toExcel calls the SheetJS `writeFile` method */
dfd.toExcel(df, {fileName: "SheetJSDanfoJS.xlsx", writingOptions: {
compression: true
}});
}}>Click to Export</button><pre>{"Data:\n"+df.head()}</pre></> );
}
```
## Low-Level Operations
DanfoJS and SheetJS provide methods for processing arrays of objects.
```mermaid
flowchart LR
ws((SheetJS\nWorksheet))
aoo[[array of\nobjects]]
df[(DanfoJS\nDataFrame)]
ws --> |sheet_to_json\n\n| aoo
aoo --> |\njson_to_sheet| ws
df --> |\ndfd.toJSON| aoo
aoo --> |new DataFrame\n\n| df
```
### Creating DataFrames
The `DataFrame` constructor[^9] creates `DataFrame` objects from arrays of
objects. Given a SheetJS worksheet object, the `sheet_to_json` method[^10]
generates compatible arrays of objects:
```js
function ws_to_df(ws) {
const aoo = XLSX.utils.sheet_to_json(ws);
return new dfd.DataFrame(aoo);
}
```
### Generating Files
`toJSON`[^11] accepts two arguments: dataframe and options.
The `format` key of the `options` argument dictates the result layout. The
`column` layout generates an array of objects in row-major order. The SheetJS
`json_to_sheet`[^12] method can generate a worksheet object from the result:
```js
function df_to_ws(df) {
const aoo = dfd.toJSON(df, { format: "column" });
return XLSX.utils.json_to_sheet(aoo);
}
```
The SheetJS `book_new` method creates a workbook object from the worksheet[^13]
and the `writeFile` method[^14] will generate the file:
```js
const ws = df_to_ws(df);
const wb = XLSX.utils.book_new(ws, "Export");
XLSX.writeFile(wb, "SheetJSDanfoJS.xlsb", { compression: true });
```
The following demo exports a sample dataframe to XLSB. This operation is not
supported by the DanfoJS `toExcel` method since that method enforces XLSX.
```jsx live
function DanfoToXLS() {
if(typeof dfd === "undefined") return (<b>RELOAD THIS PAGE</b>);
/* sample dataframe */
const df = new dfd.DataFrame([{Sheet:1,JS:2},{Sheet:3,JS:4}]);
return ( <><button onClick={async() => {
/* generate worksheet */
const aoo = dfd.toJSON(df, { format: "column" });
const ws = XLSX.utils.json_to_sheet(aoo);
/* generate workbook */
const wb = XLSX.utils.book_new(ws, "Export");
/* write to XLS */
XLSX.writeFile(wb, "SheetJSDanfoJS.xlsb", { compression: true });
}}>Click to Export</button><pre>{"Data:\n"+df.head()}</pre></> );
}
```
[^1]: See ["Dataframe"](https://danfo.jsdata.org/api-reference/dataframe) in the DanfoJS documentation
[^2]: See ["Sheet Objects"](/docs/csf/sheet)
[^3]: See ["danfo.readExcel"](https://danfo.jsdata.org/api-reference/input-output/danfo.read_excel) in the DanfoJS documentation.
[^4]: See ["Reading Files"](/docs/api/parse-options/#parsing-options) for the full list of parsing options.
[^5]: See ["File API" in "Local File Access"](/docs/demos/local/file#file-api) for more details.
[^6]: See ["danfo.toExcel"](https://danfo.jsdata.org/api-reference/input-output/danfo.to_excel) in the DanfoJS documentation.
[^7]: See [`writeFile` in "Writing Files"](/docs/api/write-options)
[^8]: See ["Writing Files"](/docs/api/write-options/#writing-options) for the full list of writing options.
[^9]: See ["Creating a DataFrame"](https://danfo.jsdata.org/api-reference/dataframe/creating-a-dataframe) in the DanfoJS documentation.
[^10]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^11]: See ["danfo.toJSON"](https://danfo.jsdata.org/api-reference/input-output/danfo.to_json) in the DanfoJS documentation.
[^12]: See [`json_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-objects-input)
[^13]: See [`book_new` in "Utilities"](/docs/api/utilities/wb)
[^14]: See [`writeFile` in "Writing Files"](/docs/api/write-options)

@ -0,0 +1,440 @@
---
title: Sheets in TensorFlow
sidebar_label: TensorFlow.js
pagination_prev: demos/index
pagination_next: demos/frontend/index
---
<head>
<script src="https://docs.sheetjs.com/tfjs/tf.min.js"></script>
</head>
[TensorFlow.js](https://www.tensorflow.org/js) (shortened to TF.js) is a library
for machine learning in JavaScript.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses TensorFlow.js and SheetJS to process data in spreadsheets. We'll
explore how to load spreadsheet data into TF.js datasets and how to export
results back to spreadsheets.
- ["CSV Data Interchange"](#csv-data-interchange) uses SheetJS to process sheets
and generate CSV data that TF.js can import.
- ["JSON Data Interchange"](#json-data-interchange) uses SheetJS to process
sheets and generate rows of objects that can be post-processed.
:::info pass
Live code blocks in this page use the TF.js `4.14.0` standalone build.
For use in web frameworks, the `@tensorflow/tfjs` module should be used.
For use in NodeJS, the native bindings module is `@tensorflow/tfjs-node`.
:::
:::note Tested Deployments
Each browser demo was tested in the following environments:
| Browser | TF.js version | Date |
|:------------|:--------------|:-----------|
| Chrome 119 | `4.14.0` | 2023-12-09 |
| Safari 16.6 | `4.14.0` | 2023-12-09 |
:::
## CSV Data Interchange
`tf.data.csv`[^1] generates a Dataset from CSV data. The function expects a URL.
:::note pass
When this demo was last tested, there was no direct method to pass a CSV string
to the underlying parser.
:::
Fortunately blob URLs are supported.
```mermaid
flowchart LR
ws((SheetJS\nWorksheet))
csv(CSV\nstring)
url{{Data\nURL}}
dataset[(TF.js\nDataset)]
ws --> |sheet_to_csv\nSheetJS| csv
csv --> |JavaScript\nAPIs| url
url --> |tf.data.csv\nTensorFlow.js| dataset
```
The SheetJS `sheet_to_csv` method[^2] generates a CSV string from a worksheet
object. Using standard JavaScript techniques, a blob URL can be constructed:
```js
function worksheet_to_csv_url(worksheet) {
/* generate CSV */
const csv = XLSX.utils.sheet_to_csv(worksheet);
/* CSV -> Uint8Array -> Blob */
const u8 = new TextEncoder().encode(csv);
const blob = new Blob([u8], { type: "text/csv" });
/* generate a blob URL */
return URL.createObjectURL(blob);
}
```
### CSV Demo
This demo shows a simple model fitting using the "cars" dataset from TensorFlow.
The [sample XLS file](https://sheetjs.com/data/cd.xls) contains the data. The
data processing mirrors the official "Making Predictions from 2D Data" demo[^3].
```mermaid
flowchart LR
file[(Remote\nFile)]
subgraph SheetJS Operations
ab[(Data\nBytes)]
wb(((SheetJS\nWorkbook)))
ws((SheetJS\nWorksheet))
csv(CSV\nstring)
end
subgraph TensorFlow.js Operations
url{{Data\nURL}}
dataset[(TF.js\nDataset)]
results((Results))
end
file --> |fetch\n\n| ab
ab --> |read\n\n| wb
wb --> |select\nsheet| ws
ws --> |sheet_to_csv\n\n| csv
csv --> |JS\nAPI| url
url --> |tf.data.csv\nTF.js| dataset
dataset --> |fitDataset\nTF.js| results
```
The demo builds a model for predicting MPG from Horsepower data. It:
- fetches <https://sheetjs.com/data/cd.xls>
- parses the data with the SheetJS `read`[^4] method
- selects the first worksheet[^5] and converts to CSV using `sheet_to_csv`[^6]
- generates a blob URL from the CSV text
- generates a TF.js dataset with `tf.data.csv`[^7] and selects data columns
- builds a model and trains with `fitDataset`[^8]
- predicts MPG from a set of sample inputs and displays results in a table
<details><summary><b>Live Demo</b> (click to show)</summary>
:::caution pass
In some test runs, the results did not make sense given the underlying data.
The dependent and independent variables are expected to be anti-correlated.
**This is a known issue in TF.js and affects the official demos**
:::
:::caution pass
If the live demo shows a message
```
ReferenceError: tf is not defined
```
please refresh the page. This is a known bug in the documentation generator.
:::
```jsx live
function SheetJSToTFJSCSV() {
const [output, setOutput] = React.useState("");
const [results, setResults] = React.useState([]);
const [disabled, setDisabled] = React.useState(false);
function worksheet_to_csv_url(worksheet) {
/* generate CSV */
const csv = XLSX.utils.sheet_to_csv(worksheet);
/* CSV -> Uint8Array -> Blob */
const u8 = new TextEncoder().encode(csv);
const blob = new Blob([u8], { type: "text/csv" });
/* generate a blob URL */
return URL.createObjectURL(blob);
}
const doit = React.useCallback(async () => {
setResults([]); setOutput(""); setDisabled(true);
try {
/* fetch file */
const f = await fetch("https://sheetjs.com/data/cd.xls");
const ab = await f.arrayBuffer();
/* parse file and get first worksheet */
const wb = XLSX.read(ab);
const ws = wb.Sheets[wb.SheetNames[0]];
/* generate blob URL */
const url = worksheet_to_csv_url(ws);
/* feed to tf.js */
const dataset = tf.data.csv(url, {
hasHeader: true,
configuredColumnsOnly: true,
columnConfigs:{
"Horsepower": {required: false, default: 0},
"Miles_per_Gallon":{required: false, default: 0, isLabel:true}
}
});
/* pre-process data */
let flat = dataset
.map(({xs,ys}) =>({xs: Object.values(xs), ys: Object.values(ys)}))
.filter(({xs,ys}) => [...xs,...ys].every(v => v>0));
/* normalize manually :( */
let minX = Infinity, maxX = -Infinity, minY = Infinity, maxY = -Infinity;
await flat.forEachAsync(({xs, ys}) => {
minX = Math.min(minX, xs[0]); maxX = Math.max(maxX, xs[0]);
minY = Math.min(minY, ys[0]); maxY = Math.max(maxY, ys[0]);
});
flat = flat.map(({xs, ys}) => ({xs:xs.map(v => (v-minX)/(maxX - minX)),ys:ys.map(v => (v-minY)/(maxY-minY))}));
flat = flat.batch(32);
/* build and train model */
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [1], units: 1}));
model.compile({ optimizer: tf.train.sgd(0.000001), loss: 'meanSquaredError' });
await model.fitDataset(flat, { epochs: 100, callbacks: { onEpochEnd: async (epoch, logs) => {
setOutput(`${epoch}:${logs.loss}`);
}}});
/* predict values */
const inp = tf.linspace(0, 1, 9);
const pred = model.predict(inp);
const xs = await inp.dataSync(), ys = await pred.dataSync();
setResults(Array.from(xs).map((x, i) => [ x * (maxX - minX) + minX, ys[i] * (maxY - minY) + minY ]));
setOutput("");
} catch(e) { setOutput(`ERROR: ${String(e)}`); } finally { setDisabled(false);}
});
return ( <>
<button onClick={doit} disabled={disabled}>Click to run</button><br/>
{output && <pre>{output}</pre> || <></>}
{results.length && <table><thead><tr><th>Horsepower</th><th>MPG</th></tr></thead><tbody>
{results.map((r,i) => <tr key={i}><td>{r[0]}</td><td>{r[1].toFixed(2)}</td></tr>)}
</tbody></table> || <></>}
</> );
}
```
</details>
## JS Array Interchange
[The official Linear Regression tutorial](https://www.tensorflow.org/js/tutorials/training/linear_regression)
loads data from a JSON file:
```json
[
{
"Name": "chevrolet chevelle malibu",
"Miles_per_Gallon": 18,
"Cylinders": 8,
"Displacement": 307,
"Horsepower": 130,
"Weight_in_lbs": 3504,
"Acceleration": 12,
"Year": "1970-01-01",
"Origin": "USA"
},
// ...
]
```
In real use cases, data is stored in [spreadsheets](https://sheetjs.com/data/cd.xls)
![cd.xls screenshot](pathname:///files/cd.png)
Following the tutorial, the data fetching method can be adapted to handle arrays
of objects, such as those generated by the SheetJS `sheet_to_json` method[^9].
Differences from the official example are highlighted below:
```js
/**
* Get the car data reduced to just the variables we are interested
* and cleaned of missing data.
*/
async function getData() {
// highlight-start
/* fetch file */
const carsDataResponse = await fetch('https://sheetjs.com/data/cd.xls');
/* get file data (ArrayBuffer) */
const carsDataAB = await carsDataResponse.arrayBuffer();
/* parse */
const carsDataWB = XLSX.read(carsDataAB);
/* get first worksheet */
const carsDataWS = carsDataWB.Sheets[carsDataWB.SheetNames[0]];
/* generate array of JS objects */
const carsData = XLSX.utils.sheet_to_json(carsDataWS);
// highlight-end
const cleaned = carsData.map(car => ({
mpg: car.Miles_per_Gallon,
horsepower: car.Horsepower,
}))
.filter(car => (car.mpg != null && car.horsepower != null));
return cleaned;
}
```
## Low-Level Operations
### Data Transposition
A typical dataset in a spreadsheet will start with one header row and represent
each data record in its own row. For example, the Iris dataset might look like
![Iris dataset](pathname:///files/iris.png)
The SheetJS `sheet_to_json` method[^10] will translate worksheet objects into an
array of row objects:
```js
var aoo = [
{"sepal length": 5.1, "sepal width": 3.5, ...},
{"sepal length": 4.9, "sepal width": 3, ...},
...
];
```
TF.js and other libraries tend to operate on individual columns, equivalent to:
```js
var sepal_lengths = [5.1, 4.9, ...];
var sepal_widths = [3.5, 3, ...];
```
When a `tensor2d` can be exported, it will look different from the spreadsheet:
```js
var data_set_2d = [
[5.1, 4.9, ...],
[3.5, 3, ...],
...
]
```
This is the transpose of how people use spreadsheets!
### Exporting Datasets to a Worksheet
The `aoa_to_sheet` method[^11] can generate a worksheet from an array of arrays.
ML libraries typically provide APIs to pull an array of arrays, but it will be
transposed. To export multiple data sets, the data should be transposed:
```js
/* assuming data is an array of typed arrays */
var aoa = [];
for(var i = 0; i < data.length; ++i) {
for(var j = 0; j < data[i].length; ++j) {
if(!aoa[j]) aoa[j] = [];
aoa[j][i] = data[i][j];
}
}
/* aoa can be directly converted to a worksheet object */
var ws = XLSX.utils.aoa_to_sheet(aoa);
```
### Importing Data from a Spreadsheet
`sheet_to_json` with the option `header:1`[^12] will generate a row-major array
of arrays that can be transposed. However, it is more efficient to walk the
sheet manually:
```js
/* find worksheet range */
var range = XLSX.utils.decode_range(ws['!ref']);
var out = []
/* walk the columns */
for(var C = range.s.c; C <= range.e.c; ++C) {
/* create the typed array */
var ta = new Float32Array(range.e.r - range.s.r + 1);
/* walk the rows */
for(var R = range.s.r; R <= range.e.r; ++R) {
/* find the cell, skip it if the cell isn't numeric or boolean */
var cell = ws["!data"] ? (ws["!data"][R]||[])[C] : ws[XLSX.utils.encode_cell({r:R, c:C})];
if(!cell || cell.t != 'n' && cell.t != 'b') continue;
/* assign to the typed array */
ta[R - range.s.r] = cell.v;
}
out.push(ta);
}
```
If the data set has a header row, the loop can be adjusted to skip those rows.
### TF.js Tensors
A single `Array#map` can pull individual named fields from the result, which
can be used to construct TensorFlow.js tensor objects:
```js
const aoo = XLSX.utils.sheet_to_json(worksheet);
const lengths = aoo.map(row => row["sepal length"]);
const tensor = tf.tensor1d(lengths);
```
`tf.Tensor` objects can be directly transposed using `transpose`:
```js
var aoo = XLSX.utils.sheet_to_json(worksheet);
// "x" and "y" are the fields we want to pull from the data
var data = aoo.map(row => ([row["x"], row["y"]]));
// create a tensor representing two column datasets
var tensor = tf.tensor2d(data).transpose();
// individual columns can be accessed
var col1 = tensor.slice([0,0], [1,tensor.shape[1]]).flatten();
var col2 = tensor.slice([1,0], [1,tensor.shape[1]]).flatten();
```
For exporting, `stack` can be used to collapse the columns into a linear array:
```js
/* pull data into a Float32Array */
var result = tf.stack([col1, col2]).transpose();
var shape = tensor.shape;
var f32 = tensor.dataSync();
/* construct an array of arrays of the data in spreadsheet order */
var aoa = [];
for(var j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(var i = 0; i < shape[1]; ++i) aoa[j][i] = f32[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);
/* generate worksheet */
var worksheet = XLSX.utils.aoa_to_sheet(aoa);
```
[^1]: See [`tf.data.csv`](https://js.tensorflow.org/api/latest/#data.csv) in the TensorFlow.js documentation
[^2]: See [`sheet_to_csv` in "CSV and Text"](/docs/api/utilities/csv#delimiter-separated-output)
[^3]: The ["Making Predictions from 2D Data" example](https://codelabs.developers.google.com/codelabs/tfjs-training-regression/) uses a hosted JSON file. The [sample XLS file](https://sheetjs.com/data/cd.xls) includes the same data.
[^4]: See [`read` in "Reading Files"](/docs/api/parse-options)
[^5]: See ["Workbook Object"](/docs/csf/book)
[^6]: See [`sheet_to_csv` in "CSV and Text"](/docs/api/utilities/csv#delimiter-separated-output)
[^7]: See [`tf.data.csv`](https://js.tensorflow.org/api/latest/#data.csv) in the TensorFlow.js documentation
[^8]: See [`tf.LayersModel.fitDataset`](https://js.tensorflow.org/api/latest/#tf.LayersModel.fitDataset) in the TensorFlow.js documentation
[^9]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^10]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)
[^11]: See [`aoa_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)
[^12]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output)

@ -0,0 +1,4 @@
{
"label": "Math and Statistics",
"position": 1
}

@ -0,0 +1,412 @@
---
title: Math and Statistics
pagination_prev: demos/index
pagination_next: demos/frontend/index
---
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
With full support for IEEE754 doubles and singles, JavaScript is an excellent
language for mathematics and statistical analysis. It has also proven to be a
viable platform for machine learning.
## Demos
Demos for various libraries are included in separate pages:
<ul>{useCurrentSidebarCategory().items.map((item, index) => {
const listyle = (item.customProps?.icon) ? {
listStyleImage: `url("${item.customProps.icon}")`
} : {};
return (<li style={listyle} {...(item.customProps?.class ? {className: item.customProps.class}: {})}>
<a href={item.href}>{item.label}</a>{item.customProps?.summary && (" - " + item.customProps.summary)}
</li>);
})}</ul>
## Typed Arrays
Modern JavaScript math and statistics libraries typically use `Float64Array` or
`Float32Array` objects to efficiently store data variables.
<details><summary><b>Technical details</b> (click to show)</summary>
Under the hood, `ArrayBuffer` objects represent raw binary data. "Typed arrays"
such as `Float64Array` and `Float32Array` are objects designed for efficient
interpretation and mutation of `ArrayBuffer` data.
:::note pass
`ArrayBuffer` are roughly analogous to heap-allocated memory. Typed arrays
behave like typed pointers.
**JavaScript**
```js
const buf = new ArrayBuffer(16);
const dbl = new Float64Array(buf);
dbl[1] = 3.14159;
const u8 = new Uint8Array(buf);
for(let i = 0; i < 8; ++i)
console.log(u8[i+8]);
```
**Equivalent C**
```c
void *const buf = malloc(16);
double *const dbl = (double *)buf;
dbl[1] = 3.14159;
uint8_t *const u8 = (uint8_t *)buf;
for(uint8_t i = 0; i < 8; ++i)
printf("%u\n", u8[i+8]);
```
:::
</details>
### Reading from Sheets
Each typed array class has a `from` static method for converting data into a
typed array. `Float64Array.from` returns a `double` typed array (8 bytes per
value) and `Float32Array.from` generates a `float` typed array (4 bytes).
```js
const column_f32 = Float32Array.from(arr); // 4-byte floats
const column_f64 = Float64Array.from(arr); // 8-byte doubles
```
:::info pass
Values in the array will be coerced to the relevant data type. Unsupported
entries will be converted to quiet `NaN` values.
:::
#### Extracting Worksheet Data
The SheetJS `sheet_to_json`[^1] method with the option `header: 1`[^2] generates
an array of arrays from a worksheet object. The result is in row-major order:
```js
const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});
```
#### Categorical Variables
Dichotomous variables are commonly represented as spreadsheet `TRUE` or `FALSE`.
The SheetJS `sheet_to_json` method will translate these values to `true` and
`false`. Typed array methods will interpret values as `1` and `0` respectively.
Polychotomous variables must be manually mapped to numeric values. For example,
using the Iris dataset:
![Iris dataset](pathname:///typedarray/iris.png)
```js
[
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]
```
Column E (`class`) is a polychotomous variable and must be manually translated:
```js
const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});
/* index_to_class will be needed to recover the values later */
const index_to_class = [];
/* map from class name to number */
const class_to_index = new Map();
/* loop over the data */
for(let R = 1; R < aoa.length; ++R) {
/* Column E = SheetJS row 4 */
const category = aoa[R][4];
const val = class_to_index.get(category);
if(val == null) {
/* assign a new index */
class_to_index.set(category, index_to_class.length);
aoa[R][4] = index_to_class.length;
index_to_class.push(category);
} else aoa[R][4] = val;
}
```
<details><summary><b>Live Demo</b> (click to show)</summary>
This example fetches and parses [`iris.xlsx`](pathname:///typedarray/iris.xlsx).
The first worksheet is processed and the new data and mapping are printed.
```jsx live
function SheetJSPolychotomy() {
const [cat, setCat] = React.useState([]);
const [aoa, setAoA] = React.useState([]);
React.useEffect(() => { (async() => {
const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer();
const wb = XLSX.read(ab);
const aoa = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], {header:1});
const index_to_class = [];
const class_to_index = new Map();
for(let R = 1; R < aoa.length; ++R) {
const category = aoa[R][4];
const val = class_to_index.get(category);
if(val == null) {
class_to_index.set(category, index_to_class.length);
aoa[R][4] = index_to_class.length;
index_to_class.push(category);
} else aoa[R][4] = val;
}
/* display every 25 rows, skipping the header row */
setAoA(aoa.filter((_, i) => (i % 25) == 1));
setCat(index_to_class);
})(); }, []);
return ( <>
<b>Mapping</b><br/>
<table><thead><tr><th>Index</th><th>Name</th></tr></thead><tbody>
{cat.map((name, i) => (<tr><td>{i}</td><td>{name}</td></tr>))}
</tbody></table>
<b>Sample Data</b><br/>
<table><thead><tr>{"ABCDE".split("").map(c => (<th>{c}</th>))}</tr></thead><tbody>
{aoa.map(row => (<tr>{row.map(col => (<td>{col}</td>))}</tr>))}
</tbody></table>
</>
);
}
```
</details>
#### One Variable per Column
It is common to store datasets where each row represents an observation and each
column represents a variable:
![Iris dataset](pathname:///typedarray/iris.png)
```js
var aoa = [
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]
```
An array `map` operation can pull data from an individual column. After mapping,
a `slice` can remove the header label. For example, the following snippet pulls
column C ("petal length") into a `Float64Array`:
```js
const C = XLSX.utils.decode_col("C"); // Column "C" = SheetJS index 2
const petal_length = Float64Array.from(aoa.map(row => row[C]).slice(1));
```
#### One Variable per Row
Some datasets are stored in tables where each row represents a variable and each
column represents an observation:
<table><thead><tr><th>JavaScript</th><th>Spreadsheet</th></tr></thead><tbody><tr><td>
```js
var aoa = [
["sepal length", 5.1, 4.9],
["sepal width", 3.5, 3],
["petal length", 1.4, 1.4],
["petal width", 0.2, 0.2],
["class", "setosa", "setosa"]
]
```
</td><td>
![Single column of data](pathname:///typedarray/iristr.png)
</td></tr></tbody></table>
From the row-major array of arrays, each entry of the outer array is a row.
Many sheets include header columns. The `slice` method can remove the header.
After removing the header, `Float64Array.from` can generate a typed array. For
example, this snippet pulls row 3 ("petal length") into a `Float64Array`:
```js
const petal_length = Float64Array.from(aoa[2].slice(1));
```
### Writing to Sheets
The SheetJS `aoa_to_sheet`[^1] method can generate a worksheet from an array of
arrays. Similarly, `sheet_add_aoa`[^2] can add an array of arrays of data into
an existing worksheet object. The `origin` option[^3] controls where data will
be written in the worksheet.
Neither method understands typed arrays, so data columns must be converted to
arrays of arrays.
#### One Variable per Row
A single typed array can be converted to a pure JS array with `Array.from`:
```js
const arr = Array.from(column);
```
An array of arrays can be created from the array:
```js
const aoa = [
arr // this array is the first element of the array literal
];
```
`aoa_to_sheet` and `sheet_add_aoa` treat this as one row. By default, data will
be written to cells in the first row of the worksheet.
Titles can be added to data rows with an `unshift` operation, but it is more
efficient to build up the worksheet with `aoa_to_sheet`:
```js
/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";
/* convert sample data to array */
const arr = Array.from(data);
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, [ arr ], { origin: "B1" });
```
![Typed Array to single row with title](pathname:///typedarray/ta-row.png)
<details open><summary><b>Live Demo</b> (click to hide)</summary>
In this example, two typed arrays are exported. `aoa_to_sheet` creates the
worksheet and `sheet_add_aoa` will add the data to the sheet.
```jsx live
function SheetJSeriesToRows() { return (<button onClick={() => {
/* typed arrays */
const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]);
const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]);
/* create worksheet from first typed array */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
const arr1 = Array.from(ta1);
XLSX.utils.sheet_add_aoa(ws, [ arr1 ], { origin: "B1" });
/* add second title to cell A2 */
XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "A2" });
/* add second typed array starting from cell B2 */
const arr2 = Array.from(ta2);
XLSX.utils.sheet_add_aoa(ws, [ arr2 ], { origin: "B2" });
/* export to file */
const wb = XLSX.utils.book_new(ws, "Export");
XLSX.writeFile(wb, "SheetJSeriesToRows.xlsx");
}}><b>Click to export</b></button>); }
```
</details>
#### One Variable per Column
A single typed array can be converted to a pure JS array with `Array.from`. For
columns, each value should be individually wrapped in an array:
<table><thead><tr><th>JavaScript</th><th>Spreadsheet</th></tr></thead><tbody><tr><td>
```js
var data = [
[54337.95],
[3.14159],
[2.718281828]
];
```
</td><td>
![Single column of data](pathname:///typedarray/col.png)
</td></tr></tbody></table>
`Array.from` takes a second argument. If it is a function, the function will be
called on each element and the value will be used in place of the original value
(in effect, mapping over the data). To generate a data column, each element must
be wrapped in an array literal:
```js
var arr = Array.from(column, (value) => ([ value ]));
```
`aoa_to_sheet` and `sheet_add_aoa` treat this as rows with one column of data
per row. By default, data will be written to cells in column "A".
Titles can be added to data columns with an `unshift` operation, but it is more
efficient to build up the worksheet with `aoa_to_sheet`:
```js
/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";
/* convert sample data to array */
const arr = Array.from(data, (value) => ([value]));
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, arr, { origin: "A2" });
```
![Typed Array to single column with title](pathname:///typedarray/ta-col.png)
<details open><summary><b>Live Demo</b> (click to hide)</summary>
In this example, two typed arrays are exported. `aoa_to_sheet` creates the
worksheet and `sheet_add_aoa` will add the data to the sheet.
```jsx live
function SheetJSeriesToCols() { return (<button onClick={() => {
/* typed arrays */
const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]);
const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]);
/* create worksheet from first title */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add first typed array starting from cell B1 */
const arr1 = Array.from(ta1, (value) => ([value]));
XLSX.utils.sheet_add_aoa(ws, arr1, { origin: "A2" });
/* add second title to cell B1 */
XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "B1" });
/* add second typed array starting from cell B2 */
const arr2 = Array.from(ta2, (value) => ([value]));
XLSX.utils.sheet_add_aoa(ws, arr2, { origin: "B2" });
/* export to file */
const wb = XLSX.utils.book_new(ws, "Export");
XLSX.writeFile(wb, "SheetJSeriesToCols.xlsx");
}}><b>Click to export</b></button>); }
```
</details>
[^1]: See [`aoa_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)
[^2]: See [`sheet_add_aoa` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)
[^3]: See [the `origin` option of `sheet_add_aoa` in "Utilities"](/docs/api/utilities/array#array-of-arrays-input)

@ -43,6 +43,7 @@ This demo was tested in the following environments:
|:---------|:-----------|
| `5.0.5` | 2023-12-04 |
| `4.5.0` | 2023-12-04 |
| `3.2.7` | 2023-12-05 |
:::

@ -1,6 +1,6 @@
---
title: Web Frameworks
pagination_prev: demos/index
pagination_prev: demos/math/index
pagination_next: demos/grid/index
---

@ -11,6 +11,11 @@ pagination_next: demos/net/upload/index
import current from '/version.js';
import CodeBlock from '@theme/CodeBlock';
`XMLHttpRequest` and `fetch` browser APIs enable binary data transfer between
web browser clients and web servers. Since this library works in web browsers,
server conversion work can be offloaded to the client! This demo shows a few
common scenarios involving browser APIs and popular wrapper libraries.
:::info pass
This demo focuses on downloading files. Other demos cover other HTTP use cases:
@ -20,11 +25,6 @@ This demo focuses on downloading files. Other demos cover other HTTP use cases:
:::
`XMLHttpRequest` and `fetch` browser APIs enable binary data transfer between
web browser clients and web servers. Since this library works in web browsers,
server conversion work can be offloaded to the client! This demo shows a few
common scenarios involving browser APIs and popular wrapper libraries.
:::caution Third-Party Hosts and Binary Data
Third-party cloud platforms such as AWS may corrupt raw binary downloads by
@ -45,7 +45,20 @@ The APIs generally have a way to control the interpretation of the downloaded
data. The `arraybuffer` response type usually forces the data to be presented
as an `ArrayBuffer` which can be parsed directly with the SheetJS `read` method[^1].
For example, with `fetch`:
The following example shows the data flow using `fetch` to download files:
```mermaid
flowchart LR
server[(Remote\nFile)]
response(Response\nobject)
subgraph SheetJS operations
ab(XLSX Data\nArrayBuffer)
wb(((SheetJS\nWorkbook)))
end
server --> |`fetch`\nGET request| response
response --> |`arrayBuffer`\n\n| ab
ab --> |`read`\n\n| wb
```
```js
/* download data into an ArrayBuffer object */
@ -69,7 +82,8 @@ contents match the first worksheet. The table is generated using the SheetJS
### XMLHttpRequest
For downloading data, the `arraybuffer` response type generates an `ArrayBuffer`
that can be viewed as an `Uint8Array` and fed to `XLSX.read` using `array` type:
that can be viewed as an `Uint8Array` and fed to the SheetJS `read` method. For
legacy browsers, the option `type: "array"` should be specified:
```js
/* set up an async GET request */
@ -122,7 +136,7 @@ function SheetJSXHRDL() {
### fetch
For downloading data, `Response#arrayBuffer` resolves to an `ArrayBuffer` that
can be converted to `Uint8Array` and passed to `XLSX.read`:
can be converted to `Uint8Array` and passed to the SheetJS `read` method:
```js
fetch(url).then(function(res) {
@ -215,13 +229,14 @@ $.ajax({
### Wrapper Libraries
Before `fetch` shipped with browsers, there were various wrapper libraries to
simplify `XMLHttpRequest`. Due to limitations with `fetch`, these libraries
are still relevant.
simplify `XMLHttpRequest`. Due to limitations with `fetch`, these libraries are
still relevant.
#### axios
[`axios`](https://axios-http.com/) presents a Promise based interface. Setting
`responseType` to `arraybuffer` ensures the return type is an ArrayBuffer:
`responseType` to `arraybuffer` ensures the return type is an ArrayBuffer. The
`data` property of the result can be passed to the SheetJS `read` method:
```js
async function workbook_dl_axios(url) {
@ -491,7 +506,7 @@ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz request
#### axios
When the `responseType` is `"arraybuffer"`, `axios` actually captures the data
in a NodeJS Buffer. `XLSX.read` will transparently handle Buffers:
in a NodeJS Buffer. The SheetJS `read` method handles NodeJS Buffer objects:
```js title="SheetJSAxios.js"
const XLSX = require("xlsx"), axios = require("axios");
@ -548,6 +563,8 @@ Other demos show network operations in special platforms:
- [React Native "Fetching Remote Data"](/docs/demos/mobile/reactnative#fetching-remote-data)
- [NativeScript "Fetching Remote Files"](/docs/demos/mobile/nativescript#fetching-remote-files)
- [AngularJS "Remote Files"](/docs/demos/frontend/angularjs#remote-files)
- [Dojo Toolkit "Parsing Remote Files"](/docs/demos/frontend/dojo#parsing-remote-files)
[^1]: See [`read` in "Reading Files"](/docs/api/parse-options)
[^2]: See [`sheet_to_html` in "Utilities"](/docs/api/utilities/html#html-table-output)

@ -219,17 +219,17 @@ This demo was tested in the following environments:
| OS | Type | Device | RN | Date |
|:-----------|:-----|:--------------------|:---------|:-----------|
| Android 34 | Sim | Pixel 3a | `0.72.7` | 2023-12-04 |
| iOS 17.0.1 | Sim | iPhone 15 Pro Max | `0.72.7` | 2023-12-04 |
| Android 29 | Real | NVIDIA Shield | `0.72.7` | 2023-12-04 |
| iOS 15.1 | Real | iPad Pro | `0.72.7` | 2023-12-04 |
| Android 34 | Sim | Pixel 3a | `0.73.1` | 2023-12-21 |
| iOS 17.2 | Sim | iPhone 15 Pro Max | `0.73.1` | 2023-12-21 |
| Android 29 | Real | NVIDIA Shield | `0.73.1` | 2023-12-21 |
| iOS 15.1 | Real | iPad Pro | `0.73.1` | 2023-12-21 |
:::
1) Create project:
```bash
npx -y react-native@0.72.7 init SheetJSRNFetch --version="0.72.7"
npx -y react-native@0.73.1 init SheetJSRNFetch --version="0.73.1"
```
2) Install shared dependencies:
@ -249,16 +249,16 @@ curl -LO https://docs.sheetjs.com/reactnative/App.tsx
**Android Testing**
4) Install or switch to Java 11[^6]
4) Install or switch to Java 17[^6]
:::note pass
When the demo was last tested on macOS, `java -version` displayed the following:
```
openjdk version "11.0.21" 2023-10-17 LTS
OpenJDK Runtime Environment Zulu11.68+17-CA (build 11.0.21+9-LTS)
OpenJDK 64-Bit Server VM Zulu11.68+17-CA (build 11.0.21+9-LTS, mixed mode)
openjdk version "17.0.9" 2023-10-17
OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode)
```
:::
@ -274,8 +274,14 @@ npx react-native run-android
If the initial launch fails with an error referencing the emulator, manually
start the emulator and try again.
Gradle errors typically stem from a Java version mismatch. Run `java -version`
and verify that the Java major version is 11.
Gradle errors typically stem from a Java version mismatch:
```
> Failed to apply plugin 'com.android.internal.application'.
> Android Gradle plugin requires Java 17 to run. You are currently using Java 11.
```
This error can be resolved by installing and switching to the requested version.
:::
@ -299,7 +305,9 @@ tapping "Import data from a spreadsheet", verify that the app shows new data:
:::warning pass
iOS testing requires macOS. It does not work on Windows or Linux.
**iOS testing can only be performed on Apple hardware running macOS!**
Xcode and iOS simulators are not available on Windows or Linux.
:::
@ -349,7 +357,7 @@ npx react-native run-android
13) Close any Android / iOS emulators.
14) Enable developer code signing certificates[^7]
14) Enable developer code signing certificates[^7].
15) Install `ios-deploy` through Homebrew:
@ -363,6 +371,67 @@ brew install ios-deploy
npx react-native run-ios
```
:::caution pass
When this demo was last tested, the build failed with the following error:
```
PhaseScriptExecution failed with a nonzero exit code
```
This was due to an error in the `react-native` package. The script
`node_modules/react-native/scripts/react-native-xcode.sh` must be edited.
Near the top of the script, there will be a `set` statement:
```bash title="node_modules/react-native/scripts/react-native-xcode.sh"
# Print commands before executing them (useful for troubleshooting)
# highlight-next-line
set -x -e
DEST=$CONFIGURATION_BUILD_DIR/$UNLOCALIZED_RESOURCES_FOLDER_PATH
```
The `-e` argument must be removed:
```bash title="node_modules/react-native/scripts/react-native-xcode.sh (edit line)"
# Print commands before executing them (useful for troubleshooting)
# highlight-next-line
set -x
DEST=$CONFIGURATION_BUILD_DIR/$UNLOCALIZED_RESOURCES_FOLDER_PATH
```
:::
:::info pass
By default, React Native generates applications that exclusively target iPhone.
On a physical iPad, a pixellated iPhone app will be run.
The "targeted device families" setting must be changed to support iPad:
A) Open the Xcode workspace:
```bash
open ./ios/SheetJSRNFetch.xcworkspace
```
B) Select the project in the left sidebar:
![Select the project](pathname:///reactnative/xcode-select-project.png)
C) Select the "SheetJSRNFetch" target in the sidebar.
![Settings](pathname:///reactnative/xcode-targets.png)
D) Select the "Build Settings" tab in the main area.
E) In the search bar below "Build Settings", type "tar"
F) Look for the "Targeted Device Families" row. Change the corresponding value
to "iPhone, iPad".
:::
## Local Files
:::warning pass
@ -987,6 +1056,6 @@ npx xlsx-cli /tmp/sheetjsw.xlsx
[^3]: See ["Array Output" in "Utility Functions"](/docs/api/utilities/array#array-output)
[^4]: See ["Array of Arrays Input" in "Utility Functions"](/docs/api/utilities/array#array-of-arrays-input)
[^5]: React-Native commit [`5b597b5`](https://github.com/facebook/react-native/commit/5b597b5ff94953accc635ed3090186baeecb3873) added the final piece required for `fetch` support. It landed in version `0.72.0-rc.1` and is available in official releases starting from `0.72.0`.
[^6]: When the demo was last tested, the Zulu11 distribution of Java 11 was installed through the macOS Brew package manager. [Direct downloads are available at `azul.com`](https://www.azul.com/downloads/?version=java-11-lts&package=jdk#zulu)
[^6]: When the demo was last tested, the Temurin distribution of Java 17 was installed through the macOS Brew package manager by running `brew install temurin17`. [Direct downloads are available at `adoptium.net`](https://adoptium.net/temurin/releases/?version=17)
[^7]: See ["Running On Device"](https://reactnative.dev/docs/running-on-device) in the React Native documentation
[^8]: Follow the ["React Native CLI Quickstart"](https://reactnative.dev/docs/environment-setup) for Android (and iOS, if applicable)

@ -178,7 +178,7 @@ npx cap init sheetjs-cap com.sheetjs.cap --web-dir=dist
npm run build
```
:::note
:::note pass
If prompted to create an Ionic account, type `N` and press Enter.

@ -184,11 +184,11 @@ This demo was tested in the following environments:
| OS and Version | Architecture | Electron | Date |
|:---------------|:-------------|:---------|:-----------|
| macOS 13.5.1 | `darwin-x64` | `26.1.0` | 2023-09-03 |
| macOS 13.5.1 | `darwin-x64` | `27.1.3` | 2023-12-09 |
| macOS 14.1.2 | `darwin-arm` | `27.1.3` | 2023-12-01 |
| Windows 10 | `win10-x64` | `26.1.0` | 2023-09-03 |
| Windows 10 | `win10-x64` | `27.1.3` | 2023-12-09 |
| Windows 11 | `win11-arm` | `27.1.3` | 2023-12-01 |
| Linux (HoloOS) | `linux-x64` | `27.0.0` | 2023-10-11 |
| Linux (HoloOS) | `linux-x64` | `27.1.3` | 2023-12-09 |
| Linux (Debian) | `linux-arm` | `27.1.3` | 2023-12-01 |
:::
@ -247,7 +247,7 @@ The app will show.
npm run make
```
This will create a package in the `out\make` folder.
This will create a package in the `out\make` folder and a standalone binary.
:::caution pass
@ -266,11 +266,13 @@ The program will run on ARM64 Windows.
5) Download [the test file `pres.numbers`](https://sheetjs.com/pres.numbers)
6) Re-launch the application in the test environment:
6) Launch the generated application:
```bash
npx -y electron .
```
| Architecture | Command |
|:-------------|:--------------------------------------------------------------|
| `darwin-x64` | `open ./out/sheetjs-electron-darwin-x64/sheetjs-electron.app` |
| `win10-x64` | `.\out\sheetjs-electron-win32-x64\sheetjs-electron.exe` |
| `linux-x64` | `./out/sheetjs-electron-linux-x64/sheetjs-electron` |
#### Electron API
@ -284,7 +286,7 @@ to write to `Untitled.xls` in the Downloads folder.
:::note pass
During the most recent Linux ARM64 test, the dialog did not have a default name.
In some tests, the dialog did not have a default name.
If there is no default name, enter `Untitled.xls` and click "Save".
@ -335,4 +337,4 @@ call is required to enable Developer Tools in the window.
:::
[^1]: See ["Makers"](https://www.electronforge.io/config/makers) in the Electron Forge documentation. On Linux, the demo generates `rpm` and `deb` distributables.
[^1]: See ["Makers"](https://www.electronforge.io/config/makers) in the Electron Forge documentation. On Linux, the demo generates `rpm` and `deb` distributables. On Arch Linux and the Steam Deck, `sudo pacman -Syu rpm-tools dpkg fakeroot` installed required packages.

@ -115,9 +115,9 @@ This demo was tested in the following environments:
|:---------------|:-------------|:---------|:-----------|
| macOS 13.5.2 | `darwin-x64` | `0.78.1` | 2023-09-27 |
| macOS 14.1.2 | `darwin-arm` | `0.82.0` | 2023-12-01 |
| Windows 10 | `win10-x64` | `0.78.1` | 2023-09-27 |
| Windows 10 | `win10-x64` | `0.82.0` | 2023-12-09 |
| Windows 11 | `win11-arm` | `0.82.0` | 2023-12-01 |
| Linux (HoloOS) | `linux-x64` | `0.78.1` | 2023-10-11 |
| Linux (HoloOS) | `linux-x64` | `0.82.0` | 2023-12-07 |
There is no official Linux ARM64 release. The community release[^1] was tested
and verified on 2023-09-27.

@ -299,7 +299,7 @@ This demo was tested in the following environments:
|:---------------|:-------------|:---------|:-----------|
| macOS 13.6 | `darwin-x64` | `v2.6.0` | 2023-11-05 |
| macOS 14.1.2 | `darwin-arm` | `v2.6.0` | 2023-12-01 |
| Windows 10 | `win10-x64` | `v2.5.1` | 2023-08-25 |
| Windows 10 | `win10-x64` | `v2.6.0` | 2023-12-09 |
| Windows 11 | `win11-arm` | `v2.6.0` | 2023-12-01 |
| Linux (HoloOS) | `linux-x64` | `v2.6.0` | 2023-10-11 |
| Linux (Debian) | `linux-arm` | `v2.6.0` | 2023-12-01 |

@ -192,11 +192,11 @@ This demo was tested in the following environments:
| OS and Version | Architecture | Server | Client | Date |
|:---------------|:-------------|:----------|:----------|:-----------|
| macOS 13.5.1 | `darwin-x64` | `v4.13.0` | `v3.11.0` | 2023-08-26 |
| macOS 13.5.1 | `darwin-x64` | `v4.14.1` | `v3.12.0` | 2023-12-13 |
| macOS 14.0 | `darwin-arm` | `v4.14.1` | `v3.12.0` | 2023-10-18 |
| Windows 10 | `win10-x64` | `v4.13.0` | `v3.11.0` | 2023-08-26 |
| Windows 10 | `win10-x64` | `v4.14.1` | `v3.12.0` | 2023-12-09 |
| Windows 11 | `win11-arm` | `v4.14.1` | `v3.12.0` | 2023-12-01 |
| Linux (HoloOS) | `linux-x64` | `v4.14.1` | `v3.12.0` | 2023-10-11 |
| Linux (HoloOS) | `linux-x64` | `v4.14.1` | `v3.12.0` | 2023-12-09 |
| Linux (Debian) | `linux-arm` | `v4.14.1` | `v3.12.0` | 2023-12-01 |
:::

@ -55,7 +55,7 @@ This demo was tested in the following deployments:
| `darwin-arm` | `4.0.0-rc.2` | `18.18.0` | 2023-12-01 |
| `win10-x64` | `4.0.0-rc.2` | `14.15.3` | 2023-10-09 |
| `win11-arm` | `4.0.0-rc.2` | `20.10.0` | 2023-12-01 |
| `linux-x64` | `4.0.0-rc.2` | `14.15.3` | 2023-10-11 |
| `linux-x64` | `4.0.0-rc.2` | `14.15.3` | 2023-12-07 |
| `linux-arm` | `4.0.0-rc.2` | `20.10.0` | 2023-12-01 |
</TabItem>
@ -67,7 +67,7 @@ This demo was tested in the following deployments:
| `darwin-arm` | `5.8.1` | `18.5.0` | 2023-12-01 |
| `win10-x64` | `5.8.1` | `18.5.0` | 2023-10-09 |
| `win11-arm` | `5.8.1` | `18.5.0` | 2023-12-01 |
| `linux-x64` | `5.8.1` | `18.5.0` | 2023-10-11 |
| `linux-x64` | `5.8.1` | `18.5.0` | 2023-12-07 |
| `linux-arm` | `5.8.1` | `18.5.0` | 2023-12-01 |
</TabItem>
@ -78,7 +78,7 @@ This demo was tested in the following deployments:
| `darwin-x64` | `2.1.2` | `20.8.0` | 2023-10-12 |
| `darwin-arm` | `2.3.0` | `21.3.0` | 2023-12-01 |
| `win10-x64` | `2.1.2` | `16.20.2` | 2023-10-09 |
| `linux-x64` | `2.1.2` | `20.8.0` | 2023-10-11 |
| `linux-x64` | `2.3.0` | `21.4.0` | 2023-12-07 |
| `linux-arm` | `2.3.0` | `21.3.0` | 2023-12-01 |
</TabItem>

@ -24,10 +24,10 @@ This demo was verified by NetSuite consultants in the following deployments:
| `@NScriptType` | `@NApiVersion` | Date |
|:----------------|:---------------|:-----------|
| ScheduledScript | 2.1 | 2023-08-18 |
| ScheduledScript | 2.1 | 2023-12-13 |
| Restlet | 2.1 | 2023-10-05 |
| Suitelet | 2.1 | 2023-10-27 |
| MapReduceScript | 2.1 | 2023-11-16 |
| Suitelet | 2.1 | 2023-12-22 |
| MapReduceScript | 2.1 | 2023-12-07 |
:::

@ -1,5 +1,5 @@
---
title: Data Processing in GitHub
title: Flat Data Processing in GitHub
sidebar_label: GitHub
pagination_prev: demos/local/index
pagination_next: demos/extensions/index
@ -8,15 +8,12 @@ pagination_next: demos/extensions/index
import current from '/version.js';
import CodeBlock from '@theme/CodeBlock';
Many official data releases by governments and organizations include XLSX or
XLS files. Unfortunately some data sources do not retain older versions.
[Git](https://git-scm.com/) is a popular system for organizing a historical
record of text files and changes. Git can also store and track spreadsheets.
[GitHub](https://github.com/) hosts Git repositories and provides infrastructure
to run scheduled tasks. ["Flat Data"](https://octo.github.com/projects/flat-data)
explores storing and comparing versions of structured CSV and JSON data.
GitHub hosts Git repositories and provides infrastructure to execute workflows.
The ["Flat Data" project](https://octo.github.com/projects/flat-data) explores
storing and comparing versions of structured data using GitHub infrastructure.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
@ -29,7 +26,7 @@ changes over time.
["Excel to CSV"](https://octo.github.com/projects/flat-data#:~:text=Excel) is an
official example that pulls XLSX workbooks from an endpoint and uses SheetJS to
parse the workbooks and generate CSV files:
parse the workbooks and generate CSV files.
:::
@ -38,8 +35,8 @@ The following diagram depicts the data dance:
```mermaid
sequenceDiagram
autonumber
participant R as GH Repo
participant A as GH Action
participant R as GitHub Repo
participant A as GitHub Action
participant S as Data Source
loop Regular Interval (cron)
A->>R: clone repo
@ -56,18 +53,30 @@ sequenceDiagram
## Flat Data
Many official data releases by governments and organizations include XLSX or
XLS files. Unfortunately some data sources do not retain older versions.
Software developers typically use version control systems such as Git to track
changes in source code.
The "Flat Data" project starts from the idea that the same version control
systems can be used to track changes in data. Third-party data sources can be
snapshotted at regular intervals and stored in Git repositories.
### Components
As a project from the company, the entire lifecycle uses GitHub offerings:
- GitHub offers free hosting for Git repositories
- GitHub Actions[^1] infrastructure runs tasks at regular intervals
- `githubocto/flat`[^2] library helps fetch data and automate post-processing
- `flat-postprocessing`[^3] library provides post-processing helper functions
- "Flat Viewer"[^4] displays structured CSV and JSON data from Git repositories
- GitHub.com[^1] offers free hosting for Git repositories
- GitHub Actions[^2] infrastructure runs tasks at regular intervals
- `githubocto/flat`[^3] library helps fetch data and automate post-processing
- `flat-postprocessing`[^4] library provides post-processing helper functions
- "Flat Viewer"[^5] displays structured CSV and JSON data from Git repositories
:::caution pass
A GitHub account is required. When the demo was last tested, "GitHub Free"
accounts had no Actions usage limits for public repositories[^5].
accounts had no Actions usage limits for public repositories[^6].
Private GitHub repositories can be used for processing data, but the Flat Viewer
will not be able to display private data.
@ -143,12 +152,12 @@ for more details.
The first argument to the post-processing script is the filename.
The SheetJS `readFile` method[^6] will read the file and generate a SheetJS
workbook object[^7]. After extracting the first worksheet, `sheet_to_csv`[^8]
The SheetJS `readFile` method[^7] will read the file and generate a SheetJS
workbook object[^8]. After extracting the first worksheet, `sheet_to_csv`[^9]
generates a CSV string.
After generating a CSV string, the string should be written to the filesystem
using `Deno.writeFileSync`[^9]. By convention, the CSV should preserve the file
using `Deno.writeFileSync`[^10]. By convention, the CSV should preserve the file
name stem and replace the extension with `.csv`:
<CodeBlock title="postprocess.ts" language="ts">{`\
@ -316,12 +325,13 @@ jobs:
The column chart in the Index column is a histogram.
[^1]: See ["GitHub Actions documentation"](https://docs.github.com/en/actions)
[^2]: See [`githubocto/flat`](https://github.com/githubocto/flat) repo on GitHub.
[^3]: See [`githubocto/flat-postprocessing`](https://github.com/githubocto/flat-postprocessing) repo on GitHub.
[^4]: The hosted version is available at <https://flatgithub.com/>
[^5]: See ["About billing for GitHub Actions"](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions) in the GitHub documentation.
[^6]: See [`readFile` in "Reading Files"](/docs/api/parse-options)
[^7]: See ["Workbook Object"](/docs/csf/book)
[^8]: See [`sheet_to_csv` in "CSV and Text"](/docs/api/utilities/csv#delimiter-separated-output)
[^9]: See [`Deno.writeFileSync`](https://deno.land/api?s=Deno.writeFileSync) in the Deno Runtime APIs documentation.
[^1]: See ["Repositories documentation"](https://docs.github.com/en/repositories) in the GitHub documentation.
[^2]: See ["GitHub Actions documentation"](https://docs.github.com/en/actions) in the GitHub documentation.
[^3]: See [`githubocto/flat`](https://github.com/githubocto/flat) repo on GitHub.
[^4]: See [`githubocto/flat-postprocessing`](https://github.com/githubocto/flat-postprocessing) repo on GitHub.
[^5]: The hosted version is available at <https://flatgithub.com/>
[^6]: See ["About billing for GitHub Actions"](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions) in the GitHub documentation.
[^7]: See [`readFile` in "Reading Files"](/docs/api/parse-options)
[^8]: See ["Workbook Object"](/docs/csf/book)
[^9]: See [`sheet_to_csv` in "CSV and Text"](/docs/api/utilities/csv#delimiter-separated-output)
[^10]: See [`Deno.writeFileSync`](https://deno.land/api?s=Deno.writeFileSync) in the Deno Runtime APIs documentation.

@ -28,7 +28,7 @@ flowchart LR
nfile --> |ExcelTools\nImport|data
```
:::note
:::note Tested Deployments
This demo was last tested by SheetJS users on 2023 October 3 in Maple 2023.

@ -1,347 +0,0 @@
---
title: Typed Arrays and ML
pagination_prev: demos/extensions/index
pagination_next: demos/engines/index
sidebar_custom_props:
summary: Parse and serialize Uint8Array data from TensorFlow
---
<head>
<script src="https://docs.sheetjs.com/tfjs/tf.min.js"></script>
</head>
Machine learning libraries in JS typically use "Typed Arrays". Typed Arrays are
not JS Arrays! With some data wrangling, translating between SheetJS worksheets
and typed arrays is straightforward.
This demo covers conversions between worksheets and Typed Arrays for use with
TensorFlow.js and other ML libraries.
:::info pass
Live code blocks in this page load the standalone build from version `4.10.0`.
For use in web frameworks, the `@tensorflow/tfjs` module should be used.
For use in NodeJS, the native bindings module is `@tensorflow/tfjs-node`.
:::
:::note pass
Each browser demo was tested in the following environments:
| Browser | Date | TF.js version |
|:------------|:-----------|:--------------|
| Chrome 116 | 2023-09-02 | `4.10.0` |
| Safari 16.6 | 2023-09-02 | `4.10.0` |
| Brave 1.57 | 2023-09-02 | `4.10.0` |
:::
## CSV Data Interchange
`tf.data.csv` generates a Dataset from CSV data. The function expects a URL.
Fortunately blob URLs are supported, making data import straightforward:
```js
function worksheet_to_csv_url(worksheet) {
/* generate CSV */
const csv = XLSX.utils.sheet_to_csv(worksheet);
/* CSV -> Uint8Array -> Blob */
const u8 = new TextEncoder().encode(csv);
const blob = new Blob([u8], { type: "text/csv" });
/* generate a blob URL */
return URL.createObjectURL(blob);
}
```
<details><summary><b>TF CSV Demo using XLSX files</b> (click to show)</summary>
This demo shows a simple model fitting using the "Boston Housing" dataset. The
[sample XLSX file](https://sheetjs.com/data/bht.xlsx) contains the data.
The demo first fetches the XLSX file and generates CSV text. A blob URL is
generated and fed to `tf.data.csv`. The rest of the demo follows the official
example in the TensorFlow documentation.
:::caution pass
If the live demo shows a message
```
ReferenceError: tf is not defined
```
please refresh the page. This is a known bug in the documentation generator.
:::
```jsx live
function SheetJSToTFJSCSV() {
const [output, setOutput] = React.useState("");
const doit = React.useCallback(async () => {
/* fetch file */
const f = await fetch("https://sheetjs.com/data/bht.xlsx");
const ab = await f.arrayBuffer();
/* parse file and get first worksheet */
const wb = XLSX.read(ab);
const ws = wb.Sheets[wb.SheetNames[0]];
/* generate CSV */
const csv = XLSX.utils.sheet_to_csv(ws);
/* generate blob URL */
const u8 = new TextEncoder().encode(csv);
const blob = new Blob([u8], {type: "text/csv"});
const url = URL.createObjectURL(blob);
/* feed to tfjs */
const dataset = tf.data.csv(url, {columnConfigs:{"medv":{isLabel:true}}});
/* this part mirrors the tf.data.csv docs */
const flat = dataset.map(({xs,ys}) => ({xs: Object.values(xs), ys: Object.values(ys)})).batch(10);
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [(await dataset.columnNames()).length - 1], units: 1}));
model.compile({ optimizer: tf.train.sgd(0.000001), loss: 'meanSquaredError' });
let base = output;
await model.fitDataset(flat, { epochs: 10, callbacks: { onEpochEnd: async (epoch, logs) => {
setOutput(base += "\n" + epoch + ":" + logs.loss);
}}});
model.summary();
});
return ( <pre>
<button onClick={doit}>Click to run</button>
{output}
</pre> );
}
```
</details>
In the other direction, `XLSX.read` will readily parse CSV exports.
## JS Array Interchange
[The official Linear Regression tutorial](https://www.tensorflow.org/js/tutorials/training/linear_regression)
loads data from a JSON file:
```json
[
{
"Name": "chevrolet chevelle malibu",
"Miles_per_Gallon": 18,
"Cylinders": 8,
"Displacement": 307,
"Horsepower": 130,
"Weight_in_lbs": 3504,
"Acceleration": 12,
"Year": "1970-01-01",
"Origin": "USA"
},
{
"Name": "buick skylark 320",
"Miles_per_Gallon": 15,
"Cylinders": 8,
"Displacement": 350,
"Horsepower": 165,
"Weight_in_lbs": 3693,
"Acceleration": 11.5,
"Year": "1970-01-01",
"Origin": "USA"
},
// ...
]
```
In real use cases, data is stored in [spreadsheets](https://sheetjs.com/data/cd.xls)
![cd.xls screenshot](pathname:///files/cd.png)
Following the tutorial, the data fetching method is easily adapted. Differences
from the official example are highlighted below:
```js
/**
* Get the car data reduced to just the variables we are interested
* and cleaned of missing data.
*/
async function getData() {
// highlight-start
/* fetch file */
const carsDataResponse = await fetch('https://sheetjs.com/data/cd.xls');
/* get file data (ArrayBuffer) */
const carsDataAB = await carsDataResponse.arrayBuffer();
/* parse */
const carsDataWB = XLSX.read(carsDataAB);
/* get first worksheet */
const carsDataWS = carsDataWB.Sheets[carsDataWB.SheetNames[0]];
/* generate array of JS objects */
const carsData = XLSX.utils.sheet_to_json(carsDataWS);
// highlight-end
const cleaned = carsData.map(car => ({
mpg: car.Miles_per_Gallon,
horsepower: car.Horsepower,
}))
.filter(car => (car.mpg != null && car.horsepower != null));
return cleaned;
}
```
## Low-Level Operations
:::caution pass
While it is more efficient to use low-level operations, JS or CSV interchange
is strongly recommended when possible.
:::
### Data Transposition
A typical dataset in a spreadsheet will start with one header row and represent
each data record in its own row. For example, the Iris dataset might look like
![Iris dataset](pathname:///files/iris.png)
`XLSX.utils.sheet_to_json` will translate this into an array of row objects:
```js
var aoo = [
{"sepal length": 5.1, "sepal width": 3.5, ...},
{"sepal length": 4.9, "sepal width": 3, ...},
...
];
```
TF.js and other libraries tend to operate on individual columns, equivalent to:
```js
var sepal_lengths = [5.1, 4.9, ...];
var sepal_widths = [3.5, 3, ...];
```
When a `tensor2d` can be exported, it will look different from the spreadsheet:
```js
var data_set_2d = [
[5.1, 4.9, ...],
[3.5, 3, ...],
...
]
```
This is the transpose of how people use spreadsheets!
#### Typed Arrays and Columns
A single typed array can be converted to a pure JS array with `Array.from`:
```js
var column = Array.from(dataset_typedarray);
```
Similarly, `Float32Array.from` generates a typed array from a normal array:
```js
var dataset = Float32Array.from(column);
```
### Exporting Datasets to a Worksheet
`XLSX.utils.aoa_to_sheet` can generate a worksheet from an array of arrays.
ML libraries typically provide APIs to pull an array of arrays, but it will
be transposed. To export multiple data sets, manually "transpose" the data:
```js
/* assuming data is an array of typed arrays */
var aoa = [];
for(var i = 0; i < data.length; ++i) {
for(var j = 0; j < data[i].length; ++j) {
if(!aoa[j]) aoa[j] = [];
aoa[j][i] = data[i][j];
}
}
/* aoa can be directly converted to a worksheet object */
var ws = XLSX.utils.aoa_to_sheet(aoa);
```
### Importing Data from a Spreadsheet
`sheet_to_json` with the option `header:1` will generate a row-major array of
arrays that can be transposed. However, it is more efficient to walk the sheet
manually:
```js
/* find worksheet range */
var range = XLSX.utils.decode_range(ws['!ref']);
var out = []
/* walk the columns */
for(var C = range.s.c; C <= range.e.c; ++C) {
/* create the typed array */
var ta = new Float32Array(range.e.r - range.s.r + 1);
/* walk the rows */
for(var R = range.s.r; R <= range.e.r; ++R) {
/* find the cell, skip it if the cell isn't numeric or boolean */
var cell = ws[XLSX.utils.encode_cell({r:R, c:C})];
if(!cell || cell.t != 'n' && cell.t != 'b') continue;
/* assign to the typed array */
ta[R - range.s.r] = cell.v;
}
out.push(ta);
}
```
If the data set has a header row, the loop can be adjusted to skip those rows.
### TF.js Tensors
A single `Array#map` can pull individual named fields from the result, which
can be used to construct TensorFlow.js tensor objects:
```js
const aoo = XLSX.utils.sheet_to_json(worksheet);
const lengths = aoo.map(row => row["sepal length"]);
const tensor = tf.tensor1d(lengths);
```
`tf.Tensor` objects can be directly transposed using `transpose`:
```js
var aoo = XLSX.utils.sheet_to_json(worksheet);
// "x" and "y" are the fields we want to pull from the data
var data = aoo.map(row => ([row["x"], row["y"]]));
// create a tensor representing two column datasets
var tensor = tf.tensor2d(data).transpose();
// individual columns can be accessed
var col1 = tensor.slice([0,0], [1,tensor.shape[1]]).flatten();
var col2 = tensor.slice([1,0], [1,tensor.shape[1]]).flatten();
```
For exporting, `stack` can be used to collapse the columns into a linear array:
```js
/* pull data into a Float32Array */
var result = tf.stack([col1, col2]).transpose();
var shape = tensor.shape;
var f32 = tensor.dataSync();
/* construct an array of arrays of the data in spreadsheet order */
var aoa = [];
for(var j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(var i = 0; i < shape[1]; ++i) aoa[j][i] = f32[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);
/* generate worksheet */
var worksheet = XLSX.utils.aoa_to_sheet(aoa);
```

@ -130,11 +130,11 @@ This demo was tested in the following deployments:
| Architecture | Version | Date |
|:-------------|:--------|:-----------|
| `darwin-x64` | `2.7.0` | 2023-10-26 |
| `darwin-x64` | `2.7.0` | 2023-12-05 |
| `darwin-arm` | `2.7.0` | 2023-10-18 |
| `win10-x64` | `2.7.0` | 2023-10-27 |
| `win11-arm` | `2.7.0` | 2023-12-01 |
| `linux-x64` | `2.7.0` | 2023-10-11 |
| `linux-x64` | `2.7.0` | 2023-12-07 |
| `linux-arm` | `2.7.0` | 2023-12-01 |
:::

@ -17,8 +17,18 @@ result is a JAR.
:::caution pass
Rhino does not support Uint8Array, so certain formats like NUMBERS cannot be
parsed or written from Rhino JS code!
Rhino does not support Uint8Array, so NUMBERS files cannot be read or written.
:::
:::note Tested Deployments
This demo was tested in the following deployments:
| OpenJDK | Rhino | Date |
|:--------|:---------|:-----------|
| 21.0.1 | `1.7.14` | 2023-12-05 |
| 1.8.0 | `1.7.14` | 2023-12-05 |
:::
@ -118,12 +128,6 @@ This string can be loaded into the JS engine and processed:
## Complete Example
:::note
This demo was tested on 2023-10-26 against Rhino 1.7.14.
:::
0) Ensure Java is installed.
1) Create a folder for the project:

@ -27,7 +27,7 @@ command-line tool for reading data from files.
:::note pass
Many QuickJS functions are not documented. The explanation was verified against
the latest release (commit `2788d71`).
the latest release (commit `daa35bc`).
:::
@ -262,14 +262,14 @@ This demo was tested in the following deployments:
| Architecture | Git Commit | Date |
|:-------------|:-----------|:-----------|
| `darwin-x64` | `2788d71` | 2023-10-26 |
| `darwin-x64` | `daa35bc` | 2023-12-09 |
| `darwin-arm` | `2788d71` | 2023-10-18 |
| `win10-x64` | `2788d71` | 2023-10-09 |
| `win10-x64` | `daa35bc` | 2023-12-09 |
| `win11-arm` | `03cc5ec` | 2023-12-01 |
| `linux-x64` | `2788d71` | 2023-10-11 |
| `linux-x64` | `03cc5ec` | 2023-12-07 |
| `linux-arm` | `03cc5ec` | 2023-12-01 |
When the demo was tested, commit `03cc5ec` corresponded to the latest commit.
When the demo was tested, commit `daa35bc` corresponded to the latest release.
:::
@ -285,7 +285,7 @@ tests were run entirely within Windows Subsystem for Linux.
```bash
git clone https://github.com/bellard/quickjs
cd quickjs
git checkout 03cc5ec
git checkout daa35bc
make
cd ..
```
@ -342,10 +342,10 @@ This demo was tested in the following environments:
| Git Commit | Date |
|:-----------|:-----------|
| `03cc5ec` | 2023-12-01 |
| `2788d71` | 2023-10-11 |
| `daa35bc` | 2023-12-09 |
| `2788d71` | 2023-12-09 |
When the demo was tested, commit `03cc5ec` corresponded to the latest commit.
When the demo was tested, commit `daa35bc` corresponded to the latest release.
:::
@ -354,7 +354,7 @@ When the demo was tested, commit `03cc5ec` corresponded to the latest commit.
```bash
git clone https://github.com/bellard/quickjs
cd quickjs
git checkout 03cc5ec
git checkout daa35bc
make
cd ..
cp quickjs/qjs .

@ -135,7 +135,7 @@ This demo was tested in the following deployments:
| `darwin-x64` | `c3ead3f` | 2023-11-04 |
| `darwin-arm` | `c3ead3f` | 2023-10-19 |
| `win10-x64` | `c3ead3f` | 2023-10-28 |
| `linux-x64` | `c3ead3f` | 2023-10-11 |
| `linux-x64` | `c3ead3f` | 2023-12-09 |
:::

@ -124,7 +124,7 @@ This demo was tested in the following deployments:
| `darwin-arm` | 2023-10-20 |
| `win10-x64` | 2023-10-28 |
| `win11-arm` | 2023-12-01 |
| `linux-x64` | 2023-10-11 |
| `linux-x64` | 2023-12-07 |
| `linux-arm` | 2023-12-01 |
:::

@ -100,9 +100,9 @@ write_file("SheetJE.fods", $fods);
## Complete Example
:::note
:::note Tested Deployments
This demo was tested on 2023-08-26 against JE 0.066
This demo was tested on 2023-12-05 against JE 0.066
:::
@ -131,5 +131,5 @@ curl -LO https://sheetjs.com/data/cd.xls
perl SheetJE.pl cd.xls
```
After a short wait, the contents will be displayed in CSV form. It will also
write a file `SheetJE.fods` that can be opened in LibreOffice.
After a short wait, the contents will be displayed in CSV form. The script will
also generate the spreadsheet `SheetJE.fods` which can be opened in LibreOffice.

@ -125,14 +125,22 @@ generates a C library and a standalone CLI tool.
The simplest way to interact with the engine is to pass Base64 strings.
:::note pass
:::note Tested Environments
This demo was tested in the following deployments:
This demo was tested in the following environments:
| Architecture | Commit | Date |
|:-------------|:----------|:-----------|
| `darwin-x64` | `bc408b1` | 2023-11-14 |
| `linux-x64` | `a588e49` | 2023-10-11 |
| `darwin-x64` | `ef4cb2b` | 2023-12-08 |
| `darwin-arm` | `ef4cb2b` | 2023-12-08 |
| `win11-x64` | `ef4cb2b` | 2023-12-08 |
| `win11-arm` | `ef4cb2b` | 2023-12-08 |
| `linux-x64` | `ef4cb2b` | 2023-12-08 |
| `linux-arm` | `ef4cb2b` | 2023-12-08 |
The Windows tests were run in WSL.
Debian and WSL require the `cmake`, `python3` and `python-is-python3` packages.
:::

@ -21,7 +21,7 @@ in the [issue tracker](https://git.sheetjs.com/sheetjs/docs.sheetjs.com/issues)
- [`XMLHttpRequest and fetch`](/docs/demos/net/network)
- [`Clipboard Data`](/docs/demos/local/clipboard)
- [`Web Workers`](/docs/demos/bigdata/worker)
- [`Typed Arrays for Machine Learning`](/docs/demos/bigdata/ml)
- [`Typed Arrays`](/docs/demos/math)
- [`Local File Access`](/docs/demos/local/file)
- [`LocalStorage and SessionStorage`](/docs/demos/data/storageapi)
- [`Web SQL Database`](/docs/demos/data/websql)

@ -748,7 +748,7 @@ example of fetching data from a JSON Endpoint and generating a workbook.
[`x-spreadsheet`](/docs/demos/grid/xs) is an interactive data grid for
previewing and modifying structured data in the web browser.
["Typed Arrays and ML"](/docs/demos/ml) covers strategies for
["TensorFlow.js"](/docs/demos/math/tensorflow) covers strategies for
creating worksheets from ML library exports (datasets stored in Typed Arrays).
<details>

@ -652,7 +652,7 @@ export default function App() {
### Example: Data Loading
["Typed Arrays and ML"](/docs/demos/ml) covers strategies for
["TensorFlow.js"](/docs/demos/math/tensorflow) covers strategies for
generating typed arrays and tensors from worksheet data.
<details>

@ -1,6 +1,7 @@
---
sidebar_position: 8
title: Workbook Helpers
hide_table_of_contents: true
---
Many utility functions return worksheet objects. Worksheets cannot be written to
@ -9,10 +10,12 @@ workbook file formats directly. They must be added to a workbook object.
**Create a new workbook**
```js
var workbook = XLSX.utils.book_new();
var wb_sans_sheets = XLSX.utils.book_new();
```
The `book_new` utility function creates an empty workbook with no worksheets.
With no arguments, the `book_new` utility function creates an empty workbook.
:::info pass
Spreadsheet software generally require at least one worksheet and enforce the
requirement in the user interface. For example, if the last worksheet is deleted
@ -21,6 +24,29 @@ in the program, Apple Numbers will automatically create a new blank sheet.
The SheetJS [write functions](/docs/api/write-options) enforce the requirement.
They will throw errors when trying to export empty workbooks.
:::
_Single Worksheet_
:::tip pass
Version `0.20.1` introduced the one and two argument forms of `book_new`. It is
strongly recommended to [upgrade](/docs/getting-started/installation/).
:::
```js
var wb_with_sheet_named_Sheet1 = XLSX.utils.book_new(worksheet);
var wb_with_sheet_named_Blatte = XLSX.utils.book_new(worksheet, "Blatte");
```
`book_new` can accept one or two arguments.
If provided, the first argument is expected to be a worksheet object. It will
be added to the new workbook.
If provided, the second argument is the name of the worksheet. If omitted, the
default name "Sheet1" will be used.
**Append a Worksheet to a Workbook**

@ -21,7 +21,7 @@ flowchart LR
wb(SheetJS\nWorkbook)
file[(workbook\nfile)]
html --> |table_to_sheet\n\n| ws
ws --> |book_new\nbook_append_sheet| wb
ws --> |book_new\n\n| wb
wb --> |writeFile\n\n| file
```

@ -116,7 +116,7 @@ _Exporting Formulae:_
_Workbook Operations:_
- `book_new` creates an empty workbook
- `book_new` creates a workbook object
- `book_append_sheet` adds a worksheet to a workbook
**[Utility Functions](/docs/api/utilities)**

@ -37,15 +37,15 @@ building, reproducing official releases, and running NodeJS and browser tests.
These instructions were tested on the following platforms:
| Platform | Test Date |
|:------------------------------|:-----------|
| Linux (Steam Deck Holo x64) | 2023-11-27 |
| Linux (Ubuntu 18 AArch64) | 2023-12-01 |
| MacOS 10.13.6 (x64) | 2023-09-30 |
| MacOS 14.1.2 (ARM64) | 2023-12-01 |
| Windows 10 (x64) + WSL Ubuntu | 2023-11-27 |
| Windows 11 (x64) + WSL Ubuntu | 2023-10-14 |
| Windows 11 (ARM) + WSL Ubuntu | 2023-09-18 |
| Platform | Architecture | Test Date |
|:------------------------------|:-------------|:-----------|
| Linux (Steam Deck Holo x64) | `linux-x64` | 2023-11-27 |
| Linux (Ubuntu 18 AArch64) | `linux-arm` | 2023-12-01 |
| MacOS 10.13.6 (x64) | `darwin-x64` | 2023-09-30 |
| MacOS 14.1.2 (ARM64) | `darwin-arm` | 2023-12-01 |
| Windows 10 (x64) + WSL Ubuntu | `win10-x64` | 2023-11-27 |
| Windows 11 (x64) + WSL Ubuntu | `win11-x64` | 2023-10-14 |
| Windows 11 (ARM) + WSL Ubuntu | `win11-arm` | 2023-09-18 |
With some additional dependencies, the unminified scripts are reproducible and
tests will pass in Windows XP with NodeJS 5.10.0.
@ -525,28 +525,28 @@ echo 'export PATH="/usr/local/opt/gnu-sed/libexec/gnubin:$PATH"' >> ~/.profile
### Reproduce official builds
5) Run `git log` and search for the commit that matches a particular release
version. For example, version `0.20.0` can be found with:
version. For example, version `0.20.1` can be found with:
```bash
git log | grep -B4 "version bump 0.20.0"
git log | grep -B4 "version bump 0.20.1"
```
The output should look like:
```bash
$ git log | grep -B4 "version bump 0.20.0"
$ git log | grep -B4 "version bump 0.20.1"
# highlight-next-line
commit 955543147dac0274d20307057c5a9f3e3e5d5307 <-- this is the commit hash
commit 29d46c07a895bdfd948d15b5115529ae697ccb48 <-- this is the commit hash
Author: SheetJS <dev@sheetjs.com>
Date: Fri Jun 23 05:48:47 2023 -0400
Date: Tue Dec 5 03:19:42 2023 -0500
version bump 0.20.0
version bump 0.20.1
```
6) Switch to that commit:
```bash
git checkout 955543147dac0274d20307057c5a9f3e3e5d5307
git checkout 29d46c07a895bdfd948d15b5115529ae697ccb48
```
7) Run the full build sequence
@ -593,36 +593,36 @@ The checksum for the CDN version can be computed with:
<TabItem value="wsl" label="Windows WSL">
```bash
curl -L https://cdn.sheetjs.com/xlsx-0.20.0/package/dist/xlsx.full.min.js | md5sum -
curl -L https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js | md5sum -
```
</TabItem>
<TabItem value="osx" label="MacOS">
```bash
curl -k -L https://cdn.sheetjs.com/xlsx-0.20.0/package/dist/xlsx.full.min.js | md5
curl -k -L https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js | md5
```
</TabItem>
<TabItem value="l" label="Linux">
```bash
curl -L https://cdn.sheetjs.com/xlsx-0.20.0/package/dist/xlsx.full.min.js | md5sum -
curl -L https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js | md5sum -
```
</TabItem>
</Tabs>
When the demo was last tested on macOS, against version `0.20.0`:
When the demo was last tested on macOS, against version `0.20.1`:
>
```bash
$ md5 dist/xlsx.full.min.js
# highlight-next-line
MD5 (dist/xlsx.full.min.js) = 0b2f539797f92d35c6394274818f2c22
$ curl -k -L https://cdn.sheetjs.com/xlsx-0.20.0/package/dist/xlsx.full.min.js | md5
MD5 (dist/xlsx.full.min.js) = c5db4b1d2a1985a4ebfbaa500243f593
$ curl -k -L https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js | md5
# highlight-next-line
0b2f539797f92d35c6394274818f2c22
c5db4b1d2a1985a4ebfbaa500243f593
```
The two hashes should match.

@ -227,9 +227,11 @@ const config = {
{ from: '/docs/getting-started/demos/cli', to: '/docs/demos/desktop/cli/' },
{ from: '/docs/getting-started/demos/desktop', to: '/docs/demos/desktop/' },
/* bigdata */
{ from: '/docs/demos/ml', to: '/docs/demos/bigdata/ml/' },
{ from: '/docs/demos/worker', to: '/docs/demos/bigdata/worker/' },
{ from: '/docs/demos/stream', to: '/docs/demos/bigdata/stream/' },
/* math */
{ from: '/docs/demos/ml', to: '/docs/demos/math/' },
{ from: '/docs/demos/bigdata/ml', to: '/docs/demos/math/' },
/* installation */
{ from: '/docs/installation/standalone', to: '/docs/getting-started/installation/standalone/' },
{ from: '/docs/installation/frameworks', to: '/docs/getting-started/installation/frameworks/' },

@ -26,7 +26,7 @@
"prism-react-renderer": "1.3.5",
"react": "17.0.2",
"react-dom": "17.0.2",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.0/xlsx-0.20.0.tgz"
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.1/xlsx-0.20.1.tgz"
},
"devDependencies": {
"@docusaurus/module-type-aliases": "2.4.1"

@ -1,6 +1,6 @@
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.0/package/types/index.d.ts"
import { read, utils, set_cptable, version } from 'https://cdn.sheetjs.com/xlsx-0.20.0/package/xlsx.mjs';
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.20.0/package/dist/cpexcel.full.mjs';
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.1/package/types/index.d.ts"
import { read, utils, set_cptable, version } from 'https://cdn.sheetjs.com/xlsx-0.20.1/package/xlsx.mjs';
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/cpexcel.full.mjs';
set_cptable(cptable);
import * as Drash from "https://cdn.jsdelivr.net/gh/drashland/drash@v2.8.1/mod.ts";

@ -64,6 +64,10 @@ async function do_file(files) {
process_wb(XLSX.read(data));
}
(async() => {
process_wb(XLSX.read(await (await fetch("https://sheetjs.com/pres.numbers")).arrayBuffer()))
})();
var drop = document.getElementById('drop');
function handleDrop(e) {

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

@ -1,3 +1,3 @@
//const version = "0.20.0";
//const version = "0.20.1";
import { version } from "xlsx";
export default version;