docs.sheetjs.com/docz/docs/03-demos/32-extensions/09-mathematica.md

359 lines
12 KiB
Markdown
Raw Normal View History

2023-04-22 23:25:24 +00:00
---
2023-08-21 23:07:34 +00:00
title: Spreadsheet Processing in Mathematica
sidebar_label: Mathematica
description: Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks.
2023-04-22 23:25:24 +00:00
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
---
2023-04-27 09:12:19 +00:00
import current from '/version.js';
2023-11-04 05:05:26 +00:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
2023-05-07 13:58:36 +00:00
import CodeBlock from '@theme/CodeBlock';
2023-04-27 09:12:19 +00:00
2023-08-21 23:07:34 +00:00
[Mathematica](https://mathematica.com) is a software system for mathematics and
scientific computing. It supports command-line tools and JavaScript extensions.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis
within Mathematica. We'll explore how to run an external tool to generate CSV
data from opaque spreadsheets and parse the data from Mathematica.
2023-11-04 05:05:26 +00:00
:::note Tested Deployments
2023-04-22 23:25:24 +00:00
2023-11-04 05:05:26 +00:00
This demo was last tested by SheetJS users on 2023 November 04 in Mathematica 13.
2023-04-22 23:25:24 +00:00
:::
2023-08-21 23:07:34 +00:00
## Integration Details
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be
loaded in NodeJS scripts, including scripts invoked using the `"NodeJS"` mode
of the `ExternalEvaluate`[^1] Mathematica function.
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
However, the current cross-platform recommendation involves a dedicated command
line tool that leverages SheetJS libraries to to perform spreadsheet processing.
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
### External Engines
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
The following diagram depicts the workbook waltz:
```mermaid
flowchart LR
subgraph `ExternalEvaluate`
file[(workbook\nfile)]
csvstr(CSV\nString)
end
data[(Dataset)]
file --> |NodeJS\nSheetJS Ops| csvstr
csvstr --> |ImportString\nMathematica| data
```
_Mathematica_
NodeJS can be activated from Mathematica using `RegisterExternalEvaluator`[^2].
Once activated, JavaScript code can be run using `ExternalEvaluate`[^3]. If the
NodeJS code returns CSV data, `ImportString`[^4] can generate a `Dataset`[^5].
_SheetJS_
For a file residing on the filesystem, the SheetJS `readFile` function[^6] can
generate a workbook object. The exact location can be determined by printing
`require("process").cwd()`[^7] in `ExternalEvaluate`:
```mathematica
In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"
```
After pulling the first worksheet[^8], the SheetJS `sheet_to_csv` function[^9]
generates a CSV string.
_Complete Function_
The following function reads a file, parses the first worksheet and returns a
Dataset object assuming one header row.
2023-11-04 05:05:26 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
```mathematica title="Complete Function"
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
```
</TabItem>
<TabItem value="win" label="Windows">
2023-09-14 08:19:13 +00:00
```mathematica title="Complete Function"
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
2023-11-04 05:05:26 +00:00
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
2023-09-14 08:19:13 +00:00
)]
```
2023-11-04 05:05:26 +00:00
</TabItem>
</Tabs>
<details open>
<summary><b>How to run the example</b> (click to hide)</summary>
2023-09-14 08:19:13 +00:00
2023-11-04 05:05:26 +00:00
:::note Tested Deployments
2023-09-14 08:19:13 +00:00
2023-11-04 05:05:26 +00:00
This example was last tested on 2023 November 04 with Mathematica 13.3.
2023-04-22 23:25:24 +00:00
:::
2023-11-04 05:05:26 +00:00
0) Install NodeJS. When the demo was tested, version `20.9.0` was installed.
2023-09-14 08:19:13 +00:00
1) Install dependencies in the Home folder (`~` or `$HOME` or `%HOMEPATH%`):
<CodeBlock language="bash">{`\
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz zeromq@6.0.0-beta.17`}
</CodeBlock>
2) Open a new Mathematica Notebook and register NodeJS. When the example was
tested in Windows, the commands were:
2023-11-04 05:05:26 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
```mathematica
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]
```
</TabItem>
<TabItem value="win" label="Windows">
2023-09-14 08:19:13 +00:00
```mathematica
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"]
FindExternalEvaluators["NodeJS"]
```
2023-11-04 05:05:26 +00:00
</TabItem>
</Tabs>
2023-09-14 08:19:13 +00:00
The second argument to `RegisterExternalEvaluator` should be the path to the
`node` or `node.exe` binary.
If NodeJS is registered, the value in the "Registered" column will be "True".
4) To determine the base folder, run `require("process").cwd()` from NodeJS:
```mathematica
ExternalEvaluate["NodeJS", "require('process').cwd()"]
```
5) Download [`pres.numbers`](https://sheetjs.com/pres.numbers) and move the file
to the base folder as shown in the previous step.
6) Copy and evaluate the "Complete Function" in the previous codeblock.
7) Run the function and confirm the result is a proper Dataset:
```mathematica
SheetJSImportFileEE["pres.numbers"]
```
2023-11-04 05:05:26 +00:00
![SheetJSImportFileEE result](pathname:///mathematica/SheetJSImportFileEE.png)
2023-09-14 08:19:13 +00:00
</details>
2023-08-21 23:07:34 +00:00
2023-04-22 23:25:24 +00:00
### Command-Line Tools
2024-03-18 08:24:41 +00:00
The ["Command-Line Tools" demo](/docs/demos/cli) creates `xlsx-cli`, a
2023-08-21 23:07:34 +00:00
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
2023-09-14 08:19:13 +00:00
`ExternalEvaluate`[^10] can run command-line tools and capture standard output.
The following snippet processes `~/Downloads/pres.numbers` and pulls CSV data
2023-08-21 23:07:34 +00:00
into a variable in Mathematica:
2023-04-22 23:25:24 +00:00
```mathematica
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
```
2023-09-14 08:19:13 +00:00
`ImportString`[^11] can interpret the CSV data as a `Dataset`[^12]. Typically the
first row of the CSV output is the header row. The `HeaderLines`[^13] option
2023-04-22 23:25:24 +00:00
controls how Mathematica parses the data:
```mathematica
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
```
2023-08-21 23:07:34 +00:00
The following diagram depicts the workbook waltz:
```mermaid
flowchart LR
2023-09-14 08:19:13 +00:00
subgraph `ExternalEvaluate`
2023-08-21 23:07:34 +00:00
file[(workbook\nfile)]
2023-09-14 08:19:13 +00:00
csvstr(CSV\nString)
2023-08-21 23:07:34 +00:00
end
data[(Dataset)]
2023-09-14 08:19:13 +00:00
file --> |`xlsx-cli`\nSheetJS Ops| csvstr
2023-08-21 23:07:34 +00:00
csvstr --> |ImportString\nMathematica| data
```
2023-04-22 23:25:24 +00:00
## Complete Demo
2023-09-14 08:19:13 +00:00
1) Create the standalone `xlsx-cli` binary[^14]:
2023-04-22 23:25:24 +00:00
2023-05-07 13:58:36 +00:00
<CodeBlock language="bash">{`\
2023-04-27 09:12:19 +00:00
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2
2023-04-22 23:25:24 +00:00
curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js
2023-04-27 09:12:19 +00:00
npx nexe -t 14.15.3 xlsx-cli.js`}
2023-05-07 13:58:36 +00:00
</CodeBlock>
2023-04-22 23:25:24 +00:00
2023-11-04 05:05:26 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
2023-04-22 23:25:24 +00:00
2) Move the generated `xlsx-cli` to a fixed location in `/usr/local/bin`:
```bash
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
```
2023-11-04 05:05:26 +00:00
</TabItem>
<TabItem value="win" label="Windows">
2) Find the current directory:
```bash
cd
```
The generated binary will be `xlsx-cli.exe` in the displayed path.
</TabItem>
</Tabs>
2023-04-22 23:25:24 +00:00
### Reading a Local File
3) In a new Mathematica notebook, run the following snippet:
```mathematica
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
2023-11-04 05:05:26 +00:00
// highlight-next-line
2023-04-22 23:25:24 +00:00
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
```
2023-11-04 05:05:26 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
</TabItem>
<TabItem value="win" label="Windows">
Change `/usr/local/bin/xlsx-cli` in the string to the path to the generated
`xlsx-cli.exe` binary. For example, if the path in step 2 was
`C:\Users\Me\Documents\`, then the code should be:
```mathematica
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
// highlight-next-line
"C:\\Users\\Me\\Documents\\xlsx-cli.exe " <> x
]], "Dataset", "HeaderLines" -> 1]
```
The `\` characters must be doubled.
</TabItem>
</Tabs>
4) Download https://sheetjs.com/pres.numbers and save to Downloads folder:
2023-09-14 08:19:13 +00:00
```bash
cd ~/Downloads/
curl -LO https://sheetjs.com/pres.numbers
```
2023-04-22 23:25:24 +00:00
5) In the Mathematica notebook, run the new function. If the file was saved to
the Downloads folder, the path will be `"~/Downloads/pres.numbers"` in macOS:
```mathematica
data = SheetJSImportFile["~/Downloads/pres.numbers"]
```
The result should be displayed in a concise table.
2023-11-04 05:05:26 +00:00
![SheetJSImportFile result](pathname:///mathematica/SheetJSImportFile.png)
2023-04-22 23:25:24 +00:00
### Reading from a URL
2023-09-14 08:19:13 +00:00
`FetchURL`[^15] downloads a file from a specified URL and returns a path to the
2023-08-21 23:07:34 +00:00
file. This function will be wrapped in a new function called `SheetJSImportURL`.
2023-04-22 23:25:24 +00:00
6) In the same notebook, run the following:
```mathematica
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
2023-08-21 23:07:34 +00:00
path = FetchURL[x];
2023-04-22 23:25:24 +00:00
SheetJSImportFile[path]
)];
```
7) Test by downloading the test file in the notebook:
```mathematica
data = SheetJSImportURL["https://sheetjs.com/pres.numbers"]
```
2023-08-21 23:07:34 +00:00
2023-11-04 05:05:26 +00:00
![SheetJSImportURL result](pathname:///mathematica/SheetJSImportURL.png)
2023-08-21 23:07:34 +00:00
[^1]: See [the `ExternalEvaluate` Node.js example](https://reference.wolfram.com/language/ref/ExternalEvaluate.html#:~:text=Evaluate%20a%20basic%20math%20function%20in%20JavaScript%20using%20Node.js%3A) in the Mathematica documentation.
2023-09-14 08:19:13 +00:00
[^2]: See [`RegisterExternalEvaluator`](https://reference.wolfram.com/language/ref/RegisterExternalEvaluator.html) in the Mathematica documentation.
[^3]: See [`ExternalEvaluate`](https://reference.wolfram.com/language/ref/ExternalEvaluate.html) in the Mathematica documentation.
[^4]: See [`ImportString`](https://reference.wolfram.com/language/ref/ImportString.html) in the Mathematica documentation.
[^5]: A [`Dataset`](https://reference.wolfram.com/language/ref/Dataset.html) will be created when using the [`"Dataset"` element in `ImportString`](https://reference.wolfram.com/language/ref/format/CSV.html)
[^6]: See [`readFile` in "Reading Files"](/docs/api/parse-options)
[^7]: See [`process.cwd()`](https://nodejs.org/api/process.html#processcwd) in the NodeJS documentation.
[^8]: The `Sheets` and `SheetNames` properties of workbook objects are described in ["Workbook Object"](/docs/csf/book)
[^9]: See [`sheet_to_csv` in "CSV and Text"](/docs/api/utilities/csv#delimiter-separated-output)
[^10]: See [`ExternalEvaluate`](https://reference.wolfram.com/language/ref/ExternalEvaluate.html) in the Mathematica documentation.
[^11]: See [`ImportString`](https://reference.wolfram.com/language/ref/ImportString.html) in the Mathematica documentation.
[^12]: A [`Dataset`](https://reference.wolfram.com/language/ref/Dataset.html) will be created when using the [`"Dataset"` element in `ImportString`](https://reference.wolfram.com/language/ref/format/CSV.html)
[^13]: See [`HeaderLines`](https://reference.wolfram.com/language/ref/HeaderLines.html) in the Mathematica documentation.
2024-03-18 08:24:41 +00:00
[^14]: See ["Command-line Tools"](/docs/demos/cli) for more details.
2023-09-14 08:19:13 +00:00
[^15]: Mathematica 11 introduced new methods including [`URLRead`](https://reference.wolfram.com/language/ref/URLRead.html).