docs.sheetjs.com/docz/docs/03-demos/32-extensions/09-mathematica.md

156 lines
5.2 KiB
Markdown
Raw Normal View History

2023-04-22 23:25:24 +00:00
---
2023-08-21 23:07:34 +00:00
title: Spreadsheet Processing in Mathematica
sidebar_label: Mathematica
description: Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks.
2023-04-22 23:25:24 +00:00
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
---
2023-04-27 09:12:19 +00:00
import current from '/version.js';
2023-05-07 13:58:36 +00:00
import CodeBlock from '@theme/CodeBlock';
2023-04-27 09:12:19 +00:00
2023-08-21 23:07:34 +00:00
[Mathematica](https://mathematica.com) is a software system for mathematics and
scientific computing. It supports command-line tools and JavaScript extensions.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis
within Mathematica. We'll explore how to run an external tool to generate CSV
data from opaque spreadsheets and parse the data from Mathematica.
2023-04-22 23:25:24 +00:00
:::note
2023-08-21 23:07:34 +00:00
This demo was last tested in 2023 August 21 in Mathematica 13.2.1.
2023-04-22 23:25:24 +00:00
:::
2023-08-21 23:07:34 +00:00
## Integration Details
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be
loaded in NodeJS scripts, including scripts invoked using the `"NodeJS"` mode
of the `ExternalEvaluate`[^1] Mathematica function.
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
:::caution pass
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
In local testing, there were incompatibilities with recent NodeJS versions.
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
**This is a Mathematica bug.**
2023-04-22 23:25:24 +00:00
:::
2023-08-21 23:07:34 +00:00
The current recommendation involves a dedicated command-line tool that leverages
SheetJS libraries to to perform spreadsheet processing.
2023-04-22 23:25:24 +00:00
### Command-Line Tools
2023-08-21 23:07:34 +00:00
The ["Command-Line Tools" demo](/docs/demos/desktop/cli) creates `xlsx-cli`, a
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
`ExternalEvaluate`[^2] can run command-line tools and capture standard output.
The following snippet processes `~/Downloads.pres.numbers` and pulls CSV data
into a variable in Mathematica:
2023-04-22 23:25:24 +00:00
```mathematica
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
```
2023-08-21 23:07:34 +00:00
`ImportString`[^3] can interpret the CSV data as a `Dataset`[^4]. Typically the
first row of the CSV output is the header row. The `HeaderLines`[^5] option
2023-04-22 23:25:24 +00:00
controls how Mathematica parses the data:
```mathematica
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
```
2023-08-21 23:07:34 +00:00
The following diagram depicts the workbook waltz:
```mermaid
flowchart LR
subgraph SheetJS operations
file[(workbook\nfile)]
csv(CSV)
end
csvstr(CSV\nString)
data[(Dataset)]
file --> |`xlsx-cli`\nSheetJS Ops| csv
csv --> |ExternalEvaluate\nMathematica| csvstr
csvstr --> |ImportString\nMathematica| data
```
2023-04-22 23:25:24 +00:00
## Complete Demo
2023-08-21 23:07:34 +00:00
:::info pass
2023-04-22 23:25:24 +00:00
This demo was tested in macOS. The path names will differ in other platforms.
:::
2023-08-21 23:07:34 +00:00
1) Create the standalone `xlsx-cli` binary[^6]:
2023-04-22 23:25:24 +00:00
2023-05-07 13:58:36 +00:00
<CodeBlock language="bash">{`\
2023-04-22 23:25:24 +00:00
cd /tmp
2023-04-27 09:12:19 +00:00
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2
2023-04-22 23:25:24 +00:00
curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js
2023-04-27 09:12:19 +00:00
npx nexe -t 14.15.3 xlsx-cli.js`}
2023-05-07 13:58:36 +00:00
</CodeBlock>
2023-04-22 23:25:24 +00:00
2) Move the generated `xlsx-cli` to a fixed location in `/usr/local/bin`:
```bash
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
```
### Reading a Local File
3) In a new Mathematica notebook, run the following snippet:
```mathematica
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
```
4) Download <https://sheetjs.com/pres.numbers> and save to Downloads folder.
5) In the Mathematica notebook, run the new function. If the file was saved to
the Downloads folder, the path will be `"~/Downloads/pres.numbers"` in macOS:
```mathematica
data = SheetJSImportFile["~/Downloads/pres.numbers"]
```
The result should be displayed in a concise table.
### Reading from a URL
2023-08-21 23:07:34 +00:00
`FetchURL`[^7] downloads a file from a specified URL and returns a path to the
file. This function will be wrapped in a new function called `SheetJSImportURL`.
2023-04-22 23:25:24 +00:00
6) In the same notebook, run the following:
```mathematica
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
2023-08-21 23:07:34 +00:00
path = FetchURL[x];
2023-04-22 23:25:24 +00:00
SheetJSImportFile[path]
)];
```
7) Test by downloading the test file in the notebook:
```mathematica
data = SheetJSImportURL["https://sheetjs.com/pres.numbers"]
```
2023-08-21 23:07:34 +00:00
[^1]: See [the `ExternalEvaluate` Node.js example](https://reference.wolfram.com/language/ref/ExternalEvaluate.html#:~:text=Evaluate%20a%20basic%20math%20function%20in%20JavaScript%20using%20Node.js%3A) in the Mathematica documentation.
[^2]: See [`ExternalEvaluate`](https://reference.wolfram.com/language/ref/ExternalEvaluate.html) in the Mathematica documentation.
[^3]: See [`ImportString`](https://reference.wolfram.com/language/ref/ImportString.html) in the Mathematica documentation.
[^4]: A [`Dataset`](https://reference.wolfram.com/language/ref/Dataset.html) will be created when using the [`"Dataset"` element in `ImportString`](https://reference.wolfram.com/language/ref/format/CSV.html)
[^5]: See [`HeaderLines`](https://reference.wolfram.com/language/ref/HeaderLines.html) in the Mathematica documentation.
[^6]: See ["Command-line Tools"](/docs/demos/desktop/cli) for more details.
[^7]: Mathematica 11 introduced new methods including [`URLRead`](https://reference.wolfram.com/language/ref/URLRead.html).