2023-04-22 23:25:24 +00:00
---
2023-08-21 23:07:34 +00:00
title: Spreadsheet Processing in Mathematica
sidebar_label: Mathematica
description: Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks.
2023-04-22 23:25:24 +00:00
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
2024-05-04 16:15:00 +00:00
sidebar_custom_props:
summary: Generate Mathematica-compatible CSVs from arbitrary workbooks
2023-04-22 23:25:24 +00:00
---
2023-04-27 09:12:19 +00:00
import current from '/version.js';
2023-11-04 05:05:26 +00:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
2023-05-07 13:58:36 +00:00
import CodeBlock from '@theme/CodeBlock';
2023-04-27 09:12:19 +00:00
2023-08-21 23:07:34 +00:00
[Mathematica ](https://mathematica.com ) is a software system for mathematics and
scientific computing. It supports command-line tools and JavaScript extensions.
[SheetJS ](https://sheetjs.com ) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis
within Mathematica. We'll explore how to run an external tool to generate CSV
data from opaque spreadsheets and parse the data from Mathematica.
2023-11-04 05:05:26 +00:00
:::note Tested Deployments
2023-04-22 23:25:24 +00:00
2023-11-04 05:05:26 +00:00
This demo was last tested by SheetJS users on 2023 November 04 in Mathematica 13.
2023-04-22 23:25:24 +00:00
:::
2023-08-21 23:07:34 +00:00
## Integration Details
2023-04-22 23:25:24 +00:00
2023-08-21 23:07:34 +00:00
The [SheetJS NodeJS module ](/docs/getting-started/installation/nodejs ) can be
loaded in NodeJS scripts, including scripts invoked using the `"NodeJS"` mode
of the `ExternalEvaluate` [^1] Mathematica function.
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
However, the current cross-platform recommendation involves a dedicated command
line tool that leverages SheetJS libraries to to perform spreadsheet processing.
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
### External Engines
2023-04-22 23:25:24 +00:00
2023-09-14 08:19:13 +00:00
The following diagram depicts the workbook waltz:
```mermaid
flowchart LR
subgraph `ExternalEvaluate`
file[(workbook\nfile)]
csvstr(CSV\nString)
end
data[(Dataset)]
file --> |NodeJS\nSheetJS Ops| csvstr
csvstr --> |ImportString\nMathematica| data
```
_Mathematica_
NodeJS can be activated from Mathematica using `RegisterExternalEvaluator` [^2].
Once activated, JavaScript code can be run using `ExternalEvaluate` [^3]. If the
NodeJS code returns CSV data, `ImportString` [^4] can generate a `Dataset` [^5].
_SheetJS_
For a file residing on the filesystem, the SheetJS `readFile` function[^6] can
generate a workbook object. The exact location can be determined by printing
`require("process").cwd()` [^7] in `ExternalEvaluate` :
```mathematica
In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"
```
After pulling the first worksheet[^8], the SheetJS `sheet_to_csv` function[^9]
generates a CSV string.
_Complete Function_
The following function reads a file, parses the first worksheet and returns a
Dataset object assuming one header row.
2023-11-04 05:05:26 +00:00
< Tabs groupId = "os" >
< TabItem value = "unix" label = "Linux/MacOS" >
```mathematica title="Complete Function"
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
```
< / TabItem >
< TabItem value = "win" label = "Windows" >
2023-09-14 08:19:13 +00:00
```mathematica title="Complete Function"
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
2023-11-04 05:05:26 +00:00
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
2023-09-14 08:19:13 +00:00
)]
```
2023-11-04 05:05:26 +00:00
< / TabItem >
< / Tabs >
2024-04-08 04:47:04 +00:00
< details open >
< summary > < b > How to run the example< / b > (click to hide)< / summary >
2023-09-14 08:19:13 +00:00
2023-11-04 05:05:26 +00:00
:::note Tested Deployments
2023-09-14 08:19:13 +00:00
2023-11-04 05:05:26 +00:00
This example was last tested on 2023 November 04 with Mathematica 13.3.
2023-04-22 23:25:24 +00:00
:::
2023-11-04 05:05:26 +00:00
0) Install NodeJS. When the demo was tested, version `20.9.0` was installed.
2023-09-14 08:19:13 +00:00
1) Install dependencies in the Home folder (`~` or `$HOME` or `%HOMEPATH%` ):
< CodeBlock language = "bash" > {`\
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz zeromq@6.0.0-beta.17`}
< / CodeBlock >
2) Open a new Mathematica Notebook and register NodeJS. When the example was
tested in Windows, the commands were:
2023-11-04 05:05:26 +00:00
< Tabs groupId = "os" >
< TabItem value = "unix" label = "Linux/MacOS" >
```mathematica
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]
```
< / TabItem >
< TabItem value = "win" label = "Windows" >
2023-09-14 08:19:13 +00:00
```mathematica
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"]
FindExternalEvaluators["NodeJS"]
```
2023-11-04 05:05:26 +00:00
< / TabItem >
< / Tabs >
2023-09-14 08:19:13 +00:00
The second argument to `RegisterExternalEvaluator` should be the path to the
`node` or `node.exe` binary.
If NodeJS is registered, the value in the "Registered" column will be "True".
4) To determine the base folder, run `require("process").cwd()` from NodeJS:
```mathematica
ExternalEvaluate["NodeJS", "require('process').cwd()"]
```
2024-04-26 04:16:13 +00:00
5) Download [`pres.numbers` ](https://docs.sheetjs.com/pres.numbers ) and move
the file to the base folder as shown in the previous step.
2023-09-14 08:19:13 +00:00
6) Copy and evaluate the "Complete Function" in the previous codeblock.
7) Run the function and confirm the result is a proper Dataset:
```mathematica
SheetJSImportFileEE["pres.numbers"]
```
2023-11-04 05:05:26 +00:00
![SheetJSImportFileEE result ](pathname:///mathematica/SheetJSImportFileEE.png )
2023-09-14 08:19:13 +00:00
< / details >
2023-08-21 23:07:34 +00:00
2023-04-22 23:25:24 +00:00
### Command-Line Tools
2024-03-18 08:24:41 +00:00
The ["Command-Line Tools" demo ](/docs/demos/cli ) creates `xlsx-cli` , a
2023-08-21 23:07:34 +00:00
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
2023-09-14 08:19:13 +00:00
`ExternalEvaluate` [^10] can run command-line tools and capture standard output.
The following snippet processes `~/Downloads/pres.numbers` and pulls CSV data
2023-08-21 23:07:34 +00:00
into a variable in Mathematica:
2023-04-22 23:25:24 +00:00
```mathematica
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
```
2023-09-14 08:19:13 +00:00
`ImportString` [^11] can interpret the CSV data as a `Dataset` [^12]. Typically the
first row of the CSV output is the header row. The `HeaderLines` [^13] option
2023-04-22 23:25:24 +00:00
controls how Mathematica parses the data:
```mathematica
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
```
2023-08-21 23:07:34 +00:00
The following diagram depicts the workbook waltz:
```mermaid
flowchart LR
2023-09-14 08:19:13 +00:00
subgraph `ExternalEvaluate`
2023-08-21 23:07:34 +00:00
file[(workbook\nfile)]
2023-09-14 08:19:13 +00:00
csvstr(CSV\nString)
2023-08-21 23:07:34 +00:00
end
data[(Dataset)]
2023-09-14 08:19:13 +00:00
file --> |`xlsx-cli`\nSheetJS Ops| csvstr
2023-08-21 23:07:34 +00:00
csvstr --> |ImportString\nMathematica| data
```
2023-04-22 23:25:24 +00:00
## Complete Demo
2023-09-14 08:19:13 +00:00
1) Create the standalone `xlsx-cli` binary[^14]:
2023-04-22 23:25:24 +00:00
2023-05-07 13:58:36 +00:00
< CodeBlock language = "bash" > {`\
2023-04-27 09:12:19 +00:00
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2
2023-04-22 23:25:24 +00:00
curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js
2023-04-27 09:12:19 +00:00
npx nexe -t 14.15.3 xlsx-cli.js`}
2023-05-07 13:58:36 +00:00
< / CodeBlock >
2023-04-22 23:25:24 +00:00
2023-11-04 05:05:26 +00:00
< Tabs groupId = "os" >
< TabItem value = "unix" label = "Linux/MacOS" >
2023-04-22 23:25:24 +00:00
2) Move the generated `xlsx-cli` to a fixed location in `/usr/local/bin` :
```bash
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
```
2023-11-04 05:05:26 +00:00
< / TabItem >
< TabItem value = "win" label = "Windows" >
2) Find the current directory:
```bash
cd
```
The generated binary will be `xlsx-cli.exe` in the displayed path.
< / TabItem >
< / Tabs >
2023-04-22 23:25:24 +00:00
### Reading a Local File
3) In a new Mathematica notebook, run the following snippet:
```mathematica
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
2023-11-04 05:05:26 +00:00
// highlight-next-line
2023-04-22 23:25:24 +00:00
"/usr/local/bin/xlsx-cli " < > x
]], "Dataset", "HeaderLines" -> 1]
```
2023-11-04 05:05:26 +00:00
< Tabs groupId = "os" >
< TabItem value = "unix" label = "Linux/MacOS" >
< / TabItem >
< TabItem value = "win" label = "Windows" >
Change `/usr/local/bin/xlsx-cli` in the string to the path to the generated
`xlsx-cli.exe` binary. For example, if the path in step 2 was
`C:\Users\Me\Documents\` , then the code should be:
```mathematica
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
// highlight-next-line
"C:\\Users\\Me\\Documents\\xlsx-cli.exe " < > x
]], "Dataset", "HeaderLines" -> 1]
```
The `\` characters must be doubled.
< / TabItem >
< / Tabs >
2024-04-26 04:16:13 +00:00
4) Download https://docs.sheetjs.com/pres.numbers and save to Downloads folder:
2023-09-14 08:19:13 +00:00
```bash
cd ~/Downloads/
2024-04-26 04:16:13 +00:00
curl -LO https://docs.sheetjs.com/pres.numbers
2023-09-14 08:19:13 +00:00
```
2023-04-22 23:25:24 +00:00
5) In the Mathematica notebook, run the new function. If the file was saved to
the Downloads folder, the path will be `"~/Downloads/pres.numbers"` in macOS:
```mathematica
data = SheetJSImportFile["~/Downloads/pres.numbers"]
```
The result should be displayed in a concise table.
2023-11-04 05:05:26 +00:00
![SheetJSImportFile result ](pathname:///mathematica/SheetJSImportFile.png )
2023-04-22 23:25:24 +00:00
### Reading from a URL
2023-09-14 08:19:13 +00:00
`FetchURL` [^15] downloads a file from a specified URL and returns a path to the
2023-08-21 23:07:34 +00:00
file. This function will be wrapped in a new function called `SheetJSImportURL` .
2023-04-22 23:25:24 +00:00
6) In the same notebook, run the following:
```mathematica
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
2023-08-21 23:07:34 +00:00
path = FetchURL[x];
2023-04-22 23:25:24 +00:00
SheetJSImportFile[path]
)];
```
7) Test by downloading the test file in the notebook:
```mathematica
2024-04-26 04:16:13 +00:00
data = SheetJSImportURL["https://docs.sheetjs.com/pres.numbers"]
2023-04-22 23:25:24 +00:00
```
2023-08-21 23:07:34 +00:00
2023-11-04 05:05:26 +00:00
![SheetJSImportURL result ](pathname:///mathematica/SheetJSImportURL.png )
2023-08-21 23:07:34 +00:00
[^1]: See [the `ExternalEvaluate` Node.js example ](https://reference.wolfram.com/language/ref/ExternalEvaluate.html#:~:text=Evaluate%20a%20basic%20math%20function%20in%20JavaScript%20using%20Node.js%3A ) in the Mathematica documentation.
2023-09-14 08:19:13 +00:00
[^2]: See [`RegisterExternalEvaluator` ](https://reference.wolfram.com/language/ref/RegisterExternalEvaluator.html ) in the Mathematica documentation.
[^3]: See [`ExternalEvaluate` ](https://reference.wolfram.com/language/ref/ExternalEvaluate.html ) in the Mathematica documentation.
[^4]: See [`ImportString` ](https://reference.wolfram.com/language/ref/ImportString.html ) in the Mathematica documentation.
[^5]: A [`Dataset` ](https://reference.wolfram.com/language/ref/Dataset.html ) will be created when using the [`"Dataset"` element in `ImportString` ](https://reference.wolfram.com/language/ref/format/CSV.html )
[^6]: See [`readFile` in "Reading Files" ](/docs/api/parse-options )
[^7]: See [`process.cwd()` ](https://nodejs.org/api/process.html#processcwd ) in the NodeJS documentation.
[^8]: The `Sheets` and `SheetNames` properties of workbook objects are described in ["Workbook Object" ](/docs/csf/book )
[^9]: See [`sheet_to_csv` in "CSV and Text" ](/docs/api/utilities/csv#delimiter-separated-output )
[^10]: See [`ExternalEvaluate` ](https://reference.wolfram.com/language/ref/ExternalEvaluate.html ) in the Mathematica documentation.
[^11]: See [`ImportString` ](https://reference.wolfram.com/language/ref/ImportString.html ) in the Mathematica documentation.
[^12]: A [`Dataset` ](https://reference.wolfram.com/language/ref/Dataset.html ) will be created when using the [`"Dataset"` element in `ImportString` ](https://reference.wolfram.com/language/ref/format/CSV.html )
[^13]: See [`HeaderLines` ](https://reference.wolfram.com/language/ref/HeaderLines.html ) in the Mathematica documentation.
2024-03-18 08:24:41 +00:00
[^14]: See ["Command-line Tools" ](/docs/demos/cli ) for more details.
2023-09-14 08:19:13 +00:00
[^15]: Mathematica 11 introduced new methods including [`URLRead` ](https://reference.wolfram.com/language/ref/URLRead.html ).