--- title: Spreadsheet Processing in Mathematica sidebar_label: Mathematica description: Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks. pagination_prev: demos/cloud/index pagination_next: demos/bigdata/index --- import current from '/version.js'; import CodeBlock from '@theme/CodeBlock'; [Mathematica](https://mathematica.com) is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions. [SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing data from spreadsheets. This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica. :::note This demo was last tested in 2023 August 21 in Mathematica 13.2.1. ::: ## Integration Details The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be loaded in NodeJS scripts, including scripts invoked using the `"NodeJS"` mode of the `ExternalEvaluate`[^1] Mathematica function. :::caution pass In local testing, there were incompatibilities with recent NodeJS versions. **This is a Mathematica bug.** ::: The current recommendation involves a dedicated command-line tool that leverages SheetJS libraries to to perform spreadsheet processing. ### Command-Line Tools The ["Command-Line Tools" demo](/docs/demos/desktop/cli) creates `xlsx-cli`, a command-line tool that reads a spreadsheet file and generates CSV rows from the first worksheet. `ExternalEvaluate`[^2] can run command-line tools and capture standard output. The following snippet processes `~/Downloads.pres.numbers` and pulls CSV data into a variable in Mathematica: ```mathematica cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers" csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd]; ``` `ImportString`[^3] can interpret the CSV data as a `Dataset`[^4]. Typically the first row of the CSV output is the header row. The `HeaderLines`[^5] option controls how Mathematica parses the data: ```mathematica data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1] ``` The following diagram depicts the workbook waltz: ```mermaid flowchart LR subgraph SheetJS operations file[(workbook\nfile)] csv(CSV) end csvstr(CSV\nString) data[(Dataset)] file --> |`xlsx-cli`\nSheetJS Ops| csv csv --> |ExternalEvaluate\nMathematica| csvstr csvstr --> |ImportString\nMathematica| data ``` ## Complete Demo :::info pass This demo was tested in macOS. The path names will differ in other platforms. ::: 1) Create the standalone `xlsx-cli` binary[^6]: {`\ cd /tmp npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2 curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js npx nexe -t 14.15.3 xlsx-cli.js`} 2) Move the generated `xlsx-cli` to a fixed location in `/usr/local/bin`: ```bash mkdir -p /usr/local/bin mv xlsx-cli /usr/local/bin/ ``` ### Reading a Local File 3) In a new Mathematica notebook, run the following snippet: ```mathematica SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[ "Shell" -> "StandardOutput", "/usr/local/bin/xlsx-cli " <> x ]], "Dataset", "HeaderLines" -> 1] ``` 4) Download and save to Downloads folder. 5) In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be `"~/Downloads/pres.numbers"` in macOS: ```mathematica data = SheetJSImportFile["~/Downloads/pres.numbers"] ``` The result should be displayed in a concise table. ### Reading from a URL `FetchURL`[^7] downloads a file from a specified URL and returns a path to the file. This function will be wrapped in a new function called `SheetJSImportURL`. 6) In the same notebook, run the following: ```mathematica Needs["Utilities`URLTools`"]; SheetJSImportURL[x_] := Module[{path},( path = FetchURL[x]; SheetJSImportFile[path] )]; ``` 7) Test by downloading the test file in the notebook: ```mathematica data = SheetJSImportURL["https://sheetjs.com/pres.numbers"] ``` [^1]: See [the `ExternalEvaluate` Node.js example](https://reference.wolfram.com/language/ref/ExternalEvaluate.html#:~:text=Evaluate%20a%20basic%20math%20function%20in%20JavaScript%20using%20Node.js%3A) in the Mathematica documentation. [^2]: See [`ExternalEvaluate`](https://reference.wolfram.com/language/ref/ExternalEvaluate.html) in the Mathematica documentation. [^3]: See [`ImportString`](https://reference.wolfram.com/language/ref/ImportString.html) in the Mathematica documentation. [^4]: A [`Dataset`](https://reference.wolfram.com/language/ref/Dataset.html) will be created when using the [`"Dataset"` element in `ImportString`](https://reference.wolfram.com/language/ref/format/CSV.html) [^5]: See [`HeaderLines`](https://reference.wolfram.com/language/ref/HeaderLines.html) in the Mathematica documentation. [^6]: See ["Command-line Tools"](/docs/demos/desktop/cli) for more details. [^7]: Mathematica 11 introduced new methods including [`URLRead`](https://reference.wolfram.com/language/ref/URLRead.html).