docs.sheetjs.com/docz/docs/03-demos/32-extensions/09-mathematica.md
2023-09-05 14:04:23 -04:00

5.2 KiB

title sidebar_label description pagination_prev pagination_next
Spreadsheet Processing in Mathematica Mathematica Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks. demos/cloud/index demos/bigdata/index

import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';

Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.

:::note

This demo was last tested in 2023 August 21 in Mathematica 13.2.1.

:::

Integration Details

The SheetJS NodeJS module can be loaded in NodeJS scripts, including scripts invoked using the "NodeJS" mode of the ExternalEvaluate1 Mathematica function.

:::caution pass

In local testing, there were incompatibilities with recent NodeJS versions.

This is a Mathematica bug.

:::

The current recommendation involves a dedicated command-line tool that leverages SheetJS libraries to to perform spreadsheet processing.

Command-Line Tools

The "Command-Line Tools" demo creates xlsx-cli, a command-line tool that reads a spreadsheet file and generates CSV rows from the first worksheet.

ExternalEvaluate2 can run command-line tools and capture standard output. The following snippet processes ~/Downloads.pres.numbers and pulls CSV data into a variable in Mathematica:

cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];

ImportString3 can interpret the CSV data as a Dataset4. Typically the first row of the CSV output is the header row. The HeaderLines5 option controls how Mathematica parses the data:

data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]

The following diagram depicts the workbook waltz:

flowchart LR
  subgraph SheetJS operations
    file[(workbook\nfile)]
    csv(CSV)
  end
  csvstr(CSV\nString)
  data[(Dataset)]
  file --> |`xlsx-cli`\nSheetJS Ops| csv
  csv --> |ExternalEvaluate\nMathematica| csvstr
  csvstr --> |ImportString\nMathematica| data

Complete Demo

:::info pass

This demo was tested in macOS. The path names will differ in other platforms.

:::

  1. Create the standalone xlsx-cli binary6:

{\ cd /tmp npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2 curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js npx nexe -t 14.15.3 xlsx-cli.js}

  1. Move the generated xlsx-cli to a fixed location in /usr/local/bin:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/

Reading a Local File

  1. In a new Mathematica notebook, run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
  "Shell" -> "StandardOutput",
  "/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
  1. Download https://sheetjs.com/pres.numbers and save to Downloads folder.

  2. In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be "~/Downloads/pres.numbers" in macOS:

data = SheetJSImportFile["~/Downloads/pres.numbers"]

The result should be displayed in a concise table.

Reading from a URL

FetchURL7 downloads a file from a specified URL and returns a path to the file. This function will be wrapped in a new function called SheetJSImportURL.

  1. In the same notebook, run the following:
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
  path = FetchURL[x];
  SheetJSImportFile[path]
)];
  1. Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://sheetjs.com/pres.numbers"]

  1. See the ExternalEvaluate Node.js example in the Mathematica documentation. ↩︎

  2. See ExternalEvaluate in the Mathematica documentation. ↩︎

  3. See ImportString in the Mathematica documentation. ↩︎

  4. A Dataset will be created when using the "Dataset" element in ImportString ↩︎

  5. See HeaderLines in the Mathematica documentation. ↩︎

  6. See "Command-line Tools" for more details. ↩︎

  7. Mathematica 11 introduced new methods including URLRead. ↩︎