docs.sheetjs.com/docz/docs/03-demos/01-math/21-pandas.md

---
title: Spreadsheet Data in Pandas
sidebar_label: Python + Pandas
description: Process structured data in Python with Pandas. Seamlessly integrate spreadsheets into your workflow with SheetJS. Analyze complex Excel spreadsheets with confidence.
pagination_prev: demos/index
pagination_next: demos/frontend/index
---

import current from '/version.js';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

Pandas[^1] is a Python software library for data analysis.

[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.

This demo uses SheetJS to process data from a spreadsheet and translate to the
Pandas DataFrame format. We'll explore how to load SheetJS from Python scripts,
generate DataFrames from workbooks, and write DataFrames back to workbooks.

The ["Complete Example"](#complete-example) includes a wrapper library that
simplifies importing and exporting spreadsheets.

:::info pass

Pandas includes limited support for reading spreadsheets (`pandas.from_excel`)
and writing XLSX spreadsheets (`pandas.DataFrame.to_excel`).

**SheetJS supports common spreadsheet formats that Pandas cannot process.**

SheetJS operations also offer more flexibility in processing complex worksheets.

:::

:::note Tested Environments

This demo was tested in the following deployments:

| Architecture | JS Engine       | Pandas | Python | Date       |
|:-------------|:----------------|:-------|:-------|:-----------|
| `darwin-x64` | Duktape `2.7.0` | 2.0.3  | 3.11.7 | 2024-01-29 |
| `linux-x64`  | Duktape `2.7.0` | 1.5.3  | 3.11.3 | 2024-01-29 |

:::

## Integration Details

[`sheetjs.py`](pathname:///pandas/sheetjs.py) is a wrapper script that provides
helper methods for reading and writing spreadsheets. Installation notes are
included in the ["Complete Example"](#complete-example) section.

### JS in Python

JS code cannot be directly evaluated in Python implementations.

To run JS code from Python, JavaScript engines[^2] can be embedded in Python
modules or dynamically loaded using the `ctypes` foreign function library[^3].
This demo uses `ctypes` with the [Duktape engine](/docs/demos/engines/duktape).

### Wrapper

The script exports a class named `SheetJSWrapper`. It is a context manager that
initializes the Duktape engine and executes SheetJS scripts on entrance. All
work should be performed in the context:

```python title="Complete Example"
#!/usr/bin/env python3
from sheetjs import SheetJSWrapper

with SheetJSWrapper() as sheetjs:

  # Parse file
  wb = sheetjs.read_file("pres.numbers")
  print("Loaded file pres.numbers")

  # Get first worksheet name
  first_ws_name = wb.get_sheet_names()[0]
  print(f"Reading from sheet {first_ws_name}")

  # Generate DataFrame from first worksheet
  df = wb.get_df(first_ws_name)
  print(df.info())

  # Export DataFrame to XLSB
  sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")
```

### Reading Files

`sheetjs.read_file` accepts a path to a spreadsheet file. It will parse the file
and return an object representing the workbook.

The `get_sheet_names` method of the workbook returns a list of sheet names.

The `get_df` method of the workbook generates a DataFrame from the workbook. The
specific sheet can be selected by passing the name.

For example, the following code reads `pres.numbers` and generates a DataFrame
from the second worksheet:

```python title="Generating a DataFrame from the second worksheet"
with SheetJSWrapper() as sheetjs:
  # Parse file
  wb = sheetjs.read_file(path)

  # Generate DataFrame from second worksheet
  ws_name = wb.get_sheet_names()[1]
  df = wb.get_df(ws_name)

  # Print metadata
  print(df.info())
```

Under the hood, `sheetjs.py` performs the following steps:

```mermaid
flowchart LR
  file[(workbook\nfile)]
  subgraph SheetJS operations
    bytes(Byte\nstring)
    wb((SheetJS\nWorkbook))
    csv(CSV\nstring)
  end
  subgraph Pandas operations
    stream(CSV\nStream)
    df[(Pandas\nDataFrame)]
  end
  file --> |`open`/`read`\nPython ops| bytes
  bytes --> |`XLSX.read`\nParse Bytes| wb
  wb --> |`sheet_to_csv`\nExtract Data| csv
  csv --> |`StringIO`\nPython ops| stream
  stream --> |`read_csv`\nParse CSV| df
```

1) Pure Python operations read the spreadsheet file and generate a byte string.

2) SheetJS libraries parse the string and generate a clean CSV.

- The `read` method[^4] parses file bytes into a SheetJS workbook object[^5]
- After selecting a worksheet, `sheet_to_csv`[^6] generates a CSV string

3) Python operations convert the CSV string to a stream object.[^7]

4) The Pandas `read_csv` method[^8] ingests the stream and generate a DataFrame.

### Writing Files

`sheetjs.write_df` accepts a DataFrame and a path. It will attempt to export
the data to a spreadsheet file.

For example, the following code exports a DataFrame to `SheetJSPandas.xlsb`:

```python title="Exporting a DataFrame to XLSB"
with SheetJSWrapper() as sheetjs:
  # Export DataFrame to XLSB
  sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")
```

Under the hood, `sheetjs.py` performs the following steps:

```mermaid
flowchart LR
  subgraph Pandas operations
    df[(Pandas\nDataFrame)]
    json(JSON\nString)
  end
  subgraph SheetJS operations
    aoo(array of\nobjects)
    wb((SheetJS\nWorkbook))
    u8a(File\nbytes)
  end
  file[(workbook\nfile)]
  df --> |`to_json`\nPandas ops| json
  json --> |`JSON.parse`\nJS Engine| aoo
  aoo --> |`json_to_sheet`\nSheetJS Ops| wb
  wb --> |`XLSX.write`\nUint8Array| u8a
  u8a --> |`open`/`write`\nPython ops| file
```

1) The Pandas DataFrame `to_json` method[^9] generates a JSON string.

2) JS engine operations translate the JSON string to an array of objects.

3) SheetJS libraries process the data array and generate file bytes.

- The `json_to_sheet` method[^10] creates a SheetJS sheet object from the data.
- The `book_new` method[^11] creates a SheetJS workbook that includes the sheet.
- The `write` method[^12] generates the spreadsheet file bytes.

4) Pure Python operations write the bytes to file.

## Complete Example

This example will extract data from an Apple Numbers spreadsheet and generate a
DataFrame. The DataFrame will be exported to the binary XLSB spreadsheet format.

0) Install Pandas:

```bash
sudo python3 -m pip install pandas
```

:::caution pass

On Arch Linux-based platforms including the Steam Deck, the install may fail:

```
error: externally-managed-environment
```

In these situations, Pandas must be installed through the package manager:

```bash
sudo pacman -Syu python-pandas
```

:::

1) Build the Duktape shared library:

```bash
curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz
cd duktape-2.7.0
make -f Makefile.sharedlibrary
cd ..
```

2) Copy the shared library to the current folder. When the demo was last tested,
the shared library file name differed by platform:

| OS     | name                      |
|:-------|:--------------------------|
| Darwin | `libduktape.207.20700.so` |
| Linux  | `libduktape.so.207.20700` |

```bash
cp duktape-*/libduktape.* .
```

3) Download the SheetJS Standalone script and move to the project directory:

<ul>
<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js`}>shim.min.js</a></li>
<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}>xlsx.full.min.js</a></li>
</ul>

<CodeBlock language="bash">{`\
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}
</CodeBlock>

4) Download the following test scripts and files:

- [`pres.numbers` test file](https://sheetjs.com/pres.numbers)
- [`sheetjs.py` script](pathname:///pandas/sheetjs.py)
- [`SheetJSPandas.py` script](pathname:///pandas/SheetJSPandas.py)

```bash
curl -LO https://sheetjs.com/pres.numbers
curl -LO https://docs.sheetjs.com/pandas/sheetjs.py
curl -LO https://docs.sheetjs.com/pandas/SheetJSPandas.py
```

5) Edit the `sheetjs.py` script.

The `lib` variable declares the path to the library:

```python title="sheetjs.py (edit highlighted line)"
# highlight-next-line
lib = "libduktape.207.20700.so"
```

<Tabs groupId="triple">
  <TabItem value="darwin-x64" label="MacOS">

The name of the library is `libduktape.207.20700.so`:

```python title="sheetjs.py (change highlighted line)"
# highlight-next-line
lib = "libduktape.207.20700.so"
```

  </TabItem>
  <TabItem value="linux-x64" label="Linux">

The name of the library is `libduktape.so.207.20700`:

```python title="sheetjs.py (change highlighted line)"
# highlight-next-line
lib = "libduktape.so.207.20700"
```

  </TabItem>
</Tabs>

6) Run the script:

```bash
python3 SheetJSPandas.py pres.numbers
```

If successful, the script will display DataFrame metadata:

```
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Index   5 non-null      int64
dtypes: int64(1), object(1)
```

It will also export the DataFrame to `SheetJSPandas.xlsb`. The file can be
inspected with a spreadsheet editor that supports XLSB files.

[^1]: The official documentation site is <https://pandas.pydata.org/> and the official distribution point is <https://pypi.org/project/pandas/>
[^2]: See ["Other Languages"](/docs/demos/engines/) for more examples.
[^3]: See [`ctypes`](https://docs.python.org/3/library/ctypes.html) in the Python documentation.
[^4]: See [`read` in "Reading Files"](/docs/api/parse-options)
[^5]: See ["Workbook Object"](/docs/csf/book)
[^6]: See [`sheet_to_csv` in "Utilities"](/docs/api/utilities/csv#delimiter-separated-output)
[^7]: See [the examples in "IO tools"](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) in the Pandas documentation.
[^8]: See [`pandas.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) in the Pandas documentation.
[^9]: See [`pandas.DataFrame.to_json`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html) in the Pandas documentation.
[^10]: See [`json_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-objects-input)
[^11]: See [`book_new` in "Utilities"](/docs/api/utilities/wb)
[^12]: See [`write` in "Writing Files"](/docs/api/write-options)
pandas-duktape 2024-01-30 09:27:22 +00:00			`---`
			`title: Spreadsheet Data in Pandas`
			`sidebar_label: Python + Pandas`
			`description: Process structured data in Python with Pandas. Seamlessly integrate spreadsheets into your workflow with SheetJS. Analyze complex Excel spreadsheets with confidence.`
			`pagination_prev: demos/index`
			`pagination_next: demos/frontend/index`
			`---`

			`import current from '/version.js';`
			`import Tabs from '@theme/Tabs';`
			`import TabItem from '@theme/TabItem';`
			`import CodeBlock from '@theme/CodeBlock';`

			`Pandas[^1] is a Python software library for data analysis.`

			`[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing`
			`data from spreadsheets.`

			`This demo uses SheetJS to process data from a spreadsheet and translate to the`
			`Pandas DataFrame format. We'll explore how to load SheetJS from Python scripts,`
			`generate DataFrames from workbooks, and write DataFrames back to workbooks.`

			`The ["Complete Example"](#complete-example) includes a wrapper library that`
			`simplifies importing and exporting spreadsheets.`

			`:::info pass`

			Pandas includes limited support for reading spreadsheets (`pandas.from_excel`)
			and writing XLSX spreadsheets (`pandas.DataFrame.to_excel`).

			`SheetJS supports common spreadsheet formats that Pandas cannot process.`

			`SheetJS operations also offer more flexibility in processing complex worksheets.`

			`:::`

			`:::note Tested Environments`

			`This demo was tested in the following deployments:`

			`\| Architecture \| JS Engine \| Pandas \| Python \| Date \|`
			`\|:-------------\|:----------------\|:-------\|:-------\|:-----------\|`
			\| `darwin-x64` \| Duktape `2.7.0` \| 2.0.3 \| 3.11.7 \| 2024-01-29 \|
			\| `linux-x64` \| Duktape `2.7.0` \| 1.5.3 \| 3.11.3 \| 2024-01-29 \|

			`:::`

			`## Integration Details`

			[`sheetjs.py`](pathname:///pandas/sheetjs.py) is a wrapper script that provides
			`helper methods for reading and writing spreadsheets. Installation notes are`
			`included in the ["Complete Example"](#complete-example) section.`

			`### JS in Python`

			`JS code cannot be directly evaluated in Python implementations.`

			`To run JS code from Python, JavaScript engines[^2] can be embedded in Python`
			modules or dynamically loaded using the `ctypes` foreign function library[^3].
			This demo uses `ctypes` with the [Duktape engine](/docs/demos/engines/duktape).

			`### Wrapper`

			The script exports a class named `SheetJSWrapper`. It is a context manager that
			`initializes the Duktape engine and executes SheetJS scripts on entrance. All`
			`work should be performed in the context:`

			```python title="Complete Example"
			`#!/usr/bin/env python3`
			`from sheetjs import SheetJSWrapper`

			`with SheetJSWrapper() as sheetjs:`

			`# Parse file`
			`wb = sheetjs.read_file("pres.numbers")`
			`print("Loaded file pres.numbers")`

			`# Get first worksheet name`
			`first_ws_name = wb.get_sheet_names()[0]`
			`print(f"Reading from sheet {first_ws_name}")`

			`# Generate DataFrame from first worksheet`
			`df = wb.get_df(first_ws_name)`
			`print(df.info())`

			`# Export DataFrame to XLSB`
			`sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")`
			```

			`### Reading Files`

			`sheetjs.read_file` accepts a path to a spreadsheet file. It will parse the file
			`and return an object representing the workbook.`

			The `get_sheet_names` method of the workbook returns a list of sheet names.

			The `get_df` method of the workbook generates a DataFrame from the workbook. The
			`specific sheet can be selected by passing the name.`

			For example, the following code reads `pres.numbers` and generates a DataFrame
			`from the second worksheet:`

			```python title="Generating a DataFrame from the second worksheet"
			`with SheetJSWrapper() as sheetjs:`
			`# Parse file`
			`wb = sheetjs.read_file(path)`

			`# Generate DataFrame from second worksheet`
			`ws_name = wb.get_sheet_names()[1]`
			`df = wb.get_df(ws_name)`

			`# Print metadata`
			`print(df.info())`
			```

			Under the hood, `sheetjs.py` performs the following steps:

			```mermaid
			`flowchart LR`
			`file[(workbook\nfile)]`
			`subgraph SheetJS operations`
			`bytes(Byte\nstring)`
			`wb((SheetJS\nWorkbook))`
			`csv(CSV\nstring)`
			`end`
			`subgraph Pandas operations`
			`stream(CSV\nStream)`
			`df[(Pandas\nDataFrame)]`
			`end`
			file --> \|`open`/`read`\nPython ops\| bytes
			bytes --> \|`XLSX.read`\nParse Bytes\| wb
			wb --> \|`sheet_to_csv`\nExtract Data\| csv
			csv --> \|`StringIO`\nPython ops\| stream
			stream --> \|`read_csv`\nParse CSV\| df
			```

			`1) Pure Python operations read the spreadsheet file and generate a byte string.`

			`2) SheetJS libraries parse the string and generate a clean CSV.`

			- The `read` method[^4] parses file bytes into a SheetJS workbook object[^5]
			- After selecting a worksheet, `sheet_to_csv`[^6] generates a CSV string

			`3) Python operations convert the CSV string to a stream object.[^7]`

			4) The Pandas `read_csv` method[^8] ingests the stream and generate a DataFrame.

			`### Writing Files`

			`sheetjs.write_df` accepts a DataFrame and a path. It will attempt to export
			`the data to a spreadsheet file.`

			For example, the following code exports a DataFrame to `SheetJSPandas.xlsb`:

			```python title="Exporting a DataFrame to XLSB"
			`with SheetJSWrapper() as sheetjs:`
			`# Export DataFrame to XLSB`
			`sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")`
			```

			Under the hood, `sheetjs.py` performs the following steps:

			```mermaid
			`flowchart LR`
			`subgraph Pandas operations`
			`df[(Pandas\nDataFrame)]`
			`json(JSON\nString)`
			`end`
			`subgraph SheetJS operations`
			`aoo(array of\nobjects)`
			`wb((SheetJS\nWorkbook))`
			`u8a(File\nbytes)`
			`end`
			`file[(workbook\nfile)]`
			df --> \|`to_json`\nPandas ops\| json
			json --> \|`JSON.parse`\nJS Engine\| aoo
			aoo --> \|`json_to_sheet`\nSheetJS Ops\| wb
			wb --> \|`XLSX.write`\nUint8Array\| u8a
			u8a --> \|`open`/`write`\nPython ops\| file
			```

			1) The Pandas DataFrame `to_json` method[^9] generates a JSON string.

			`2) JS engine operations translate the JSON string to an array of objects.`

			`3) SheetJS libraries process the data array and generate file bytes.`

			- The `json_to_sheet` method[^10] creates a SheetJS sheet object from the data.
			- The `book_new` method[^11] creates a SheetJS workbook that includes the sheet.
			- The `write` method[^12] generates the spreadsheet file bytes.

			`4) Pure Python operations write the bytes to file.`

			`## Complete Example`

			`This example will extract data from an Apple Numbers spreadsheet and generate a`
			`DataFrame. The DataFrame will be exported to the binary XLSB spreadsheet format.`

			`0) Install Pandas:`

			```bash
			`sudo python3 -m pip install pandas`
			```

			`:::caution pass`

			`On Arch Linux-based platforms including the Steam Deck, the install may fail:`

			```
			`error: externally-managed-environment`
			```

			`In these situations, Pandas must be installed through the package manager:`

			```bash
			`sudo pacman -Syu python-pandas`
			```

			`:::`

			`1) Build the Duktape shared library:`

			```bash
			`curl -LO https://duktape.org/duktape-2.7.0.tar.xz`
			`tar -xJf duktape-2.7.0.tar.xz`
			`cd duktape-2.7.0`
			`make -f Makefile.sharedlibrary`
			`cd ..`
			```

			`2) Copy the shared library to the current folder. When the demo was last tested,`
			`the shared library file name differed by platform:`

			`\| OS \| name \|`
			`\|:-------\|:--------------------------\|`
			\| Darwin \| `libduktape.207.20700.so` \|
			\| Linux \| `libduktape.so.207.20700` \|

			```bash
			`cp duktape-/libduktape. .`
			```

			`3) Download the SheetJS Standalone script and move to the project directory:`

			`<ul>`
			<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js`}>shim.min.js</a></li>
			<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}>xlsx.full.min.js</a></li>
			`</ul>`

			<CodeBlock language="bash">{`\
			`curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js`
			curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}
			`</CodeBlock>`

			`4) Download the following test scripts and files:`

			- [`pres.numbers` test file](https://sheetjs.com/pres.numbers)
			- [`sheetjs.py` script](pathname:///pandas/sheetjs.py)
			- [`SheetJSPandas.py` script](pathname:///pandas/SheetJSPandas.py)

			```bash
			`curl -LO https://sheetjs.com/pres.numbers`
			`curl -LO https://docs.sheetjs.com/pandas/sheetjs.py`
			`curl -LO https://docs.sheetjs.com/pandas/SheetJSPandas.py`
			```

			5) Edit the `sheetjs.py` script.

			The `lib` variable declares the path to the library:

			```python title="sheetjs.py (edit highlighted line)"
			`# highlight-next-line`
			`lib = "libduktape.207.20700.so"`
			```

			`<Tabs groupId="triple">`
			`<TabItem value="darwin-x64" label="MacOS">`

			The name of the library is `libduktape.207.20700.so`:

			```python title="sheetjs.py (change highlighted line)"
			`# highlight-next-line`
			`lib = "libduktape.207.20700.so"`
			```

			`</TabItem>`
			`<TabItem value="linux-x64" label="Linux">`

			The name of the library is `libduktape.so.207.20700`:

			```python title="sheetjs.py (change highlighted line)"
			`# highlight-next-line`
			`lib = "libduktape.so.207.20700"`
			```

			`</TabItem>`
			`</Tabs>`

			`6) Run the script:`

			```bash
			`python3 SheetJSPandas.py pres.numbers`
			```

			`If successful, the script will display DataFrame metadata:`

			```
			`RangeIndex: 5 entries, 0 to 4`
			`Data columns (total 2 columns):`
			`# Column Non-Null Count Dtype`
			`--- ------ -------------- -----`
			`0 Name 5 non-null object`
			`1 Index 5 non-null int64`
			`dtypes: int64(1), object(1)`
			```

			It will also export the DataFrame to `SheetJSPandas.xlsb`. The file can be
			`inspected with a spreadsheet editor that supports XLSB files.`

			`[^1]: The official documentation site is <https://pandas.pydata.org/> and the official distribution point is <https://pypi.org/project/pandas/>`
			`[^2]: See ["Other Languages"](/docs/demos/engines/) for more examples.`
			[^3]: See [`ctypes`](https://docs.python.org/3/library/ctypes.html) in the Python documentation.
			[^4]: See [`read` in "Reading Files"](/docs/api/parse-options)
			`[^5]: See ["Workbook Object"](/docs/csf/book)`
			[^6]: See [`sheet_to_csv` in "Utilities"](/docs/api/utilities/csv#delimiter-separated-output)
			`[^7]: See [the examples in "IO tools"](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) in the Pandas documentation.`
			[^8]: See [`pandas.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) in the Pandas documentation.
			[^9]: See [`pandas.DataFrame.to_json`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html) in the Pandas documentation.
			[^10]: See [`json_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-objects-input)
			[^11]: See [`book_new` in "Utilities"](/docs/api/utilities/wb)
			[^12]: See [`write` in "Writing Files"](/docs/api/write-options)