---
title: Spreadsheet Data in Pandas
sidebar_label: Python (Pandas)
description: Process structured data in Python with Pandas. Seamlessly integrate spreadsheets into your workflow with SheetJS. Analyze complex Excel spreadsheets with confidence.
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
---
import current from '/version.js';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
Pandas[^1] is a Python software library for data analysis.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses SheetJS to process data from a spreadsheet and translate to the
Pandas DataFrame format. We'll explore how to load SheetJS from Python scripts,
generate DataFrames from workbooks, and write DataFrames back to workbooks.
:::note
This demo was tested in the following deployments:
| Architecture | V8 version | Pandas | Python | Date |
|:-------------|:--------------|:-------|:-------|:-----------|
| `darwin-x64` | `11.5.150.16` | 2.0.3 | 3.11.4 | 2023-07-29 |
:::
:::info pass
Pandas includes limited support for reading spreadsheets (`pandas.from_excel`)
and writing XLSX spreadsheets (`pandas.DataFrame.to_excel`).
The SheetJS approach supports many common spreadsheet formats that are not
supported by the current set of Pandas codecs and offers greater flexibility in
processing complex worksheets.
:::
## Integration Details
JS code cannot literally be run in the Python interpreter. To run JS code from
Python, JavaScript engines[^2] can be embedded in CPython modules.
### Loading SheetJS
This demo uses the `STPyV8` module[^3] to access the V8 JavaScript engine.
_Initialize V8_
The engine library provides a convenient context manager `JSContext` for context
resource management. Within the context, the `eval` method can evaluate code:
```py
from STPyV8 import JSContext
# Initialize JS context
with JSContext() as ctxt:
# Run code
res = ctxt.eval("'Sheet' + 'JS'")
# print result
print(res)
```
`STPyV8` handles data interchange for common types. Arrays and JS objects can be
translated to Python `list` and `dict` respectively. The following `convert`
function is used in the test suite[^4]
```py
# from `tests/test_Wrapper.py` in the STPyV8 library
# License: Apache 2.0
def convert(obj):
if isinstance(obj, JSArray):
return [convert(v) for v in obj]
if isinstance(obj, JSObject):
return dict([[str(k), convert(obj.__getattr__(str(k)))] for k in obj.__dir__()])
return obj
```
_Loading the Library_
The [SheetJS Standalone scripts](/docs/getting-started/installation/standalone)
can be parsed and evaluated from the JS engine. Once evaluated, the `XLSX`
variable is available as a global.
Assuming the standalone library is in the same directory as the source file,
the script can be evaluated with `eval`:
```py
# Within a JSContext, open `xlsx.full.min.js` and evaluate
with open("xlsx.full.min.js") as f:
ctxt.eval(f.read())
```
### Reading Files
The following diagram depicts the spreadsheet salsa:
```mermaid
flowchart LR
file[(workbook\nfile)]
subgraph SheetJS operations
base64(Base64\nstring)
wb((SheetJS\nWorkbook))
aoo(array of\nobjects)
end
subgraph Pandas operations
lod(list of\nrecords)
df[(Pandas\nDataFrame)]
end
file --> |`open`/`read`\nPython ops| base64
base64 --> |`XLSX.read`\nParse Bytes| wb
wb --> |`sheet_to_json`\nExtract Data| aoo
aoo --> |`convert`\nPython ops|lod
lod --> |`from_records`\nPandas ops| df
```
At a high level:
1) Pure Python operations read the file and generate a Base64 string
2) SheetJS libraries parse the string and generates JS records
3) JS engine operations translate the rows to Python `list` of `dicts`
4) Pandas operations translate the Python data to a DataFrame
#### Read files
The safest format for data interchange is Base64-encoded strings:
```py
from base64 import b64encode
with open(path, mode="rb") as f:
file_bytes = f.read()
b64 = b64encode(file_bytes)
```
#### Parse bytes
From JS code, `XLSX.read`[^5] parses the Base64 string
```py
wb = ctxt.eval("(b64 => XLSX.read(b64, {type: 'base64', dense: true}))")(b64)
```
The `wb` object follows the "Common Spreadsheet Format"[^6], an in-memory format
for representing workbooks, worksheets, cells, and spreadsheet features.
#### Get First Worksheet
As explained in the "Workbook Object"[^7] section:
- the `SheetNames` property is a ordered list of the sheet names in the workbook
- the `Sheets` property of the workbook object is an object whose keys are sheet
names and whose values are sheet objects.
For use in Python, the `SheetNames` array must be converted to a `list`:
```py
sheet_names = convert(wb.SheetNames)
first_sheet_name = sheet_names[0]
```
Since utility functions will process the worksheet object from JavaScript, it is
preferable not to convert the object:
```py
first_sheet = wb.Sheets[first_sheet_name] # do not convert
```
#### Generate List of Records
In JavaScript, the equivalent of the "`list` of `dict`s" or "`list` of records"
is "array of objects". They can be created with `XLSX.utils.sheet_to_json`[^8]:
```py
rows = convert(ctxt.eval("(ws => XLSX.utils.sheet_to_json(ws))")(first_sheet))
```
#### Generate Pandas DataFrame
`rows` is a `list` of `dict` objects. `from_records`[^9] understands this data
shape and generates a proper DataFrame:
```py
df = pd.DataFrame.from_records(rows)
```
### Writing Files
The writing process looks similar to the reading process in reverse:
```mermaid
flowchart LR
subgraph Pandas operations
df[(Pandas\nDataFrame)]
json(JSON\nString)
end
subgraph SheetJS operations
aoo(array of\nobjects)
wb((SheetJS\nWorkbook))
base64(Base64\nstring)
end
file[(workbook\nfile)]
df --> |`to_json`\nPandas ops| json
json --> |`JSON.parse`\nJS Engine| aoo
aoo --> |`json_to_sheet`\nSheetJS Ops| wb
wb --> |`XLSX.write`\nBase64| base64
base64 --> |`open`/`write`\nPython ops| file
```
At a high level:
1) Pandas operations translate the Python data to JSON string
2) JS engine operations translate the JSON string to an array of objects
3) SheetJS libraries parse the array and generate a Base64-encoded workbook
4) Pure Python operations decode the Base64 string and write the bytes to file.
#### Generate JSON
`DataFrame#to_json`[^10] with the option `orient="records"` generates a JSON
string that encodes an array of objects:
```py
json = df.to_json(orient="records")
```
#### Generate Worksheet
In JavaScript, `JSON.parse` will interpret the string as an array of objects.
`XLSX.utils.json_to_sheet`[^11] generates a SheetJS worksheet object:
```py
sheet = ctxt.eval("(json => XLSX.utils.json_to_sheet(JSON.parse(json)) )")(json)
```
#### Export Enhancements
At this point, there are many options for improving the appearance of the sheet.
For example, the "Export Tutorial"[^12] shows how to adjust column widths.
:::tip pass
[SheetJS Pro](https://sheetjs.com/pro) offers additional styling options such as
cell styling and frozen rows.
"Pro Edit" offers a special approach for inserting data into an existing file.
:::
#### Generate Workbook
`XLSX.utils.book_new`[^13] creates a new workbook and `XLSX.utils.book_append_sheet`[^14]
appends a worksheet to the workbook. The new worksheet will be called "Export":
:::note pass
The code in the string literal is reproduced below:
```js
(ws, name) => {
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, name);
return wb;
}
```
:::
```py
book = ctxt.eval("""((ws, name) => {
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, name);
return wb;
})""")(sheet, "Export")
```
#### Generate File
`XLSX.write`[^15] with the option `type: "base64"` attempts to create a file and
generate a Base64 string:
```py
b64 = ctxt.eval("(wb => XLSX.write(wb, {type:'base64', bookType:'xls'}))")(book)
```
With the Base64 string, standard Python operations can create a file:
```py
from base64 import b64decode
raw = b64decode(b64)
with open("export.xls", mode="wb") as f:
f.write(raw)
```
## Complete Demo
This example will extract data from an Apple Numbers spreadsheet and generate a
DataFrame. The DataFrame will be exported to a legacy XLS spreadsheet.
### Engine Setup
0) Follow the official installation instructions[^16].
Instructions for macOS 12 (click to show)
- Install `boost-python3` package using `brew`:
```bash
brew install boost-python3
```
- Identify python version:
```bash
python3 --version
```
:::note pass
When the demo was last tested, the version was `3.11.4`
:::
- [Download latest release](https://github.com/cloudflare/stpyv8/releases)
```bash
curl -LO https://github.com/cloudflare/stpyv8/releases/download/v11.5.150.16/stpyv8-macos-12-python-3.11.zip
```
- Extract ZIP file and enter folder
```bash
unzip stpyv8-macos-12-python-3.11.zip
cd stpyv8-macos-12-3.11
```
- Move `icudtl.dat` to `/Library/Application Support/STPyV8/`:
```bash
sudo mkdir -p /Library/Application\ Support/STPyV8
sudo mv icudtl.dat /Library/Application\ Support/STPyV8/
```
- Install wheel:
```bash
sudo python3 -m pip install --upgrade *.whl
cd ..
```