---
title: Modern Spreadsheets in Stata
sidebar_label: Stata
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
---
import current from '/version.js';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
export const b = {style: {color:"blue"}};
[Stata](https://www.stata.com/) is a statistical software package. It offers a
robust C-based extension system.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis
within Stata. We'll create a Stata native extension that loads the
[Duktape](/docs/demos/engines/duktape) JavaScript engine and uses the SheetJS
library to read data from spreadsheets and converts to a Stata-friendly format.
```mermaid
flowchart LR
ofile[(workbook\nXLSB file)]
nfile[(clean file\nXLSX)]
data[[Stata\nVariables]]
ofile --> |Stata Extension\nSheetJS + Duktape| nfile
nfile --> |Stata command\nimport excel|data
```
The demo will read [a Numbers workbook](https://sheetjs.com/pres.numbers) and
generate variables for each column. A sample Stata session is shown below:
![Stata commands](pathname:///stata/commands.png)
:::info pass
This demo covers Stata extensions. For directly processing Stata DTA files, the
["Stata DTA Codec"](/docs/constellation/dta) works in the browser or NodeJS.
:::
:::note Tested Deployments
This demo was last tested by SheetJS users on 2023 November 15.
:::
:::info pass
Stata has limited support for processing spreadsheets through the `import excel`
command[^1]. At the time of writing, it lacked support for XLSB, NUMBERS, and
other common spreadsheet formats.
SheetJS libraries help fill the gap by normalizing spreadsheets to a form that
Stata can understand.
:::
## Integration Details
The current recommendation involves a native plugin that reads arbitrary files
and generates clean XLSX files that Stata can import.
The extension function ultimately pairs the SheetJS `read`[^2] and `write`[^3]
methods to read data from the old file and write a new file:
```js
var wb = XLSX.read(original_file_data, {type: "buffer"});
var new_file_data = XLSX.write(wb, {type: "array", bookType: "xlsx"});
```
The extension function `cleanfile` will take one or two arguments:
`plugin call cleanfile, "pres.numbers"` will generate `sheetjs.tmp.xlsx` from
the first argument (`"pres.numbers"`) and print instructions to load the file.
`plugin call cleanfile, "pres.numbers" verbose` will additionally print CSV
contents of each worksheet in the workbook.
```mermaid
flowchart LR
ofile{{File\nName}}
subgraph JS Operations
ojbuf[(Buffer\nFile Bytes)]
wb(((SheetJS\nWorkbook)))
njbuf[(Buffer\nXLSX bytes)]
end
obuf[(File\nbytes)]
nbuf[(New file\nbytes)]
nfile[(XLSX\nFile)]
ofile --> |C\nRead File| obuf
obuf --> |Duktape\nBuffer Ops| ojbuf
ojbuf --> |SheetJS\n`read`| wb
wb --> |SheetJS\n`write`| njbuf
njbuf --> |Duktape\nBuffer Ops| nbuf
nbuf --> |C\nWrite File| nfile
```
### C Extensions
Stata C extensions are shared libraries or DLLs that use special Stata methods
for parsing arguments and returning values.
Arguments are passed to the `stata_call` function in the DLL.
`SF_display` and `SF_error` display text and error messages respectively.
### Duktape JS Engine
This demo uses the [Duktape JavaScript engine](/docs/demos/engines/duktape). The
SheetJS + Duktape demo covers engine integration details in more detail.
The [SheetJS Standalone scripts](/docs/getting-started/installation/standalone)
can be loaded in Duktape by reading the source from the filesystem.
## Complete Demo
:::info pass
This demo was tested in Windows x64 and macOS x64. The path names and build
commands will differ in other platforms and operating systems.
:::
The [`cleanfile.c`](pathname:///stata/cleanfile.c) extension defines one plugin
function. It can be chained with `import excel`:
```stata
program cleanfile, plugin
plugin call cleanfile, "pres.numbers" verbose
program drop cleanfile
import excel "sheetjs.tmp.xlsx", firstrow
```
### Create Plugin
. plugin call cleanfile, "pres.numbers" verbose{'\n'} Worksheet 0 Name: Sheet1{'\n'} Name,Index{'\n'} Bill Clinton,42{'\n'} GeorgeW Bush,43{'\n'} Barack Obama,44{'\n'} Donald Trump,45{'\n'} Joseph Biden,46{'\n'} {'\n'} Saved to `sheetjs.tmp.xlsx`{'\n'} import excel "sheetjs.tmp.xlsx", firstrow will read the first sheet and use headers{'\n'} for more help, see import excel17) Close the plugin: ```stata program drop cleanfile ``` 18) Clear the current session: ```stata clear ```
19) In the result of Step 16, click the link on import
excel "sheetjs.tmp.xlsx", firstrow
. import excel "sheetjs.tmp.xlsx", firstrow{'\n'} (2 vars, 5 obs)20) Open the Data Editor (in Browse or Edit mode) and compare to the screenshot: ```stata browse Name Index ``` ![Data Editor showing data from the file](pathname:///stata/data-editor.png) [^1]: Run `help import excel` in Stata or see ["import excel"](https://www.stata.com/manuals/dimportexcel.pdf) in the Stata documentation. [^2]: See [`read` in "Reading Files"](/docs/api/parse-options) [^3]: See [`write` in "Writing Files"](/docs/api/write-options)