--- title: Modern Spreadsheets in Stata sidebar_label: Stata pagination_prev: demos/cloud/index pagination_next: demos/bigdata/index sidebar_custom_props: summary: Generate Stata-compatible XLSX workbooks from incompatible files --- import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; export const b = {style: {color:"blue"}}; [Stata](https://www.stata.com/) is a statistical software package. It offers a robust C-based extension system. [SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing data from spreadsheets. This demo uses SheetJS to pull data from a spreadsheet for further analysis within Stata. We'll create a Stata native extension that loads the [Duktape](/docs/demos/engines/duktape) JavaScript engine and uses the SheetJS library to read data from spreadsheets and converts to a Stata-friendly format. ```mermaid flowchart LR ofile[(workbook\nXLSB file)] nfile[(clean file\nXLSX)] data[[Stata\nVariables]] ofile --> |Stata Extension\nSheetJS + Duktape| nfile nfile --> |Stata command\nimport excel|data linkStyle 0 color:blue,stroke:blue; ``` The demo will read [a NUMBERS workbook](https://docs.sheetjs.com/pres.numbers) and generate variables for each column. A sample Stata session is shown below: ![Stata commands](pathname:///stata/commands.png) :::info pass This demo covers Stata extensions. For directly processing Stata DTA files, the ["Stata DTA Codec"](/docs/constellation/dta) works in the browser or NodeJS. ::: :::note Tested Deployments This demo was tested in the following deployments: | Architecture | Version | Date | |:-------------|:------------------|:-----------| | `darwin-x64` | `18.0` | 2024-04-10 | | `darwin-arm` | `18.5` (StataNow) | 2024-12-15 | | `win10-x64` | `18.0` | 2024-04-10 | | `win11-arm` | `18.5` (StataNow) | 2024-12-15 | | `linux-x64` | `18.0` | 2024-04-25 | ::: :::info pass Stata has limited support for processing spreadsheets through the `import excel` command[^1]. At the time of writing, it lacked support for XLSB, NUMBERS, and other common spreadsheet formats. SheetJS libraries help fill the gap by normalizing spreadsheets to a form that Stata can understand. ::: ## Integration Details ```mermaid flowchart LR ofile{{File\nName}} subgraph JS Operations ojbuf[(Buffer\nFile Bytes)] wb(((SheetJS\nWorkbook))) njbuf[(Buffer\nXLSX bytes)] end obuf[(File\nbytes)] nbuf[(New file\nbytes)] nfile[(XLSX\nFile)] ofile --> |C\nRead File| obuf obuf --> |Duktape\nBuffer Ops| ojbuf ojbuf --> |SheetJS\n`read`| wb wb --> |SheetJS\n`write`| njbuf njbuf --> |Duktape\nBuffer Ops| nbuf nbuf --> |C\nWrite File| nfile linkStyle 2,3 color:blue,stroke:blue; ``` The current recommendation involves a native plugin that reads arbitrary files and generates clean XLSX files that Stata can import. The extension function ultimately pairs the SheetJS `read`[^2] and `write`[^3] methods to read data from the old file and write a new file: ```js title="Code executed by Duktape within the Stata extension (snippet)" /* `original_file_data` is a sideloaded Duktape `Buffer` */ // highlight-start var wb = XLSX.read(original_file_data, {type: "buffer"}); var new_file_data = XLSX.write(wb, {type: "array", bookType: "xlsx"}); // highlight-end /* `new_file_data` will be pulled into the extension and saved */ ``` The extension function `cleanfile` will take one or two arguments: `plugin call cleanfile, "pres.numbers"` will generate `sheetjs.tmp.xlsx` from the first argument (`"pres.numbers"`) and print instructions to load the file. `plugin call cleanfile, "pres.numbers" verbose` will additionally print CSV contents of each worksheet in the workbook. ### C Extensions Stata C extensions are shared libraries or DLLs that use special Stata methods for parsing arguments and returning values. #### Structure Arguments are passed to the `stata_call` function in the plugin.[^4] The function receives the argument count and an array of C strings: ```c title="stata_call declaration" STDLL stata_call(int argc, char *argv[]); ``` For example, `argc` is 2 and `argv` has two C strings in the following command: ```stata title="Sample plugin invocation with arguments" plugin call cleanfile, "pres.numbers" verbose * arguments start * argv[0] ^^^^^^^^^^^^ * argv[1] ^^^^^^^ * argc = 2 ``` #### Communication `SF_display` and `SF_error` display text and error messages respectively. Message text follows the "Stata Markup and Control Language"[^5]. `{stata ...}` is a special directive that displays the arguments and creates a clickable link. Clicking the link will run the string. For example, a plugin may attempt to print a link: ```c title="SF_display C plugin example" SF_display("{stata import excel \"sheetjs.tmp.xlsx\", firstrow} will read the first sheet and use headers\n"); ``` The function will print the following text to the terminal:
import excel "sheetjs.tmp.xlsx", firstrow will read the first sheet and use headers
The blue text is clickable. When a user clicks the text, the command
`import excel "sheetjs.tmp.xlsx", firstrow` will be executed.
### Duktape JS Engine
This demo uses the [Duktape JavaScript engine](/docs/demos/engines/duktape). The
SheetJS + Duktape demo covers engine integration details in more detail.
The [SheetJS Standalone scripts](/docs/getting-started/installation/standalone)
can be loaded in Duktape by reading the source from the filesystem.
## Complete Demo
:::info pass
This demo was tested in Windows x64 and macOS x64. The path names and build
commands will differ in other platforms and operating systems.
:::
The [`cleanfile.c`](pathname:///stata/cleanfile.c) extension defines one plugin
function. It can be chained with `import excel`:
```stata
program cleanfile, plugin
plugin call cleanfile, "pres.numbers" verbose
program drop cleanfile
import excel "sheetjs.tmp.xlsx", firstrow
```
### Create Plugin
. plugin call cleanfile, "pres.numbers" verbose Worksheet 0 Name: Sheet1 Name,Index Bill Clinton,42 GeorgeW Bush,43 Barack Obama,44 Donald Trump,45 Joseph Biden,46 Saved to `sheetjs.tmp.xlsx` import excel "sheetjs.tmp.xlsx", firstrow will read the first sheet and use headers for more help, see import excel17) Close the plugin: ```stata program drop cleanfile ``` 18) Clear the current session: ```stata clear ```
In the result of Step 16, click the link on import
excel "sheetjs.tmp.xlsx", firstrow
. import excel "sheetjs.tmp.xlsx", firstrow (2 vars, 5 obs)20) Open the Data Editor (in Browse or Edit mode) and compare to the screenshot: ```stata browse Name Index ``` ![Data Editor showing data from the file](pathname:///stata/data-editor.png) :::info pass In the terminal version of Stata, `browse` does not work: ``` . browse Name Index command browse is unrecognized r(199); ``` The `codebook` command will display details.