---
title: Browser Automation
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

Headless automation involves controlling "headless browsers" to access websites
and submit or download data.  It is also possible to automate browsers using
custom browser extensions.

The [SheetJS standalone script](/docs/getting-started/installation/standalone) can be added to
any website by inserting a `SCRIPT` tag.  Headless browsers usually provide
utility functions for running custom snippets in the browser and passing data
back to the automation script.

## Use Case

This demo focuses on exporting table data to a workbook.  Headless browsers do
not generally support passing objects between the browser context and the
automation script, so the file data must be generated in the browser context
and sent back to the automation script for saving in the file system.

```mermaid
sequenceDiagram
  autonumber off
  actor U as User
  participant C as Controller
  participant B as Browser
  U->>C: run script
  rect rgba(255, 0, 0, 0.25)
    C->>B: launch browser
    B->>C: ready
    C->>B: load URL
    B->>C: site loaded
  end
  rect rgba(0, 127, 0, 0.25)
    C->>B: add SheetJS script
    B->>C: script loaded
  end
  rect rgba(255, 0, 0, 0.25)
    C->>B: ask for file
    Note over B: scrape tables
    Note over B: generate workbook
    B->>C: file bytes
  end
  rect rgba(0, 127, 0, 0.25)
    C->>U: save file
  end
```

Steps:

1) Launch the headless browser and load the target site.

2) Add the standalone SheetJS build to the page in a `SCRIPT` tag.

3) Add a script to the page (in the browser context) that will:

- Make a workbook object from the first table using `XLSX.utils.table_to_book`
- Generate the bytes for an XLSB file using `XLSX.write`
- Send the bytes back to the automation script

4) When the automation context receives data, save to a file

This demo exports data from <https://sheetjs.com/demos/table>.

:::note

It is also possible to parse files from the browser context, but parsing from
the automation context is more efficient and strongly recommended.

:::

## Puppeteer

Puppeteer enables headless Chromium automation for NodeJS.  Releases ship with
an installer script.  Installation is straightforward:

```bash
npm i https://cdn.sheetjs.com/xlsx-latest/xlsx-latest.tgz puppeteer
```

<Tabs>
  <TabItem value="nodejs" label="NodeJS">

Binary strings are the favored data type.  They can be safely passed from the
browser context to the automation script.  NodeJS provides an API to write
binary strings to file (`fs.writeFileSync` using encoding `binary`).

To run the example, after installing the packages, save the following script to
`SheetJSPuppeteer.js` and run `node SheetJSPuppeteer.js`.  Steps are commented:

```js title="SheetJSPuppeteer.js"
const fs = require("fs");
const puppeteer = require('puppeteer');
(async () => {
  /* (1) Load the target page */
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  page.on("console", msg => console.log("PAGE LOG:", msg.text()));
  await page.setViewport({width: 1920, height: 1080});
  await page.goto('https://sheetjs.com/demos/table');

  /* (2) Load the standalone SheetJS build from the CDN */
  await page.addScriptTag({ url: 'https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js' });

  /* (3) Run the snippet in browser and return data */
  const bin = await page.evaluate(() => {
    /* NOTE: this function will be evaluated in the browser context.
       `page`, `fs` and `puppeteer` are not available.
       `XLSX` will be available thanks to step 2 */

    /* find first table */
    var table = document.body.getElementsByTagName('table')[0];

    /* call table_to_book on first table */
    var wb = XLSX.utils.table_to_book(table);

    /* generate XLSB and return binary string */
    return XLSX.write(wb, {type: "binary", bookType: "xlsb"});
  });

  /* (4) write data to file */
  fs.writeFileSync("SheetJSPuppeteer.xlsb", bin, { encoding: "binary" });

  await browser.close();
})();
```

This script will generate `SheetJSPuppeteer.xlsb` which can be opened in Excel.

  </TabItem>
  <TabItem value="deno" label="Deno">

:::caution

Deno Puppeteer is a fork. It is not officially supported by the Puppeteer team.

:::

Installation is straightforward:

```bash
env PUPPETEER_PRODUCT=chrome deno run -A --unstable https://deno.land/x/puppeteer@14.1.1/install.ts
```

Base64 strings are the favored data type.  They can be safely passed from the
browser context to the automation script.  Deno can decode the Base64 strings
and write the decoded `Uint8Array` data to file with `Deno.writeFileSync`

To run the example, after installing the packages, save the following script to
`SheetJSPuppeteer.ts` and run `deno run -A --unstable SheetJSPuppeteer.js`.

```js title="SheetJSPuppeteer.ts"
import puppeteer from "https://deno.land/x/puppeteer@14.1.1/mod.ts";
import { decode } from "https://deno.land/std/encoding/base64.ts"

/* (1) Load the target page */
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on("console", msg => console.log("PAGE LOG:", msg.text()));
await page.setViewport({width: 1920, height: 1080});
await page.goto('https://sheetjs.com/demos/table');

/* (2) Load the standalone SheetJS build from the CDN */
await page.addScriptTag({ url: 'https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js' });

/* (3) Run the snippet in browser and return data */
const b64 = await page.evaluate(() => {
  /* NOTE: this function will be evaluated in the browser context.
     `page`, `fs` and `puppeteer` are not available.
     `XLSX` will be available thanks to step 2 */

  /* find first table */
  var table = document.body.getElementsByTagName('table')[0];

  /* call table_to_book on first table */
  var wb = XLSX.utils.table_to_book(table);

  /* generate XLSB and return binary string */
  return XLSX.write(wb, {type: "base64", bookType: "xlsb"});
});
/* (4) write data to file */
Deno.writeFileSync("SheetJSPuppeteer.xlsb", decode(b64));

await browser.close();
```

This script will generate `SheetJSPuppeteer.xlsb` which can be opened in Excel.

  </TabItem>
</Tabs>


## Playwright

Playwright presents a unified scripting framework for Chromium, WebKit, and
other browsers.  It draws inspiration from Puppeteer.  In fact, the example
code is almost identical!

```bash
npm i https://cdn.sheetjs.com/xlsx-latest/xlsx-latest.tgz playwright
```

To run the example, after installing the packages, save the following script to
`SheetJSPlaywright.js` and run `node SheetJSPlaywright.js`.  Import divergences
from the Puppeteer example are highlighted below:

```js title="SheetJSPlaywright.js"
const fs = require("fs");
// highlight-next-line
const { webkit } = require('playwright'); // import desired browser
(async () => {
  /* (1) Load the target page */
  // highlight-next-line
  const browser = await webkit.launch(); // launch desired browser
  const page = await browser.newPage();
  page.on("console", msg => console.log("PAGE LOG:", msg.text()));
  // highlight-next-line
  await page.setViewportSize({width: 1920, height: 1080}); // different name :(
  await page.goto('https://sheetjs.com/demos/table');

  /* (2) Load the standalone SheetJS build from the CDN */
  await page.addScriptTag({ url: 'https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js' });

  /* (3) Run the snippet in browser and return data */
  const bin = await page.evaluate(() => {
    /* NOTE: this function will be evaluated in the browser context.
       `page`, `fs` and the browser engine are not available.
       `XLSX` will be available thanks to step 2 */

    /* find first table */
    var table = document.body.getElementsByTagName('table')[0];

    /* call table_to_book on first table */
    var wb = XLSX.utils.table_to_book(table);

    /* generate XLSB and return binary string */
    return XLSX.write(wb, {type: "binary", bookType: "xlsb"});
  });

  /* (4) write data to file */
  fs.writeFileSync("SheetJSPlaywright.xlsb", bin, { encoding: "binary" });

  await browser.close();
})();
```


## PhantomJS

PhantomJS is a headless web browser powered by WebKit.

:::warning

This information is provided for legacy deployments.  PhantomJS development has
been suspended and there are known vulnerabilities, so new projects should use
alternatives.  For WebKit automation, new projects should use Playwright.

:::

Binary strings are the favored data type.  They can be safely passed from the
browser context to the automation script.  PhantomJS provides an API to write
binary strings to file (`fs.write` using mode `wb`).

To run the example, save the following script to `SheetJSPhantom.js` in the same
folder as `phantomjs.exe` or `phantomjs` and run

```
./phantomjs SheetJSPhantom.js     ## MacOS / Linux
.\phantomjs.exe SheetJSPhantom.js ## windows
```

The steps are marked in the comments:

```js title="SheetJSPhantom.js"
var page = require('webpage').create();
page.onConsoleMessage = function(msg) { console.log(msg); };

/* (1) Load the target page */
page.open('https://sheetjs.com/demos/table', function() {

  /* (2) Load the standalone SheetJS build from the CDN */
  page.includeJs("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js", function() {

    /* (3) Run the snippet in browser and return data */
    var bin = page.evaluateJavaScript([ "function(){",

      /* find first table */
      "var table = document.body.getElementsByTagName('table')[0];",

      /* call table_to_book on first table */
      "var wb = XLSX.utils.table_to_book(table);",

      /* generate XLSB file and return binary string */
      "return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
    "}" ].join(""));

    /* (4) write data to file */
    require("fs").write("SheetJSPhantomJS.xlsb", bin, "wb");

    phantom.exit();
  });
});
```

:::caution

PhantomJS is very finicky and will hang if there are script errors.  It is
strongly recommended to add verbose logging and to lint scripts before use.

:::