2022-05-16 03:26:04 +00:00
|
|
|
---
|
2024-04-12 07:11:07 +00:00
|
|
|
title: Data Import
|
2022-05-16 03:26:04 +00:00
|
|
|
sidebar_position: 1
|
2023-02-11 07:13:53 +00:00
|
|
|
pagination_prev: demos/index
|
2022-05-16 03:26:04 +00:00
|
|
|
---
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
import current from '/version.js';
|
|
|
|
import Tabs from '@theme/Tabs';
|
|
|
|
import TabItem from '@theme/TabItem';
|
2023-04-27 09:12:19 +00:00
|
|
|
import CodeBlock from '@theme/CodeBlock';
|
2022-06-11 17:21:26 +00:00
|
|
|
|
2023-08-01 00:28:19 +00:00
|
|
|
:::tip pass
|
|
|
|
|
|
|
|
The ["Import Tutorial"](/docs/getting-started/examples/import) is a gentle
|
|
|
|
introduction to data import and analysis.
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
## Parsing Workbooks
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
### API
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
_Extract data from spreadsheet bytes_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var workbook = XLSX.read(data, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `read` method can extract data from spreadsheet bytes stored in a JS string,
|
|
|
|
"binary string", NodeJS buffer or typed array (`Uint8Array` or `ArrayBuffer`).
|
|
|
|
|
|
|
|
|
|
|
|
_Read spreadsheet bytes from a local file and extract data_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var workbook = XLSX.readFile(filename, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `readFile` method attempts to read a spreadsheet file at the supplied path.
|
|
|
|
|
2022-10-30 05:45:37 +00:00
|
|
|
The second `opts` argument is optional. ["Parsing Options"](/docs/api/parse-options)
|
2022-05-16 03:26:04 +00:00
|
|
|
covers the supported properties and behaviors.
|
|
|
|
|
2024-04-14 07:40:38 +00:00
|
|
|
:::danger pass
|
2022-06-11 17:21:26 +00:00
|
|
|
|
|
|
|
Browsers generally do not allow reading files by specifying filename (it is a
|
|
|
|
security risk), and running `XLSX.readFile` in the browser will throw an error.
|
|
|
|
|
|
|
|
Deno scripts must be invoked with `--allow-read` to read from the filesystem.
|
|
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
#### Examples
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
Here are a few common scenarios (click on each subtitle to see the code).
|
|
|
|
|
2022-10-30 05:45:37 +00:00
|
|
|
The [demos](/docs/demos) cover special deployments in more detail.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
### Example: Local File
|
|
|
|
|
|
|
|
`XLSX.readFile` supports reading local files in platforms like NodeJS. In other
|
|
|
|
platforms like React Native, `XLSX.read` should be called with file data.
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
In-browser processing where users drag-and-drop files or use a file element are
|
|
|
|
covered in [the "User Submissions" example.](#example-user-submissions)
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
<Tabs>
|
|
|
|
<TabItem value="nodejs" label="NodeJS">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
`readFile` uses `fs.readFileSync` under the hood:
|
|
|
|
|
|
|
|
```js
|
|
|
|
var XLSX = require("xlsx");
|
|
|
|
|
|
|
|
var workbook = XLSX.readFile("test.xlsx");
|
|
|
|
```
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
For Node ESM, `fs` must be loaded manually:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
2022-06-21 12:26:53 +00:00
|
|
|
import * as fs from "fs";
|
2023-01-09 05:08:30 +00:00
|
|
|
import { readFile, set_fs } from "xlsx";
|
2022-06-21 12:26:53 +00:00
|
|
|
set_fs(fs);
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
const workbook = readFile("test.xlsx");
|
2022-05-16 03:26:04 +00:00
|
|
|
```
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="electron" label="Electron">
|
|
|
|
|
|
|
|
`readFile` can be used in the renderer process:
|
|
|
|
|
|
|
|
```js
|
|
|
|
/* From the renderer process */
|
|
|
|
var XLSX = require("xlsx");
|
|
|
|
|
|
|
|
var workbook = XLSX.readFile(path);
|
|
|
|
```
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
Electron APIs have changed over time. The [`electron` demo](/docs/demos/desktop/electron)
|
2022-06-11 17:21:26 +00:00
|
|
|
shows a complete example and details the required version-specific settings.
|
|
|
|
|
|
|
|
</TabItem>
|
|
|
|
<TabItem value="reactnative" label="React Native">
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
[The React Native Demo](/docs/demos/mobile/reactnative#rn-file-plugins) covers tested plugins.
|
2022-06-11 17:21:26 +00:00
|
|
|
|
|
|
|
</TabItem>
|
|
|
|
<TabItem value="extendscript" label="Photoshop">
|
|
|
|
|
|
|
|
`readFile` wraps the `File` logic in Photoshop and other ExtendScript targets.
|
|
|
|
The specified path should be an absolute path:
|
|
|
|
|
|
|
|
```js
|
|
|
|
#include "xlsx.extendscript.js"
|
|
|
|
|
|
|
|
/* Read test.xlsx from the Documents folder */
|
|
|
|
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
|
|
|
|
```
|
|
|
|
|
|
|
|
For user-configurable paths, `openDialog` can show a file picker:
|
|
|
|
|
|
|
|
```js
|
|
|
|
#include "xlsx.extendscript.js"
|
|
|
|
|
|
|
|
/* Ask user to select path */
|
|
|
|
var thisFile = File.openDialog("Select a spreadsheet");
|
|
|
|
var workbook = XLSX.readFile(thisFile.absoluteURI);
|
|
|
|
```
|
|
|
|
|
2024-10-26 03:17:31 +00:00
|
|
|
The ["ExtendScript" demo](/docs/demos/extensions/extendscript) includes complete
|
2024-04-12 01:04:37 +00:00
|
|
|
examples for Photoshop and InDesign.
|
2022-06-11 17:21:26 +00:00
|
|
|
|
2022-07-10 22:17:34 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="deno" label="Deno">
|
|
|
|
|
|
|
|
`readFile` uses `Deno.readFileSync` under the hood:
|
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
<CodeBlock language="ts">{`\
|
2022-07-10 22:17:34 +00:00
|
|
|
// @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts"
|
|
|
|
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs';
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2024-03-22 04:45:40 +00:00
|
|
|
const workbook = XLSX.readFile("test.xlsx");`}
|
|
|
|
</CodeBlock>
|
2022-07-10 22:17:34 +00:00
|
|
|
|
2023-09-24 03:59:48 +00:00
|
|
|
:::note pass
|
2022-07-10 22:17:34 +00:00
|
|
|
|
2022-07-31 23:45:34 +00:00
|
|
|
Applications reading files must be invoked with the `--allow-read` flag.
|
|
|
|
|
|
|
|
:::
|
2022-07-10 22:17:34 +00:00
|
|
|
|
2022-07-09 23:59:44 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="bun" label="Bun">
|
|
|
|
|
2022-11-08 05:43:21 +00:00
|
|
|
Bun `readFileSync` output should be wrapped in a `Buffer`:
|
2022-07-09 23:59:44 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
import { readFileSync } from 'fs'
|
|
|
|
import { read } from './xlsx.mjs'
|
|
|
|
|
|
|
|
const workbook = read(Buffer.from(readFileSync(path)));
|
|
|
|
```
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
|
|
|
|
|
|
|
|
|
|
|
### Example: User Submissions
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
This example focuses on user-submitted files through a drag-and-drop event, HTML
|
|
|
|
file input element, or network request.
|
|
|
|
|
|
|
|
<Tabs>
|
|
|
|
<TabItem value="browser" label="Browser">
|
|
|
|
|
|
|
|
**For modern websites targeting Chrome 76+**, `File#arrayBuffer` is recommended:
|
|
|
|
|
|
|
|
<Tabs>
|
|
|
|
<TabItem value="dnd" label="Drag and Drop">
|
|
|
|
|
|
|
|
Assume `drop_dom_element` is the DOM element that will listen for changes:
|
|
|
|
|
|
|
|
```html
|
|
|
|
<div id="drop_dom_element">Drop files here</div>
|
|
|
|
```
|
|
|
|
|
|
|
|
The event property is `e.dataTransfer`. The code snippet highlights the
|
|
|
|
difference between the drag-and-drop example and the file input example:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
|
|
|
```js
|
|
|
|
// XLSX is a global from the standalone script
|
|
|
|
|
|
|
|
async function handleDropAsync(e) {
|
|
|
|
e.stopPropagation(); e.preventDefault();
|
2022-06-27 02:05:36 +00:00
|
|
|
// highlight-next-line
|
2022-05-16 03:26:04 +00:00
|
|
|
const f = e.dataTransfer.files[0];
|
|
|
|
/* f is a File */
|
|
|
|
const data = await f.arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
|
|
|
const workbook = XLSX.read(data);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
}
|
|
|
|
drop_dom_element.addEventListener("drop", handleDropAsync, false);
|
|
|
|
```
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="file" label="HTML File Input Element">
|
|
|
|
|
|
|
|
Starting with an HTML INPUT element with `type="file"`:
|
|
|
|
|
|
|
|
```html
|
|
|
|
<input type="file" id="input_dom_element">
|
|
|
|
```
|
|
|
|
|
|
|
|
The event property is `e.target`. The code snippet highlights the difference
|
|
|
|
between the drag-and-drop example and the file input example:
|
|
|
|
|
|
|
|
```js
|
|
|
|
// XLSX is a global from the standalone script
|
|
|
|
|
|
|
|
async function handleFileAsync(e) {
|
|
|
|
// highlight-next-line
|
|
|
|
const file = e.target.files[0];
|
|
|
|
const data = await file.arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
|
|
|
const workbook = XLSX.read(data);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
}
|
|
|
|
input_dom_element.addEventListener("change", handleFileAsync, false);
|
|
|
|
```
|
|
|
|
|
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
|
|
|
|
2024-04-08 04:47:04 +00:00
|
|
|
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
|
2022-06-27 02:05:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
**For maximal compatibility (IE10+)**, the `FileReader` approach is recommended:
|
|
|
|
|
|
|
|
<Tabs>
|
|
|
|
<TabItem value="dnd" label="Drag and Drop">
|
|
|
|
|
|
|
|
Assume `drop_dom_element` is the DOM element that will listen for changes:
|
|
|
|
|
|
|
|
```html
|
|
|
|
<div id="drop_dom_element">Drop files here</div>
|
|
|
|
```
|
|
|
|
|
|
|
|
The event property is `e.dataTransfer`. The code snippet highlights the
|
|
|
|
difference between the drag-and-drop example and the file input example:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
function handleDrop(e) {
|
|
|
|
e.stopPropagation(); e.preventDefault();
|
2022-06-27 02:05:36 +00:00
|
|
|
// highlight-next-line
|
2022-05-16 03:26:04 +00:00
|
|
|
var f = e.dataTransfer.files[0];
|
|
|
|
/* f is a File */
|
|
|
|
var reader = new FileReader();
|
|
|
|
reader.onload = function(e) {
|
|
|
|
var data = e.target.result;
|
|
|
|
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
|
|
|
|
var workbook = XLSX.read(data);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
};
|
|
|
|
reader.readAsArrayBuffer(f);
|
|
|
|
}
|
|
|
|
drop_dom_element.addEventListener("drop", handleDrop, false);
|
|
|
|
```
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="file" label="HTML File Input Element">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
Starting with an HTML INPUT element with `type="file"`:
|
|
|
|
|
|
|
|
```html
|
|
|
|
<input type="file" id="input_dom_element">
|
|
|
|
```
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
The event property is `e.target`. The code snippet highlights the difference
|
|
|
|
between the drag-and-drop example and the file input example:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
function handleFile(e) {
|
2022-06-27 02:05:36 +00:00
|
|
|
// highlight-next-line
|
2022-05-16 03:26:04 +00:00
|
|
|
var file = e.target.files[0];
|
|
|
|
var reader = new FileReader();
|
|
|
|
reader.onload = function(e) {
|
|
|
|
var data = e.target.result;
|
|
|
|
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
|
|
|
|
var workbook = XLSX.read(e.target.result);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
};
|
|
|
|
reader.readAsArrayBuffer(file);
|
|
|
|
}
|
|
|
|
input_dom_element.addEventListener("change", handleFile, false);
|
|
|
|
```
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2023-01-09 17:25:32 +00:00
|
|
|
The [`oldie` demo](/docs/demos/frontend/legacy#internet-explorer) shows an IE-compatible fallback scenario.
|
2022-06-11 17:21:26 +00:00
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="nodejs" label="NodeJS">
|
2022-06-11 17:21:26 +00:00
|
|
|
|
|
|
|
`read` can accept a NodeJS buffer. `readFile` can read files generated by a
|
2022-10-19 10:05:59 +00:00
|
|
|
HTTP POST request body parser like **`formidable`**:
|
2022-06-11 17:21:26 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const http = require("http");
|
|
|
|
const formidable = require("formidable");
|
|
|
|
|
|
|
|
const server = http.createServer((req, res) => {
|
|
|
|
const form = new formidable.IncomingForm();
|
|
|
|
form.parse(req, (err, fields, files) => {
|
|
|
|
/* grab the first file */
|
|
|
|
const f = Object.entries(files)[0][1];
|
|
|
|
const path = f.filepath;
|
|
|
|
const workbook = XLSX.readFile(path);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
});
|
|
|
|
}).listen(process.env.PORT || 7262);
|
|
|
|
```
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
The [`server` demo](/docs/demos/net/server) includes more advanced examples.
|
2022-06-11 17:21:26 +00:00
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="deno" label="Deno">
|
|
|
|
|
2022-11-13 20:45:13 +00:00
|
|
|
Drash is a HTTP server framework for Deno. In a `POST` request handler, the
|
|
|
|
body parser can pull file data into a `Uint8Array`:
|
2022-06-27 02:05:36 +00:00
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
<CodeBlock language="ts">{`\
|
2022-06-27 02:05:36 +00:00
|
|
|
// @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts"
|
|
|
|
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs';
|
|
|
|
/* load the codepage support library for extended support with older formats */
|
|
|
|
import * as cptable from 'https://cdn.sheetjs.com/xlsx-${current}/package/dist/cpexcel.full.mjs';
|
|
|
|
XLSX.set_cptable(cptable);
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2023-10-19 05:23:55 +00:00
|
|
|
import * as Drash from "https://cdn.jsdelivr.net/gh/drashland/drash@v2.8.1/mod.ts";
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-06-27 02:05:36 +00:00
|
|
|
class SheetResource extends Drash.Resource {
|
|
|
|
public paths = ["/"];
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-06-27 02:05:36 +00:00
|
|
|
public POST(request: Drash.Request, response: Drash.Response) {
|
|
|
|
// highlight-next-line
|
|
|
|
const file = request.bodyParam<Drash.Types.BodyFile>("file");
|
|
|
|
if (!file) throw new Error("File is required!");
|
|
|
|
// highlight-next-line
|
2022-11-13 20:45:13 +00:00
|
|
|
var wb = XLSX.read(file.content);
|
2022-06-27 02:05:36 +00:00
|
|
|
var html = XLSX.utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]);
|
|
|
|
return response.html(html);
|
|
|
|
}
|
|
|
|
}
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-06-27 02:05:36 +00:00
|
|
|
const server = new Drash.Server({ hostname: "", port: 7262, protocol: "http",
|
|
|
|
resources: [
|
|
|
|
// highlight-next-line
|
|
|
|
SheetResource,
|
|
|
|
],
|
|
|
|
});
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2024-03-22 04:45:40 +00:00
|
|
|
server.run();`}
|
|
|
|
</CodeBlock>
|
2022-06-27 02:05:36 +00:00
|
|
|
|
2023-09-24 03:59:48 +00:00
|
|
|
:::note pass
|
2022-06-27 02:05:36 +00:00
|
|
|
|
|
|
|
Deno must be run with the `--allow-net` flag to enable network requests:
|
|
|
|
|
|
|
|
```bash
|
2022-08-07 07:48:40 +00:00
|
|
|
deno run --allow-net test-server.ts
|
2022-06-27 02:05:36 +00:00
|
|
|
```
|
|
|
|
|
2023-05-03 03:40:40 +00:00
|
|
|
To test, submit a POST request to `http://localhost:7262` with an attachment:
|
2022-06-27 02:05:36 +00:00
|
|
|
|
|
|
|
```bash
|
|
|
|
curl -X POST -F "file=@test.xlsx" http://localhost:7262/
|
|
|
|
```
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
|
|
|
|
### Example: Remote File
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
This example focuses on fetching files ("Ajax" in browser parlance) using APIs
|
|
|
|
like `XMLHttpRequest` and `fetch` as well as third-party libraries.
|
|
|
|
|
|
|
|
<Tabs>
|
|
|
|
<TabItem value="browser" label="Browser">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
For modern websites targeting Chrome 42+, `fetch` is recommended:
|
|
|
|
|
|
|
|
```js
|
|
|
|
// XLSX is a global from the standalone script
|
|
|
|
|
|
|
|
(async() => {
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
const data = await (await fetch(url)).arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
|
|
|
const workbook = XLSX.read(data);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
})();
|
|
|
|
```
|
|
|
|
|
|
|
|
For broader support, the `XMLHttpRequest` approach is recommended:
|
|
|
|
|
|
|
|
```js
|
2024-04-26 04:16:13 +00:00
|
|
|
var url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
/* set up async GET request */
|
|
|
|
var req = new XMLHttpRequest();
|
|
|
|
req.open("GET", url, true);
|
|
|
|
req.responseType = "arraybuffer";
|
|
|
|
|
|
|
|
req.onload = function(e) {
|
|
|
|
var workbook = XLSX.read(req.response);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
};
|
|
|
|
|
|
|
|
req.send();
|
|
|
|
```
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
The [HTTP Downloads demo](/docs/demos/net/network) includes examples using
|
|
|
|
browser APIs and wrapper libraries.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-04-08 04:47:04 +00:00
|
|
|
https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="nodejs" label="NodeJS">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
NodeJS releases starting from version 18.0 have native support for fetch:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
const data = await (await fetch(url)).arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
|
|
|
const workbook = XLSX.read(data);
|
|
|
|
```
|
|
|
|
|
|
|
|
For broader compatibility, third-party modules are recommended.
|
|
|
|
|
2022-10-19 10:05:59 +00:00
|
|
|
**`request`** requires a `null` encoding to yield Buffers:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
var XLSX = require("xlsx");
|
|
|
|
var request = require("request");
|
|
|
|
|
2024-04-26 04:16:13 +00:00
|
|
|
var url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
request({url: url, encoding: null}, function(err, resp, body) {
|
|
|
|
var workbook = XLSX.read(body);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
});
|
|
|
|
```
|
|
|
|
|
2022-10-19 10:05:59 +00:00
|
|
|
**`axios`** works the same way in browser and in NodeJS:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
```js
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const axios = require("axios");
|
|
|
|
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
(async() => {
|
|
|
|
const res = await axios.get(url, {responseType: "arraybuffer"});
|
|
|
|
/* res.data is a Buffer */
|
|
|
|
const workbook = XLSX.read(res.data);
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
})();
|
2022-07-24 10:22:17 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
</TabItem>
|
|
|
|
<TabItem value="bun" label="Bun">
|
|
|
|
|
2022-10-30 05:45:37 +00:00
|
|
|
Bun has native support for `fetch`. Using the [NodeJS package](/docs/getting-started/installation/nodejs):
|
2022-07-24 10:22:17 +00:00
|
|
|
|
|
|
|
```js
|
2023-01-09 05:08:30 +00:00
|
|
|
import * as XLSX from 'xlsx';
|
2022-07-24 10:22:17 +00:00
|
|
|
/* load the codepage support library for extended support with older formats */
|
|
|
|
import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
|
|
|
|
XLSX.set_cptable(cptable);
|
|
|
|
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-07-24 10:22:17 +00:00
|
|
|
// highlight-next-line
|
|
|
|
const data = await (await fetch(url)).arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
|
|
|
const workbook = XLSX.read(data);
|
2022-05-16 03:26:04 +00:00
|
|
|
```
|
|
|
|
|
2022-06-27 02:05:36 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="deno" label="Deno">
|
|
|
|
|
2022-07-24 10:22:17 +00:00
|
|
|
Deno has native support for `fetch`.
|
2022-06-27 02:05:36 +00:00
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
<CodeBlock language="ts">{`\
|
2022-06-27 02:05:36 +00:00
|
|
|
// @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts"
|
|
|
|
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs';
|
|
|
|
/* load the codepage support library for extended support with older formats */
|
|
|
|
import * as cptable from 'https://cdn.sheetjs.com/xlsx-${current}/package/dist/cpexcel.full.mjs';
|
|
|
|
XLSX.set_cptable(cptable);
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-06-27 02:05:36 +00:00
|
|
|
// highlight-next-line
|
|
|
|
const data = await (await fetch(url)).arrayBuffer();
|
|
|
|
/* data is an ArrayBuffer */
|
2024-03-22 04:45:40 +00:00
|
|
|
const workbook = XLSX.read(data);`}
|
|
|
|
</CodeBlock>
|
2022-06-27 02:05:36 +00:00
|
|
|
|
2023-09-24 03:59:48 +00:00
|
|
|
:::note pass
|
2022-06-27 02:05:36 +00:00
|
|
|
|
|
|
|
Deno must be run with the `--allow-net` flag to enable network requests:
|
|
|
|
|
2023-08-20 20:39:35 +00:00
|
|
|
```bash
|
2022-08-07 07:48:40 +00:00
|
|
|
deno run --allow-net test-fetch.ts
|
2022-06-27 02:05:36 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="electron" label="Electron">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
The `net` module in the main process can make HTTP/HTTPS requests to external
|
|
|
|
resources. Responses should be manually concatenated using `Buffer.concat`:
|
|
|
|
|
|
|
|
```js
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const { net } = require("electron");
|
|
|
|
|
2024-04-26 04:16:13 +00:00
|
|
|
const url = "https://docs.sheetjs.com/pres.xlsx";
|
2022-05-16 03:26:04 +00:00
|
|
|
const req = net.request(url);
|
|
|
|
req.on("response", (res) => {
|
|
|
|
const bufs = []; // this array will collect all of the buffers
|
|
|
|
res.on("data", (chunk) => { bufs.push(chunk); });
|
|
|
|
res.on("end", () => {
|
|
|
|
const workbook = XLSX.read(Buffer.concat(bufs));
|
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
|
|
|
});
|
|
|
|
});
|
|
|
|
req.end();
|
|
|
|
```
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-11 17:21:26 +00:00
|
|
|
### Example: Readable Streams
|
|
|
|
|
2023-09-24 03:59:48 +00:00
|
|
|
:::caution pass
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
The recommended approach is to buffer streams in memory and process once all of
|
|
|
|
the data has been collected. A proper streaming parse is technically impossible.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-04-08 04:47:04 +00:00
|
|
|
<details>
|
|
|
|
<summary><b>Technical details</b> (click to show)</summary>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary
|
|
|
|
and XML entries. The ZIP file format stores the table of contents ("end of
|
|
|
|
central directory" record) at the end of the file, so a proper parse of a ZIP
|
|
|
|
file requires scanning from the end. Streams do not provide random access into
|
|
|
|
the data, so the only correct approach involves buffering the entire stream.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format".
|
|
|
|
It is a container format that can hold multiple "files" and "folders". It also
|
|
|
|
has a table of contents ("directory sectors") but these can be placed anywhere
|
|
|
|
in the file! The only correct approach involves buffering enough of the stream
|
|
|
|
to find the full table of contents, but the added complexity has little benefit
|
|
|
|
when testing against real-world files generated by various versions of Excel and
|
|
|
|
other tools.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
</details>
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
|
|
<Tabs>
|
|
|
|
<TabItem value="browser" label="Browser">
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
When dealing with `ReadableStream`, the easiest approach is to buffer the stream
|
|
|
|
and process the whole thing at the end:
|
|
|
|
|
|
|
|
```js
|
|
|
|
// XLSX is a global from the standalone script
|
|
|
|
|
2022-11-13 20:45:13 +00:00
|
|
|
async function buffer_RS(stream) {
|
2022-05-16 03:26:04 +00:00
|
|
|
/* collect data */
|
|
|
|
const buffers = [];
|
|
|
|
const reader = stream.getReader();
|
|
|
|
for(;;) {
|
|
|
|
const res = await reader.read();
|
|
|
|
if(res.value) buffers.push(res.value);
|
|
|
|
if(res.done) break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* concat */
|
|
|
|
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
|
|
|
|
|
|
|
|
let off = 0;
|
|
|
|
for(const u8 of buffers) {
|
|
|
|
out.set(u8, off);
|
|
|
|
off += u8.length;
|
|
|
|
}
|
|
|
|
|
|
|
|
return out;
|
|
|
|
}
|
|
|
|
|
2022-11-13 20:45:13 +00:00
|
|
|
const data = await buffer_RS(stream);
|
2022-05-16 03:26:04 +00:00
|
|
|
/* data is Uint8Array */
|
2022-11-13 20:45:13 +00:00
|
|
|
const workbook = XLSX.read(data);
|
2022-05-16 03:26:04 +00:00
|
|
|
```
|
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="nodejs" label="NodeJS">
|
|
|
|
|
|
|
|
When dealing with Readable Streams, the easiest approach is to buffer the stream
|
|
|
|
and process the whole thing at the end:
|
|
|
|
|
|
|
|
```js
|
|
|
|
var XLSX = require("xlsx");
|
|
|
|
|
|
|
|
function process_RS(stream, cb) {
|
|
|
|
var buffers = [];
|
|
|
|
stream.on("data", function(data) { buffers.push(data); });
|
|
|
|
stream.on("end", function() {
|
|
|
|
var buffer = Buffer.concat(buffers);
|
2022-11-13 20:45:13 +00:00
|
|
|
var workbook = XLSX.read(buffer);
|
2022-06-21 12:26:53 +00:00
|
|
|
|
|
|
|
/* DO SOMETHING WITH workbook IN THE CALLBACK */
|
|
|
|
cb(workbook);
|
|
|
|
});
|
|
|
|
}
|
2023-06-04 08:05:50 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
In recent versions of NodeJS, Promises are preferred:
|
|
|
|
|
|
|
|
```js
|
|
|
|
var XLSX = require("xlsx");
|
|
|
|
|
|
|
|
/* async_RS reads a stream and returns a Promise resolving to a workbook */
|
|
|
|
const async_RS = (stream) => new Promise((res, rej) => {
|
|
|
|
var buffers = [];
|
|
|
|
stream.on("data", function(data) { buffers.push(data); });
|
|
|
|
stream.on("end", function() {
|
|
|
|
const buf = Buffer.concat(buffers);
|
|
|
|
const wb = XLSX.read(buf);
|
|
|
|
res(wb);
|
|
|
|
});
|
|
|
|
});
|
2022-06-21 12:26:53 +00:00
|
|
|
```
|
|
|
|
|
2022-07-06 05:38:24 +00:00
|
|
|
</TabItem>
|
|
|
|
<TabItem value="deno" label="Deno">
|
|
|
|
|
2022-11-08 05:43:21 +00:00
|
|
|
In addition to the browser `ReadableStream` API, Deno has a `Reader` class.
|
2022-07-06 05:38:24 +00:00
|
|
|
|
|
|
|
For these streams, `std` provides a `readAll` method to collect data into a
|
|
|
|
`Uint8Array`. This example reads from a file using `Deno.open` and prints the
|
|
|
|
worksheet names array:
|
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
<CodeBlock language="ts">{`\
|
2022-07-06 05:38:24 +00:00
|
|
|
// @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts"
|
|
|
|
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs';
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-07-06 05:38:24 +00:00
|
|
|
import { readAll } from "https://deno.land/std/streams/conversion.ts";
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-07-06 05:38:24 +00:00
|
|
|
/* Simple Deno.Reader from a file */
|
|
|
|
const file = await Deno.open("test.xlsx", {read: true});
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-07-06 05:38:24 +00:00
|
|
|
/* \`content\` will be a Uint8Array holding the full contents of the stream */
|
2022-12-01 01:13:00 +00:00
|
|
|
const content = await readAll(file);
|
2023-05-07 13:58:36 +00:00
|
|
|
\n\
|
2022-07-06 05:38:24 +00:00
|
|
|
/* Since this is a Uint8Array, \`XLSX.read\` "just works" */
|
|
|
|
const wb = XLSX.read(content);
|
2024-03-22 04:45:40 +00:00
|
|
|
console.log(wb.SheetNames);`}
|
|
|
|
</CodeBlock>
|
2022-07-06 05:38:24 +00:00
|
|
|
|
2022-06-21 12:26:53 +00:00
|
|
|
</TabItem>
|
|
|
|
</Tabs>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-10-30 05:45:37 +00:00
|
|
|
More detailed examples are covered in the [included demos](/docs/demos/)
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
## Processing JSON and JS Data
|
|
|
|
|
|
|
|
JSON and JS data tend to represent single worksheets. This section will use a
|
|
|
|
few utility functions to generate workbooks.
|
|
|
|
|
|
|
|
_Create a new Workbook_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var workbook = XLSX.utils.book_new();
|
|
|
|
```
|
|
|
|
|
2023-06-25 09:36:58 +00:00
|
|
|
The [`book_new` utility function](/docs/api/utilities/wb) creates an empty
|
|
|
|
workbook with no worksheets.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
#### API
|
|
|
|
|
|
|
|
_Create a worksheet from an array of arrays of JS values_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `aoa_to_sheet` utility function walks an "array of arrays" in row-major
|
|
|
|
order, generating a worksheet object. The following snippet generates a sheet
|
|
|
|
with cell `A1` set to the string `A1`, cell `B1` set to `B1`, etc:
|
|
|
|
|
|
|
|
```js
|
|
|
|
var worksheet = XLSX.utils.aoa_to_sheet([
|
|
|
|
["A1", "B1", "C1"],
|
|
|
|
["A2", "B2", "C2"],
|
|
|
|
["A3", "B3", "C3"]
|
|
|
|
]);
|
|
|
|
```
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
["Array of Arrays Input"](/docs/api/utilities/array#array-of-arrays-input) describes
|
2023-05-18 22:41:23 +00:00
|
|
|
the function and the optional `opts` argument in more detail.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
|
|
|
_Create a worksheet from an array of JS objects_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `json_to_sheet` utility function walks an array of JS objects in order,
|
|
|
|
generating a worksheet object. By default, it will generate a header row and
|
|
|
|
one row per object in the array. The optional `opts` argument has settings to
|
|
|
|
control the column order and header output.
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
["Array of Objects Input"](/docs/api/utilities/array#array-of-objects-input) describes
|
2022-11-08 05:43:21 +00:00
|
|
|
the function and the optional `opts` argument in more detail.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
#### Examples
|
|
|
|
|
2023-07-26 20:18:07 +00:00
|
|
|
["Export Tutorial"](/docs/getting-started/examples/export) contains a detailed
|
2023-06-29 21:07:52 +00:00
|
|
|
example of fetching data from a JSON Endpoint and generating a workbook.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2023-02-28 11:40:44 +00:00
|
|
|
[`x-spreadsheet`](/docs/demos/grid/xs) is an interactive data grid for
|
|
|
|
previewing and modifying structured data in the web browser.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-01-03 06:47:00 +00:00
|
|
|
["TensorFlow.js"](/docs/demos/math/tensorflow) covers strategies for
|
2022-07-08 00:04:16 +00:00
|
|
|
creating worksheets from ML library exports (datasets stored in Typed Arrays).
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
<details>
|
|
|
|
<summary><b>Records from a database query (SQL or no-SQL)</b> (click to show)</summary>
|
|
|
|
|
2023-02-24 07:46:48 +00:00
|
|
|
The [`data` demo](/docs/demos/data/) includes examples of working with
|
2022-05-16 03:26:04 +00:00
|
|
|
databases and query results.
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
|
|
|
|
## Processing HTML Tables
|
|
|
|
|
|
|
|
#### API
|
|
|
|
|
|
|
|
_Create a worksheet by scraping an HTML TABLE in the page_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `table_to_sheet` utility function takes a DOM TABLE element and iterates
|
|
|
|
through the rows to generate a worksheet. The `opts` argument is optional.
|
2023-05-18 22:41:23 +00:00
|
|
|
["HTML Table Input"](/docs/api/utilities/html#html-table-input) describes the
|
|
|
|
function in more detail.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_Create a workbook by scraping an HTML TABLE in the page_
|
|
|
|
|
|
|
|
```js
|
|
|
|
var workbook = XLSX.utils.table_to_book(dom_element, opts);
|
|
|
|
```
|
|
|
|
|
|
|
|
The `table_to_book` utility function follows the same logic as `table_to_sheet`.
|
|
|
|
After generating a worksheet, it creates a blank workbook and appends the
|
|
|
|
spreadsheet.
|
|
|
|
|
|
|
|
The options argument supports the same options as `table_to_sheet`, with the
|
|
|
|
addition of a `sheet` property to control the worksheet name. If the property
|
|
|
|
is missing or no options are specified, the default name `Sheet1` is used.
|
|
|
|
|
|
|
|
#### Examples
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
The [Headless Demo](/docs/demos/net/headless) includes examples of
|
2022-07-07 04:05:14 +00:00
|
|
|
server-side spreadsheet generation from HTML TABLE elements using headless
|
|
|
|
Chromium ("Puppeteer") and other browsers ("Playwright")
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
Here are a few common scenarios (click on each subtitle to see the code):
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary><b>HTML TABLE element in a webpage</b> (click to show)</summary>
|
|
|
|
|
2023-04-27 09:12:19 +00:00
|
|
|
<CodeBlock language="html">{`\
|
2022-07-07 04:05:14 +00:00
|
|
|
<!-- include the standalone script and shim -->
|
2023-04-27 09:12:19 +00:00
|
|
|
<script src="https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js"></script>
|
|
|
|
<script src="https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js"></script>
|
|
|
|
\n\
|
2022-05-16 03:26:04 +00:00
|
|
|
<!-- example table with id attribute -->
|
|
|
|
<table id="tableau">
|
|
|
|
<tr><td>Sheet</td><td>JS</td></tr>
|
|
|
|
<tr><td>12345</td><td>67</td></tr>
|
|
|
|
</table>
|
2023-04-27 09:12:19 +00:00
|
|
|
\n\
|
2022-05-16 03:26:04 +00:00
|
|
|
<!-- this block should appear after the table HTML and the standalone script -->
|
|
|
|
<script type="text/javascript">
|
|
|
|
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
|
2023-04-27 09:12:19 +00:00
|
|
|
\n\
|
2022-05-16 03:26:04 +00:00
|
|
|
/* DO SOMETHING WITH workbook HERE */
|
2023-04-27 09:12:19 +00:00
|
|
|
XLSX.writeFile(workbook, "SheetJSHTMLExport.xlsx");
|
|
|
|
</script>`}
|
|
|
|
</CodeBlock>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
Multiple tables on a web page can be converted to individual worksheets:
|
|
|
|
|
|
|
|
```js
|
|
|
|
/* create new workbook */
|
|
|
|
var workbook = XLSX.utils.book_new();
|
|
|
|
|
|
|
|
/* convert table "table1" to worksheet named "Sheet1" */
|
|
|
|
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
|
|
|
|
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
|
|
|
|
|
|
|
|
/* convert table "table2" to worksheet named "Sheet2" */
|
|
|
|
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
|
|
|
|
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
|
|
|
|
|
|
|
|
/* workbook now has 2 worksheets */
|
|
|
|
```
|
|
|
|
|
|
|
|
Alternatively, the HTML code can be extracted and parsed:
|
|
|
|
|
|
|
|
```js
|
|
|
|
var htmlstr = document.getElementById("tableau").outerHTML;
|
|
|
|
var workbook = XLSX.read(htmlstr, {type:"string"});
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary><b>Chrome/Chromium Extension</b> (click to show)</summary>
|
|
|
|
|
2024-04-12 01:04:37 +00:00
|
|
|
The ["Chrome and Chromium" demo](/docs/demos/extensions/chromium/) includes a
|
|
|
|
complete example and enumerates the required permissions and other settings.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
In an extension, it is recommended to generate the workbook in a content script
|
|
|
|
and pass the object back to the extension:
|
|
|
|
|
|
|
|
```js
|
|
|
|
/* in the worker script */
|
|
|
|
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
|
|
|
|
/* pass a message like { sheetjs: true } from the extension to scrape */
|
|
|
|
if(!msg || !msg.sheetjs) return;
|
|
|
|
/* create a new workbook */
|
|
|
|
var workbook = XLSX.utils.book_new();
|
|
|
|
/* loop through each table element */
|
|
|
|
var tables = document.getElementsByTagName("table")
|
|
|
|
for(var i = 0; i < tables.length; ++i) {
|
|
|
|
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
|
|
|
|
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
|
|
|
|
}
|
|
|
|
/* pass back to the extension */
|
|
|
|
return cb(workbook);
|
|
|
|
});
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary><b>NodeJS HTML Tables without a browser</b> (click to show)</summary>
|
|
|
|
|
|
|
|
NodeJS does not include a DOM implementation and Puppeteer requires a hefty
|
2023-05-18 22:41:23 +00:00
|
|
|
Chromium build. The ["Synthetic DOM"](/docs/demos/net/dom) demo includes
|
|
|
|
examples for NodeJS.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
</details>
|