2023-05-18 22:41:23 +00:00
|
|
|
---
|
|
|
|
title: Synthetic DOM
|
|
|
|
---
|
|
|
|
|
|
|
|
import current from '/version.js';
|
|
|
|
import CodeBlock from '@theme/CodeBlock';
|
|
|
|
|
|
|
|
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
|
|
|
Traditionally there is no DOM in server-side environments.
|
|
|
|
|
|
|
|
:::note
|
|
|
|
|
|
|
|
The most robust approach for server-side processing is to automate a headless
|
|
|
|
web browser. ["Browser Automation"](/docs/demos/net/headless) includes demos.
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
This demo covers synthetic DOM implementations for non-browser platforms.
|
|
|
|
|
|
|
|
## NodeJS
|
|
|
|
|
|
|
|
### JSDOM
|
|
|
|
|
|
|
|
JSDOM is a DOM implementation for NodeJS. Given an HTML string, a reference to
|
|
|
|
the table element plays nice with the SheetJS DOM methods:
|
|
|
|
|
|
|
|
```js
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const { JSDOM } = require("jsdom");
|
|
|
|
|
|
|
|
/* parse HTML */
|
|
|
|
const dom = new JSDOM(html_string);
|
|
|
|
/* get first TABLE element */
|
|
|
|
const tbl = dom.window.document.querySelector("table");
|
|
|
|
/* generate workbook */
|
|
|
|
const workbook = XLSX.utils.table_to_book(tbl);
|
|
|
|
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
|
|
|
|
```
|
|
|
|
|
2023-05-20 21:37:10 +00:00
|
|
|
<details><summary><b>Complete Demo</b> (click to show)</summary>
|
2023-05-18 22:41:23 +00:00
|
|
|
|
|
|
|
:::note
|
|
|
|
|
|
|
|
This demo was last tested on 2023 May 18 against JSDOM `22.0.0`
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
1) Install SheetJS and JSDOM libraries:
|
|
|
|
|
|
|
|
<CodeBlock language="bash">{`\
|
|
|
|
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz jsdom@22.0.0`}
|
|
|
|
</CodeBlock>
|
|
|
|
|
|
|
|
2) Save the following script to `SheetJSDOM.js`:
|
|
|
|
|
|
|
|
```js title="SheetJSDOM.js"
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const { readFileSync } = require("fs");
|
|
|
|
const { JSDOM } = require("jsdom");
|
|
|
|
|
|
|
|
/* obtain HTML string. This example reads from SheetJSTable.html */
|
|
|
|
const html_str = readFileSync("SheetJSTable.html", "utf8");
|
|
|
|
/* get first TABLE element */
|
|
|
|
const doc = new JSDOM(html_str).window.document.querySelector("table");
|
|
|
|
/* generate workbook */
|
|
|
|
const workbook = XLSX.utils.table_to_book(doc);
|
|
|
|
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
|
|
|
|
```
|
|
|
|
|
|
|
|
3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html):
|
|
|
|
|
|
|
|
```bash
|
|
|
|
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
|
|
|
|
```
|
|
|
|
|
|
|
|
4) Run the script:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
node SheetJSDOM.js
|
|
|
|
```
|
|
|
|
|
|
|
|
The script will create a file `SheetJSDOM.xlsx` that can be opened.
|
|
|
|
|
2023-05-20 21:37:10 +00:00
|
|
|
</details>
|
|
|
|
|
|
|
|
### XMLDOM
|
|
|
|
|
|
|
|
XMLDOM provides a DOM framework for NodeJS. Given an HTML string, a reference to
|
|
|
|
the table element works with the SheetJS DOM methods after patching the object.
|
|
|
|
|
|
|
|
<details><summary><b>Complete Demo</b> (click to show)</summary>
|
|
|
|
|
|
|
|
:::note
|
|
|
|
|
|
|
|
This demo was last tested on 2023 May 18 against XMLDOM `0.8.7`
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
1) Install SheetJS and XMLDOM libraries:
|
|
|
|
|
|
|
|
<CodeBlock language="bash">{`\
|
|
|
|
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz @xmldom/xmldom@0.8.7`}
|
|
|
|
</CodeBlock>
|
|
|
|
|
|
|
|
2) Save the following codeblock to `SheetJSXMLDOM.js`:
|
|
|
|
|
|
|
|
```js title="SheetJSXMLDOM.js"
|
|
|
|
const XLSX = require("xlsx");
|
|
|
|
const { DOMParser, XMLSerializer } = require("@xmldom/xmldom");
|
|
|
|
|
|
|
|
(async() => {
|
|
|
|
const text = await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text();
|
|
|
|
const doc = new DOMParser().parseFromString( text, "text/html");
|
|
|
|
const tbl = doc.getElementsByTagName("table")[0];
|
|
|
|
|
|
|
|
/* patch XMLDOM */
|
|
|
|
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
|
|
|
|
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")))
|
|
|
|
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
|
|
|
|
var outerHTML = new XMLSerializer().serializeToString(this);
|
|
|
|
if(outerHTML.match(/</g).length == 1) return "";
|
|
|
|
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
|
|
|
|
}});
|
|
|
|
|
|
|
|
const workbook = XLSX.utils.table_to_book(tbl);
|
|
|
|
XLSX.writeFile(workbook, "SheetJSXMLDOM.xlsx");
|
|
|
|
})();
|
|
|
|
```
|
|
|
|
|
|
|
|
3) Run the script:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
node SheetJSXMLDOM.js
|
|
|
|
```
|
|
|
|
|
|
|
|
The script will create a file `SheetJSXMLDOM.xlsx` that can be opened.
|
|
|
|
|
|
|
|
|
2023-05-18 22:41:23 +00:00
|
|
|
</details>
|
|
|
|
|
|
|
|
### CheerioJS
|
|
|
|
|
|
|
|
:::caution
|
|
|
|
|
|
|
|
Cheerio does not support a number of fundamental properties out of the box. They
|
|
|
|
can be shimmed, but it is strongly recommended to use a more compliant library.
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
CheerioJS provides a DOM-like framework for NodeJS. Given an HTML string, a
|
|
|
|
reference to the table element works with the SheetJS DOM methods with some
|
|
|
|
prototype fixes. [`SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js) is a
|
|
|
|
complete script.
|
|
|
|
|
|
|
|
<details><summary><b>Complete Demo</b> (click to show)</summary>
|
|
|
|
|
|
|
|
:::note
|
|
|
|
|
|
|
|
This demo was last tested on 2023 May 18 against Cheerio `1.0.0-rc.12`
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
1) Install SheetJS and CheerioJS libraries:
|
|
|
|
|
|
|
|
<CodeBlock language="bash">{`\
|
|
|
|
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz cheerio@1.0.0-rc.12`}
|
|
|
|
</CodeBlock>
|
|
|
|
|
|
|
|
2) Download [the sample script `SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js):
|
|
|
|
|
|
|
|
```bash
|
|
|
|
curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js
|
|
|
|
```
|
|
|
|
|
|
|
|
3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html):
|
|
|
|
|
|
|
|
```bash
|
|
|
|
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
|
|
|
|
```
|
|
|
|
|
|
|
|
4) Run the script:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
node SheetJSCheerio.js
|
|
|
|
```
|
|
|
|
|
|
|
|
The script will create a file `SheetJSCheerio.xlsx` that can be opened.
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
## Other Platforms
|
|
|
|
|
|
|
|
### DenoDOM
|
|
|
|
|
|
|
|
DenoDOM provides a DOM framework for Deno. Given an HTML string, a reference to
|
|
|
|
the table element works with the SheetJS DOM methods after patching the object.
|
|
|
|
|
|
|
|
This example fetches [a sample table](pathname:///dom/SheetJSTable.html):
|
|
|
|
|
|
|
|
```ts title="SheetJSDenoDOM.ts"
|
|
|
|
// @deno-types="https://cdn.sheetjs.com/xlsx-0.19.3/package/types/index.d.ts"
|
|
|
|
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.19.3/package/xlsx.mjs';
|
|
|
|
|
|
|
|
import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.38/deno-dom-wasm.ts';
|
|
|
|
|
|
|
|
const doc = new DOMParser().parseFromString(
|
|
|
|
await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(),
|
|
|
|
"text/html",
|
|
|
|
)!;
|
|
|
|
// highlight-start
|
|
|
|
const tbl = doc.querySelector("table");
|
|
|
|
|
|
|
|
/* patch DenoDOM element */
|
|
|
|
tbl.rows = tbl.querySelectorAll("tr");
|
|
|
|
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))
|
|
|
|
|
|
|
|
/* generate workbook */
|
|
|
|
const workbook = XLSX.utils.table_to_book(tbl);
|
|
|
|
// highlight-end
|
|
|
|
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
|
|
|
|
```
|
|
|
|
|
|
|
|
<details open><summary><b>Complete Demo</b> (click to hide)</summary>
|
|
|
|
|
|
|
|
:::note
|
|
|
|
|
|
|
|
This demo was last tested on 2023 May 18 against DenoDOM `0.1.38`
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
1) Save the previous codeblock to `SheetJSDenoDOM.ts`.
|
|
|
|
|
|
|
|
2) Run the script with `--allow-net` and `--allow-write` entitlements:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
deno run --allow-net --allow-write SheetJSDenoDOM.ts
|
|
|
|
```
|
|
|
|
|
|
|
|
The script will create a file `SheetJSDenoDOM.xlsx` that can be opened.
|
|
|
|
|
|
|
|
</details>
|