docs.sheetjs.com/docz/docs/03-demos/03-net/09-dom.md
2023-09-11 00:44:15 -04:00

7.9 KiB

title
Synthetic DOM

import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';

SheetJS offers three methods to directly process HTML DOM TABLE elements1:

  • table_to_sheet generates a SheetJS worksheet2 from a TABLE element
  • table_to_book generates a SheetJS workbook3 from a TABLE element
  • sheet_add_dom adds data from a TABLE element to an existing worksheet

These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.

:::tip pass

The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.

:::

This demo covers synthetic DOM implementations for non-browser platforms.

Integration Details

SheetJS API methods use DOM features that may not be available.

Table rows

The rows property of TABLE elements is a list of TR row children. This list automatically updates when rows are added and deleted.

SheetJS does not mutate rows. Assuming there are no nested tables, the rows property can be created using getElementsByTagName:

tbl.rows = Array.from(tbl.getElementsByTagName("tr"));

Row cells

The cells property of TR elements is a list of TD cell children. This list automatically updates when cells are added and deleted.

SheetJS does not mutate cells. Assuming there are no nested tables, the cells property can be created using getElementsByTagName:

tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));

NodeJS

JSDOM

JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.

The following example scrapes the first table from the file SheetJSTable.html and generates a XLSX workbook:

const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string.  This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");

// highlight-start
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");

/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
// highlight-end

XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
Complete Demo (click to show)

:::note

This demo was last tested on 2023 September 10 against JSDOM 22.1.0

:::

  1. Install SheetJS and JSDOM libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz jsdom@22.0.0}

  1. Save the previous codeblock to SheetJSDOM.js.

  2. Download the sample SheetJSTable.html:

curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
  1. Run the script:
node SheetJSDOM.js

The script will create a file SheetJSDOM.xlsx that can be opened.

HappyDOM

HappyDOM provides a DOM framework for NodeJS. For the tested version (11.0.2), the following patches were needed:

  • TABLE rows property (explained above)
  • TR cells property (explained above)
Complete Demo (click to show)

:::note

This demo was last tested on 2023 September 10 against HappyDOM 11.0.2

:::

  1. Install SheetJS and HappyDOM libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz happy-dom@11.0.2}

  1. Download the sample script SheetJSHappyDOM.js:
curl -LO https://docs.sheetjs.com/dom/SheetJSHappyDOM.js
  1. Run the script:
node SheetJSHappyDOM.js

The script will create a file SheetJSHappyDOM.xlsx that can be opened.

XMLDOM

XMLDOM provides a DOM framework for NodeJS. For the tested version (0.8.10), the following patches were needed:

  • TABLE rows property (explained above)
  • TR cells property (explained above)
  • Element innerHTML property:
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
	var outerHTML = new XMLSerializer().serializeToString(this);
	if(outerHTML.match(/</g).length == 1) return "";
	return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
Complete Demo (click to show)

:::note

This demo was last tested on 2023 September 10 against XMLDOM 0.8.10

:::

  1. Install SheetJS and XMLDOM libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz @xmldom/xmldom@0.8.10}

  1. Download the sample script SheetJSXMLDOM.js:
curl -LO https://docs.sheetjs.com/dom/SheetJSXMLDOM.js
  1. Run the script:
node SheetJSXMLDOM.js

The script will create a file SheetJSXMLDOM.xlsx that can be opened.

CheerioJS

:::caution pass

Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.

:::

CheerioJS provides a DOM-like framework for NodeJS. Many features were missing. SheetJSCheerio.js implements the missing features to ensure that SheetJS DOM methods can process TABLE elements.

Complete Demo (click to show)

:::note

This demo was last tested on 2023 September 10 against Cheerio 1.0.0-rc.12

:::

  1. Install SheetJS and CheerioJS libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz cheerio@1.0.0-rc.12}

  1. Download the sample script SheetJSCheerio.js:
curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js
  1. Download the sample SheetJSTable.html:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
  1. Run the script:
node SheetJSCheerio.js

The script will create a file SheetJSCheerio.xlsx that can be opened.

Other Platforms

DenoDOM

DenoDOM provides a DOM framework for Deno. For the tested version (0.1.38), the following patches were needed:

  • TABLE rows property (explained above)
  • TR cells property (explained above)

This example fetches a sample table:

{\ // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; \n\ import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.38/deno-dom-wasm.ts'; \n\ const doc = new DOMParser().parseFromString( await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(), "text/html", )!; // highlight-start const tbl = doc.querySelector("table"); \n\ /* patch DenoDOM element */ tbl.rows = tbl.querySelectorAll("tr"); tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th")) \n\ /* generate workbook */ const workbook = XLSX.utils.table_to_book(tbl); // highlight-end XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");}

Complete Demo (click to show)

:::note

This demo was last tested on 2023 September 10 against DenoDOM 0.1.38

:::

  1. Save the previous codeblock to SheetJSDenoDOM.ts.

  2. Run the script with --allow-net and --allow-write entitlements:

deno run --allow-net --allow-write SheetJSDenoDOM.ts

The script will create a file SheetJSDenoDOM.xlsx that can be opened.