docs.sheetjs.com/docz/docs/03-demos/03-net/09-dom.md
2023-05-20 17:43:13 -04:00

6.3 KiB

title
Synthetic DOM

import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';

table_to_book / table_to_sheet / sheet_add_dom act on HTML DOM elements. Traditionally there is no DOM in server-side environments.

:::note

The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.

:::

This demo covers synthetic DOM implementations for non-browser platforms.

NodeJS

JSDOM

JSDOM is a DOM implementation for NodeJS. Given an HTML string, a reference to the table element plays nice with the SheetJS DOM methods:

const XLSX = require("xlsx");
const { JSDOM } = require("jsdom");

/* parse HTML */
const dom = new JSDOM(html_string);
/* get first TABLE element */
const tbl = dom.window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
Complete Demo (click to show)

:::note

This demo was last tested on 2023 May 18 against JSDOM 22.0.0

:::

  1. Install SheetJS and JSDOM libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz jsdom@22.0.0}

  1. Save the following script to SheetJSDOM.js:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string.  This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
  1. Download the sample SheetJSTable.html:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
  1. Run the script:
node SheetJSDOM.js

The script will create a file SheetJSDOM.xlsx that can be opened.

XMLDOM

XMLDOM provides a DOM framework for NodeJS. Given an HTML string, a reference to the table element works with the SheetJS DOM methods after patching the object.

Complete Demo (click to show)

:::note

This demo was last tested on 2023 May 18 against XMLDOM 0.8.7

:::

  1. Install SheetJS and XMLDOM libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz @xmldom/xmldom@0.8.7}

  1. Save the following codeblock to SheetJSXMLDOM.js:
const XLSX = require("xlsx");
const { DOMParser, XMLSerializer } = require("@xmldom/xmldom");

(async() => {
const text = await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text();
const doc = new DOMParser().parseFromString( text, "text/html");
const tbl = doc.getElementsByTagName("table")[0];

/* patch XMLDOM */
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")))
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
	var outerHTML = new XMLSerializer().serializeToString(this);
	if(outerHTML.match(/</g).length == 1) return "";
	return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});

const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSXMLDOM.xlsx");
})();
  1. Run the script:
node SheetJSXMLDOM.js

The script will create a file SheetJSXMLDOM.xlsx that can be opened.

CheerioJS

:::caution

Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.

:::

CheerioJS provides a DOM-like framework for NodeJS. Given an HTML string, a reference to the table element works with the SheetJS DOM methods with some prototype fixes. SheetJSCheerio.js is a complete script.

Complete Demo (click to show)

:::note

This demo was last tested on 2023 May 18 against Cheerio 1.0.0-rc.12

:::

  1. Install SheetJS and CheerioJS libraries:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz cheerio@1.0.0-rc.12}

  1. Download the sample script SheetJSCheerio.js:
curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js
  1. Download the sample SheetJSTable.html:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
  1. Run the script:
node SheetJSCheerio.js

The script will create a file SheetJSCheerio.xlsx that can be opened.

Other Platforms

DenoDOM

DenoDOM provides a DOM framework for Deno. Given an HTML string, a reference to the table element works with the SheetJS DOM methods after patching the object.

This example fetches a sample table:

// @deno-types="https://cdn.sheetjs.com/xlsx-0.19.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.19.3/package/xlsx.mjs';

import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.38/deno-dom-wasm.ts';

const doc = new DOMParser().parseFromString(
	await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(),
  "text/html",
)!;
// highlight-start
const tbl = doc.querySelector("table");

/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))

/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
// highlight-end
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
Complete Demo (click to hide)

:::note

This demo was last tested on 2023 May 18 against DenoDOM 0.1.38

:::

  1. Save the previous codeblock to SheetJSDenoDOM.ts.

  2. Run the script with --allow-net and --allow-write entitlements:

deno run --allow-net --allow-write SheetJSDenoDOM.ts

The script will create a file SheetJSDenoDOM.xlsx that can be opened.