synthetic-dom
This commit is contained in:
parent
23cb01e318
commit
a36dee9eeb
186
docz/docs/03-demos/03-net/09-dom.md
Normal file
186
docz/docs/03-demos/03-net/09-dom.md
Normal file
@ -0,0 +1,186 @@
|
||||
---
|
||||
title: Synthetic DOM
|
||||
---
|
||||
|
||||
import current from '/version.js';
|
||||
import CodeBlock from '@theme/CodeBlock';
|
||||
|
||||
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
||||
Traditionally there is no DOM in server-side environments.
|
||||
|
||||
:::note
|
||||
|
||||
The most robust approach for server-side processing is to automate a headless
|
||||
web browser. ["Browser Automation"](/docs/demos/net/headless) includes demos.
|
||||
|
||||
:::
|
||||
|
||||
This demo covers synthetic DOM implementations for non-browser platforms.
|
||||
|
||||
## NodeJS
|
||||
|
||||
### JSDOM
|
||||
|
||||
JSDOM is a DOM implementation for NodeJS. Given an HTML string, a reference to
|
||||
the table element plays nice with the SheetJS DOM methods:
|
||||
|
||||
```js
|
||||
const XLSX = require("xlsx");
|
||||
const { JSDOM } = require("jsdom");
|
||||
|
||||
/* parse HTML */
|
||||
const dom = new JSDOM(html_string);
|
||||
/* get first TABLE element */
|
||||
const tbl = dom.window.document.querySelector("table");
|
||||
/* generate workbook */
|
||||
const workbook = XLSX.utils.table_to_book(tbl);
|
||||
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
|
||||
```
|
||||
|
||||
<details open><summary><b>Complete Demo</b> (click to hide)</summary>
|
||||
|
||||
:::note
|
||||
|
||||
This demo was last tested on 2023 May 18 against JSDOM `22.0.0`
|
||||
|
||||
:::
|
||||
|
||||
1) Install SheetJS and JSDOM libraries:
|
||||
|
||||
<CodeBlock language="bash">{`\
|
||||
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz jsdom@22.0.0`}
|
||||
</CodeBlock>
|
||||
|
||||
2) Save the following script to `SheetJSDOM.js`:
|
||||
|
||||
```js title="SheetJSDOM.js"
|
||||
const XLSX = require("xlsx");
|
||||
const { readFileSync } = require("fs");
|
||||
const { JSDOM } = require("jsdom");
|
||||
|
||||
/* obtain HTML string. This example reads from SheetJSTable.html */
|
||||
const html_str = readFileSync("SheetJSTable.html", "utf8");
|
||||
/* get first TABLE element */
|
||||
const doc = new JSDOM(html_str).window.document.querySelector("table");
|
||||
/* generate workbook */
|
||||
const workbook = XLSX.utils.table_to_book(doc);
|
||||
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
|
||||
```
|
||||
|
||||
3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html):
|
||||
|
||||
```bash
|
||||
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
|
||||
```
|
||||
|
||||
4) Run the script:
|
||||
|
||||
```bash
|
||||
node SheetJSDOM.js
|
||||
```
|
||||
|
||||
The script will create a file `SheetJSDOM.xlsx` that can be opened.
|
||||
|
||||
</details>
|
||||
|
||||
### CheerioJS
|
||||
|
||||
:::caution
|
||||
|
||||
Cheerio does not support a number of fundamental properties out of the box. They
|
||||
can be shimmed, but it is strongly recommended to use a more compliant library.
|
||||
|
||||
:::
|
||||
|
||||
CheerioJS provides a DOM-like framework for NodeJS. Given an HTML string, a
|
||||
reference to the table element works with the SheetJS DOM methods with some
|
||||
prototype fixes. [`SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js) is a
|
||||
complete script.
|
||||
|
||||
<details><summary><b>Complete Demo</b> (click to show)</summary>
|
||||
|
||||
:::note
|
||||
|
||||
This demo was last tested on 2023 May 18 against Cheerio `1.0.0-rc.12`
|
||||
|
||||
:::
|
||||
|
||||
1) Install SheetJS and CheerioJS libraries:
|
||||
|
||||
<CodeBlock language="bash">{`\
|
||||
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz cheerio@1.0.0-rc.12`}
|
||||
</CodeBlock>
|
||||
|
||||
2) Download [the sample script `SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js):
|
||||
|
||||
```bash
|
||||
curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js
|
||||
```
|
||||
|
||||
3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html):
|
||||
|
||||
```bash
|
||||
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
|
||||
```
|
||||
|
||||
4) Run the script:
|
||||
|
||||
```bash
|
||||
node SheetJSCheerio.js
|
||||
```
|
||||
|
||||
The script will create a file `SheetJSCheerio.xlsx` that can be opened.
|
||||
|
||||
</details>
|
||||
|
||||
## Other Platforms
|
||||
|
||||
### DenoDOM
|
||||
|
||||
DenoDOM provides a DOM framework for Deno. Given an HTML string, a reference to
|
||||
the table element works with the SheetJS DOM methods after patching the object.
|
||||
|
||||
This example fetches [a sample table](pathname:///dom/SheetJSTable.html):
|
||||
|
||||
```ts title="SheetJSDenoDOM.ts"
|
||||
// @deno-types="https://cdn.sheetjs.com/xlsx-0.19.3/package/types/index.d.ts"
|
||||
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.19.3/package/xlsx.mjs';
|
||||
|
||||
import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.38/deno-dom-wasm.ts';
|
||||
|
||||
const doc = new DOMParser().parseFromString(
|
||||
await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(),
|
||||
"text/html",
|
||||
)!;
|
||||
// highlight-start
|
||||
const tbl = doc.querySelector("table");
|
||||
|
||||
/* patch DenoDOM element */
|
||||
tbl.rows = tbl.querySelectorAll("tr");
|
||||
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))
|
||||
|
||||
/* generate workbook */
|
||||
const workbook = XLSX.utils.table_to_book(tbl);
|
||||
// highlight-end
|
||||
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
|
||||
```
|
||||
|
||||
<details open><summary><b>Complete Demo</b> (click to hide)</summary>
|
||||
|
||||
:::note
|
||||
|
||||
This demo was last tested on 2023 May 18 against DenoDOM `0.1.38`
|
||||
|
||||
:::
|
||||
|
||||
1) Save the previous codeblock to `SheetJSDenoDOM.ts`.
|
||||
|
||||
2) Run the script with `--allow-net` and `--allow-write` entitlements:
|
||||
|
||||
```bash
|
||||
deno run --allow-net --allow-write SheetJSDenoDOM.ts
|
||||
```
|
||||
|
||||
The script will create a file `SheetJSDenoDOM.xlsx` that can be opened.
|
||||
|
||||
</details>
|
@ -702,8 +702,8 @@ var worksheet = XLSX.utils.aoa_to_sheet([
|
||||
]);
|
||||
```
|
||||
|
||||
["Array of Arrays Input"](/docs/api/utilities#array-of-arrays-input) describes the function and the
|
||||
optional `opts` argument in more detail.
|
||||
["Array of Arrays Input"](/docs/api/utilities#array-of-arrays-input) describes
|
||||
the function and the optional `opts` argument in more detail.
|
||||
|
||||
|
||||
_Create a worksheet from an array of JS objects_
|
||||
@ -752,7 +752,8 @@ var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
|
||||
|
||||
The `table_to_sheet` utility function takes a DOM TABLE element and iterates
|
||||
through the rows to generate a worksheet. The `opts` argument is optional.
|
||||
["HTML Table Input"](/docs/api/utilities#html-table-input) describes the function in more detail.
|
||||
["HTML Table Input"](/docs/api/utilities/html#html-table-input) describes the
|
||||
function in more detail.
|
||||
|
||||
|
||||
|
||||
@ -860,19 +861,7 @@ chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
|
||||
<summary><b>NodeJS HTML Tables without a browser</b> (click to show)</summary>
|
||||
|
||||
NodeJS does not include a DOM implementation and Puppeteer requires a hefty
|
||||
Chromium build. **`jsdom`** is a lightweight alternative:
|
||||
|
||||
```js
|
||||
const XLSX = require("xlsx");
|
||||
const { readFileSync } = require("fs");
|
||||
const { JSDOM } = require("jsdom");
|
||||
|
||||
/* obtain HTML string. This example reads from test.html */
|
||||
const html_str = fs.readFileSync("test.html", "utf8");
|
||||
/* get first TABLE element */
|
||||
const doc = new JSDOM(html_str).window.document.querySelector("table");
|
||||
/* generate workbook */
|
||||
const workbook = XLSX.utils.table_to_book(doc);
|
||||
```
|
||||
Chromium build. The ["Synthetic DOM"](/docs/demos/net/dom) demo includes
|
||||
examples for NodeJS.
|
||||
|
||||
</details>
|
||||
|
@ -23,41 +23,60 @@ formats, the library will guess the number format.
|
||||
| WK\* | | Binary encoding |
|
||||
| WQ\* / WB\* / QPW | | Binary encoding |
|
||||
| DBF | | Implied by field types |
|
||||
| HTML | * | Special override |
|
||||
| CSV | * | N/A |
|
||||
| PRN | * | N/A |
|
||||
| DIF | * | N/A |
|
||||
| RTF | * | N/A |
|
||||
|
||||
Asterisks (*) mark formats that mix content and presentation. Synthetic number
|
||||
formats may be generated for special values.
|
||||
Asterisks (*) mark formats that mix content and presentation. Writers will use
|
||||
formatted values if cell objects include formatted text or number formats.
|
||||
Parsers may guess number formats for special values.
|
||||
|
||||
The letter R (R) marks features parsed but not written in the format.
|
||||
|
||||
</details>
|
||||
|
||||
The following example generates a file with some common number formats:
|
||||
This example generates a worksheet with common number formats. `sheet_to_html`
|
||||
uses the number formats in generating the HTML table. The "Export" button
|
||||
generates workbooks with number formatting.
|
||||
|
||||
```jsx live
|
||||
function SheetJSSimpleNF(props) {
|
||||
const xport = React.useCallback(async () => {
|
||||
const [ws, setWS] = React.useState();
|
||||
const fmt = React.useRef(null);
|
||||
|
||||
/* when the page is loaded, create worksheet and show table */
|
||||
React.useEffect(() => {
|
||||
/* Create worksheet from simple data */
|
||||
const ws = XLSX.utils.aoa_to_sheet([
|
||||
["Currency", 3.5],
|
||||
["Thousands", 7262],
|
||||
["Percent", 0.0219]
|
||||
["General", 54337 ],
|
||||
["Currency", 3.5 ],
|
||||
["Thousands", 7262 ],
|
||||
["Percent", 0.0219 ],
|
||||
]);
|
||||
/* assign number formats */
|
||||
ws["B1"].z = '"$"#,##0_);\\("$"#,##0\\)';
|
||||
ws["B2"].z = '#,##0';
|
||||
ws["B3"].z = "0.00%";
|
||||
|
||||
/* assign number formats */
|
||||
ws["B2"].z = '"$"#,##0.00_);\\("$"#,##0.00\\)';
|
||||
ws["B3"].z = '#,##0';
|
||||
ws["B4"].z = "0.00%";
|
||||
|
||||
setWS(ws);
|
||||
}, []);
|
||||
|
||||
const xport = (fmt) => {
|
||||
/* Export to file (start a download) */
|
||||
const wb = XLSX.utils.book_new();
|
||||
XLSX.utils.book_append_sheet(wb, ws, "Formats");
|
||||
XLSX.writeFile(wb, "SheetJSSimpleNF.xlsx");
|
||||
});
|
||||
XLSX.writeFile(wb, `SheetJSSimpleNF.${fmt}`);
|
||||
};
|
||||
|
||||
return ( <button onClick={xport}><b>Export XLSX!</b></button> );
|
||||
const fmts = ["xlsx", "xls", "csv", "xlsb", "html", "ods"];
|
||||
return ( <>
|
||||
<select ref={fmt}>{fmts.map(fmt => (<option value={fmt}>{fmt}</option>))}</select>
|
||||
<button onClick={()=>xport(fmt.current.value)}><b>Export!</b></button>
|
||||
<div dangerouslySetInnerHTML={{__html: ws && XLSX.utils.sheet_to_html(ws) || "" }}/>
|
||||
</> );
|
||||
}
|
||||
```
|
||||
|
||||
@ -70,9 +89,8 @@ To simplify editing, the applications will store the underlying values and the
|
||||
number formats separately. For example, `$3.50` will be represented as the value
|
||||
`3.5` with a number format that mandates a `$` sigil and 2 decimal places.
|
||||
|
||||
Some file formats like CSV only support the formatted text. Native formats for
|
||||
spreadsheet applications including Lotus 1-2-3 and Excel will store the value
|
||||
and number format separately.
|
||||
CSV and other formats only support the formatted text. Applications reading CSV
|
||||
files are expected to interpret the values as numbers or dates.
|
||||
|
||||
### Dates and Times
|
||||
|
||||
@ -115,18 +133,17 @@ function SheetJSExtractNF(props) {
|
||||
return ( <>
|
||||
<input type="file" onChange={async(e) => {
|
||||
/* parse workbook with cellNF: true */
|
||||
const file = e.target.files[0];
|
||||
const data = await file.arrayBuffer();
|
||||
const wb = XLSX.read(data, {cellNF: true});
|
||||
const wb = XLSX.read(await e.target.files[0].arrayBuffer(), {cellNF: true});
|
||||
|
||||
/* look at each cell in each worksheet */
|
||||
const formats = {};
|
||||
wb.SheetNames.forEach(n => {
|
||||
var ws = wb.Sheets[n]; if(!ws || !ws["!ref"]) return;
|
||||
var ref = XLSX.utils.decode_range(ws["!ref"]);
|
||||
for(var R = 0; R <= ref.e.r; ++R) for(var C = 0; C <= ref.e.c; ++C) {
|
||||
var addr = XLSX.utils.encode_cell({r:R,c:C});
|
||||
if(!ws[addr] || !ws[addr].z) continue;
|
||||
if(formats[ws[addr].z]) continue;
|
||||
if(!ws[addr] || !ws[addr].z || formats[ws[addr].z]) continue;
|
||||
/* when a new format is found, save the address */
|
||||
formats[ws[addr].z] = `'${n}'!${addr}`;
|
||||
setRows(Object.entries(formats));
|
||||
}
|
||||
@ -216,3 +233,6 @@ set of formats as "Accounting". The exact formats in `en-US` are listed below:
|
||||
For other locales, the formats can be discovered by creating a file with the
|
||||
desired format and testing with [the Number Format Strings demo](#number-format-strings)
|
||||
|
||||
### HTML Override
|
||||
|
||||
[**This feature is discussed in the HTML utilities section**](/docs/api/utilities/html#value-override)
|
388
docz/docs/08-api/07-utilities/07-html.md
Normal file
388
docz/docs/08-api/07-utilities/07-html.md
Normal file
@ -0,0 +1,388 @@
|
||||
---
|
||||
sidebar_position: 7
|
||||
title: HTML
|
||||
---
|
||||
|
||||
HTML is a common format for presenting data in the web. While the general read
|
||||
functions (`XLSX.read` and `XLSX.readFile`) can parse HTML strings and the write
|
||||
functions (`XLSX.write` and `XLSX.writeFile`) can generate HTML strings, the
|
||||
utility functions in this section can use DOM features.
|
||||
|
||||
:::note
|
||||
|
||||
SheetJS CE primarily focuses on data and number formatting.
|
||||
|
||||
[SheetJS Pro](https://sheetjs.com/pro) supports CSS text and cell styles in the
|
||||
HTML format and HTML table utilities.
|
||||
|
||||
:::
|
||||
|
||||
## HTML Table Input
|
||||
|
||||
### Create New Sheet
|
||||
|
||||
**Create a worksheet or workbook from a TABLE element**
|
||||
|
||||
```js
|
||||
var ws = XLSX.utils.table_to_sheet(elt, opts);
|
||||
var wb = XLSX.utils.table_to_book(elt, opts);
|
||||
```
|
||||
|
||||
`XLSX.utils.table_to_sheet` takes a table DOM element and returns a worksheet
|
||||
resembling the input table. Numbers are parsed. All other data will be stored
|
||||
as strings.
|
||||
|
||||
`XLSX.utils.table_to_book` produces a minimal workbook based on the worksheet.
|
||||
|
||||
Both functions accept options arguments:
|
||||
|
||||
| Option Name | Default | Description |
|
||||
| :---------- | :------: | :-------------------------------------------------- |
|
||||
|`raw` | | If true, every cell will hold raw strings |
|
||||
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||||
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||||
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||||
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||||
|
||||
Exporting a table to a spreadsheet file in the web browser involves 3 steps:
|
||||
"find the table", "generate a workbook object", and "export to file".
|
||||
|
||||
For example, if the HTML table has `id` attribute set to `sheetjs`:
|
||||
|
||||
```html
|
||||
<table id="sheetjs">
|
||||
<tr><th>Name</th><th>Index</th></tr>
|
||||
<tr><td>Barack Obama</td><td>44</td></tr>
|
||||
<tr><td>Donald Trump</td><td>45</td></tr>
|
||||
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||||
</table>
|
||||
```
|
||||
|
||||
`document.getElementById("sheetjs")` is a live reference to the table.
|
||||
|
||||
```js
|
||||
/* find the table element in the page */
|
||||
var tbl = document.getElementById('sheetjs');
|
||||
/* create a workbook */
|
||||
var wb = XLSX.utils.table_to_book(tbl);
|
||||
/* export to file */
|
||||
XLSX.writeFile(wb, "SheetJSTable.xlsx");
|
||||
```
|
||||
|
||||
<details open><summary><b>Demo</b> (click to hide)</summary>
|
||||
|
||||
This HTML table has id set to `sheetjs`:
|
||||
|
||||
<table id="sheetjs">
|
||||
<tr><th>Name</th><th>Index</th></tr>
|
||||
<tr><td>Barack Obama</td><td>44</td></tr>
|
||||
<tr><td>Donald Trump</td><td>45</td></tr>
|
||||
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||||
</table>
|
||||
|
||||
```jsx live
|
||||
function SheetJSExportTable() { return ( <button onClick={() => {
|
||||
/* find the table element in the page */
|
||||
var tbl = document.getElementById('sheetjs');
|
||||
/* create a workbook */
|
||||
var wb = XLSX.utils.table_to_book(tbl);
|
||||
/* export to file */
|
||||
XLSX.writeFile(wb, "SheetJSTable.xlsx");
|
||||
}}><b>Export XLSX!</b></button> ); }
|
||||
```
|
||||
</details>
|
||||
|
||||
### Add to Sheet
|
||||
|
||||
**Add data from a TABLE element to an existing worksheet**
|
||||
|
||||
```js
|
||||
XLSX.utils.sheet_add_dom(ws, elt, opts);
|
||||
```
|
||||
|
||||
`XLSX.utils.sheet_add_dom` takes a table DOM element and updates an existing
|
||||
worksheet object. It follows the same process as `table_to_sheet` and accepts
|
||||
an options argument:
|
||||
|
||||
| Option Name | Default | Description |
|
||||
| :---------- | :------: | :-------------------------------------------------- |
|
||||
|`raw` | | If true, every cell will hold raw strings |
|
||||
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||||
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||||
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||||
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||||
|
||||
`origin` is expected to be one of:
|
||||
|
||||
| `origin` | Description |
|
||||
| :--------------- | :-------------------------------------------------------- |
|
||||
| (cell object) | Use specified cell (cell object) |
|
||||
| (string) | Use specified cell (A1-Style cell) |
|
||||
| (number >= 0) | Start from the first column at specified row (0-indexed) |
|
||||
| -1 | Append to bottom of worksheet starting on first column |
|
||||
| (default) | Start from cell `A1` |
|
||||
|
||||
|
||||
A common use case for `sheet_add_dom` involves adding multiple tables to a
|
||||
single worksheet, usually with a few blank rows in between each table:
|
||||
|
||||
```js
|
||||
/* get "table1" and create worksheet */
|
||||
const table1 = document.getElementById('table1');
|
||||
const ws = XLSX.utils.table_to_sheet(table1);
|
||||
|
||||
/* get "table2" and append to the worksheet */
|
||||
const table2 = document.getElementById('table2');
|
||||
// highlight-next-line
|
||||
XLSX.utils.sheet_add_dom(ws, table2, {origin: -1});
|
||||
```
|
||||
|
||||
<details><summary><b>Multi-table Export Example</b> (click to show)</summary>
|
||||
|
||||
This demo creates a worksheet that should look like the screenshot below:
|
||||
|
||||
![Multi-Table Export in Excel](pathname:///files/multitable.png)
|
||||
|
||||
The `create_gap_rows` helper function expands the worksheet range, adding blank
|
||||
rows between the data tables.
|
||||
|
||||
```jsx live
|
||||
function MultiTable() {
|
||||
const headers = ["Table 1", "Table2", "Table 3"];
|
||||
|
||||
/* Callback invoked when the button is clicked */
|
||||
const xport = React.useCallback(async () => {
|
||||
/* This function creates gap rows */
|
||||
function create_gap_rows(ws, nrows) {
|
||||
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
|
||||
ref.e.r += nrows; // add to ending row
|
||||
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
|
||||
}
|
||||
|
||||
/* first table */
|
||||
const ws = XLSX.utils.aoa_to_sheet([[headers[0]]]);
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table1'), {origin: -1});
|
||||
create_gap_rows(ws, 1); // one row gap after first table
|
||||
|
||||
/* second table */
|
||||
XLSX.utils.sheet_add_aoa(ws, [[headers[1]]], {origin: -1});
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
|
||||
create_gap_rows(ws, 2); // two rows gap after second table
|
||||
|
||||
/* third table */
|
||||
XLSX.utils.sheet_add_aoa(ws, [[headers[2]]], {origin: -1});
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
|
||||
|
||||
/* create workbook and export */
|
||||
const wb = XLSX.utils.book_new();
|
||||
XLSX.utils.book_append_sheet(wb, ws, "Export");
|
||||
XLSX.writeFile(wb, "SheetJSMultiTablexport.xlsx");
|
||||
});
|
||||
|
||||
return ( <>
|
||||
<button onClick={xport}><b>Export XLSX!</b></button><br/><br/>
|
||||
<b>{headers[0]}</b><br/>
|
||||
<table id="table1">
|
||||
<tr><td>A2</td><td>B2</td></tr>
|
||||
<tr><td>A3</td><td>B3</td></tr>
|
||||
</table>
|
||||
<b>{headers[1]}</b><br/>
|
||||
<table id="table2">
|
||||
<tr><td>A6</td><td>B6</td><td>C6</td></tr>
|
||||
<tr><td>A7</td><td>B7</td><td>C7</td></tr>
|
||||
</table>
|
||||
<br/>
|
||||
<b>{headers[2]}</b><br/>
|
||||
<table id="table3">
|
||||
<tr><td>A11</td><td>B11</td></tr>
|
||||
<tr><td>A12</td><td>B12</td></tr>
|
||||
</table>
|
||||
</> );
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### HTML Strings
|
||||
|
||||
**Create a worksheet or workbook from HTML string**
|
||||
|
||||
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
||||
Starting from an HTML string, there are two parsing approaches:
|
||||
|
||||
A) Table Phantasm: create a DIV whose `innerHTML` is set to the HTML string,
|
||||
generate worksheet using the DOM element, then remove the DIV:
|
||||
|
||||
```js
|
||||
/* create element from the source */
|
||||
var elt = document.createElement("div");
|
||||
elt.innerHTML = html_source;
|
||||
document.body.appendChild(elt);
|
||||
|
||||
/* generate worksheet */
|
||||
var ws = XLSX.utils.table_to_sheet(elt.getElementsByTagName("TABLE")[0]);
|
||||
|
||||
/* remove element */
|
||||
document.body.removeChild(elt);
|
||||
```
|
||||
|
||||
<details><summary><b>Phantasm Demo</b> (click to show)</summary>
|
||||
|
||||
The `html` variable in the demo is an editable HTML string
|
||||
|
||||
```jsx live
|
||||
function SheetJSTablePhantasm() {
|
||||
/* HTML stored as a string */
|
||||
const html = `\
|
||||
<table>
|
||||
<tr><th>Name</th><th>Index</th></tr>
|
||||
<tr><td>Barack Obama</td><td>44</td></tr>
|
||||
<tr><td>Donald Trump</td><td>45</td></tr>
|
||||
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||||
</table>
|
||||
`;
|
||||
return ( <>
|
||||
<button onClick={() => {
|
||||
/* create element from the source */
|
||||
var elt = document.createElement("div");
|
||||
elt.innerHTML = html;
|
||||
document.body.appendChild(elt);
|
||||
|
||||
/* generate workbook */
|
||||
var tbl = elt.getElementsByTagName("TABLE")[0];
|
||||
var wb = XLSX.utils.table_to_book(tbl);
|
||||
|
||||
/* remove element */
|
||||
document.body.removeChild(elt);
|
||||
|
||||
/* generate file */
|
||||
XLSX.writeFile(wb, "SheetJSTablePhantasm.xlsx");
|
||||
}}><b>Export XLSX!</b></button>
|
||||
<pre><b>HTML:</b><br/>{html}</pre>
|
||||
</>);
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
B) Raw HTML: use `XLSX.read` to read the text in the same manner as CSV.
|
||||
|
||||
```js
|
||||
var wb = XLSX.read(html_source, { type: "string" });
|
||||
var ws = wb.Sheets[wb.SheetNames[0]];
|
||||
```
|
||||
|
||||
<details><summary><b>Raw HTML Demo</b> (click to show)</summary>
|
||||
|
||||
The `html` variable in the demo is an editable HTML string
|
||||
|
||||
```jsx live
|
||||
function SheetJSRawHTMLToXLSX() {
|
||||
/* HTML stored as a string */
|
||||
const html = `\
|
||||
<table>
|
||||
<tr><th>Name</th><th>Index</th></tr>
|
||||
<tr><td>Barack Obama</td><td>44</td></tr>
|
||||
<tr><td>Donald Trump</td><td>45</td></tr>
|
||||
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||||
</table>
|
||||
`;
|
||||
return ( <>
|
||||
<button onClick={() => {
|
||||
/* read HTML string */
|
||||
var wb = XLSX.read(html, {type: "string"});
|
||||
|
||||
/* generate file */
|
||||
XLSX.writeFile(wb, "SheetJSRawHTML.xlsx");
|
||||
}}><b>Export XLSX!</b></button>
|
||||
<pre><b>HTML:</b><br/>{html}</pre>
|
||||
</>);
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Value Override
|
||||
|
||||
When the `raw: true` option is specified, the parser will generate text cells.
|
||||
When the option is not specified or when it is set to false, the parser will
|
||||
try to interpret the text of each TD element.
|
||||
|
||||
To override the conversion for a specific cell, the following data attributes
|
||||
can be added to the individual TD elements:
|
||||
|
||||
| Attribute | Description |
|
||||
|:----------|:-------------------------------------------------------------|
|
||||
| `data-t` | Override [Cell Type](/docs/csf/cell#data-types) |
|
||||
| `data-v` | Override Cell Value |
|
||||
| `data-z` | Override [Number Format](/docs/csf/features/nf) |
|
||||
|
||||
For example:
|
||||
|
||||
```html
|
||||
<!-- Parser interprets value as `new Date("2012-12-03")` default date format -->
|
||||
<td>2012-12-03</td>
|
||||
|
||||
<!-- String cell "2012-12-03" -->
|
||||
<td data-t="s">2012-12-03</td>
|
||||
|
||||
<!-- Numeric cell with the correct date code and General format -->
|
||||
<td data-t="n" data-v="41246">2012-12-03</td>
|
||||
|
||||
<!-- Traditional Excel Date 2012-12-03 with style yyyy-mm-dd -->
|
||||
<td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td>
|
||||
```
|
||||
|
||||
<details open><summary><b>HTML Value Examples</b> (click to hide)</summary>
|
||||
|
||||
```jsx live
|
||||
function SheetJSHTMLValueOverride() {
|
||||
/* HTML stored as a string */
|
||||
const html = `\
|
||||
<table>
|
||||
<tr><th>Cell</th><th>data-t</th><th>data-v</th><th>data-z</th></tr>
|
||||
<tr><td>2012-12-03</td><td/><td/><td/></tr>
|
||||
<tr><td data-t="s">2012-12-03</td><td>s</td><td/><td/></tr>
|
||||
<tr><td data-t="n" data-v="41246">2012-12-03</td><td>n</td><td>41246</td><td/></tr>
|
||||
<tr><td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td><td>n</td><td>41246</td><td>yyyy-mm-dd</td></tr>
|
||||
</table>
|
||||
`;
|
||||
return ( <>
|
||||
<button onClick={() => {
|
||||
/* create element from the source */
|
||||
var elt = document.createElement("div");
|
||||
elt.innerHTML = html;
|
||||
document.body.appendChild(elt);
|
||||
|
||||
/* generate workbook */
|
||||
var tbl = elt.getElementsByTagName("TABLE")[0];
|
||||
var wb = XLSX.utils.table_to_book(tbl);
|
||||
|
||||
/* remove element */
|
||||
document.body.removeChild(elt);
|
||||
|
||||
/* generate file */
|
||||
XLSX.writeFile(wb, "SheetJSHTMLValueOverride.xlsx");
|
||||
}}><b>Export XLSX!</b></button>
|
||||
<pre><b>HTML String:</b><br/>{html}<br/><b>TABLE:</b></pre>
|
||||
<div dangerouslySetInnerHTML={{__html: html}}/>
|
||||
</>);
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Synthetic DOM
|
||||
|
||||
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
||||
Traditionally there is no DOM in server-side environments including NodeJS.
|
||||
|
||||
:::note
|
||||
|
||||
The simplest approach for server-side processing is to automate a headless web
|
||||
browser. ["Browser Automation"](/docs/demos/net/headless) covers some browsers.
|
||||
|
||||
:::
|
||||
|
||||
Some ecosystems provide DOM-like frameworks that are compatible with SheetJS.
|
||||
Examples are included in the ["Synthetic DOM"](/docs/demos/net/dom) demo
|
@ -276,187 +276,11 @@ function SheetJSHeaderOrder() {
|
||||
|
||||
## HTML Table Input
|
||||
|
||||
**Create a worksheet or workbook from a TABLE element**
|
||||
|
||||
```js
|
||||
var ws = XLSX.utils.table_to_sheet(elt, opts);
|
||||
var wb = XLSX.utils.table_to_book(elt, opts);
|
||||
```
|
||||
|
||||
`XLSX.utils.table_to_sheet` takes a table DOM element and returns a worksheet
|
||||
resembling the input table. Numbers are parsed. All other data will be stored
|
||||
as strings.
|
||||
|
||||
`XLSX.utils.table_to_book` produces a minimal workbook based on the worksheet.
|
||||
|
||||
Both functions accept options arguments:
|
||||
|
||||
| Option Name | Default | Description |
|
||||
| :---------- | :------: | :-------------------------------------------------- |
|
||||
|`raw` | | If true, every cell will hold raw strings |
|
||||
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||||
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||||
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||||
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||||
|
||||
|
||||
To generate the example sheet, assuming the table has ID `sheetjs`:
|
||||
|
||||
```js
|
||||
var tbl = document.getElementById('sheetjs');
|
||||
var ws = XLSX.utils.table_to_sheet(tbl);
|
||||
```
|
||||
|
||||
:::note
|
||||
|
||||
`table_to_book` and `table_to_sheet` act on HTML DOM elements. Starting from
|
||||
an HTML string, there are two parsing approaches:
|
||||
|
||||
A) Table Phantasm: create a DIV with the desired HTML.
|
||||
|
||||
```js
|
||||
/* create element from the source */
|
||||
var elt = document.createElement("div");
|
||||
elt.innerHTML = html_source;
|
||||
document.body.appendChild(elt);
|
||||
|
||||
/* generate worksheet */
|
||||
var ws = XLSX.utils.table_to_sheet(elt.getElementsByTagName("TABLE")[0]);
|
||||
|
||||
/* remove element */
|
||||
document.body.removeChild(elt);
|
||||
```
|
||||
|
||||
B) Raw HTML: use `XLSX.read` to read the text in the same manner as CSV.
|
||||
|
||||
```js
|
||||
var wb = XLSX.read(html_source, { type: "string" });
|
||||
var ws = wb.Sheets[wb.SheetNames[0]];
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
**Add data from a TABLE element to an existing worksheet**
|
||||
|
||||
```js
|
||||
XLSX.utils.sheet_add_dom(ws, elt, opts);
|
||||
```
|
||||
|
||||
`XLSX.utils.sheet_add_dom` takes a table DOM element and updates an existing
|
||||
worksheet object. It follows the same process as `table_to_sheet` and accepts
|
||||
an options argument:
|
||||
|
||||
| Option Name | Default | Description |
|
||||
| :---------- | :------: | :-------------------------------------------------- |
|
||||
|`raw` | | If true, every cell will hold raw strings |
|
||||
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||||
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||||
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||||
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||||
|
||||
`origin` is expected to be one of:
|
||||
|
||||
| `origin` | Description |
|
||||
| :--------------- | :-------------------------------------------------------- |
|
||||
| (cell object) | Use specified cell (cell object) |
|
||||
| (string) | Use specified cell (A1-Style cell) |
|
||||
| (number >= 0) | Start from the first column at specified row (0-indexed) |
|
||||
| -1 | Append to bottom of worksheet starting on first column |
|
||||
| (default) | Start from cell `A1` |
|
||||
|
||||
|
||||
A common use case for `sheet_add_dom` involves adding multiple tables to a
|
||||
single worksheet, usually with a few blank rows in between each table:
|
||||
|
||||
![Multi-Table Export in Excel](pathname:///files/multitable.png)
|
||||
|
||||
```jsx live
|
||||
function MultiTable() {
|
||||
const headers = ["Table 1", "Table2", "Table 3"];
|
||||
|
||||
/* Callback invoked when the button is clicked */
|
||||
const xport = React.useCallback(async () => {
|
||||
/* This function creates gap rows */
|
||||
function create_gap_rows(ws, nrows) {
|
||||
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
|
||||
ref.e.r += nrows; // add to ending row
|
||||
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
|
||||
}
|
||||
|
||||
/* first table */
|
||||
const ws = XLSX.utils.aoa_to_sheet([[headers[0]]]);
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table1'), {origin: -1});
|
||||
create_gap_rows(ws, 1); // one row gap after first table
|
||||
|
||||
/* second table */
|
||||
XLSX.utils.sheet_add_aoa(ws, [[headers[1]]], {origin: -1});
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
|
||||
create_gap_rows(ws, 2); // two rows gap after second table
|
||||
|
||||
/* third table */
|
||||
XLSX.utils.sheet_add_aoa(ws, [[headers[2]]], {origin: -1});
|
||||
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
|
||||
|
||||
/* create workbook and export */
|
||||
const wb = XLSX.utils.book_new();
|
||||
XLSX.utils.book_append_sheet(wb, ws, "Export");
|
||||
XLSX.writeFile(wb, "SheetJSMultiTablexport.xlsx");
|
||||
});
|
||||
|
||||
return (
|
||||
<>
|
||||
<button onClick={xport}><b>Export XLSX!</b></button><br/><br/>
|
||||
<b>{headers[0]}</b><br/>
|
||||
<table id="table1">
|
||||
<tr><td>A2</td><td>B2</td></tr>
|
||||
<tr><td>A3</td><td>B3</td></tr>
|
||||
</table>
|
||||
<b>{headers[1]}</b><br/>
|
||||
<table id="table2">
|
||||
<tr><td>A6</td><td>B6</td><td>C6</td></tr>
|
||||
<tr><td>A7</td><td>B7</td><td>C7</td></tr>
|
||||
</table>
|
||||
<br/>
|
||||
<b>{headers[2]}</b><br/>
|
||||
<table id="table3">
|
||||
<tr><td>A11</td><td>B11</td></tr>
|
||||
<tr><td>A12</td><td>B12</td></tr>
|
||||
</table>
|
||||
</>
|
||||
);
|
||||
}
|
||||
```
|
||||
[**This has been moved to a separate page**](/docs/api/utilities/html#html-table-input)
|
||||
|
||||
### Value Override
|
||||
|
||||
When the `raw: true` option is specified, the parser will generate text cells.
|
||||
When the option is not specified or when it is set to false, the parser will
|
||||
try to interpret the text of each TD element.
|
||||
|
||||
To override the conversion for a specific cell, the following data attributes
|
||||
can be added to the individual TD elements:
|
||||
|
||||
| Attribute | Description |
|
||||
|:----------|:-------------------------------------------------------------|
|
||||
| `data-t` | Override [Cell Type](/docs/csf/cell#data-types) |
|
||||
| `data-v` | Override Cell Value |
|
||||
| `data-z` | Override [Number Format](/docs/csf/features/nf) |
|
||||
|
||||
For example:
|
||||
|
||||
```html
|
||||
<!-- Parser interprets value as `new Date("2012-12-03")` default date format -->
|
||||
<td>2012-12-03</td>
|
||||
|
||||
<!-- String cell "2012-12-03" -->
|
||||
<td data-t="s">2012-12-03</td>
|
||||
|
||||
<!-- Numeric cell with the correct date code and General format -->
|
||||
<td data-t="n" data-v="41246">2012-12-03</td>
|
||||
|
||||
<!-- Traditional Excel Date 2012-12-03 with style yyyy-mm-dd -->
|
||||
<td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td>
|
||||
```
|
||||
[**This has been moved to a separate page**](/docs/api/utilities/html#value-override)
|
||||
|
||||
## Delimiter-Separated Output
|
||||
|
||||
|
@ -51,6 +51,14 @@ The following are described in [`A1` Utilities](/docs/csf/general#utilities)
|
||||
- `encode_cell / decode_cell` converts cell addresses.
|
||||
- `encode_range / decode_range` converts cell ranges.
|
||||
|
||||
The following are described in ["HTML" section of "Utility Functions"](/docs/api/utilities/html):
|
||||
|
||||
**Reading from HTML:**
|
||||
|
||||
- `table_to_sheet` converts a DOM TABLE element to a worksheet.
|
||||
- `table_to_book` converts a DOM TABLE element to a worksheet.
|
||||
- `sheet_add_dom` adds data from a DOM TABLE element to an existing worksheet.
|
||||
|
||||
The following are described in the [Utility Functions](/docs/api/utilities):
|
||||
|
||||
**Constructing:**
|
||||
@ -62,10 +70,8 @@ The following are described in the [Utility Functions](/docs/api/utilities):
|
||||
|
||||
- `aoa_to_sheet` converts an array of arrays of JS data to a worksheet.
|
||||
- `json_to_sheet` converts an array of JS objects to a worksheet.
|
||||
- `table_to_sheet` converts a DOM TABLE element to a worksheet.
|
||||
- `sheet_add_aoa` adds an array of arrays of JS data to an existing worksheet.
|
||||
- `sheet_add_json` adds an array of JS objects to an existing worksheet.
|
||||
- `sheet_add_dom` adds data from a DOM TABLE element to an existing worksheet.
|
||||
|
||||
**Exporting:**
|
||||
|
||||
|
23
docz/static/dom/SheetJSCheerio.js
Normal file
23
docz/static/dom/SheetJSCheerio.js
Normal file
@ -0,0 +1,23 @@
|
||||
const XLSX = require("xlsx");
|
||||
const { readFileSync } = require("fs");
|
||||
const cheerio = require("cheerio");
|
||||
|
||||
/* obtain HTML string. This example reads from test.html */
|
||||
const html_str = readFileSync("SheetJSTable.html", "utf8");
|
||||
/* get first TABLE element */
|
||||
const $ = cheerio.load(html_str);
|
||||
const doc = $("TABLE").first()[0];
|
||||
|
||||
/* FIX THE CHEERIO LIBRARY */
|
||||
Object.defineProperty(doc.__proto__, "tagName", { get: function() { return Object.entries(this).find(r => r[0] == "tagName" || r[0] == "name")[1].toUpperCase(); }});
|
||||
Object.defineProperty(doc.__proto__, "rows", { get: function() { return $(this).children("tbody").children("tr"); }});
|
||||
Object.defineProperty(doc.__proto__, "cells", { get: function() { return $(this).children("td, th"); }});
|
||||
Object.defineProperty(doc.__proto__, "ownerDocument", { get: function() { return {}; }});
|
||||
doc.__proto__.hasAttribute = function(name) { return Object.hasOwnProperty.call(this.attribs, name); }
|
||||
doc.__proto__.getAttribute = function(name) { return this.attribs[name]; }
|
||||
Object.defineProperty(doc.__proto__, "innerHTML", { get: function() { return $(this).prop('innerHTML'); }});
|
||||
doc.__proto__.getElementsByTagName = function(name) { return ($(this).children(name))}
|
||||
|
||||
/* generate workbook */
|
||||
const workbook = XLSX.utils.table_to_book(doc);
|
||||
XLSX.writeFile(workbook, "SheetJSCheerio.xlsx");
|
46
docz/static/dom/SheetJSTable.html
Normal file
46
docz/static/dom/SheetJSTable.html
Normal file
@ -0,0 +1,46 @@
|
||||
<!DOCTYPE html>
|
||||
<!-- vim: set ts=2: -->
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||
<meta http-equiv="Content-Security-Policy" content="script-src 'self' https:">
|
||||
<meta name="robots" content="noindex">
|
||||
<title>SheetJS Table Example</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<table id="data-table">
|
||||
<tbody>
|
||||
<tr>
|
||||
<td id="data-table-A1"><span contenteditable="true">This</span></td>
|
||||
<td id="data-table-B1"><span contenteditable="true">is</span></td>
|
||||
<td id="data-table-C1"><span contenteditable="true">a</span></td>
|
||||
<td id="data-table-D1"><span contenteditable="true">Test</span></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td id="data-table-A2"><span contenteditable="true">வணக்கம்</span></td>
|
||||
<td id="data-table-B2"><span contenteditable="true">สวัสดี</span></td>
|
||||
<td id="data-table-C2"><span contenteditable="true">你好</span></td>
|
||||
<td id="data-table-D2"><span contenteditable="true">가지마</span></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td id="data-table-A3"><span contenteditable="true">1</span></td>
|
||||
<td id="data-table-B3"><span contenteditable="true">2</span></td>
|
||||
<td id="data-table-C3"><span contenteditable="true">3</span></td>
|
||||
<td id="data-table-D3"><span contenteditable="true">4</span></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td id="data-table-A4"><span contenteditable="true">Click</span></td>
|
||||
<td id="data-table-B4"><span contenteditable="true">to</span></td>
|
||||
<td id="data-table-C4"><span contenteditable="true">edit</span></td>
|
||||
<td id="data-table-D4"><span contenteditable="true">cells</span></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td colspan="4"><a href="https://sheetjs.com">Generated by SheetJS</a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</body>
|
||||
|
||||
</html>
|
Loading…
Reference in New Issue
Block a user