389 lines
12 KiB
Markdown
389 lines
12 KiB
Markdown
|
---
|
||
|
sidebar_position: 7
|
||
|
title: HTML
|
||
|
---
|
||
|
|
||
|
HTML is a common format for presenting data in the web. While the general read
|
||
|
functions (`XLSX.read` and `XLSX.readFile`) can parse HTML strings and the write
|
||
|
functions (`XLSX.write` and `XLSX.writeFile`) can generate HTML strings, the
|
||
|
utility functions in this section can use DOM features.
|
||
|
|
||
|
:::note
|
||
|
|
||
|
SheetJS CE primarily focuses on data and number formatting.
|
||
|
|
||
|
[SheetJS Pro](https://sheetjs.com/pro) supports CSS text and cell styles in the
|
||
|
HTML format and HTML table utilities.
|
||
|
|
||
|
:::
|
||
|
|
||
|
## HTML Table Input
|
||
|
|
||
|
### Create New Sheet
|
||
|
|
||
|
**Create a worksheet or workbook from a TABLE element**
|
||
|
|
||
|
```js
|
||
|
var ws = XLSX.utils.table_to_sheet(elt, opts);
|
||
|
var wb = XLSX.utils.table_to_book(elt, opts);
|
||
|
```
|
||
|
|
||
|
`XLSX.utils.table_to_sheet` takes a table DOM element and returns a worksheet
|
||
|
resembling the input table. Numbers are parsed. All other data will be stored
|
||
|
as strings.
|
||
|
|
||
|
`XLSX.utils.table_to_book` produces a minimal workbook based on the worksheet.
|
||
|
|
||
|
Both functions accept options arguments:
|
||
|
|
||
|
| Option Name | Default | Description |
|
||
|
| :---------- | :------: | :-------------------------------------------------- |
|
||
|
|`raw` | | If true, every cell will hold raw strings |
|
||
|
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||
|
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||
|
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||
|
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||
|
|
||
|
Exporting a table to a spreadsheet file in the web browser involves 3 steps:
|
||
|
"find the table", "generate a workbook object", and "export to file".
|
||
|
|
||
|
For example, if the HTML table has `id` attribute set to `sheetjs`:
|
||
|
|
||
|
```html
|
||
|
<table id="sheetjs">
|
||
|
<tr><th>Name</th><th>Index</th></tr>
|
||
|
<tr><td>Barack Obama</td><td>44</td></tr>
|
||
|
<tr><td>Donald Trump</td><td>45</td></tr>
|
||
|
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||
|
</table>
|
||
|
```
|
||
|
|
||
|
`document.getElementById("sheetjs")` is a live reference to the table.
|
||
|
|
||
|
```js
|
||
|
/* find the table element in the page */
|
||
|
var tbl = document.getElementById('sheetjs');
|
||
|
/* create a workbook */
|
||
|
var wb = XLSX.utils.table_to_book(tbl);
|
||
|
/* export to file */
|
||
|
XLSX.writeFile(wb, "SheetJSTable.xlsx");
|
||
|
```
|
||
|
|
||
|
<details open><summary><b>Demo</b> (click to hide)</summary>
|
||
|
|
||
|
This HTML table has id set to `sheetjs`:
|
||
|
|
||
|
<table id="sheetjs">
|
||
|
<tr><th>Name</th><th>Index</th></tr>
|
||
|
<tr><td>Barack Obama</td><td>44</td></tr>
|
||
|
<tr><td>Donald Trump</td><td>45</td></tr>
|
||
|
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||
|
</table>
|
||
|
|
||
|
```jsx live
|
||
|
function SheetJSExportTable() { return ( <button onClick={() => {
|
||
|
/* find the table element in the page */
|
||
|
var tbl = document.getElementById('sheetjs');
|
||
|
/* create a workbook */
|
||
|
var wb = XLSX.utils.table_to_book(tbl);
|
||
|
/* export to file */
|
||
|
XLSX.writeFile(wb, "SheetJSTable.xlsx");
|
||
|
}}><b>Export XLSX!</b></button> ); }
|
||
|
```
|
||
|
</details>
|
||
|
|
||
|
### Add to Sheet
|
||
|
|
||
|
**Add data from a TABLE element to an existing worksheet**
|
||
|
|
||
|
```js
|
||
|
XLSX.utils.sheet_add_dom(ws, elt, opts);
|
||
|
```
|
||
|
|
||
|
`XLSX.utils.sheet_add_dom` takes a table DOM element and updates an existing
|
||
|
worksheet object. It follows the same process as `table_to_sheet` and accepts
|
||
|
an options argument:
|
||
|
|
||
|
| Option Name | Default | Description |
|
||
|
| :---------- | :------: | :-------------------------------------------------- |
|
||
|
|`raw` | | If true, every cell will hold raw strings |
|
||
|
|`dateNF` | FMT 14 | Use specified date format in string output |
|
||
|
|`cellDates` | false | Store dates as type `d` (default is `n`) |
|
||
|
|`sheetRows` | 0 | If >0, read the first `sheetRows` rows of the table |
|
||
|
|`display` | false | If true, hidden rows and cells will not be parsed |
|
||
|
|
||
|
`origin` is expected to be one of:
|
||
|
|
||
|
| `origin` | Description |
|
||
|
| :--------------- | :-------------------------------------------------------- |
|
||
|
| (cell object) | Use specified cell (cell object) |
|
||
|
| (string) | Use specified cell (A1-Style cell) |
|
||
|
| (number >= 0) | Start from the first column at specified row (0-indexed) |
|
||
|
| -1 | Append to bottom of worksheet starting on first column |
|
||
|
| (default) | Start from cell `A1` |
|
||
|
|
||
|
|
||
|
A common use case for `sheet_add_dom` involves adding multiple tables to a
|
||
|
single worksheet, usually with a few blank rows in between each table:
|
||
|
|
||
|
```js
|
||
|
/* get "table1" and create worksheet */
|
||
|
const table1 = document.getElementById('table1');
|
||
|
const ws = XLSX.utils.table_to_sheet(table1);
|
||
|
|
||
|
/* get "table2" and append to the worksheet */
|
||
|
const table2 = document.getElementById('table2');
|
||
|
// highlight-next-line
|
||
|
XLSX.utils.sheet_add_dom(ws, table2, {origin: -1});
|
||
|
```
|
||
|
|
||
|
<details><summary><b>Multi-table Export Example</b> (click to show)</summary>
|
||
|
|
||
|
This demo creates a worksheet that should look like the screenshot below:
|
||
|
|
||
|
![Multi-Table Export in Excel](pathname:///files/multitable.png)
|
||
|
|
||
|
The `create_gap_rows` helper function expands the worksheet range, adding blank
|
||
|
rows between the data tables.
|
||
|
|
||
|
```jsx live
|
||
|
function MultiTable() {
|
||
|
const headers = ["Table 1", "Table2", "Table 3"];
|
||
|
|
||
|
/* Callback invoked when the button is clicked */
|
||
|
const xport = React.useCallback(async () => {
|
||
|
/* This function creates gap rows */
|
||
|
function create_gap_rows(ws, nrows) {
|
||
|
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
|
||
|
ref.e.r += nrows; // add to ending row
|
||
|
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
|
||
|
}
|
||
|
|
||
|
/* first table */
|
||
|
const ws = XLSX.utils.aoa_to_sheet([[headers[0]]]);
|
||
|
XLSX.utils.sheet_add_dom(ws, document.getElementById('table1'), {origin: -1});
|
||
|
create_gap_rows(ws, 1); // one row gap after first table
|
||
|
|
||
|
/* second table */
|
||
|
XLSX.utils.sheet_add_aoa(ws, [[headers[1]]], {origin: -1});
|
||
|
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
|
||
|
create_gap_rows(ws, 2); // two rows gap after second table
|
||
|
|
||
|
/* third table */
|
||
|
XLSX.utils.sheet_add_aoa(ws, [[headers[2]]], {origin: -1});
|
||
|
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
|
||
|
|
||
|
/* create workbook and export */
|
||
|
const wb = XLSX.utils.book_new();
|
||
|
XLSX.utils.book_append_sheet(wb, ws, "Export");
|
||
|
XLSX.writeFile(wb, "SheetJSMultiTablexport.xlsx");
|
||
|
});
|
||
|
|
||
|
return ( <>
|
||
|
<button onClick={xport}><b>Export XLSX!</b></button><br/><br/>
|
||
|
<b>{headers[0]}</b><br/>
|
||
|
<table id="table1">
|
||
|
<tr><td>A2</td><td>B2</td></tr>
|
||
|
<tr><td>A3</td><td>B3</td></tr>
|
||
|
</table>
|
||
|
<b>{headers[1]}</b><br/>
|
||
|
<table id="table2">
|
||
|
<tr><td>A6</td><td>B6</td><td>C6</td></tr>
|
||
|
<tr><td>A7</td><td>B7</td><td>C7</td></tr>
|
||
|
</table>
|
||
|
<br/>
|
||
|
<b>{headers[2]}</b><br/>
|
||
|
<table id="table3">
|
||
|
<tr><td>A11</td><td>B11</td></tr>
|
||
|
<tr><td>A12</td><td>B12</td></tr>
|
||
|
</table>
|
||
|
</> );
|
||
|
}
|
||
|
```
|
||
|
|
||
|
</details>
|
||
|
|
||
|
### HTML Strings
|
||
|
|
||
|
**Create a worksheet or workbook from HTML string**
|
||
|
|
||
|
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
||
|
Starting from an HTML string, there are two parsing approaches:
|
||
|
|
||
|
A) Table Phantasm: create a DIV whose `innerHTML` is set to the HTML string,
|
||
|
generate worksheet using the DOM element, then remove the DIV:
|
||
|
|
||
|
```js
|
||
|
/* create element from the source */
|
||
|
var elt = document.createElement("div");
|
||
|
elt.innerHTML = html_source;
|
||
|
document.body.appendChild(elt);
|
||
|
|
||
|
/* generate worksheet */
|
||
|
var ws = XLSX.utils.table_to_sheet(elt.getElementsByTagName("TABLE")[0]);
|
||
|
|
||
|
/* remove element */
|
||
|
document.body.removeChild(elt);
|
||
|
```
|
||
|
|
||
|
<details><summary><b>Phantasm Demo</b> (click to show)</summary>
|
||
|
|
||
|
The `html` variable in the demo is an editable HTML string
|
||
|
|
||
|
```jsx live
|
||
|
function SheetJSTablePhantasm() {
|
||
|
/* HTML stored as a string */
|
||
|
const html = `\
|
||
|
<table>
|
||
|
<tr><th>Name</th><th>Index</th></tr>
|
||
|
<tr><td>Barack Obama</td><td>44</td></tr>
|
||
|
<tr><td>Donald Trump</td><td>45</td></tr>
|
||
|
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||
|
</table>
|
||
|
`;
|
||
|
return ( <>
|
||
|
<button onClick={() => {
|
||
|
/* create element from the source */
|
||
|
var elt = document.createElement("div");
|
||
|
elt.innerHTML = html;
|
||
|
document.body.appendChild(elt);
|
||
|
|
||
|
/* generate workbook */
|
||
|
var tbl = elt.getElementsByTagName("TABLE")[0];
|
||
|
var wb = XLSX.utils.table_to_book(tbl);
|
||
|
|
||
|
/* remove element */
|
||
|
document.body.removeChild(elt);
|
||
|
|
||
|
/* generate file */
|
||
|
XLSX.writeFile(wb, "SheetJSTablePhantasm.xlsx");
|
||
|
}}><b>Export XLSX!</b></button>
|
||
|
<pre><b>HTML:</b><br/>{html}</pre>
|
||
|
</>);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
</details>
|
||
|
|
||
|
B) Raw HTML: use `XLSX.read` to read the text in the same manner as CSV.
|
||
|
|
||
|
```js
|
||
|
var wb = XLSX.read(html_source, { type: "string" });
|
||
|
var ws = wb.Sheets[wb.SheetNames[0]];
|
||
|
```
|
||
|
|
||
|
<details><summary><b>Raw HTML Demo</b> (click to show)</summary>
|
||
|
|
||
|
The `html` variable in the demo is an editable HTML string
|
||
|
|
||
|
```jsx live
|
||
|
function SheetJSRawHTMLToXLSX() {
|
||
|
/* HTML stored as a string */
|
||
|
const html = `\
|
||
|
<table>
|
||
|
<tr><th>Name</th><th>Index</th></tr>
|
||
|
<tr><td>Barack Obama</td><td>44</td></tr>
|
||
|
<tr><td>Donald Trump</td><td>45</td></tr>
|
||
|
<tr><td>Joseph Biden</td><td>46</td></tr>
|
||
|
</table>
|
||
|
`;
|
||
|
return ( <>
|
||
|
<button onClick={() => {
|
||
|
/* read HTML string */
|
||
|
var wb = XLSX.read(html, {type: "string"});
|
||
|
|
||
|
/* generate file */
|
||
|
XLSX.writeFile(wb, "SheetJSRawHTML.xlsx");
|
||
|
}}><b>Export XLSX!</b></button>
|
||
|
<pre><b>HTML:</b><br/>{html}</pre>
|
||
|
</>);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
</details>
|
||
|
|
||
|
### Value Override
|
||
|
|
||
|
When the `raw: true` option is specified, the parser will generate text cells.
|
||
|
When the option is not specified or when it is set to false, the parser will
|
||
|
try to interpret the text of each TD element.
|
||
|
|
||
|
To override the conversion for a specific cell, the following data attributes
|
||
|
can be added to the individual TD elements:
|
||
|
|
||
|
| Attribute | Description |
|
||
|
|:----------|:-------------------------------------------------------------|
|
||
|
| `data-t` | Override [Cell Type](/docs/csf/cell#data-types) |
|
||
|
| `data-v` | Override Cell Value |
|
||
|
| `data-z` | Override [Number Format](/docs/csf/features/nf) |
|
||
|
|
||
|
For example:
|
||
|
|
||
|
```html
|
||
|
<!-- Parser interprets value as `new Date("2012-12-03")` default date format -->
|
||
|
<td>2012-12-03</td>
|
||
|
|
||
|
<!-- String cell "2012-12-03" -->
|
||
|
<td data-t="s">2012-12-03</td>
|
||
|
|
||
|
<!-- Numeric cell with the correct date code and General format -->
|
||
|
<td data-t="n" data-v="41246">2012-12-03</td>
|
||
|
|
||
|
<!-- Traditional Excel Date 2012-12-03 with style yyyy-mm-dd -->
|
||
|
<td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td>
|
||
|
```
|
||
|
|
||
|
<details open><summary><b>HTML Value Examples</b> (click to hide)</summary>
|
||
|
|
||
|
```jsx live
|
||
|
function SheetJSHTMLValueOverride() {
|
||
|
/* HTML stored as a string */
|
||
|
const html = `\
|
||
|
<table>
|
||
|
<tr><th>Cell</th><th>data-t</th><th>data-v</th><th>data-z</th></tr>
|
||
|
<tr><td>2012-12-03</td><td/><td/><td/></tr>
|
||
|
<tr><td data-t="s">2012-12-03</td><td>s</td><td/><td/></tr>
|
||
|
<tr><td data-t="n" data-v="41246">2012-12-03</td><td>n</td><td>41246</td><td/></tr>
|
||
|
<tr><td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td><td>n</td><td>41246</td><td>yyyy-mm-dd</td></tr>
|
||
|
</table>
|
||
|
`;
|
||
|
return ( <>
|
||
|
<button onClick={() => {
|
||
|
/* create element from the source */
|
||
|
var elt = document.createElement("div");
|
||
|
elt.innerHTML = html;
|
||
|
document.body.appendChild(elt);
|
||
|
|
||
|
/* generate workbook */
|
||
|
var tbl = elt.getElementsByTagName("TABLE")[0];
|
||
|
var wb = XLSX.utils.table_to_book(tbl);
|
||
|
|
||
|
/* remove element */
|
||
|
document.body.removeChild(elt);
|
||
|
|
||
|
/* generate file */
|
||
|
XLSX.writeFile(wb, "SheetJSHTMLValueOverride.xlsx");
|
||
|
}}><b>Export XLSX!</b></button>
|
||
|
<pre><b>HTML String:</b><br/>{html}<br/><b>TABLE:</b></pre>
|
||
|
<div dangerouslySetInnerHTML={{__html: html}}/>
|
||
|
</>);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
</details>
|
||
|
|
||
|
### Synthetic DOM
|
||
|
|
||
|
`table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements.
|
||
|
Traditionally there is no DOM in server-side environments including NodeJS.
|
||
|
|
||
|
:::note
|
||
|
|
||
|
The simplest approach for server-side processing is to automate a headless web
|
||
|
browser. ["Browser Automation"](/docs/demos/net/headless) covers some browsers.
|
||
|
|
||
|
:::
|
||
|
|
||
|
Some ecosystems provide DOM-like frameworks that are compatible with SheetJS.
|
||
|
Examples are included in the ["Synthetic DOM"](/docs/demos/net/dom) demo
|