14 KiB
sidebar_position | title |
---|---|
5 | HTML |
HTML is a common format for presenting data in the web. While the general read
functions (XLSX.read
and XLSX.readFile
) can parse HTML strings and the write
functions (XLSX.write
and XLSX.writeFile
) can generate HTML strings, the
utility functions in this section can use DOM features.
:::note
SheetJS CE primarily focuses on data and number formatting.
SheetJS Pro supports CSS text and cell styles in the HTML format and HTML table utilities.
:::
HTML Table Output
var html = XLSX.utils.sheet_to_html(ws, opts);
As an alternative to the writeFile
HTML type, XLSX.utils.sheet_to_html
also
produces HTML output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
id |
Specify the id attribute for the TABLE element |
|
editable |
false | If true, set contenteditable="true" for every TD |
header |
Override header | |
footer |
Override footer |
Starting from the sample file pres.numbers
:
function SheetJSHTML() {
const url = "https://sheetjs.com/pres.numbers";
const [__html, setHTML] = React.useState("");
React.useEffect(() => { (async() => {
/* download file and parse */
const wb = XLSX.read(await (await fetch(url)).arrayBuffer());
/* get the first worksheet */
const ws = wb.Sheets[wb.SheetNames[0]];
/* generate HTML */
const html = XLSX.utils.sheet_to_html(ws);
setHTML(html);
})(); }, []);
return ( <>
<b>XLSX.utils.sheet_to_html(ws)</b>
<div dangerouslySetInnerHTML={{__html}}/>
</> );
}
Implementation Details
The generated table will include special data attributes on each TD
element:
Attribute | Description |
---|---|
data-t |
Override Cell Type |
data-v |
Override Cell Value |
data-z |
Override Number Format |
External cell links will be written as A
tags wrapping the cell contents.
HTML Table Input
Create New Sheet
Create a worksheet or workbook from a TABLE element
var ws = XLSX.utils.table_to_sheet(elt, opts);
var wb = XLSX.utils.table_to_book(elt, opts);
XLSX.utils.table_to_sheet
takes a table DOM element and returns a worksheet
resembling the input table. Numbers are parsed. All other data will be stored
as strings.
XLSX.utils.table_to_book
produces a minimal workbook based on the worksheet.
Both functions accept options arguments:
Option Name | Default | Description |
---|---|---|
raw |
If true, every cell will hold raw strings | |
dateNF |
FMT 14 | Use specified date format in string output |
cellDates |
false | Store dates as type d (default is n ) |
sheetRows |
0 | If >0, read the first sheetRows rows of the table |
display |
false | If true, hidden rows and cells will not be parsed |
Exporting a table to a spreadsheet file in the web browser involves 3 steps: "find the table", "generate a workbook object", and "export to file".
For example, if the HTML table has id
attribute set to sheetjs
:
<table id="sheetjs">
<tr><th>Name</th><th>Index</th></tr>
<tr><td>Barack Obama</td><td>44</td></tr>
<tr><td>Donald Trump</td><td>45</td></tr>
<tr><td>Joseph Biden</td><td>46</td></tr>
</table>
document.getElementById("sheetjs")
is a live reference to the table.
/* find the table element in the page */
var tbl = document.getElementById('sheetjs');
/* create a workbook */
var wb = XLSX.utils.table_to_book(tbl);
/* export to file */
XLSX.writeFile(wb, "SheetJSTable.xlsx");
Demo (click to hide)
This HTML table has id set to sheetjs
:
Name | Index |
---|---|
Barack Obama | 44 |
Donald Trump | 45 |
Joseph Biden | 46 |
function SheetJSExportTable() { return ( <button onClick={() => {
/* find the table element in the page */
var tbl = document.getElementById('sheetjs');
/* create a workbook */
var wb = XLSX.utils.table_to_book(tbl);
/* export to file */
XLSX.writeFile(wb, "SheetJSTable.xlsx");
}}><b>Export XLSX!</b></button> ); }
Add to Sheet
Add data from a TABLE element to an existing worksheet
XLSX.utils.sheet_add_dom(ws, elt, opts);
XLSX.utils.sheet_add_dom
takes a table DOM element and updates an existing
worksheet object. It follows the same process as table_to_sheet
and accepts
an options argument:
Option Name | Default | Description |
---|---|---|
raw |
If true, every cell will hold raw strings | |
dateNF |
FMT 14 | Use specified date format in string output |
cellDates |
false | Store dates as type d (default is n ) |
sheetRows |
0 | If >0, read the first sheetRows rows of the table |
display |
false | If true, hidden rows and cells will not be parsed |
origin
is expected to be one of:
origin |
Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-Style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
A common use case for sheet_add_dom
involves adding multiple tables to a
single worksheet, usually with a few blank rows in between each table:
/* get "table1" and create worksheet */
const table1 = document.getElementById('table1');
const ws = XLSX.utils.table_to_sheet(table1);
/* get "table2" and append to the worksheet */
const table2 = document.getElementById('table2');
// highlight-next-line
XLSX.utils.sheet_add_dom(ws, table2, {origin: -1});
Multi-table Export Example (click to show)
This demo creates a worksheet that should look like the screenshot below:
The create_gap_rows
helper function expands the worksheet range, adding blank
rows between the data tables.
function MultiTable() {
const headers = ["Table 1", "Table2", "Table 3"];
/* Callback invoked when the button is clicked */
const xport = React.useCallback(async () => {
/* This function creates gap rows */
function create_gap_rows(ws, nrows) {
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
ref.e.r += nrows; // add to ending row
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
}
/* first table */
const ws = XLSX.utils.aoa_to_sheet([[headers[0]]]);
XLSX.utils.sheet_add_dom(ws, document.getElementById('table1'), {origin: -1});
create_gap_rows(ws, 1); // one row gap after first table
/* second table */
XLSX.utils.sheet_add_aoa(ws, [[headers[1]]], {origin: -1});
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
create_gap_rows(ws, 2); // two rows gap after second table
/* third table */
XLSX.utils.sheet_add_aoa(ws, [[headers[2]]], {origin: -1});
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
/* create workbook and export */
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Export");
XLSX.writeFile(wb, "SheetJSMultiTablexport.xlsx");
});
return ( <>
<button onClick={xport}><b>Export XLSX!</b></button><br/><br/>
<b>{headers[0]}</b><br/>
<table id="table1">
<tr><td>A2</td><td>B2</td></tr>
<tr><td>A3</td><td>B3</td></tr>
</table>
<b>{headers[1]}</b><br/>
<table id="table2">
<tr><td>A6</td><td>B6</td><td>C6</td></tr>
<tr><td>A7</td><td>B7</td><td>C7</td></tr>
</table>
<br/>
<b>{headers[2]}</b><br/>
<table id="table3">
<tr><td>A11</td><td>B11</td></tr>
<tr><td>A12</td><td>B12</td></tr>
</table>
</> );
}
HTML Strings
Create a worksheet or workbook from HTML string
table_to_book
/ table_to_sheet
/ sheet_add_dom
act on HTML DOM elements.
Starting from an HTML string, there are two parsing approaches:
A) Table Phantasm: create a DIV whose innerHTML
is set to the HTML string,
generate worksheet using the DOM element, then remove the DIV:
/* create element from the source */
var elt = document.createElement("div");
elt.innerHTML = html_source;
document.body.appendChild(elt);
/* generate worksheet */
var ws = XLSX.utils.table_to_sheet(elt.getElementsByTagName("TABLE")[0]);
/* remove element */
document.body.removeChild(elt);
Phantasm Demo (click to show)
The html
variable in the demo is an editable HTML string
function SheetJSTablePhantasm() {
/* HTML stored as a string */
const html = `\
<table>
<tr><th>Name</th><th>Index</th></tr>
<tr><td>Barack Obama</td><td>44</td></tr>
<tr><td>Donald Trump</td><td>45</td></tr>
<tr><td>Joseph Biden</td><td>46</td></tr>
</table>
`;
return ( <>
<button onClick={() => {
/* create element from the source */
var elt = document.createElement("div");
elt.innerHTML = html;
document.body.appendChild(elt);
/* generate workbook */
var tbl = elt.getElementsByTagName("TABLE")[0];
var wb = XLSX.utils.table_to_book(tbl);
/* remove element */
document.body.removeChild(elt);
/* generate file */
XLSX.writeFile(wb, "SheetJSTablePhantasm.xlsx");
}}><b>Export XLSX!</b></button>
<pre><b>HTML:</b><br/>{html}</pre>
</>);
}
B) Raw HTML: use XLSX.read
to read the text in the same manner as CSV.
var wb = XLSX.read(html_source, { type: "string" });
var ws = wb.Sheets[wb.SheetNames[0]];
Raw HTML Demo (click to show)
The html
variable in the demo is an editable HTML string
function SheetJSRawHTMLToXLSX() {
/* HTML stored as a string */
const html = `\
<table>
<tr><th>Name</th><th>Index</th></tr>
<tr><td>Barack Obama</td><td>44</td></tr>
<tr><td>Donald Trump</td><td>45</td></tr>
<tr><td>Joseph Biden</td><td>46</td></tr>
</table>
`;
return ( <>
<button onClick={() => {
/* read HTML string */
var wb = XLSX.read(html, {type: "string"});
/* generate file */
XLSX.writeFile(wb, "SheetJSRawHTML.xlsx");
}}><b>Export XLSX!</b></button>
<pre><b>HTML:</b><br/>{html}</pre>
</>);
}
Value Override
When the raw: true
option is specified, the parser will generate text cells.
When the option is not specified or when it is set to false, the parser will
try to interpret the text of each TD element.
To override the conversion for a specific cell, the following data attributes can be added to the individual TD elements:
Attribute | Description |
---|---|
data-t |
Override Cell Type |
data-v |
Override Cell Value |
data-z |
Override Number Format |
For example:
<!-- Parser interprets value as `new Date("2012-12-03")` default date format -->
<td>2012-12-03</td>
<!-- String cell "2012-12-03" -->
<td data-t="s">2012-12-03</td>
<!-- Numeric cell with the correct date code and General format -->
<td data-t="n" data-v="41246">2012-12-03</td>
<!-- Traditional Excel Date 2012-12-03 with style yyyy-mm-dd -->
<td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td>
HTML Value Examples (click to hide)
function SheetJSHTMLValueOverride() {
/* HTML stored as a string */
const html = `\
<table>
<tr><th>Cell</th><th>data-t</th><th>data-v</th><th>data-z</th></tr>
<tr><td>2012-12-03</td><td/><td/><td/></tr>
<tr><td data-t="s">2012-12-03</td><td>s</td><td/><td/></tr>
<tr><td data-t="n" data-v="41246">2012-12-03</td><td>n</td><td>41246</td><td/></tr>
<tr><td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td><td>n</td><td>41246</td><td>yyyy-mm-dd</td></tr>
</table>
`;
return ( <>
<button onClick={() => {
/* create element from the source */
var elt = document.createElement("div");
elt.innerHTML = html;
document.body.appendChild(elt);
/* generate workbook */
var tbl = elt.getElementsByTagName("TABLE")[0];
var wb = XLSX.utils.table_to_book(tbl);
/* remove element */
document.body.removeChild(elt);
/* generate file */
XLSX.writeFile(wb, "SheetJSHTMLValueOverride.xlsx");
}}><b>Export XLSX!</b></button>
<pre><b>HTML String:</b><br/>{html}<br/><b>TABLE:</b></pre>
<div dangerouslySetInnerHTML={{__html: html}}/>
</>);
}
Synthetic DOM
table_to_book
/ table_to_sheet
/ sheet_add_dom
act on HTML DOM elements.
Traditionally there is no DOM in server-side environments including NodeJS.
:::note
The simplest approach for server-side processing is to automate a headless web browser. "Browser Automation" covers some browsers.
:::
Some ecosystems provide DOM-like frameworks that are compatible with SheetJS. Examples are included in the "Synthetic DOM" demo