docs.sheetjs.com/docz/docs/08-api/07-utilities/03-html.md
SheetJS 92e3c5aa72 mdx cleanup in preparation for v2
- use autolinks (e.g <https://sheetjs.com> -> https://sheetjs.com)
- move <summary> blocks to separate lines
2024-04-08 00:57:39 -04:00

15 KiB

sidebar_position title
3 HTML

HTML is a common format for presenting data in the web. While the general read functions (XLSX.read and XLSX.readFile) can parse HTML strings and the write functions (XLSX.write and XLSX.writeFile) can generate HTML strings, the utility functions in this section can use DOM features.

:::note pass

SheetJS CE primarily focuses on data and number formatting.

SheetJS Pro supports CSS text and cell styles in the HTML format and HTML table utilities.

:::

HTML Table Output

Display worksheet data in a HTML table

var html = XLSX.utils.sheet_to_html(ws, opts);

As an alternative to the writeFile HTML type, XLSX.utils.sheet_to_html also produces HTML output. The function takes an options argument:

Option Name Default Description
id Specify the id attribute for the TABLE element
editable false If true, set contenteditable="true" for every TD
header Override header
footer Override footer

Starting from the sample file pres.numbers:

function SheetJSHTML() {
  const url = "https://sheetjs.com/pres.numbers";
  const [__html, setHTML] = React.useState("");
  React.useEffect(() => { (async() => {
    /* download file and parse */
    const wb = XLSX.read(await (await fetch(url)).arrayBuffer());
    /* get the first worksheet */
    const ws = wb.Sheets[wb.SheetNames[0]];

    /* generate HTML */
    const html = XLSX.utils.sheet_to_html(ws);

    setHTML(html);
  })(); }, []);
  return ( <>
    <b>XLSX.utils.sheet_to_html(ws)</b>
    <div dangerouslySetInnerHTML={{__html}}/>
  </> );
}

Implementation Details

The generated table will include special data attributes on each TD element:

Attribute Description
data-t Override Cell Type
data-v Override Cell Value
data-z Override Number Format

External cell links will be written as A tags wrapping the cell contents.

HTML Table Input

Create New Sheet

Create a worksheet or workbook from a TABLE element

var ws = XLSX.utils.table_to_sheet(elt, opts);
var wb = XLSX.utils.table_to_book(elt, opts);

XLSX.utils.table_to_sheet takes a table DOM element and returns a worksheet resembling the input table. Numbers are parsed. All other data will be stored as strings.

XLSX.utils.table_to_book produces a minimal workbook based on the worksheet.

Both functions accept options arguments:

Option Name Default Description
raw If true, every cell will hold raw strings
dateNF FMT 14 Use specified date format in string output
cellDates false Store dates as type d (default is n)
sheetRows 0 If >0, read the first sheetRows rows of the table
display false If true, hidden rows and cells will not be parsed
UTC false If true, dates are interpreted as UTC **

UTC option is explained in "Dates"

Exporting a table to a spreadsheet file in the web browser involves 3 steps: "find the table", "generate a workbook object", and "export to file".

For example, if the HTML table has id attribute set to sheetjs:

<table id="sheetjs">
  <tr><th>Name</th><th>Index</th></tr>
  <tr><td>Barack Obama</td><td>44</td></tr>
  <tr><td>Donald Trump</td><td>45</td></tr>
  <tr><td>Joseph Biden</td><td>46</td></tr>
</table>

document.getElementById("sheetjs") is a live reference to the table.

/* find the table element in the page */
var tbl = document.getElementById('sheetjs');
/* create a workbook */
var wb = XLSX.utils.table_to_book(tbl);
/* export to file */
XLSX.writeFile(wb, "SheetJSTable.xlsx");
Demo (click to hide)

This HTML table has id set to sheetjs:

NameIndex
Barack Obama44
Donald Trump45
Joseph Biden46
function SheetJSExportTable() { return ( <button onClick={() => {
  /* find the table element in the page */
  var tbl = document.getElementById('sheetjs');
  /* create a workbook */
  var wb = XLSX.utils.table_to_book(tbl);
  /* export to file */
  XLSX.writeFile(wb, "SheetJSTable.xlsx");
}}><b>Export XLSX!</b></button> ); }

Add to Sheet

Add data from a TABLE element to an existing worksheet

XLSX.utils.sheet_add_dom(ws, elt, opts);

XLSX.utils.sheet_add_dom takes a table DOM element and updates an existing worksheet object. It follows the same process as table_to_sheet and accepts an options argument:

Option Name Default Description
raw If true, every cell will hold raw strings
dateNF FMT 14 Use specified date format in string output
cellDates false Store dates as type d (default is n)
sheetRows 0 If >0, read the first sheetRows rows of the table
display false If true, hidden rows and cells will not be parsed
UTC false If true, dates are interpreted as UTC **

UTC option is explained in "Dates"

origin is expected to be one of:

origin Description
(cell object) Use specified cell (cell object)
(string) Use specified cell (A1-Style cell)
(number >= 0) Start from the first column at specified row (0-indexed)
-1 Append to bottom of worksheet starting on first column
(default) Start from cell A1

A common use case for sheet_add_dom involves adding multiple tables to a single worksheet, usually with a few blank rows in between each table:

/* get "table1" and create worksheet */
const table1 = document.getElementById('table1');
const ws = XLSX.utils.table_to_sheet(table1);

/* get "table2" and append to the worksheet */
const table2 = document.getElementById('table2');
// highlight-next-line
XLSX.utils.sheet_add_dom(ws, table2, {origin: -1});
Multi-table Export Example (click to show)

This demo creates a worksheet that should look like the screenshot below:

Multi-Table Export in Excel

The create_gap_rows helper function expands the worksheet range, adding blank rows between the data tables.

function MultiTable() {
  const headers = ["Table 1", "Table2", "Table 3"];

  /* Callback invoked when the button is clicked */
  const xport = React.useCallback(async () => {
    /* This function creates gap rows */
    function create_gap_rows(ws, nrows) {
      var ref = XLSX.utils.decode_range(ws["!ref"]);       // get original range
      ref.e.r += nrows;                                    // add to ending row
      ws["!ref"] = XLSX.utils.encode_range(ref);           // reassign row
    }

    /* first table */
    const ws = XLSX.utils.aoa_to_sheet([[headers[0]]]);
    XLSX.utils.sheet_add_dom(ws, document.getElementById('table1'), {origin: -1});
    create_gap_rows(ws, 1); // one row gap after first table

    /* second table */
    XLSX.utils.sheet_add_aoa(ws, [[headers[1]]], {origin: -1});
    XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
    create_gap_rows(ws, 2); // two rows gap after second table

    /* third table */
    XLSX.utils.sheet_add_aoa(ws, [[headers[2]]], {origin: -1});
    XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});

    /* create workbook and export */
    const wb = XLSX.utils.book_new();
    XLSX.utils.book_append_sheet(wb, ws, "Export");
    XLSX.writeFile(wb, "SheetJSMultiTablexport.xlsx");
  });

  return ( <>
    <button onClick={xport}><b>Export XLSX!</b></button><br/><br/>
    <b>{headers[0]}</b><br/>
    <table id="table1">
      <tr><td>A2</td><td>B2</td></tr>
      <tr><td>A3</td><td>B3</td></tr>
    </table>
    <b>{headers[1]}</b><br/>
    <table id="table2">
      <tr><td>A6</td><td>B6</td><td>C6</td></tr>
      <tr><td>A7</td><td>B7</td><td>C7</td></tr>
    </table>
    <br/>
    <b>{headers[2]}</b><br/>
    <table id="table3">
      <tr><td>A11</td><td>B11</td></tr>
      <tr><td>A12</td><td>B12</td></tr>
    </table>
  </> );
}

HTML Strings

Create a worksheet or workbook from HTML string

table_to_book / table_to_sheet / sheet_add_dom act on HTML DOM elements. Starting from an HTML string, there are two parsing approaches:

A) Table Phantasm: create a DIV whose innerHTML is set to the HTML string, generate worksheet using the DOM element, then remove the DIV:

/* create element from the source */
var elt = document.createElement("div");
elt.innerHTML = html_source;
document.body.appendChild(elt);

/* generate worksheet */
var ws = XLSX.utils.table_to_sheet(elt.getElementsByTagName("TABLE")[0]);

/* remove element */
document.body.removeChild(elt);
Phantasm Demo (click to show)

The html variable in the demo is an editable HTML string

function SheetJSTablePhantasm() {
  /* HTML stored as a string */
  const html = `\
<table>
  <tr><th>Name</th><th>Index</th></tr>
  <tr><td>Barack Obama</td><td>44</td></tr>
  <tr><td>Donald Trump</td><td>45</td></tr>
  <tr><td>Joseph Biden</td><td>46</td></tr>
</table>
`;
  return ( <>
    <button onClick={() => {
      /* create element from the source */
      var elt = document.createElement("div");
      elt.innerHTML = html;
      document.body.appendChild(elt);

      /* generate workbook */
      var tbl = elt.getElementsByTagName("TABLE")[0];
      var wb = XLSX.utils.table_to_book(tbl);

      /* remove element */
      document.body.removeChild(elt);

      /* generate file */
      XLSX.writeFile(wb, "SheetJSTablePhantasm.xlsx");
    }}><b>Export XLSX!</b></button>
    <pre><b>HTML:</b><br/>{html}</pre>
  </>);
}

B) Raw HTML: use XLSX.read to read the text in the same manner as CSV.

var wb = XLSX.read(html_source, { type: "string" });
var ws = wb.Sheets[wb.SheetNames[0]];
Raw HTML Demo (click to show)

The html variable in the demo is an editable HTML string

function SheetJSRawHTMLToXLSX() {
  /* HTML stored as a string */
  const html = `\
<table>
  <tr><th>Name</th><th>Index</th></tr>
  <tr><td>Barack Obama</td><td>44</td></tr>
  <tr><td>Donald Trump</td><td>45</td></tr>
  <tr><td>Joseph Biden</td><td>46</td></tr>
</table>
`;
  return ( <>
    <button onClick={() => {
      /* read HTML string */
      var wb = XLSX.read(html, {type: "string"});

      /* generate file */
      XLSX.writeFile(wb, "SheetJSRawHTML.xlsx");
    }}><b>Export XLSX!</b></button>
    <pre><b>HTML:</b><br/>{html}</pre>
  </>);
}

Value Override

When the raw: true option is specified, the parser will generate text cells. When the option is not specified or when it is set to false, the parser will try to interpret the text of each TD element.

To override the conversion for a specific cell, the following data attributes can be added to the individual TD elements:

Attribute Description
data-t Override Cell Type
data-v Override Cell Value
data-z Override Number Format

For example:

<!-- Parser interprets value as `new Date("2012-12-03")` default date format -->
<td>2012-12-03</td>

<!-- String cell "2012-12-03" -->
<td data-t="s">2012-12-03</td>

<!-- Numeric cell with the correct date code and General format -->
<td data-t="n" data-v="41246">2012-12-03</td>

<!-- Traditional Excel Date 2012-12-03 with style yyyy-mm-dd -->
<td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td>
HTML Value Examples (click to hide)
function SheetJSHTMLValueOverride() {
  /* HTML stored as a string */
  const html = `\
<table>
  <tr><th>Cell</th><th>data-t</th><th>data-v</th><th>data-z</th></tr>
  <tr><td>2012-12-03</td><td/><td/><td/></tr>
  <tr><td data-t="s">2012-12-03</td><td>s</td><td/><td/></tr>
  <tr><td data-t="n" data-v="41246">2012-12-03</td><td>n</td><td>41246</td><td/></tr>
  <tr><td data-t="n" data-v="41246" data-z="yyyy-mm-dd">2012-12-03</td><td>n</td><td>41246</td><td>yyyy-mm-dd</td></tr>
</table>
`;
  return ( <>
    <button onClick={() => {
      /* create element from the source */
      var elt = document.createElement("div");
      elt.innerHTML = html;
      document.body.appendChild(elt);

      /* generate workbook */
      var tbl = elt.getElementsByTagName("TABLE")[0];
      var wb = XLSX.utils.table_to_book(tbl);

      /* remove element */
      document.body.removeChild(elt);

      /* generate file */
      XLSX.writeFile(wb, "SheetJSHTMLValueOverride.xlsx");
    }}><b>Export XLSX!</b></button>
    <pre><b>HTML String:</b><br/>{html}<br/><b>TABLE:</b></pre>
    <div dangerouslySetInnerHTML={{__html: html}}/>
  </>);
}

Synthetic DOM

table_to_book / table_to_sheet / sheet_add_dom act on HTML DOM elements. Traditionally there is no DOM in server-side environments including NodeJS.

:::note pass

The simplest approach for server-side processing is to automate a headless web browser. "Browser Automation" covers some browsers.

:::

Some ecosystems provide DOM-like frameworks that are compatible with SheetJS. Examples are included in the "Synthetic DOM" demo