diff --git a/index.html b/index.html index 303d2bb..8782f02 100644 --- a/index.html +++ b/index.html @@ -75,22 +75,13 @@
Parser and writer for various spreadsheet formats. Pure-JS cleanroom -implementation from official specifications, related documents, and test files. -Emphasis on parsing and writing robustness, cross-format feature compatibility -with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.
-This is the community version. We also offer a pro version with performance -enhancements, additional features like styling, and dedicated support.
-Community Translations of this README:
- - - - - - - +The SheetJS Community Edition offers battle-tested open-source solutions for +extracting useful data from almost any complex spreadsheet and generating new +spreadsheets that will work with legacy and modern software alike.
+SheetJS Pro offers solutions beyond data processing: +Edit complex templates with ease; let out your inner Picasso with styling; make +custom sheets with images/graphs/PivotTables; evaluate formula expressions and +port calculations to web apps; automate common spreadsheet tasks, and much more!
@@ -111,19 +102,20 @@ enhancements, additional features like styling, and dedicated support.
In the browser, just add a script tag:
+ Getting Started +Standalone Browser Scripts
+The complete browser standalone build is saved to dist/xlsx.full.min.js
and
+can be directly added to a page with a script
tag:
<script lang="javascript" src="dist/xlsx.full.min.js"></script>
unpkg
makes the latest version available at:
For example, unpkg
makes the latest version available at:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
With npm:
-$ npm install xlsx
The complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js
. Compared to full build:
With bower:
$ bower install js-xlsx
ECMAScript Modules
+The ECMAScript Module build is saved to xlsx.mjs
and can be directly added to
+a page with a script
tag using type=module
:
<script type="module">
+import { read, writeFileXLSX } from "./xlsx.mjs";
+
+/* load the codepage support library for extended support with older formats */
+import { set_cptable } from "./xlsx.mjs";
+import * as cptable from './dist/cpexcel.full.mjs';
+set_cptable(cptable);
+</script>
The npm package also exposes the module
+with the module
parameter, supported in Angular and other projects:
import { read, writeFileXLSX } from "xlsx";
+
+/* load the codepage support library for extended support with older formats */
+import { set_cptable } from "xlsx";
+import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
+set_cptable(cptable);
Deno
+The sheetjs
package is hosted by Deno:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+/* load the codepage support library for extended support with older formats */
+import * as cptable from 'https://deno.land/x/sheetjs/dist/cpexcel.full.mjs';
+XLSX.set_cptable(cptable);
NodeJS
+With npm:
+$ npm install xlsx
By default, the module supports require
:
var XLSX = require("xlsx");
The module also ships with xlsx.mjs
for use with import
:
import * as XLSX from 'xlsx/xlsx.mjs';
+
+/* load 'fs' for readFile and writeFile support */
+import * as fs from 'fs';
+XLSX.set_fs(fs);
+
+/* load the codepage support library for extended support with older formats */
+import * as cpexcel from 'xlsx/dist/cpexcel.full.mjs';
+XLSX.set_cptable(cpexcel);
Photoshop and InDesign
+dist/xlsx.extendscript.js
is an ExtendScript build for Photoshop and InDesign
+that is included in the npm
package. It can be directly referenced with a
+#include
directive:
#include "xlsx.extendscript.js"
+
+For broad compatibility with JavaScript engines, the library is written using
+ECMAScript 3 language dialect as well as some ES5 features like Array#forEach
.
+Older browsers require shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js
:
<!-- add the shim first -->
+<script type="text/javascript" src="shim.min.js"></script>
+<!-- after the shim is referenced, add the library -->
+<script type="text/javascript" src="xlsx.full.min.js"></script>
The script also includes IE_LoadFile
and IE_SaveFile
for loading and saving
+files in Internet Explorer versions 6-9. The xlsx.extendscript.js
script
+bundles the shim in a format suitable for Photoshop and other Adobe products.
Most scenarios involving spreadsheets and data can be broken into 5 parts:
+Acquire Data: Data may be stored anywhere: local or remote files, +databases, HTML TABLE, or even generated programmatically in the web browser.
+Extract Data: For spreadsheet files, this involves parsing raw bytes to +read the cell data. For general JS data, this involves reshaping the data.
+Process Data: From generating summary statistics to cleaning data +records, this step is the heart of the problem.
+Package Data: This can involve making a new spreadsheet or serializing
+with JSON.stringify
or writing XML or simply flattening data for UI tools.
Release Data: Spreadsheet files can be uploaded to a server or written +locally. Data can be presented to users in an HTML TABLE or data grid.
+A common problem involves generating a valid spreadsheet export from data stored
+in an HTML table. In this example, an HTML TABLE on the page will be scraped,
+a row will be added to the bottom with the date of the report, and a new file
+will be generated and downloaded locally. XLSX.writeFile
takes care of
+packaging the data and attempting a local download:
// Acquire Data (reference to the HTML table)
+var table_elt = document.getElementById("my-table-id");
+
+// Extract Data (create a workbook object from the table)
+var workbook = XLSX.utils.table_to_book(table_elt);
+
+// Process Data (add a new row)
+var ws = workbook.Sheets["Sheet1"];
+XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
+
+// Package and Release Data (`writeFile` tries to write and save an XLSB file)
+XLSX.writeFile(workbook, "Report.xlsb");
This library tries to simplify steps 2 and 4 with functions to extract useful
+data from spreadsheet files (read
/ readFile
) and generate new spreadsheet
+files from data (write
/ writeFile
). Additional utility functions like
+table_to_book
work with other common data sources like HTML tables.
This documentation and various demo projects cover a number of common scenarios +and approaches for steps 1 and 5.
+Utility functions help with step 3.
+Data processing should fit in any workflow
+The library does not impose a separate lifecycle. It fits nicely in websites +and apps built using any framework. The plain JS data objects play nice with +Web Workers and future APIs.
+"Acquiring and Extracting Data" describes +solutions for common data import scenarios.
+"Writing Workbooks" describes solutions for common data +export scenarios involving actual spreadsheet files.
+"Utility Functions" details utility functions for +translating JSON Arrays and other common JS structures into worksheet objects.
+JavaScript is a powerful language for data processing
+The "Common Spreadsheet Format" is a simple object +representation of the core concepts of a workbook. The various functions in the +library provide low-level tools for working with the object.
+For friendly JS processing, there are utility functions for converting parts of +a worksheet to/from an Array of Arrays. The following example combines powerful +JS Array methods with a network request library to download data, select the +information we want and create a workbook file:
+The goal is to generate a XLSB workbook of US President names and birthdays.
+Acquire Data
+Raw Data
+https://theunitedstates.io/congress-legislators/executive.json has the desired +data. For example, John Adams:
+{
+ "id": { /* (data omitted) */ },
+ "name": {
+ "first": "John", // <-- first name
+ "last": "Adams" // <-- last name
+ },
+ "bio": {
+ "birthday": "1735-10-19", // <-- birthday
+ "gender": "M"
+ },
+ "terms": [
+ { "type": "viceprez", /* (other fields omitted) */ },
+ { "type": "viceprez", /* (other fields omitted) */ },
+ { "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
+ ]
+}
Filtering for Presidents
+The dataset includes Aaron Burr, a Vice President who was never President!
+Array#filter
creates a new array with the desired rows. A President served
+at least one term with type
set to "prez"
. To test if a particular row has
+at least one "prez"
term, Array#some
is another native JS function. The
+complete filter would be:
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
Lining up the data
+For this example, the name will be the first name combined with the last name
+(row.name.first + " " + row.name.last
) and the birthday will be the subfield
+row.bio.birthday
. Using Array#map
, the dataset can be massaged in one call:
const rows = prez.map(row => ({
+ name: row.name.first + " " + row.name.last,
+ birthday: row.bio.birthday
+}));
The result is an array of "simple" objects with no nesting:
+[
+ { name: "George Washington", birthday: "1732-02-22" },
+ { name: "John Adams", birthday: "1735-10-19" },
+ // ... one row per President
+]
Extract Data
+With the cleaned dataset, XLSX.utils.json_to_sheet
generates a worksheet:
const worksheet = XLSX.utils.json_to_sheet(rows);
XLSX.utils.book_new
creates a new workbook and XLSX.utils.book_append_sheet
+appends a worksheet to the workbook. The new worksheet will be called "Dates":
const workbook = XLSX.utils.book_new();
+XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
Process Data
+Fixing headers
+By default, json_to_sheet
creates a worksheet with a header row. In this case,
+the headers come from the JS object keys: "name" and "birthday".
The headers are in cells A1 and B1. XLSX.utils.sheet_add_aoa
can write text
+values to the existing worksheet starting at cell A1:
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
Fixing Column Widths
+Some of the names are longer than the default column width. Column widths are
+set by setting the "!cols"
worksheet property.
The following line sets the width of column A to approximately 10 characters:
+worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 characters
One Array#reduce
call over rows
can calculate the maximum width:
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
+worksheet["!cols"] = [ { wch: max_width } ];
Note: If the starting point was a file or HTML table, XLSX.utils.sheet_to_json
+will generate an array of JS objects.
Package and Release Data
+XLSX.writeFile
creates a spreadsheet file and tries to write it to the system.
+In the browser, it will try to prompt the user to download the file. In NodeJS,
+it will write to the local directory.
XLSX.writeFile(workbook, "Presidents.xlsx");
Complete Example
+// Uncomment the next line for use in NodeJS:
+// const XLSX = require("xlsx"), axios = require("axios");
+
+(async() => {
+ /* fetch JSON data and parse */
+ const url = "https://theunitedstates.io/congress-legislators/executive.json";
+ const raw_data = (await axios(url, {responseType: "json"})).data;
+
+ /* filter for the Presidents */
+ const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
+
+ /* flatten objects */
+ const rows = prez.map(row => ({
+ name: row.name.first + " " + row.name.last,
+ birthday: row.bio.birthday
+ }));
+
+ /* generate worksheet and workbook */
+ const worksheet = XLSX.utils.json_to_sheet(rows);
+ const workbook = XLSX.utils.book_new();
+ XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
+
+ /* fix headers */
+ XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
+
+ /* calculate column width */
+ const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
+ worksheet["!cols"] = [ { wch: max_width } ];
+
+ /* create an XLSX file and try to save to Presidents.xlsx */
+ XLSX.writeFile(workbook, "Presidents.xlsx");
+})();
For use in the web browser, assuming the snippet is saved to snippet.js
,
+script tags should be used to include the axios
and xlsx
standalone builds:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
+<script src="snippet.js"></script>
File formats are implementation details
+The parser covers a wide gamut of common spreadsheet file formats to ensure that +"HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
+The writer supports a number of common output formats for broad compatibility +with the data ecosystem.
+To the greatest extent possible, data processing code should not have to worry +about the specific file formats involved.
The demos
directory includes sample projects for:
Platforms and Integrations
deno
electron application
nw.js application
Chrome / Chromium extensions
Download a Google Sheet locally
Adobe ExtendScript
Headless Browsers
canvas-datagrid
internet explorer
Other examples are included in the showcase.
-The node version automatically requires modules for additional features. Some -of these modules are rather large in size and are only needed in special -circumstances, so they do not ship with the core. For browser use, they must -be included directly:
-<!-- international support from js-codepage -->
-<script src="dist/cpexcel.js"></script>
An appropriate version for each dependency is included in the dist/ directory.
-The complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js
. Compared to full build:
Webpack and Browserify builds include optional modules by default. Webpack can
-be configured to remove support with resolve.alias
:
/* uncomment the lines below to remove support */
- resolve: {
- alias: { "./dist/cpexcel.js": "" } // <-- omit international support
- }
Since the library uses functions like Array#forEach
, older browsers require
-shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js
:
<!-- add the shim first -->
-<script type="text/javascript" src="shim.min.js"></script>
-<!-- after the shim is referenced, add the library -->
-<script type="text/javascript" src="xlsx.full.min.js"></script>
The script also includes IE_LoadFile
and IE_SaveFile
for loading and saving
-files in Internet Explorer versions 6-9. The xlsx.extendscript.js
script
-bundles the shim in a format suitable for Photoshop and other Adobe products.
API
+Extract data from spreadsheet bytes
+var workbook = XLSX.read(data, opts);
The read
method can extract data from spreadsheet bytes stored in a JS string,
+"binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).
Read spreadsheet bytes from a local file and extract data
+var workbook = XLSX.readFile(filename, opts);
The readFile
method attempts to read a spreadsheet file at the supplied path.
+Browsers generally do not allow reading files in this way (it is deemed a
+security risk), and attempts to read files in this way will throw an error.
The second opts
argument is optional. "Parsing Options"
+covers the supported properties and behaviors.
Examples
+Here are a few common scenarios (click on each subtitle to see the code):
Prior to SheetJS, APIs for processing spreadsheet files were format-specific. -Third-party libraries either supported one format, or they involved a separate -set of classes for each supported file type. Even though XLSB was introduced in -Excel 2007, nothing outside of SheetJS or Excel supported the format.
-To promote a format-agnostic view, SheetJS starts from a pure-JS representation -that we call the "Common Spreadsheet Format". -Emphasizing a uniform object representation enables new features like format -conversion (reading an XLSX template and saving as XLS) and circumvents the mess -of classes. By abstracting the complexities of the various formats, tools -need not worry about the specific file type!
-A simple object representation combined with careful coding practices enables -use cases in older browsers and in alternative environments like ExtendScript -and Web Workers. It is always tempting to use the latest and greatest features, -but they tend to require the latest versions of browsers, limiting usability.
-Utility functions capture common use cases like generating JS objects or HTML. -Most simple operations should only require a few lines of code. More complex -operations generally should be straightforward to implement.
-Excel pushes the XLSX format as default starting in Excel 2007. However, there -are other formats with more appealing properties. For example, the XLSB format -is spiritually similar to XLSX but files often tend up taking less than half the -space and open much faster! Even though an XLSX writer is available, other -format writers are available so users can take advantage of the unique -characteristics of each format.
-The primary focus of the Community Edition is correct data interchange, focused -on extracting data from any compatible data representation and exporting data in -various formats suitable for any third party interface.
-For parsing, the first step is to read the file. This involves acquiring the -data and feeding it into the library. Here are a few common scenarios:
-readFile
is only available in server environments. Browsers have no API for
-reading arbitrary files given a path, so another strategy must be used.
if(typeof require !== 'undefined') XLSX = require('xlsx');
-var workbook = XLSX.readFile('test.xlsx');
-/* DO SOMETHING WITH workbook HERE */
readFile
wraps the File
logic in Photoshop and other ExtendScript targets.
-The specified path should be an absolute path:
#include "xlsx.extendscript.js"
-/* Read test.xlsx from the Documents folder */
-var workbook = XLSX.readFile(Folder.myDocuments + '/' + 'test.xlsx');
-/* DO SOMETHING WITH workbook HERE */
The extendscript
demo includes a more complex example.
The table_to_book
and table_to_sheet
utility functions take a DOM TABLE
-element and iterate through the child nodes.
var workbook = XLSX.utils.table_to_book(document.getElementById('tableau'));
-/* DO SOMETHING WITH workbook HERE */
Multiple tables on a web page can be converted to individual worksheets:
-/* create new workbook */
-var workbook = XLSX.utils.book_new();
+ Local file in a NodeJS server (click to show)
+readFile
uses fs.readFileSync
under the hood:
+var XLSX = require("xlsx");
-/* convert table 'table1' to worksheet named "Sheet1" */
-var ws1 = XLSX.utils.table_to_sheet(document.getElementById('table1'));
-XLSX.utils.book_append_sheet(workbook, ws1, "Sheet1");
+var workbook = XLSX.readFile("test.xlsx");
+For Node ESM, the readFile
helper is not enabled. Instead, fs.readFileSync
+should be used to read the file data as a Buffer
for use with XLSX.read
:
+import { readFileSync } from "fs";
+import { read } from "xlsx/xlsx.mjs";
-/* convert table 'table2' to worksheet named "Sheet2" */
-var ws2 = XLSX.utils.table_to_sheet(document.getElementById('table2'));
-XLSX.utils.book_append_sheet(workbook, ws2, "Sheet2");
-
-/* workbook now has 2 worksheets */
-Alternatively, the HTML code can be extracted and parsed:
-var htmlstr = document.getElementById('tableau').outerHTML;
-var workbook = XLSX.read(htmlstr, {type:'string'});
+const buf = readFileSync("test.xlsx");
+/* buf is a Buffer */
+const workbook = read(buf);
Note: for a more complete example that works in older browsers, check the demo
-at http://oss.sheetjs.com/sheetjs/ajax.html. The xhr
demo
-includes more examples with XMLHttpRequest
and fetch
.
readFile
uses Deno.readFileSync
under the hood:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+const workbook = XLSX.readFile("test.xlsx");
Applications reading files must be invoked with the --allow-read
flag. The
+deno
demo has more examples
For modern websites targeting Chrome 76+, File#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
+
+async function handleDropAsync(e) {
+ e.stopPropagation(); e.preventDefault();
+ const f = e.dataTransfer.files[0];
+ /* f is a File */
+ const data = await f.arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+}
+drop_dom_element.addEventListener("drop", handleDropAsync, false);
For maximal compatibility, the FileReader
API should be used:
function handleDrop(e) {
+ e.stopPropagation(); e.preventDefault();
+ var f = e.dataTransfer.files[0];
+ /* f is a File */
+ var reader = new FileReader();
+ reader.onload = function(e) {
+ var data = e.target.result;
+ /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
+ var workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+ };
+ reader.readAsArrayBuffer(f);
+}
+drop_dom_element.addEventListener("drop", handleDrop, false);
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
+Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
For modern websites targeting Chrome 76+, Blob#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
+
+async function handleFileAsync(e) {
+ const file = e.target.files[0];
+ const data = await file.arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+}
+input_dom_element.addEventListener("change", handleFileAsync, false);
For broader support (including IE10+), the FileReader
approach is recommended:
function handleFile(e) {
+ var file = e.target.files[0];
+ var reader = new FileReader();
+ reader.onload = function(e) {
+ var data = e.target.result;
+ /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
+ var workbook = XLSX.read(e.target.result);
+
+ /* DO SOMETHING WITH workbook HERE */
+ };
+ reader.readAsArrayBuffer(file);
+}
+input_dom_element.addEventListener("change", handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.
For modern websites targeting Chrome 42+, fetch
is recommended:
// XLSX is a global from the standalone script
+
+(async() => {
+ const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
+ const data = await (await fetch(url)).arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+})();
For broader support, the XMLHttpRequest
approach is recommended:
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
@@ -493,108 +723,139 @@ includes more examples with XMLHttpRequest
and fetch
.<
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
-}
+};
req.send();
The xhr
demo includes a longer discussion and more examples.
http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
For modern browsers, Blob#arrayBuffer
can read data from files:
async function handleDropAsync(e) {
- e.stopPropagation(); e.preventDefault();
- const f = evt.dataTransfer.files[0];
- const data = await f.arrayBuffer();
- const workbook = XLSX.read(data);
+ Local file in a PhotoShop or InDesign plugin (click to show)
+readFile
wraps the File
logic in Photoshop and other ExtendScript targets.
+The specified path should be an absolute path:
+#include "xlsx.extendscript.js"
- /* DO SOMETHING WITH workbook HERE */
-}
-drop_dom_element.addEventListener('drop', handleDropAsync, false);
-For maximal compatibility, the FileReader
API should be used:
-function handleDrop(e) {
- e.stopPropagation(); e.preventDefault();
- var f = e.dataTransfer.files[0];
- var reader = new FileReader();
- reader.onload = function(e) {
- var workbook = XLSX.read(e.target.result);
+/* Read test.xlsx from the Documents folder */
+var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
+The extendscript
demo includes a more complex example.
+
+
+ Local file in an Electron app (click to show)
+readFile
can be used in the renderer process:
+/* From the renderer process */
+var XLSX = require("xlsx");
+
+var workbook = XLSX.readFile(path);
+Electron APIs have changed over time. The electron
demo
+shows a complete example and details the required version-specific settings.
+
+
+ Local file in a mobile app with React Native (click to show)
+The react
demo includes a sample React Native app.
+Since React Native does not provide a way to read files from the filesystem, a
+third-party library must be used. The following libraries have been tested:
+
+The base64
encoding returns strings compatible with the base64
type:
+import XLSX from "xlsx";
+import { FileSystem } from "react-native-file-access";
+
+const b64 = await FileSystem.readFile(path, "base64");
+/* b64 is a base64 string */
+const workbook = XLSX.read(b64, {type: "base64"});
+
+The ascii
encoding returns binary strings compatible with the binary
type:
+import XLSX from "xlsx";
+import { readFile } from "react-native-fs";
+
+const bstr = await readFile(path, "ascii");
+/* bstr is a binary string */
+const workbook = XLSX.read(bstr, {type: "binary"});
+
+
+ NodeJS Server File Uploads (click to show)
+read
can accept a NodeJS buffer. readFile
can read files generated by a
+HTTP POST request body parser like formidable
:
+const XLSX = require("xlsx");
+const http = require("http");
+const formidable = require("formidable");
+
+const server = http.createServer((req, res) => {
+ const form = new formidable.IncomingForm();
+ form.parse(req, (err, fields, files) => {
+ /* grab the first file */
+ const f = Object.entries(files)[0][1];
+ const path = f.filepath;
+ const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
- };
- reader.readAsArrayBuffer(f);
-}
-drop_dom_element.addEventListener('drop', handleDrop, false);
+ });
+}).listen(process.env.PORT || 7262);
The server
demo has more advanced examples.
Data from file input elements can be processed using the same APIs as in the -drag-and-drop example.
-Using Blob#arrayBuffer
:
async function handleFileAsync(e) {
- const file = e.target.files[0];
- const data = await file.arrayBuffer();
- const workbook = XLSX.read(data);
+ Download files in a NodeJS process (click to show)
+Node 17.5 and 18.0 have native support for fetch:
+const XLSX = require("xlsx");
+
+const data = await (await fetch(url)).arrayBuffer();
+/* data is an ArrayBuffer */
+const workbook = XLSX.read(data);
+For broader compatibility, third-party modules are recommended.
+request
requires a null
encoding to yield Buffers:
+var XLSX = require("xlsx");
+var request = require("request");
+
+request({url: url, encoding: null}, function(err, resp, body) {
+ var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
-}
-input_dom_element.addEventListener('change', handleFileAsync, false);
-Using FileReader
:
-function handleFile(e) {
- var files = e.target.files, f = files[0];
- var reader = new FileReader();
- reader.onload = function(e) {
- var workbook = XLSX.read(e.target.result);
+});
+axios
works the same way in browser and in NodeJS:
+const XLSX = require("xlsx");
+const axios = require("axios");
+
+(async() => {
+ const res = await axios.get(url, {responseType: "arraybuffer"});
+ /* res.data is a Buffer */
+ const workbook = XLSX.read(res.data);
+
+ /* DO SOMETHING WITH workbook HERE */
+})();
+
+
+ Download files in an Electron app (click to show)
+The net
module in the main process can make HTTP/HTTPS requests to external
+resources. Responses should be manually concatenated using Buffer.concat
:
+const XLSX = require("xlsx");
+const { net } = require("electron");
+
+const req = net.request(url);
+req.on("response", (res) => {
+ const bufs = []; // this array will collect all of the buffers
+ res.on("data", (chunk) => { bufs.push(chunk); });
+ res.on("end", () => {
+ const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
- };
- reader.readAsArrayBuffer(f);
-}
-input_dom_element.addEventListener('change', handleFile, false);
-The oldie
demo shows an IE-compatible fallback scenario.
+ });
+});
+req.end();
More specialized cases, including mobile app file processing, are covered in the -included demos
-Note that older versions of IE do not support HTML5 File API, so the Base64 mode -is used for testing.
On OSX you can get the Base64 encoding with:
-$ <target_file base64 | pbcopy
On Windows XP and up you can get the Base64 encoding using certutil
:
> certutil -encode target_file target_file.b64
(note: You have to open the file and remove the header and footer lines)
-The most common and interesting formats (XLS, XLSX/M, XLSB, ODS) are ultimately -ZIP or CFB containers of files. Neither format puts the directory structure at -the beginning of the file: ZIP files place the Central Directory records at the -end of the logical file, while CFB files can place the storage info anywhere in -the file! As a result, to properly handle these formats, a streaming function -would have to buffer the entire file before commencing. That belies the -expectations of streaming, so we do not provide any streaming read API.
-When dealing with Readable Streams, the easiest approach is to buffer the stream -and process the whole thing at the end. This can be done with a temporary file -or by explicitly concatenating the stream:
-var fs = require('fs');
-var XLSX = require('xlsx');
-function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
+and process the whole thing at the end:
+var fs = require("fs");
+var XLSX = require("xlsx");
+
+function process_RS(stream, cb) {
var buffers = [];
- stream.on('data', function(data) { buffers.push(data); });
- stream.on('end', function() {
+ stream.on("data", function(data) { buffers.push(data); });
+ stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
@@ -602,26 +863,281 @@ or by explicitly concatenating the stream:
cb(workbook);
});
}
-More robust solutions are available using modules like concat-stream
.
- Writing to filesystem first (click to show)
-This example uses tempfile
to generate file names:
-var fs = require('fs'), tempfile = require('tempfile');
-var XLSX = require('xlsx');
-function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
- var fname = tempfile('.sheetjs');
- console.log(fname);
- var ostream = fs.createWriteStream(fname);
- stream.pipe(ostream);
- ostream.on('finish', function() {
- var workbook = XLSX.readFile(fname);
- fs.unlinkSync(fname);
+ ReadableStream in the browser (click to show)
+When dealing with ReadableStream
, the easiest approach is to buffer the stream
+and process the whole thing at the end:
+// XLSX is a global from the standalone script
- /* DO SOMETHING WITH workbook IN THE CALLBACK */
- cb(workbook);
+async function process_RS(stream) {
+ /* collect data */
+ const buffers = [];
+ const reader = stream.getReader();
+ for(;;) {
+ const res = await reader.read();
+ if(res.value) buffers.push(res.value);
+ if(res.done) break;
+ }
+
+ /* concat */
+ const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
+
+ let off = 0;
+ for(const u8 of arr) {
+ out.set(u8, off);
+ off += u8.length;
+ }
+
+ return out;
+}
+
+const data = await process_RS(stream);
+/* data is Uint8Array */
+const workbook = XLSX.read(data);
+
+More detailed examples are covered in the included demos
+
+
+ Processing JSON and JS DataJSON and JS data tend to represent single worksheets. This section will use a
+few utility functions to generate workbooks:
+Create a new Worksheet
+var workbook = XLSX.utils.book_new();
+The book_new
utility function creates an empty workbook with no worksheets.
+Append a Worksheet to a Workbook
+XLSX.utils.book_append_sheet(workbook, worksheet, sheet_name);
+The book_append_sheet
utility function appends a worksheet to the workbook.
+The third argument specifies the desired worksheet name. Multiple worksheets can
+be added to a workbook by calling the function multiple times.
+API
+Create a worksheet from an array of arrays of JS values
+var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
+The aoa_to_sheet
utility function walks an "array of arrays" in row-major
+order, generating a worksheet object. The following snippet generates a sheet
+with cell A1
set to the string A1
, cell B1
set to B2
, etc:
+var worksheet = XLSX.utils.aoa_to_sheet([
+ ["A1", "B1", "C1"],
+ ["A2", "B2", "C2"],
+ ["A3", "B3", "C3"]
+])
+"Array of Arrays Input" describes the function and the
+optional opts
argument in more detail.
+Create a worksheet from an array of JS objects
+var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
+The json_to_sheet
utility function walks an array of JS objects in order,
+generating a worksheet object. By default, it will generate a header row and
+one row per object in the array. The optional opts
argument has settings to
+control the column order and header output.
+"Array of Objects Input" describes the function and
+the optional opts
argument in more detail.
+Examples
+"Zen of SheetJS" contains a detailed example "Get Data
+from a JSON Endpoint and Generate a Workbook"
+x-spreadsheet
is an interactive
+data grid for previewing and modifying structured data in the web browser. The
+xspreadsheet
demo includes a sample script with the
+xtos
function for converting from x-spreadsheet data object to a workbook.
+https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
+
+ Records from a database query (SQL or no-SQL) (click to show)
+The database
demo includes examples of working with
+databases and query results.
+
+
+ Numerical Computations with TensorFlow.js (click to show)
+@tensorflow/tfjs
and other libraries expect data in simple
+arrays, well-suited for worksheets where each column is a data vector. That is
+the transpose of how most people use spreadsheets, where each row is a vector.
+When recovering data from tfjs
, the returned data points are stored in a typed
+array. An array of arrays can be constructed with loops. Array#unshift
can
+prepend a title row before the conversion:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
+const tfdata = tf.stack([xs, ys]).transpose();
+const shape = tfdata.shape;
+const tfarr = tfdata.dataSync();
+
+/* construct the array of arrays */
+const aoa = [];
+for(let j = 0; j < shape[0]; ++j) {
+ aoa[j] = [];
+ for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
+}
+/* add headers to the top */
+aoa.unshift(["x", "y"]);
+
+/* generate worksheet */
+const worksheet = XLSX.utils.aoa_to_sheet(aoa);
+The array
demo shows a complete example.
+
+
+
+ Processing HTML TablesAPI
+Create a worksheet by scraping an HTML TABLE in the page
+var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
+The table_to_sheet
utility function takes a DOM TABLE element and iterates
+through the rows to generate a worksheet. The opts
argument is optional.
+"HTML Table Input" describes the function in more detail.
+Create a workbook by scraping an HTML TABLE in the page
+var workbook = XLSX.utils.table_to_book(dom_element, opts);
+The table_to_book
utility function follows the same logic as table_to_sheet
.
+After generating a worksheet, it creates a blank workbook and appends the
+spreadsheet.
+The options argument supports the same options as table_to_sheet
, with the
+addition of a sheet
property to control the worksheet name. If the property
+is missing or no options are specified, the default name Sheet1
is used.
+Examples
+Here are a few common scenarios (click on each subtitle to see the code):
+
+ HTML TABLE element in a webpage (click to show)
+<!-- include the standalone script and shim. this uses the UNPKG CDN -->
+<script src="https://unpkg.com/xlsx/dist/shim.min.js"></script>
+<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+
+<!-- example table with id attribute -->
+<table id="tableau">
+ <tr><td>Sheet</td><td>JS</td></tr>
+ <tr><td>12345</td><td>67</td></tr>
+</table>
+
+<!-- this block should appear after the table HTML and the standalone script -->
+<script type="text/javascript">
+ var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
+
+ /* DO SOMETHING WITH workbook HERE */
+</script>
+Multiple tables on a web page can be converted to individual worksheets:
+/* create new workbook */
+var workbook = XLSX.utils.book_new();
+
+/* convert table "table1" to worksheet named "Sheet1" */
+var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
+XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
+
+/* convert table "table2" to worksheet named "Sheet2" */
+var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
+XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
+
+/* workbook now has 2 worksheets */
+Alternatively, the HTML code can be extracted and parsed:
+var htmlstr = document.getElementById("tableau").outerHTML;
+var workbook = XLSX.read(htmlstr, {type:"string"});
+
+
+ Chrome/Chromium Extension (click to show)
+The chrome
demo shows a complete example and details the
+required permissions and other settings.
+In an extension, it is recommended to generate the workbook in a content script
+and pass the object back to the extension:
+/* in the worker script */
+chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
+ /* pass a message like { sheetjs: true } from the extension to scrape */
+ if(!msg || !msg.sheetjs) return;
+ /* create a new workbook */
+ var workbook = XLSX.utils.book_new();
+ /* loop through each table element */
+ var tables = document.getElementsByTagName("table")
+ for(var i = 0; i < tables.length; ++i) {
+ var worksheet = XLSX.utils.table_to_sheet(tables[i]);
+ XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
+ }
+ /* pass back to the extension */
+ return cb(workbook);
+});
+
+
+ Server-Side HTML Tables with Headless Chrome (click to show)
+The headless
demo includes a complete demo to convert HTML
+files to XLSB workbooks. The core idea is to add the script to the page, parse
+the table in the page context, generate a base64
workbook and send it back
+for further processing:
+const XLSX = require("xlsx");
+const { readFileSync } = require("fs"), puppeteer = require("puppeteer");
+
+const url = `https://sheetjs.com/demos/table`;
+
+/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
+const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");
+
+(async() => {
+ /* start browser and go to web page */
+ const browser = await puppeteer.launch();
+ const page = await browser.newPage();
+ await page.goto(url, {waitUntil: "networkidle2"});
+
+ /* inject library */
+ await page.addScriptTag({content: lib});
+
+ /* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
+ await page.exposeFunction("s5s", async(b64) => {
+ const workbook = XLSX.read(b64, {type: "base64" });
+
+ /* DO SOMETHING WITH workbook HERE */
});
-}
+
+ /* generate XLSB file in webpage context and send back result */
+ await page.addScriptTag({content: `
+ /* call table_to_book on first table */
+ var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));
+
+ /* generate XLSX file */
+ var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});
+
+ /* call "s5s" hook exposed from the node process */
+ window.s5s(b64);
+ `});
+
+ /* cleanup */
+ await browser.close();
+})();
+
+
+ Server-Side HTML Tables with Headless WebKit (click to show)
+The headless
demo includes a complete demo to convert HTML
+files to XLSB workbooks using PhantomJS. The core idea
+is to add the script to the page, parse the table in the page context, generate
+a binary
workbook and send it back for further processing:
+var XLSX = require('xlsx');
+var page = require('webpage').create();
+
+/* this code will be run in the page */
+var code = [ "function(){",
+ /* call table_to_book on first table */
+ "var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",
+
+ /* generate XLSB file and return binary string */
+ "return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
+"}" ].join("");
+
+page.open('https://sheetjs.com/demos/table', function() {
+ /* Load the browser script from the UNPKG CDN */
+ page.includeJs("https://unpkg.com/xlsx/dist/xlsx.full.min.js", function() {
+ /* The code will return an XLSB file encoded as binary string */
+ var bin = page.evaluateJavaScript(code);
+
+ var workbook = XLSX.read(bin, {type: "binary"});
+ /* DO SOMETHING WITH workbook HERE */
+
+ phantom.exit();
+ });
+});
+
+
+ NodeJS HTML Tables without a browser (click to show)
+NodeJS does not include a DOM implementation and Puppeteer requires a hefty
+Chromium build. jsdom
is a lightweight alternative:
+const XLSX = require("xlsx");
+const { readFileSync } = require("fs");
+const { JSDOM } = require("jsdom");
+
+/* obtain HTML string. This example reads from test.html */
+const html_str = fs.readFileSync("test.html", "utf8");
+/* get first TABLE element */
+const doc = new JSDOM(html_str).window.document.querySelector("table");
+/* generate workbook */
+const workbook = XLSX.utils.table_to_book(doc);
@@ -694,56 +1210,74 @@ files and output the contents in various formats. The source is available at
Working with the WorkbookXLSX.utils.sheet_to_formulae
generates a list of formulae
-
- Writing WorkbooksFor writing, the first step is to generate output data. The helper functions
-write
and writeFile
will produce the data in various formats suitable for
-dissemination. The second step is to actual share the data with the end point.
-Assuming workbook
is a workbook object:
+ Packaging and Releasing Data
+
+
+ Writing WorkbooksAPI
+Generate spreadsheet bytes (file) from data
+var data = XLSX.write(workbook, opts);
+The write
method attempts to package data from the workbook into a file in
+memory. By default, XLSX files are generated, but that can be controlled with
+the bookType
property of the opts
argument. Based on the type
option,
+the data can be stored as a "binary string", JS string, Uint8Array
or Buffer.
+The second opts
argument is required. "Writing Options"
+covers the supported properties and behaviors.
+Generate and attempt to save file
+XLSX.writeFile(workbook, filename, opts);
+The writeFile
method packages the data and attempts to save the new file. The
+export file format is determined by the extension of filename
(SheetJS.xlsx
+signals XLSX export, SheetJS.xlsb
signals XLSB export, etc).
+The writeFile
method uses platform-specific APIs to initiate the file save. In
+NodeJS, fs.readFileSync
can create a file. In the web browser, a download is
+attempted using the HTML5 download
attribute, with fallbacks for IE.
+Generate and attempt to save an XLSX file
+XLSX.writeFileXLSX(workbook, filename, opts);
+The writeFile
method embeds a number of different export functions. This is
+great for developer experience but not amenable to dead code elimination using
+the current toolset. When only XLSX exports are needed, this method avoids
+referencing the other export codecs.
+The second opts
argument is optional. "Writing Options"
+covers the supported properties and behaviors.
+Examples
- nodejs write a file (click to show)
-XLSX.writeFile
uses fs.writeFileSync
in server environments:
-if(typeof require !== 'undefined') XLSX = require('xlsx');
+ Local file in a NodeJS server (click to show)
+writeFile
uses fs.writeFileSync
in server environments:
+var XLSX = require("xlsx");
+
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsb');
-/* at this point, out.xlsb is a file that you can distribute */
+XLSX.writeFile(workbook, "out.xlsb");
+For Node ESM, the writeFile
helper is not enabled. Instead, fs.writeFileSync
+should be used to write the file data to a Buffer
for use with XLSX.write
:
+import { writeFileSync } from "fs";
+import { write } from "xlsx/xlsx.mjs";
+
+const buf = write(workbook, {type: "buffer", bookType: "xlsb"});
+/* buf is a Buffer */
+const workbook = writeFileSync("out.xlsb", buf);
- Photoshop ExtendScript write a file (click to show)
+ Local file in a Deno application (click to show)
+writeFile
uses Deno.writeFileSync
under the hood:
+// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+XLSX.writeFile(workbook, "test.xlsx");
+Applications writing files must be invoked with the --allow-write
flag. The
+deno
demo has more examples
+
+
+ Local file in a PhotoShop or InDesign plugin (click to show)
writeFile
wraps the File
logic in Photoshop and other ExtendScript targets.
The specified path should be an absolute path:
#include "xlsx.extendscript.js"
+
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsx');
+XLSX.writeFile(workbook, "out.xlsx");
/* at this point, out.xlsx is a file that you can distribute */
The extendscript
demo includes a more complex example.
- Browser add TABLE element to page (click to show)
-The sheet_to_html
utility function generates HTML code that can be added to
-any DOM element.
-var worksheet = workbook.Sheets[workbook.SheetNames[0]];
-var container = document.getElementById('tableau');
-container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
-
-
- Browser upload file (ajax) (click to show)
-A complete example using XHR is included in the XHR demo, along
-with examples for fetch and wrapper libraries. This example assumes the server
-can handle Base64-encoded files (see the demo for a basic nodejs server):
-/* in this example, send a base64 string to the server */
-var wopts = { bookType:'xlsx', bookSST:false, type:'base64' };
-
-var wbout = XLSX.write(workbook,wopts);
-
-var req = new XMLHttpRequest();
-req.open("POST", "/upload", true);
-var formdata = new FormData();
-formdata.append('file', 'test.xlsx'); // <-- server expects `file` to hold name
-formdata.append('data', wbout); // <-- `data` holds the base64-encoded data
-req.send(formdata);
-
-
- Browser save file (click to show)
+ Download a file in the browser to the user machine (click to show)
XLSX.writeFile
wraps a few techniques for triggering a file save:
-
@@ -757,17 +1291,17 @@ XP and Windows 7. The shim must be included in the containing HTML page.
There is no standard way to determine if the actual file has been downloaded.
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsb');
+XLSX.writeFile(workbook, "out.xlsb");
/* at this point, out.xlsb will have been downloaded */
- Browser save file (compatibility) (click to show)
+ Download a file in legacy browsers (click to show)
XLSX.writeFile
techniques work for most modern browsers as well as older IE.
For much older browsers, there are workarounds implemented by wrapper libraries.
FileSaver.js
implements saveAs
.
Note: XLSX.writeFile
will automatically call saveAs
if available.
/* bookType can be any supported output type */
-var wopts = { bookType:'xlsx', bookSST:false, type:'array' };
+var wopts = { bookType:"xlsx", bookSST:false, type:"array" };
var wbout = XLSX.write(workbook,wopts);
@@ -776,14 +1310,47 @@ Note: XLSX.writeFile
will automatically call saveAs
if
Downloadify
uses a Flash SWF button
to generate local files, suitable for environments where ActiveX is unavailable:
Downloadify.create(id,{
- /* other options are required! read the downloadify docs for more info */
- filename: "test.xlsx",
- data: function() { return XLSX.write(wb, {bookType:"xlsx", type:'base64'}); },
- append: false,
- dataType: 'base64'
+ /* other options are required! read the downloadify docs for more info */
+ filename: "test.xlsx",
+ data: function() { return XLSX.write(wb, {bookType:"xlsx", type:"base64"}); },
+ append: false,
+ dataType: "base64"
});
The oldie
demo shows an IE-compatible fallback scenario.
+
+ Browser upload file (ajax) (click to show)
+A complete example using XHR is included in the XHR demo, along
+with examples for fetch and wrapper libraries. This example assumes the server
+can handle Base64-encoded files (see the demo for a basic nodejs server):
+/* in this example, send a base64 string to the server */
+var wopts = { bookType:"xlsx", bookSST:false, type:"base64" };
+
+var wbout = XLSX.write(workbook,wopts);
+
+var req = new XMLHttpRequest();
+req.open("POST", "/upload", true);
+var formdata = new FormData();
+formdata.append("file", "test.xlsx"); // <-- server expects `file` to hold name
+formdata.append("data", wbout); // <-- `data` holds the base64-encoded data
+req.send(formdata);
+
+
+ PhantomJS (Headless Webkit) File Generation (click to show)
+The headless
demo includes a complete demo to convert HTML
+files to XLSB workbooks using PhantomJS. PhantomJS
+fs.write
supports writing files from the main process but has a different
+interface from the NodeJS fs
module:
+var XLSX = require('xlsx');
+var fs = require('fs');
+
+/* generate a binary string */
+var bin = XLSX.write(workbook, { type:"binary", bookType: "xlsx" });
+/* write to file */
+fs.write("test.xlsx", bin, "wb");
+Note: The section "Processing HTML Tables" shows how
+to generate a workbook from HTML tables in a page in "Headless WebKit".
+
The included demos cover mobile apps and other special deployments.
@@ -824,6 +1391,193 @@ Stream. They are only exposed in NodeJS.
Writing Examplesstream.pipe(conv); conv.pipe(process.stdout);
https://github.com/sheetjs/sheetaki pipes write streams to nodejs response.
+
+
+ Generating JSON and JS DataJSON and JS data tend to represent single worksheets. The utility functions in
+this section work with single worksheets.
+The "Common Spreadsheet Format" section describes
+the object structure in more detail. workbook.SheetNames
is an ordered list
+of the worksheet names. workbook.Sheets
is an object whose keys are sheet
+names and whose values are worksheet objects.
+The "first worksheet" is stored at workbook.Sheets[workbook.SheetNames[0]]
.
+API
+Create an array of JS objects from a worksheet
+var jsa = XLSX.utils.sheet_to_json(worksheet, opts);
+Create an array of arrays of JS values from a worksheet
+var aoa = XLSX.utils.sheet_to_json(worksheet, {...opts, header: 1});
+The sheet_to_json
utility function walks a workbook in row-major order,
+generating an array of objects. The second opts
argument controls a number of
+export decisions including the type of values (JS values or formatted text). The
+"JSON" section describes the argument in more detail.
+By default, sheet_to_json
scans the first row and uses the values as headers.
+With the header: 1
option, the function exports an array of arrays of values.
+Examples
+x-spreadsheet
is an interactive
+data grid for previewing and modifying structured data in the web browser. The
+xspreadsheet
demo includes a sample script with the
+stox
function for converting from a workbook to x-spreadsheet data object.
+https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
+
+ Previewing data in a React data grid (click to show)
+react-data-grid
is a data grid tailored for
+react. It expects two properties: rows
of data objects and columns
which
+describe the columns. For the purposes of massaging the data to fit the react
+data grid API it is easiest to start from an array of arrays.
+This demo starts by fetching a remote file and using XLSX.read
to extract:
+import { useEffect, useState } from "react";
+import DataGrid from "react-data-grid";
+import { read, utils } from "xlsx";
+
+const url = "https://oss.sheetjs.com/test_files/RkNumber.xls";
+
+export default function App() {
+ const [columns, setColumns] = useState([]);
+ const [rows, setRows] = useState([]);
+ useEffect(() => {(async () => {
+ const wb = read(await (await fetch(url)).arrayBuffer(), { WTF: 1 });
+
+ /* use sheet_to_json with header: 1 to generate an array of arrays */
+ const data = utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], { header: 1 });
+
+ /* see react-data-grid docs to understand the shape of the expected data */
+ setColumns(data[0].map((r) => ({ key: r, name: r })));
+ setRows(data.slice(1).map((r) => r.reduce((acc, x, i) => {
+ acc[data[0][i]] = x;
+ return acc;
+ }, {})));
+ })(); });
+
+ return <DataGrid columns={columns} rows={rows} />;
+}
+
+
+ Populating a database (SQL or no-SQL) (click to show)
+The database
demo includes examples of working with
+databases and query results.
+
+
+ Numerical Computations with TensorFlow.js (click to show)
+@tensorflow/tfjs
and other libraries expect data in simple
+arrays, well-suited for worksheets where each column is a data vector. That is
+the transpose of how most people use spreadsheets, where each row is a vector.
+A single Array#map
can pull individual named rows from sheet_to_json
export:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+const key = "age"; // this is the field we want to pull
+const ages = XLSX.utils.sheet_to_json(worksheet).map(r => r[key]);
+const tf_data = tf.tensor1d(ages);
+All fields can be processed at once using a transpose of the 2D tensor generated
+with the sheet_to_json
export with header: 1
. The first row, if it contains
+header labels, should be removed with a slice:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+/* array of arrays of the data starting on the second row */
+const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1}).slice(1);
+/* dataset in the "correct orientation" */
+const tf_dataset = tf.tensor2d(aoa).transpose();
+/* pull out each dataset with a slice */
+const tf_field0 = tf_dataset.slice([0,0], [1,tensor.shape[1]]).flatten();
+const tf_field1 = tf_dataset.slice([1,0], [1,tensor.shape[1]]).flatten();
+The array
demo shows a complete example.
+
+
+
+ Generating HTML TablesAPI
+Generate HTML Table from Worksheet
+var html = XLSX.utils.sheet_to_html(worksheet);
+The sheet_to_html
utility function generates HTML code based on the worksheet
+data. Each cell in the worksheet is mapped to a <TD>
element. Merged cells
+in the worksheet are serialized by setting colspan
and rowspan
attributes.
+Examples
+The sheet_to_html
utility function generates HTML code that can be added to
+any DOM element by setting the innerHTML
:
+var container = document.getElementById("tavolo");
+container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
+Combining with fetch
, constructing a site from a workbook is straightforward:
+
+ Vanilla JS + HTML fetch workbook and generate table previews (click to show)
+<body>
+ <style>TABLE { border-collapse: collapse; } TD { border: 1px solid; }</style>
+ <div id="tavolo"></div>
+ <script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+ <script type="text/javascript">
+(async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const workbook = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+
+ let output = [];
+ /* loop through the worksheet names in order */
+ workbook.SheetNames.forEach(name => {
+
+ /* generate HTML from the corresponding worksheets */
+ const worksheet = workbook.Sheets[name];
+ const html = XLSX.utils.sheet_to_html(worksheet);
+
+ /* add a header with the title name followed by the table */
+ output.push(`<H3>${name}</H3>${html}`);
+ });
+ /* write to the DOM at the end */
+ tavolo.innerHTML = output.join("\n");
+})();
+ </script>
+</body>
+
+
+ React fetch workbook and generate HTML table previews (click to show)
+It is generally recommended to use a React-friendly workflow, but it is possible
+to generate HTML and use it in React with dangerouslySetInnerHTML
:
+function Tabeller(props) {
+ /* the workbook object is the state */
+ const [workbook, setWorkbook] = React.useState(XLSX.utils.book_new());
+
+ /* fetch and update the workbook with an effect */
+ React.useEffect(() => { (async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const wb = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+ setWorkbook(wb);
+ })(); });
+
+ return workbook.SheetNames.map(name => (<>
+ <h3>name</h3>
+ <div dangerouslySetInnerHTML={{
+ /* this __html mantra is needed to set the inner HTML */
+ __html: XLSX.utils.sheet_to_html(workbook.Sheets[name])
+ }} />
+ </>));
+}
+The react
demo includes more React examples.
+
+
+ VueJS fetch workbook and generate HTML table previews (click to show)
+It is generally recommended to use a VueJS-friendly workflow, but it is possible
+to generate HTML and use it in VueJS with the v-html
directive:
+import { read, utils } from 'xlsx';
+import { reactive } from 'vue';
+
+const S5SComponent = {
+ mounted() { (async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const workbook = read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+ /* loop through the worksheet names in order */
+ workbook.SheetNames.forEach(name => {
+ /* generate HTML from the corresponding worksheets */
+ const html = utils.sheet_to_html(workbook.Sheets[name]);
+ /* add to state */
+ this.wb.wb.push({ name, html });
+ });
+ })(); },
+ /* this state mantra is required for array updates to work */
+ setup() { return { wb: reactive({ wb: [] }) }; },
+ template: `
+ <div v-for="ws in wb.wb" :key="ws.name">
+ <h3>{{ ws.name }}</h3>
+ <div v-html="ws.html"></div>
+ </div>`
+};
+The vuejs
demo includes more React examples.
+
InterfaceXLSX
is the exposed variable in the browser and the exported node variable
@@ -847,6 +1601,13 @@ If o
is omitted, the writer will use the third argument as the call
Utilities
Utilities are available in the XLSX.utils
object and are described in the
Utility Functions section:
+Constructing:
+
book_new
creates an empty workbookbook_append_sheet
adds a worksheet to a workbookImporting:
Row Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
+Column Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
+Row and Column properties are not extracted by default when reading from a file
+and are not persisted by default when writing to a file. The option
+cellStyles: true
must be passed to the relevant read or write function.
Column Properties
The !cols
array in each worksheet, if present, is a collection of ColInfo
objects which have the following properties:
type ColInfo = {
@@ -1630,6 +2400,23 @@ objects which have the following properties:
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};
Row Properties
+The !rows
array in each worksheet, if present, is a collection of RowInfo
+objects which have the following properties:
type RowInfo = {
+ /* visibility */
+ hidden?: boolean; // if true, the row is hidden
+
+ /* row height is specified in one of the following ways: */
+ hpx?: number; // height in screen pixels
+ hpt?: number; // height in points
+
+ level?: number; // 0-indexed outline / group level
+};
Outline / Group Levels Convention
+The Excel UI displays the base outline level as 1
and the max level as 8
.
+Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base
+outline level is 0
and the max level is 7
.
There are three different width types corresponding to the three different ways
@@ -1653,6 +2440,17 @@ when changing the pixel width, delete the Row Heights Excel internally stores row heights in points. The default resolution is 72 DPI
+or 96 PPI, so the pixel and point size should agree. For different resolutions
+they may not agree, so the library separates the concepts. Even though all of the information is made available, writers are expected to
+follow the priority order: Column Widths Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
to pixels and back, repeating for all possible MDW and selecting the MDW that
@@ -1667,34 +2465,6 @@ follow the priority order:wch
and width
Implementation details (click to show)
+
+
+hpx
pixel height if availablehpt
point height if available
The !rows
array in each worksheet, if present, is a collection of RowInfo
-objects which have the following properties:
type RowInfo = {
- /* visibility */
- hidden?: boolean; // if true, the row is hidden
-
- /* row height is specified in one of the following ways: */
- hpx?: number; // height in screen pixels
- hpt?: number; // height in points
-
- level?: number; // 0-indexed outline / group level
-};
Note: Excel UI displays the base outline level as 1
and the max level as 8
.
-The level
field stores the base outline as 0
and the max level as 7
.
Excel internally stores row heights in points. The default resolution is 72 DPI -or 96 PPI, so the pixel and point size should agree. For different resolutions -they may not agree, so the library separates the concepts.
-Even though all of the information is made available, writers are expected to -follow the priority order:
-hpx
pixel height if availablehpt
point height if availableThe cell.w
formatted text for each cell is produced from cell.v
and cell.z
format. If the format is not specified, the Excel General
format is used.
@@ -3648,21 +4418,21 @@ range limits will be silently truncated:
Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.
-Core Spreadsheet Formats
+XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.
The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.
-BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced
the concept of a workbook (XLW
files) but also had single-sheet XLS
format.
The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12
@@ -3671,85 +4441,64 @@ extended the format in various ways but largely stuck to the same record format.
files in these formats, so record lengths and fields were determined by writing
in all of the supported formats and comparing files. Excel 2016 can generate
BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.
The MS-XLS
specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.
-Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The MS-XLSB
specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and code page 1200.
-Notes:
-0x49 0x44 ("ID")
are treated as Symbolic
+Like in Excel, files starting with 0x49 0x44 ("ID")
are treated as Symbolic
Link files. Unlike Excel, if the file does not have a valid SYLK header, it
will be proactively reinterpreted as CSV. There are some files with semicolon
delimiter that align with a valid SYLK file. For the broadest compatibility,
-all cells with the value of ID
are automatically wrapped in double-quotes.
Support for other formats is generally far XLS/XLSB/XLSX support, due in large
+all cells with the value of ID
are automatically wrapped in double-quotes.
Miscellaneous Workbook Formats
+Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.
-The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.
Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.
Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.
-The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.
-All versions of Works were limited to a single worksheet.
Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.
@@ -3760,42 +4509,33 @@ BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes aWksSSWorkBook
stream similar to Lotus FM3/FMT files.
-iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). This format has been used up through the current release (Numbers 11.2).
The parser focuses on extracting raw data from tables. Numbers technically supports multiple tables in a logical worksheet, including custom titles. This parser will generate one worksheet per Numbers table.
-ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.
-UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.
-Miscellaneous Worksheet Formats
Many older formats supported only one worksheet:
-DBF is really a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes @@ -3803,47 +4543,48 @@ files compatible with Visual FoxPro extensions.
Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.
-There is no real documentation. All knowledge was gathered by saving files in various versions of Excel to deduce the meaning of fields. Notes:
Plain formulae are stored in the RC form.
+Column widths are rounded to integral characters.
+Lotus Formatted Text (PRN)
+There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.
-There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:
"0.3" -> "=""0.3""
+Since Excel automatically converts numbers-as-strings to numbers, numeric
+string constants are converted to formulae: "0.3" -> "=""0.3""
DIF technically expects numeric cells to hold the raw numeric data, but Excel +permits formatted numbers (including dates)
+DIF technically has no support for formulae, but Excel will automatically +convert plain formulae. Array formulae are not preserved.
+HTML
Excel HTML worksheets include special metadata encoded in styles. For example,
mso-number-format
is a localized string containing the number format. Despite
the metadata the output is valid HTML, although it does accept bare &
symbols.
&<
looks for those tags and overrides the default interpretation. For example, text
like <td>12345</td>
will be parsed as numbers but <td t="s">12345</td>
will
be parsed as text.
-
Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.
-Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.