24 KiB
title | sidebar_position | pagination_prev |
---|---|---|
Data Import | 1 | demos/index |
import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock';
:::tip pass
The "Import Tutorial" is a gentle introduction to data import and analysis.
:::
Parsing Workbooks
API
Extract data from spreadsheet bytes
var workbook = XLSX.read(data, opts);
The read
method can extract data from spreadsheet bytes stored in a JS string,
"binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).
Read spreadsheet bytes from a local file and extract data
var workbook = XLSX.readFile(filename, opts);
The readFile
method attempts to read a spreadsheet file at the supplied path.
The second opts
argument is optional. "Parsing Options"
covers the supported properties and behaviors.
:::danger pass
Browsers generally do not allow reading files by specifying filename (it is a
security risk), and running XLSX.readFile
in the browser will throw an error.
Deno scripts must be invoked with --allow-read
to read from the filesystem.
:::
Examples
Here are a few common scenarios (click on each subtitle to see the code).
The demos cover special deployments in more detail.
Example: Local File
XLSX.readFile
supports reading local files in platforms like NodeJS. In other
platforms like React Native, XLSX.read
should be called with file data.
In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.
readFile
uses fs.readFileSync
under the hood:
var XLSX = require("xlsx");
var workbook = XLSX.readFile("test.xlsx");
For Node ESM, fs
must be loaded manually:
import * as fs from "fs";
import { readFile, set_fs } from "xlsx";
set_fs(fs);
const workbook = readFile("test.xlsx");
readFile
can be used in the renderer process:
/* From the renderer process */
var XLSX = require("xlsx");
var workbook = XLSX.readFile(path);
Electron APIs have changed over time. The electron
demo
shows a complete example and details the required version-specific settings.
The React Native Demo covers tested plugins.
readFile
wraps the File
logic in Photoshop and other ExtendScript targets.
The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
For user-configurable paths, openDialog
can show a file picker:
#include "xlsx.extendscript.js"
/* Ask user to select path */
var thisFile = File.openDialog("Select a spreadsheet");
var workbook = XLSX.readFile(thisFile.absoluteURI);
The extendscript
demo includes complete
examples for Photoshop and InDesign.
readFile
uses Deno.readFileSync
under the hood:
{\ // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; \n\ const workbook = XLSX.readFile("test.xlsx");
}
:::note pass
Applications reading files must be invoked with the --allow-read
flag.
:::
Bun readFileSync
output should be wrapped in a Buffer
:
import { readFileSync } from 'fs'
import { read } from './xlsx.mjs'
const workbook = read(Buffer.from(readFileSync(path)));
Example: User Submissions
This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.
For modern websites targeting Chrome 76+, File#arrayBuffer
is recommended:
Assume drop_dom_element
is the DOM element that will listen for changes:
<div id="drop_dom_element">Drop files here</div>
The event property is e.dataTransfer
. The code snippet highlights the
difference between the drag-and-drop example and the file input example:
// XLSX is a global from the standalone script
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
// highlight-next-line
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);
Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
The event property is e.target
. The code snippet highlights the difference
between the drag-and-drop example and the file input example:
// XLSX is a global from the standalone script
async function handleFileAsync(e) {
// highlight-next-line
const file = e.target.files[0];
const data = await file.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
For maximal compatibility (IE10+), the FileReader
approach is recommended:
Assume drop_dom_element
is the DOM element that will listen for changes:
<div id="drop_dom_element">Drop files here</div>
The event property is e.dataTransfer
. The code snippet highlights the
difference between the drag-and-drop example and the file input example:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
// highlight-next-line
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);
Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
The event property is e.target
. The code snippet highlights the difference
between the drag-and-drop example and the file input example:
function handleFile(e) {
// highlight-next-line
var file = e.target.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.
read
can accept a NodeJS buffer. readFile
can read files generated by a
HTTP POST request body parser like formidable
:
const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");
const server = http.createServer((req, res) => {
const form = new formidable.IncomingForm();
form.parse(req, (err, fields, files) => {
/* grab the first file */
const f = Object.entries(files)[0][1];
const path = f.filepath;
const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
});
}).listen(process.env.PORT || 7262);
The server
demo includes more advanced examples.
Drash is a HTTP server framework for Deno. In a POST
request handler, the
body parser can pull file data into a Uint8Array
:
{\ // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; /* load the codepage support library for extended support with older formats */ import * as cptable from 'https://cdn.sheetjs.com/xlsx-${current}/package/dist/cpexcel.full.mjs'; XLSX.set_cptable(cptable); \n\ import * as Drash from "https://cdn.jsdelivr.net/gh/drashland/drash@v2.8.1/mod.ts"; \n\ class SheetResource extends Drash.Resource { public paths = ["/"]; \n\ public POST(request: Drash.Request, response: Drash.Response) { // highlight-next-line const file = request.bodyParam<Drash.Types.BodyFile>("file"); if (!file) throw new Error("File is required!"); // highlight-next-line var wb = XLSX.read(file.content); var html = XLSX.utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]); return response.html(html); } } \n\ const server = new Drash.Server({ hostname: "", port: 7262, protocol: "http", resources: [ // highlight-next-line SheetResource, ], }); \n\ server.run();
}
:::note pass
Deno must be run with the --allow-net
flag to enable network requests:
deno run --allow-net test-server.ts
To test, submit a POST request to http://localhost:7262
with an attachment:
curl -X POST -F "file=@test.xlsx" http://localhost:7262/
:::
Example: Remote File
This example focuses on fetching files ("Ajax" in browser parlance) using APIs
like XMLHttpRequest
and fetch
as well as third-party libraries.
For modern websites targeting Chrome 42+, fetch
is recommended:
// XLSX is a global from the standalone script
(async() => {
const url = "https://sheetjs.com/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
})();
For broader support, the XMLHttpRequest
approach is recommended:
var url = "https://sheetjs.com/pres.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
};
req.send();
The HTTP Downloads demo includes examples using browser APIs and wrapper libraries.
https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
NodeJS releases starting from version 18.0 have native support for fetch:
const XLSX = require("xlsx");
const url = "https://sheetjs.com/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
For broader compatibility, third-party modules are recommended.
request
requires a null
encoding to yield Buffers:
var XLSX = require("xlsx");
var request = require("request");
var url = "https://sheetjs.com/pres.xlsx";
request({url: url, encoding: null}, function(err, resp, body) {
var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
});
axios
works the same way in browser and in NodeJS:
const XLSX = require("xlsx");
const axios = require("axios");
const url = "https://sheetjs.com/pres.xlsx";
(async() => {
const res = await axios.get(url, {responseType: "arraybuffer"});
/* res.data is a Buffer */
const workbook = XLSX.read(res.data);
/* DO SOMETHING WITH workbook HERE */
})();
Bun has native support for fetch
. Using the NodeJS package:
import * as XLSX from 'xlsx';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
const url = "https://sheetjs.com/pres.xlsx";
// highlight-next-line
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
Deno has native support for fetch
.
{\ // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; /* load the codepage support library for extended support with older formats */ import * as cptable from 'https://cdn.sheetjs.com/xlsx-${current}/package/dist/cpexcel.full.mjs'; XLSX.set_cptable(cptable); \n\ const url = "https://sheetjs.com/pres.xlsx"; // highlight-next-line const data = await (await fetch(url)).arrayBuffer(); /* data is an ArrayBuffer */ const workbook = XLSX.read(data);
}
:::note pass
Deno must be run with the --allow-net
flag to enable network requests:
deno run --allow-net test-fetch.ts
:::
The net
module in the main process can make HTTP/HTTPS requests to external
resources. Responses should be manually concatenated using Buffer.concat
:
const XLSX = require("xlsx");
const { net } = require("electron");
const url = "https://sheetjs.com/pres.xlsx";
const req = net.request(url);
req.on("response", (res) => {
const bufs = []; // this array will collect all of the buffers
res.on("data", (chunk) => { bufs.push(chunk); });
res.on("end", () => {
const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
});
});
req.end();
Example: Readable Streams
:::caution pass
The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.
Technical details (click to show)
XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.
XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.
:::
When dealing with ReadableStream
, the easiest approach is to buffer the stream
and process the whole thing at the end:
// XLSX is a global from the standalone script
async function buffer_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}
/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}
return out;
}
const data = await buffer_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);
When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:
var XLSX = require("xlsx");
function process_RS(stream, cb) {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer);
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
In recent versions of NodeJS, Promises are preferred:
var XLSX = require("xlsx");
/* async_RS reads a stream and returns a Promise resolving to a workbook */
const async_RS = (stream) => new Promise((res, rej) => {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
const buf = Buffer.concat(buffers);
const wb = XLSX.read(buf);
res(wb);
});
});
In addition to the browser ReadableStream
API, Deno has a Reader
class.
For these streams, std
provides a readAll
method to collect data into a
Uint8Array
. This example reads from a file using Deno.open
and prints the
worksheet names array:
{\ // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; \n\ import { readAll } from "https://deno.land/std/streams/conversion.ts"; \n\ /* Simple Deno.Reader from a file */ const file = await Deno.open("test.xlsx", {read: true}); \n\ /* \
content` will be a Uint8Array holding the full contents of the stream /
const content = await readAll(file);
\n
/ Since this is a Uint8Array, `XLSX.read` "just works" */
const wb = XLSX.read(content);
console.log(wb.SheetNames);`}
More detailed examples are covered in the included demos
Processing JSON and JS Data
JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.
Create a new Workbook
var workbook = XLSX.utils.book_new();
The book_new
utility function creates an empty
workbook with no worksheets.
API
Create a worksheet from an array of arrays of JS values
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
The aoa_to_sheet
utility function walks an "array of arrays" in row-major
order, generating a worksheet object. The following snippet generates a sheet
with cell A1
set to the string A1
, cell B1
set to B1
, etc:
var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);
"Array of Arrays Input" describes
the function and the optional opts
argument in more detail.
Create a worksheet from an array of JS objects
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
The json_to_sheet
utility function walks an array of JS objects in order,
generating a worksheet object. By default, it will generate a header row and
one row per object in the array. The optional opts
argument has settings to
control the column order and header output.
"Array of Objects Input" describes
the function and the optional opts
argument in more detail.
Examples
"Export Tutorial" contains a detailed example of fetching data from a JSON Endpoint and generating a workbook.
x-spreadsheet
is an interactive data grid for
previewing and modifying structured data in the web browser.
"TensorFlow.js" covers strategies for creating worksheets from ML library exports (datasets stored in Typed Arrays).
Records from a database query (SQL or no-SQL) (click to show)
The data
demo includes examples of working with
databases and query results.
Processing HTML Tables
API
Create a worksheet by scraping an HTML TABLE in the page
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
The table_to_sheet
utility function takes a DOM TABLE element and iterates
through the rows to generate a worksheet. The opts
argument is optional.
"HTML Table Input" describes the
function in more detail.
Create a workbook by scraping an HTML TABLE in the page
var workbook = XLSX.utils.table_to_book(dom_element, opts);
The table_to_book
utility function follows the same logic as table_to_sheet
.
After generating a worksheet, it creates a blank workbook and appends the
spreadsheet.
The options argument supports the same options as table_to_sheet
, with the
addition of a sheet
property to control the worksheet name. If the property
is missing or no options are specified, the default name Sheet1
is used.
Examples
The Headless Demo includes examples of server-side spreadsheet generation from HTML TABLE elements using headless Chromium ("Puppeteer") and other browsers ("Playwright")
Here are a few common scenarios (click on each subtitle to see the code):
HTML TABLE element in a webpage (click to show)
{`\
\n\
Sheet | JS |
12345 | 67 |
Multiple tables on a web page can be converted to individual worksheets:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:
var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)
The "Chrome and Chromium" demo includes a complete example and enumerates the required permissions and other settings.
In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:
/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
NodeJS HTML Tables without a browser (click to show)
NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. The "Synthetic DOM" demo includes examples for NodeJS.