Tinywan/docs.sheetjs.com

2022-06-21 08:26:53 -04:00

24 KiB

Raw Blame History

sidebar_position
1

Data Import

import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Parsing Workbooks

API

Extract data from spreadsheet bytes

var workbook = XLSX.read(data, opts);

The read method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).

Read spreadsheet bytes from a local file and extract data

var workbook = XLSX.readFile(filename, opts);

The readFile method attempts to read a spreadsheet file at the supplied path.

The second opts argument is optional. "Parsing Options" covers the supported properties and behaviors.

:::warning

Browsers generally do not allow reading files by specifying filename (it is a security risk), and running XLSX.readFile in the browser will throw an error.

Deno scripts must be invoked with --allow-read to read from the filesystem.

:::

Examples

Here are a few common scenarios (click on each subtitle to see the code).

The demos cover special deployments in more detail.

Example: Local File

XLSX.readFile supports reading local files in platforms like NodeJS. In other platforms like React Native, XLSX.read should be called with file data.

readFile uses fs.readFileSync under the hood:

var XLSX = require("xlsx");

var workbook = XLSX.readFile("test.xlsx");

For Node ESM, fs must be loaded manually:

import * as fs from "fs";
import { readFile, set_fs } from "xlsx/xlsx.mjs";
set_fs(fs);

const workbook = readFile("test.xlsx");

readFile uses Deno.readFileSync under the hood:

{`\
// @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs';

const workbook = XLSX.readFile("test.xlsx");`}

Applications reading files must be invoked with the --allow-read flag. The deno demo has more examples

readFile can be used in the renderer process:

/* From the renderer process */
var XLSX = require("xlsx");

var workbook = XLSX.readFile(path);

Electron APIs have changed over time. The electron demo shows a complete example and details the required version-specific settings.

:::caution

React Native does not provide a way to read files from the filesystem. A separate third-party library must be used.

Since React Native internals change between releases, libraries may only work with specific versions of React Native. Project documentation should be consulted before picking a library.

:::caution

The react demo includes a sample React Native app.

The following libraries have been tested:

react-native-file-access

The base64 encoding returns strings compatible with the base64 type:

import * as XLSX from "xlsx";
import { FileSystem } from "react-native-file-access";

const b64 = await FileSystem.readFile(path, "base64");
/* b64 is a base64 string */
const workbook = XLSX.read(b64, {type: "base64"});

react-native-fs

The ascii encoding returns binary strings compatible with the binary type:

import * as XLSX from "xlsx";
import { readFile } from "react-native-fs";

const bstr = await readFile(path, "ascii");
/* bstr is a binary string */
const workbook = XLSX.read(bstr, {type: "binary"});

readFile wraps the File logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:

#include "xlsx.extendscript.js"

/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");

For user-configurable paths, openDialog can show a file picker:

#include "xlsx.extendscript.js"

/* Ask user to select path */
var thisFile = File.openDialog("Select a spreadsheet");
var workbook = XLSX.readFile(thisFile.absoluteURI);

The extendscript demo includes a more complex example.

Example: User Submissions

User-submitted file in a web page ("Drag-and-Drop") (click to show)

For modern websites targeting Chrome 76+, File#arrayBuffer is recommended:

// XLSX is a global from the standalone script

async function handleDropAsync(e) {
  e.stopPropagation(); e.preventDefault();
  const f = e.dataTransfer.files[0];
  /* f is a File */
  const data = await f.arrayBuffer();
  /* data is an ArrayBuffer */
  const workbook = XLSX.read(data);

  /* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);

For maximal compatibility, the FileReader API should be used:

function handleDrop(e) {
  e.stopPropagation(); e.preventDefault();
  var f = e.dataTransfer.files[0];
  /* f is a File */
  var reader = new FileReader();
  reader.onload = function(e) {
    var data = e.target.result;
    /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
    var workbook = XLSX.read(data);

    /* DO SOMETHING WITH workbook HERE */
  };
  reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);

https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.

User-submitted file with an HTML INPUT element (click to show)

Starting with an HTML INPUT element with type="file":

<input type="file" id="input_dom_element">

For modern websites targeting Chrome 76+, Blob#arrayBuffer is recommended:

// XLSX is a global from the standalone script

async function handleFileAsync(e) {
  const file = e.target.files[0];
  const data = await file.arrayBuffer();
  /* data is an ArrayBuffer */
  const workbook = XLSX.read(data);

  /* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);

For broader support (including IE10+), the FileReader approach is recommended:

function handleFile(e) {
  var file = e.target.files[0];
  var reader = new FileReader();
  reader.onload = function(e) {
    var data = e.target.result;
    /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
    var workbook = XLSX.read(e.target.result);

    /* DO SOMETHING WITH workbook HERE */
  };
  reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);

The oldie demo shows an IE-compatible fallback scenario.

NodeJS Server File Uploads (click to show)

read can accept a NodeJS buffer. readFile can read files generated by a HTTP POST request body parser like formidable:

const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");

const server = http.createServer((req, res) => {
  const form = new formidable.IncomingForm();
  form.parse(req, (err, fields, files) => {
    /* grab the first file */
    const f = Object.entries(files)[0][1];
    const path = f.filepath;
    const workbook = XLSX.readFile(path);

    /* DO SOMETHING WITH workbook HERE */
  });
}).listen(process.env.PORT || 7262);

The server demo has more advanced examples.

Example: Remote File

This example focuses on fetching files ("Ajax" in browser parlance) using APIs like XMLHttpRequest and fetch as well as third-party libraries.

For modern websites targeting Chrome 42+, fetch is recommended:

// XLSX is a global from the standalone script

(async() => {
  const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
  const data = await (await fetch(url)).arrayBuffer();
  /* data is an ArrayBuffer */
  const workbook = XLSX.read(data);

  /* DO SOMETHING WITH workbook HERE */
})();

For broader support, the XMLHttpRequest approach is recommended:

var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";

/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";

req.onload = function(e) {
  var workbook = XLSX.read(req.response);

  /* DO SOMETHING WITH workbook HERE */
};

req.send();

The xhr demo includes a longer discussion and more examples.

http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.

Node 17.5 and 18.0 have native support for fetch:

const XLSX = require("xlsx");

const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

For broader compatibility, third-party modules are recommended.

request requires a null encoding to yield Buffers:

var XLSX = require("xlsx");
var request = require("request");

request({url: url, encoding: null}, function(err, resp, body) {
  var workbook = XLSX.read(body);

  /* DO SOMETHING WITH workbook HERE */
});

axios works the same way in browser and in NodeJS:

const XLSX = require("xlsx");
const axios = require("axios");

(async() => {
  const res = await axios.get(url, {responseType: "arraybuffer"});
  /* res.data is a Buffer */
  const workbook = XLSX.read(res.data);

  /* DO SOMETHING WITH workbook HERE */
})();

The net module in the main process can make HTTP/HTTPS requests to external resources. Responses should be manually concatenated using Buffer.concat:

const XLSX = require("xlsx");
const { net } = require("electron");

const req = net.request(url);
req.on("response", (res) => {
  const bufs = []; // this array will collect all of the buffers
  res.on("data", (chunk) => { bufs.push(chunk); });
  res.on("end", () => {
    const workbook = XLSX.read(Buffer.concat(bufs));

    /* DO SOMETHING WITH workbook HERE */
  });
});
req.end();

Example: Readable Streams

:::caution

The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.

Technical details (click to show)

XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.

XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.

:::

When dealing with ReadableStream, the easiest approach is to buffer the stream and process the whole thing at the end:

// XLSX is a global from the standalone script

async function process_RS(stream) {
  /* collect data */
  const buffers = [];
  const reader = stream.getReader();
  for(;;) {
    const res = await reader.read();
    if(res.value) buffers.push(res.value);
    if(res.done) break;
  }

  /* concat */
  const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));

  let off = 0;
  for(const u8 of buffers) {
    out.set(u8, off);
    off += u8.length;
  }

  return out;
}

const data = await process_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data, {type: 'array'});

When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:

var fs = require("fs");
var XLSX = require("xlsx");

function process_RS(stream, cb) {
  var buffers = [];
  stream.on("data", function(data) { buffers.push(data); });
  stream.on("end", function() {
    var buffer = Buffer.concat(buffers);
    var workbook = XLSX.read(buffer, {type:"buffer"});

    /* DO SOMETHING WITH workbook IN THE CALLBACK */
    cb(workbook);
  });
}

More detailed examples are covered in the included demos

Processing JSON and JS Data

JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.

Create a new Workbook

var workbook = XLSX.utils.book_new();

The book_new utility function creates an empty workbook with no worksheets.

Spreadsheet software generally require at least one worksheet and enforce the requirement in the user interface. This library enforces the requirement at write time, throwing errors if an empty workbook is passed to write functions.

API

Create a worksheet from an array of arrays of JS values

var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);

The aoa_to_sheet utility function walks an "array of arrays" in row-major order, generating a worksheet object. The following snippet generates a sheet with cell A1 set to the string A1, cell B1 set to B1, etc:

var worksheet = XLSX.utils.aoa_to_sheet([
  ["A1", "B1", "C1"],
  ["A2", "B2", "C2"],
  ["A3", "B3", "C3"]
]);

"Array of Arrays Input" describes the function and the optional opts argument in more detail.

Create a worksheet from an array of JS objects

var worksheet = XLSX.utils.json_to_sheet(jsa, opts);

The json_to_sheet utility function walks an array of JS objects in order, generating a worksheet object. By default, it will generate a header row and one row per object in the array. The optional opts argument has settings to control the column order and header output.

"Array of Objects Input" describes the function and the optional opts argument in more detail.

Examples

"Complete Example" contains a detailed example "Get Data from a JSON Endpoint and Generate a Workbook"

x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser. The demo includes a sample script with the xtos function for converting from x-spreadsheet to a workbook. Live Demo: https://oss.sheetjs.com/sheetjs/x-spreadsheet

Records from a database query (SQL or no-SQL) (click to show)

The database demo includes examples of working with databases and query results.

Numerical Computations with TensorFlow.js (click to show)

@tensorflow/tfjs and other libraries expect data in simple arrays, well-suited for worksheets where each column is a data vector. That is the transpose of how most people use spreadsheets, where each row is a vector.

When recovering data from tfjs, the returned data points are stored in a typed array. An array of arrays can be constructed with loops. Array#unshift can prepend a title row before the conversion:

const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');

/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
const tfdata = tf.stack([xs, ys]).transpose();
const shape = tfdata.shape;
const tfarr = tfdata.dataSync();

/* construct the array of arrays */
const aoa = [];
for(let j = 0; j < shape[0]; ++j) {
  aoa[j] = [];
  for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);

/* generate worksheet */
const worksheet = XLSX.utils.aoa_to_sheet(aoa);

The array demo shows a complete example.

Processing HTML Tables

API

Create a worksheet by scraping an HTML TABLE in the page

var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);

The table_to_sheet utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts argument is optional. "HTML Table Input" describes the function in more detail.

Create a workbook by scraping an HTML TABLE in the page

var workbook = XLSX.utils.table_to_book(dom_element, opts);

The table_to_book utility function follows the same logic as table_to_sheet. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.

The options argument supports the same options as table_to_sheet, with the addition of a sheet property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1 is used.

Examples

Here are a few common scenarios (click on each subtitle to see the code):

HTML TABLE element in a webpage (click to show)

<!-- include the standalone script and shim.  this uses the UNPKG CDN -->
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>

<!-- example table with id attribute -->
<table id="tableau">
  <tr><td>Sheet</td><td>JS</td></tr>
  <tr><td>12345</td><td>67</td></tr>
</table>

<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
  var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));

  /* DO SOMETHING WITH workbook HERE */
</script>

Multiple tables on a web page can be converted to individual worksheets:

/* create new workbook */
var workbook = XLSX.utils.book_new();

/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");

/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");

/* workbook now has 2 worksheets */

Alternatively, the HTML code can be extracted and parsed:

var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});

Chrome/Chromium Extension (click to show)

The chrome demo shows a complete example and details the required permissions and other settings.

In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:

/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
  /* pass a message like { sheetjs: true } from the extension to scrape */
  if(!msg || !msg.sheetjs) return;
  /* create a new workbook */
  var workbook = XLSX.utils.book_new();
  /* loop through each table element */
  var tables = document.getElementsByTagName("table")
  for(var i = 0; i < tables.length; ++i) {
    var worksheet = XLSX.utils.table_to_sheet(tables[i]);
    XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
  }
  /* pass back to the extension */
  return cb(workbook);
});

Server-Side HTML Tables with Headless Chrome (click to show)

The headless demo includes a complete demo to convert HTML files to XLSB workbooks. The core idea is to add the script to the page, parse the table in the page context, generate a base64 workbook and send it back for further processing:

const XLSX = require("xlsx");
const { readFileSync } = require("fs"), puppeteer = require("puppeteer");

const url = `https://sheetjs.com/demos/table`;

/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");

(async() => {
  /* start browser and go to web page */
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(url, {waitUntil: "networkidle2"});

  /* inject library */
  await page.addScriptTag({content: lib});

  /* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
  await page.exposeFunction("s5s", async(b64) => {
    const workbook = XLSX.read(b64, {type: "base64" });

    /* DO SOMETHING WITH workbook HERE */
  });

  /* generate XLSB file in webpage context and send back result */
  await page.addScriptTag({content: `
    /* call table_to_book on first table */
    var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));

    /* generate XLSX file */
    var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});

    /* call "s5s" hook exposed from the node process */
    window.s5s(b64);
  `});

  /* cleanup */
  await browser.close();
})();

Server-Side HTML Tables with Headless WebKit (click to show)

The headless demo includes a complete demo to convert HTML files to XLSB workbooks using PhantomJS. The core idea is to add the script to the page, parse the table in the page context, generate a binary workbook and send it back for further processing:

var XLSX = require('xlsx');
var page = require('webpage').create();

/* this code will be run in the page */
var code = [ "function(){",
  /* call table_to_book on first table */
  "var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",

  /* generate XLSB file and return binary string */
  "return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
"}" ].join("");

page.open('https://sheetjs.com/demos/table', function() {
  /* Load the browser script from the UNPKG CDN */
  page.includeJs("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js", function() {
    /* The code will return an XLSB file encoded as binary string */
    var bin = page.evaluateJavaScript(code);

    var workbook = XLSX.read(bin, {type: "binary"});
    /* DO SOMETHING WITH workbook HERE */

    phantom.exit();
  });
});

NodeJS HTML Tables without a browser (click to show)

NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. jsdom is a lightweight alternative:

const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string.  This example reads from test.html */
const html_str = fs.readFileSync("test.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);

24 KiB Raw Blame History

Data Import

Parsing Workbooks

API

Examples

Example: Local File

Example: User Submissions

Example: Remote File

Example: Readable Streams

Processing JSON and JS Data

API

Examples

Processing HTML Tables

API

Examples

24 KiB

Raw Blame History