docs.sheetjs.com/docz/docs/08-api/11-stream.md
2024-07-18 18:19:02 -04:00

8.6 KiB

title sidebar_position hide_table_of_contents
Stream Export 11 true

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Many platforms offer methods to write files. These methods typically expect the entire file to be generated before writing. Large workbook files may exceed platform-specific size limits.

Some platforms also offer a "streaming" or "incremental" approach. Instead of writing the entire file at once, these methods can accept small chunks of data and incrementally write to the filesystem.

The Streaming Write demo includes live browser demos and notes for platforms that do not support SheetJS streams.

:::tip pass

This feature was expanded in version 0.20.3. It is strongly recommended to upgrade to the latest version.

:::

Streaming Basics

SheetJS streams use the NodeJS push streams API. It is strongly recommended to review the official NodeJS "Stream" documentation1.

Historical Note (click to show)

NodeJS push streams were introduced in 2012. The text streaming methods to_csv and to_html are supported in NodeJS v0.10 and later while the object streaming method to_json is supported in NodeJS v0.12 and later.

The first SheetJS streaming write function, to_csv, was introduced in 2017. It used and still uses the battle-tested NodeJS streaming API.

Years later, browser vendors opted to standardize a different stream API.

For maximal compatibility, the library uses NodeJS push streams.

NodeJS ECMAScript Module Support

In CommonJS modules, libraries can load the stream module using require. SheetJS libraries will load streaming support where applicable.

Due to ESM limitations, libraries cannot freely import the stream module.

:::danger ECMAScript Module Limitations

The original specification only supported top-level imports:

import { Readable } from 'stream';

If a module is unavailable, there is no way for scripts to gracefully fail or ignore the error.


Patches to the specification added two different solutions to the problem:

  • "dynamic imports" will throw errors that can be handled by libraries. Dynamic imports will taint APIs that do not use Promise-based methods.
/* Readable will be undefined if stream cannot be imported */
const Readable = await (async() => {
  try {
    return (await import("stream"))?.Readable;
  } catch(e) { /* silently ignore error */ }
})();
  • "import maps" control module resolution, allowing library users to manually shunt unsupported modules.

These patches were released after browsers adopted ESM! A number of browsers and other platforms support top-level imports but do not support the patches.


Due to ESM woes, it is strongly recommended to use CommonJS when possible!

:::

For maximal platform support, SheetJS libraries expose a special set_readable method to provide a Readable implementation:

import { stream as SheetJStream } from 'xlsx';
import { Readable } from 'stream';

SheetJStream.set_readable(Readable);

Worksheet Export

The worksheet export methods accept a SheetJS worksheet object.

CSV Export

Export worksheet data in "Comma-Separated Values" (CSV)

var csvstream = XLSX.stream.to_csv(ws, opts);

to_csv creates a NodeJS text stream. The options mirror the non-streaming sheet_to_csv method.

The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and streams CSV rows to the terminal.

const XLSX = require("xlsx");

(async() => {
  var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
  var wb = XLSX.read(ab);
  var ws = wb.Sheets[wb.SheetNames[0]];
  XLSX.stream.to_csv(ws).pipe(process.stdout);
})();
import { read, stream } from "xlsx";
import { Readable } from "stream";
stream.set_readable(Readable);

var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
var wb = read(ab);
var ws = wb.Sheets[wb.SheetNames[0]];
stream.to_csv(ws).pipe(process.stdout);

JSON Export

Export worksheet data to "Arrays of Arrays" or "Arrays of Objects"

var jsonstream = XLSX.stream.to_json(ws, opts);

to_json creates a NodeJS object stream. The options mirror the non-streaming sheet_to_json method.

The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and streams JSON rows to the terminal. A Transform2 stream generates text from the object streams.

const XLSX = require("xlsx")
const { Transform } = require("stream");

/* this Transform stream converts JS objects to text */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };

(async() => {
  var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
  var wb = XLSX.read(ab);
  var ws = wb.Sheets[wb.SheetNames[0]];
  XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout);
})();
import { read, stream } from "xlsx";
import { Readable, Transform } from "stream";
stream.set_readable(Readable);

/* this Transform stream converts JS objects to text */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };

var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
var wb = read(ab);
var ws = wb.Sheets[wb.SheetNames[0]];
stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout);

HTML Export

Export worksheet data to HTML TABLE

var htmlstream = XLSX.stream.to_html(ws, opts);

to_html creates a NodeJS text stream. The options mirror the non-streaming sheet_to_html method.

The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and streams HTML TABLE rows to the terminal.

const XLSX = require("xlsx");

(async() => {
  var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
  var wb = XLSX.read(ab);
  var ws = wb.Sheets[wb.SheetNames[0]];
  XLSX.stream.to_html(ws).pipe(process.stdout);
})();
import { read, stream } from "xlsx";
import { Readable } from "stream";
stream.set_readable(Readable);

var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
var wb = read(ab);
var ws = wb.Sheets[wb.SheetNames[0]];
stream.to_html(ws).pipe(process.stdout);

Workbook Export

The workbook export methods accept a SheetJS workbook object.

XLML Export

Export workbook data to SpreadsheetML2003 XML files

var xlmlstream = XLSX.stream.to_xlml(wb, opts);

to_xlml creates a NodeJS text stream. The options mirror the non-streaming write method using the xlml book type.

The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and writes a SpreadsheetML2003 workbook to SheetJStream.xml.xls:

const XLSX = require("xlsx"), fs = require("fs");

(async() => {
  var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
  var wb = XLSX.read(ab);
  XLSX.stream.to_xlml(wb).pipe(fs.createWriteStream("SheetJStream.xml.xls"));
})();
import { read, stream } from "xlsx";
import { Readable } from "stream";
stream.set_readable(Readable);
import { createWriteStream } from "fs";

var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer()
var wb = read(ab);
stream.to_xlml(wb).pipe(createWriteStream("SheetJStream.xml.xls"));

  1. See "Stream" in the NodeJS documentation. ↩︎

  2. See Transform in the NodeJS documentation. ↩︎