--- title: Large Datasets --- For maximal compatibility, the library reads entire files at once and generates files at once. Browsers and other JS engines enforce tight memory limits. In these cases, the library offers strategies to optimize for memory or space by using platform-specific APIs. ## Dense Mode The `dense` option (supported in `read`, `readFile` and `aoa_to_sheet`) creates worksheet objects that use arrays of arrays under the hood: ```js var dense_wb = XLSX.read(ab, {dense: true}); var dense_sheet = XLSX.utils.aoa_to_sheet(aoa, {dense: true}); ```
Historical Note (click to show) The earliest versions of the library aimed for IE6+ compatibility. In early testing, both in Chrome 26 and in IE6, the most efficient worksheet storage for small sheets was a large object whose keys were cell addresses. Over time, V8 (the engine behind Chrome and NodeJS) evolved in a way that made the array of arrays approach more efficient but reduced the performance of the large object approach. In the interest of preserving backwards compatibility, the library opts to make the array of arrays approach available behind a special `dense` option.
The various API functions will seamlessly handle dense and sparse worksheets. ## Streaming Write The streaming write functions are available in the `XLSX.stream` object. They take the same arguments as the normal write functions: - `XLSX.stream.to_csv` is the streaming version of `XLSX.utils.sheet_to_csv`. - `XLSX.stream.to_html` is the streaming version of `XLSX.utils.sheet_to_html`. - `XLSX.stream.to_json` is the streaming version of `XLSX.utils.sheet_to_json`. "Stream" refers to the NodeJS push streams API.
Historical Note (click to show) NodeJS push streams were introduced in 2012. The first streaming write function, `to_csv`, was introduced in April 2017. It used and still uses the same NodeJS streaming API. Years later, browser vendors are settling on a different stream API. For maximal compatibility, the library uses NodeJS push streams.
### NodeJS :::note In a CommonJS context, NodeJS Streams and `fs` immediately work with SheetJS: ```js const XLSX = require("xlsx"); // "just works" ``` In NodeJS ESM, the dependency must be loaded manually: ```js import * as XLSX from 'xlsx'; import { Readable } from 'stream'; XLSX.stream.set_readable(Readable); // manually load stream helpers ``` Additionally, for file-related operations in NodeJS ESM, `fs` must be loaded: ```js import * as XLSX from 'xlsx'; import * as fs from 'fs'; XLSX.set_fs(fs); // manually load fs helpers ``` **It is strongly encouraged to use CommonJS in NodeJS whenever possible.** ::: This example reads a worksheet passed as an argument to the script, pulls the first worksheet, converts to CSV and writes to `out.csv`: ```js var XLSX = require("xlsx"); var workbook = XLSX.readFile(process.argv[2]); var worksheet = workbook.Sheets[workbook.SheetNames[0]]; // highlight-next-line var stream = XLSX.stream.to_csv(worksheet); var output_file_name = "out.csv"; // highlight-next-line stream.pipe(fs.createWriteStream(output_file_name)); ``` `stream.to_json` uses Object-mode streams. A `Transform` stream can be used to generate a normal stream for streaming to a file or the screen: ```js var XLSX = require("xlsx"); var workbook = XLSX.readFile(process.argv[2], {dense: true}); var worksheet = workbook.Sheets[workbook.SheetNames[0]]; /* to_json returns an object-mode stream */ // highlight-next-line var stream = XLSX.stream.to_json(worksheet, {raw:true}); /* this Transform stream converts JS objects to text and prints to screen */ var conv = new Transform({writableObjectMode:true}); conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); }; conv.pipe(process.stdout); // highlight-next-line stream.pipe(conv); ``` ### Browser
Live Demo (click to show) The following live demo fetches and parses a file in a Web Worker. The `to_csv` streaming function is used to generate CSV rows and pass back to the main thread for further processing. :::note For Chromium browsers, the File System Access API provides a modern worker-only approach. [The Web Workers demo](/docs/demos/worker#streaming-write) includes a live example of CSV streaming write. ::: The demo has a URL input box. Feel free to change the URL. For example, `https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls` is an XLS file over 50 MB `https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx` is an XLSX file with 300000 rows (approximately 20 MB) ```jsx live function SheetJSFetchCSVStreamWorker() { const [__html, setHTML] = React.useState(""); const [state, setState] = React.useState(""); const [cnt, setCnt] = React.useState(0); const [url, setUrl] = React.useState("https://oss.sheetjs.com/test_files/large_strings.xlsx"); return ( <> URL: setUrl(e.target.value)} size="80"/>
State: {state}
Number of rows: {cnt}
   );
}
```

NodeJS streaming APIs are not available in the browser. The following function supplies a pseudo stream object compatible with the `to_csv` function: ```js function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { XLSX.stream.set_readable(() => ({ __done: false, // this function will be assigned by the SheetJS stream methods _read: function() { this.__done = true; }, // this function is called by the stream methods push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } })); return XLSX.stream.to_csv(ws, opts); } // assuming `workbook` is a workbook, stream the first sheet const ws = workbook.Sheets[workbook.SheetNames[0]]; const strm = sheet_to_csv_cb(ws, (csv)=>{ if(csv != null) console.log(csv); }); strm.resume(); ``` #### Web Workers For processing large files in the browser, it is strongly encouraged to use Web Workers. The [Worker demo](/docs/demos/worker#streaming-write) includes examples using the File System Access API. Typically, the file and stream processing occurs in the Web Worker. CSV rows can be sent back to the main thread in the callback: ```js title="worker.js" /* load standalone script from CDN */ importScripts("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"); function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { XLSX.stream.set_readable(() => ({ __done: false, // this function will be assigned by the SheetJS stream methods _read: function() { this.__done = true; }, // this function is called by the stream methods push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } })); return XLSX.stream.to_csv(ws, opts); } /* this callback will run once the main context sends a message */ self.addEventListener('message', async(e) => { try { postMessage({state: "fetching " + e.data.url}); /* Fetch file */ const res = await fetch(e.data.url); const ab = await res.arrayBuffer(); /* Parse file */ postMessage({state: "parsing"}); const wb = XLSX.read(ab, {dense: true}); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate CSV rows */ postMessage({state: "csv"}); const strm = sheet_to_csv_cb(ws, (csv) => { if(csv != null) postMessage({csv}); else postMessage({state: "done"}); }); strm.resume(); } catch(e) { /* Pass the error message back */ postMessage({error: String(e.message || e) }); } }, false); ``` The main thread will receive messages with CSV rows for further processing: ```js worker.onmessage = function(e) { if(e.data.error) { console.error(e.data.error); /* show an error message */ } else if(e.data.state) { console.info(e.data.state); /* current state */ } else { /* e.data.csv is the row generated by the stream */ console.log(e.data.csv); } }; ``` ### Deno Deno does not support NodeJS streams in normal execution, so a wrapper is used. This example fetches and prints CSV rows: ```ts title="sheet2csv.ts" // @deno-types="https://cdn.sheetjs.com/xlsx-latest/package/types/index.d.ts" import { stream, Sheet2CSVOpts, WorkSheet } from 'https://cdn.sheetjs.com/xlsx-latest/package/xlsx.mjs'; interface Resumable { resume:()=>void; }; /* Generate row strings from a worksheet */ function sheet_to_csv_cb(ws: WorkSheet, cb:(d:string|null)=>void, opts: Sheet2CSVOpts = {}, batch = 1000): Resumable { stream.set_readable(() => ({ __done: false, // this function will be assigned by the SheetJS stream methods _read: function() { this.__done = true; }, // this function is called by the stream methods push: function(d: any) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } })); return stream.to_csv(ws, opts) as Resumable; } /* Callback invoked on each row (string) and at the end (null) */ const csv_cb = (d:string|null) => { if(d == null) return; /* The strings include line endings, so raw write ops should be used */ Deno.stdout.write(new TextEncoder().encode(d)); }; /* Fetch https://sheetjs.com/pres.numbers, parse, and get first worksheet */ import { read } from 'https://cdn.sheetjs.com/xlsx-latest/package/xlsx.mjs'; const ab = await (await fetch("https://sheetjs.com/pres.numbers")).arrayBuffer(); const wb = read(ab, { dense: true }); const ws = wb.Sheets[wb.SheetNames[0]]; /* Create and start CSV stream */ sheet_to_csv_cb(ws, csv_cb).resume(); ```