271 lines
8.9 KiB
Markdown
271 lines
8.9 KiB
Markdown
|
### Usage
|
||
|
|
||
|
Most scenarios involving spreadsheets and data can be broken into 5 parts:
|
||
|
|
||
|
1) **Acquire Data**: Data may be stored anywhere: local or remote files,
|
||
|
databases, HTML TABLE, or even generated programmatically in the web browser.
|
||
|
|
||
|
2) **Extract Data**: For spreadsheet files, this involves parsing raw bytes to
|
||
|
read the cell data. For general JS data, this involves reshaping the data.
|
||
|
|
||
|
3) **Process Data**: From generating summary statistics to cleaning data
|
||
|
records, this step is the heart of the problem.
|
||
|
|
||
|
4) **Package Data**: This can involve making a new spreadsheet or serializing
|
||
|
with `JSON.stringify` or writing XML or simply flattening data for UI tools.
|
||
|
|
||
|
5) **Release Data**: Spreadsheet files can be uploaded to a server or written
|
||
|
locally. Data can be presented to users in an HTML TABLE or data grid.
|
||
|
|
||
|
A common problem involves generating a valid spreadsheet export from data stored
|
||
|
in an HTML table. In this example, an HTML TABLE on the page will be scraped,
|
||
|
a row will be added to the bottom with the date of the report, and a new file
|
||
|
will be generated and downloaded locally. `XLSX.writeFile` takes care of
|
||
|
packaging the data and attempting a local download:
|
||
|
|
||
|
```js
|
||
|
// Acquire Data (reference to the HTML table)
|
||
|
var table_elt = document.getElementById("my-table-id");
|
||
|
|
||
|
// Extract Data (create a workbook object from the table)
|
||
|
var workbook = XLSX.utils.table_to_book(table_elt);
|
||
|
|
||
|
// Process Data (add a new row)
|
||
|
var ws = workbook.Sheets["Sheet1"];
|
||
|
XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
|
||
|
|
||
|
// Package and Release Data (`writeFile` tries to write and save an XLSB file)
|
||
|
XLSX.writeFile(workbook, "Report.xlsb");
|
||
|
```
|
||
|
|
||
|
This library tries to simplify steps 2 and 4 with functions to extract useful
|
||
|
data from spreadsheet files (`read` / `readFile`) and generate new spreadsheet
|
||
|
files from data (`write` / `writeFile`). Additional utility functions like
|
||
|
`table_to_book` work with other common data sources like HTML tables.
|
||
|
|
||
|
This documentation and various demo projects cover a number of common scenarios
|
||
|
and approaches for steps 1 and 5.
|
||
|
|
||
|
Utility functions help with step 3.
|
||
|
|
||
|
["Acquiring and Extracting Data"](#acquiring-and-extracting-data) describes
|
||
|
solutions for common data import scenarios.
|
||
|
|
||
|
["Packaging and Releasing Data"](#packaging-and-releasing-data) describes
|
||
|
solutions for common data export scenarios.
|
||
|
|
||
|
["Processing Data"](#packaging-and-releasing-data) describes solutions for
|
||
|
common workbook processing and manipulation scenarios.
|
||
|
|
||
|
["Utility Functions"](#utility-functions) details utility functions for
|
||
|
translating JSON Arrays and other common JS structures into worksheet objects.
|
||
|
|
||
|
### The Zen of SheetJS
|
||
|
|
||
|
_Data processing should fit in any workflow_
|
||
|
|
||
|
The library does not impose a separate lifecycle. It fits nicely in websites
|
||
|
and apps built using any framework. The plain JS data objects play nice with
|
||
|
Web Workers and future APIs.
|
||
|
|
||
|
_JavaScript is a powerful language for data processing_
|
||
|
|
||
|
The ["Common Spreadsheet Format"](#common-spreadsheet-format) is a simple object
|
||
|
representation of the core concepts of a workbook. The various functions in the
|
||
|
library provide low-level tools for working with the object.
|
||
|
|
||
|
For friendly JS processing, there are utility functions for converting parts of
|
||
|
a worksheet to/from an Array of Arrays. The following example combines powerful
|
||
|
JS Array methods with a network request library to download data, select the
|
||
|
information we want and create a workbook file:
|
||
|
|
||
|
<details>
|
||
|
<summary><b>Get Data from a JSON Endpoint and Generate a Workbook</b> (click to show)</summary>
|
||
|
|
||
|
The goal is to generate a XLSB workbook of US President names and birthdays.
|
||
|
|
||
|
**Acquire Data**
|
||
|
|
||
|
_Raw Data_
|
||
|
|
||
|
<https://theunitedstates.io/congress-legislators/executive.json> has the desired
|
||
|
data. For example, John Adams:
|
||
|
|
||
|
```js
|
||
|
{
|
||
|
"id": { /* (data omitted) */ },
|
||
|
"name": {
|
||
|
"first": "John", // <-- first name
|
||
|
"last": "Adams" // <-- last name
|
||
|
},
|
||
|
"bio": {
|
||
|
"birthday": "1735-10-19", // <-- birthday
|
||
|
"gender": "M"
|
||
|
},
|
||
|
"terms": [
|
||
|
{ "type": "viceprez", /* (other fields omitted) */ },
|
||
|
{ "type": "viceprez", /* (other fields omitted) */ },
|
||
|
{ "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
|
||
|
]
|
||
|
}
|
||
|
```
|
||
|
|
||
|
_Filtering for Presidents_
|
||
|
|
||
|
The dataset includes Aaron Burr, a Vice President who was never President!
|
||
|
|
||
|
`Array#filter` creates a new array with the desired rows. A President served
|
||
|
at least one term with `type` set to `"prez"`. To test if a particular row has
|
||
|
at least one `"prez"` term, `Array#some` is another native JS function. The
|
||
|
complete filter would be:
|
||
|
|
||
|
```js
|
||
|
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
|
||
|
```
|
||
|
|
||
|
_Lining up the data_
|
||
|
|
||
|
For this example, the name will be the first name combined with the last name
|
||
|
(`row.name.first + " " + row.name.last`) and the birthday will be the subfield
|
||
|
`row.bio.birthday`. Using `Array#map`, the dataset can be massaged in one call:
|
||
|
|
||
|
```js
|
||
|
const rows = prez.map(row => ({
|
||
|
name: row.name.first + " " + row.name.last,
|
||
|
birthday: row.bio.birthday
|
||
|
}));
|
||
|
```
|
||
|
|
||
|
The result is an array of "simple" objects with no nesting:
|
||
|
|
||
|
```js
|
||
|
[
|
||
|
{ name: "George Washington", birthday: "1732-02-22" },
|
||
|
{ name: "John Adams", birthday: "1735-10-19" },
|
||
|
// ... one row per President
|
||
|
]
|
||
|
```
|
||
|
|
||
|
**Extract Data**
|
||
|
|
||
|
With the cleaned dataset, `XLSX.utils.json_to_sheet` generates a worksheet:
|
||
|
|
||
|
```js
|
||
|
const worksheet = XLSX.utils.json_to_sheet(rows);
|
||
|
```
|
||
|
|
||
|
`XLSX.utils.book_new` creates a new workbook and `XLSX.utils.book_append_sheet`
|
||
|
appends a worksheet to the workbook. The new worksheet will be called "Dates":
|
||
|
|
||
|
```js
|
||
|
const workbook = XLSX.utils.book_new();
|
||
|
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
|
||
|
```
|
||
|
|
||
|
**Process Data**
|
||
|
|
||
|
_Fixing headers_
|
||
|
|
||
|
By default, `json_to_sheet` creates a worksheet with a header row. In this case,
|
||
|
the headers come from the JS object keys: "name" and "birthday".
|
||
|
|
||
|
The headers are in cells A1 and B1. `XLSX.utils.sheet_add_aoa` can write text
|
||
|
values to the existing worksheet starting at cell A1:
|
||
|
|
||
|
```js
|
||
|
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
|
||
|
```
|
||
|
|
||
|
_Fixing Column Widths_
|
||
|
|
||
|
Some of the names are longer than the default column width. Column widths are
|
||
|
set by [setting the `"!cols"` worksheet property](#row-and-column-properties).
|
||
|
|
||
|
The following line sets the width of column A to approximately 10 characters:
|
||
|
|
||
|
```js
|
||
|
worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 characters
|
||
|
```
|
||
|
|
||
|
One `Array#reduce` call over `rows` can calculate the maximum width:
|
||
|
|
||
|
```js
|
||
|
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
|
||
|
worksheet["!cols"] = [ { wch: max_width } ];
|
||
|
```
|
||
|
|
||
|
Note: If the starting point was a file or HTML table, `XLSX.utils.sheet_to_json`
|
||
|
will generate an array of JS objects.
|
||
|
|
||
|
**Package and Release Data**
|
||
|
|
||
|
`XLSX.writeFile` creates a spreadsheet file and tries to write it to the system.
|
||
|
In the browser, it will try to prompt the user to download the file. In NodeJS,
|
||
|
it will write to the local directory.
|
||
|
|
||
|
```js
|
||
|
XLSX.writeFile(workbook, "Presidents.xlsx");
|
||
|
```
|
||
|
|
||
|
**Complete Example**
|
||
|
|
||
|
```js
|
||
|
// Uncomment the next line for use in NodeJS:
|
||
|
// const XLSX = require("xlsx"), axios = require("axios");
|
||
|
|
||
|
(async() => {
|
||
|
/* fetch JSON data and parse */
|
||
|
const url = "https://theunitedstates.io/congress-legislators/executive.json";
|
||
|
const raw_data = (await axios(url, {responseType: "json"})).data;
|
||
|
|
||
|
/* filter for the Presidents */
|
||
|
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
|
||
|
|
||
|
/* flatten objects */
|
||
|
const rows = prez.map(row => ({
|
||
|
name: row.name.first + " " + row.name.last,
|
||
|
birthday: row.bio.birthday
|
||
|
}));
|
||
|
|
||
|
/* generate worksheet and workbook */
|
||
|
const worksheet = XLSX.utils.json_to_sheet(rows);
|
||
|
const workbook = XLSX.utils.book_new();
|
||
|
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
|
||
|
|
||
|
/* fix headers */
|
||
|
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
|
||
|
|
||
|
/* calculate column width */
|
||
|
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
|
||
|
worksheet["!cols"] = [ { wch: max_width } ];
|
||
|
|
||
|
/* create an XLSX file and try to save to Presidents.xlsx */
|
||
|
XLSX.writeFile(workbook, "Presidents.xlsx");
|
||
|
})();
|
||
|
```
|
||
|
|
||
|
For use in the web browser, assuming the snippet is saved to `snippet.js`,
|
||
|
script tags should be used to include the `axios` and `xlsx` standalone builds:
|
||
|
|
||
|
```html
|
||
|
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
|
||
|
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
|
||
|
<script src="snippet.js"></script>
|
||
|
```
|
||
|
|
||
|
|
||
|
</details>
|
||
|
|
||
|
_File formats are implementation details_
|
||
|
|
||
|
The parser covers a wide gamut of common spreadsheet file formats to ensure that
|
||
|
"HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
|
||
|
|
||
|
The writer supports a number of common output formats for broad compatibility
|
||
|
with the data ecosystem.
|
||
|
|
||
|
To the greatest extent possible, data processing code should not have to worry
|
||
|
about the specific file formats involved.
|
||
|
|
||
|
|