forked from sheetjs/sheetjs
280 lines
9.3 KiB
Markdown
280 lines
9.3 KiB
Markdown
|
# Databases
|
||
|
|
||
|
"Database" is a catch-all term referring to traditional RDBMS as well as K/V
|
||
|
stores, document databases, and other "NoSQL" storages. There are many external
|
||
|
database systems as well as browser APIs like WebSQL and `localStorage`
|
||
|
|
||
|
This demo discusses general strategies and provides examples for a variety of
|
||
|
database systems. The examples are merely intended to demonstrate very basic
|
||
|
functionality.
|
||
|
|
||
|
|
||
|
## Structured Tables
|
||
|
|
||
|
Database tables are a common import and export target for spreadsheets. One
|
||
|
common representation of a database table is an array of JS objects whose keys
|
||
|
are column headers and whose values are the underlying data values. For example,
|
||
|
|
||
|
| Name | Index |
|
||
|
| :----------- | ----: |
|
||
|
| Barack Obama | 44 |
|
||
|
| Donald Trump | 45 |
|
||
|
|
||
|
is naturally represented as an array of objects
|
||
|
|
||
|
```js
|
||
|
[
|
||
|
{ Name: "Barack Obama", Index: 44 },
|
||
|
{ Name: "Donald Trump", Index: 45 }
|
||
|
]
|
||
|
```
|
||
|
|
||
|
The `sheet_to_json` and `json_to_sheet` helper functions work with objects of
|
||
|
similar shape, converting to and from worksheet objects. The corresponding
|
||
|
worksheet would include a header row for the labels:
|
||
|
|
||
|
```
|
||
|
XXX| A | B |
|
||
|
---+--------------+-------+
|
||
|
1 | Name | Index |
|
||
|
2 | Barack Obama | 44 |
|
||
|
3 | Donald Trump | 45 |
|
||
|
```
|
||
|
|
||
|
|
||
|
## Building Schemas from Worksheets
|
||
|
|
||
|
The `sheet_to_json` helper function generates arrays of JS objects that can be
|
||
|
scanned to determine the column "types", and there are third-party connectors
|
||
|
that can push arrays of JS objects to database tables.
|
||
|
|
||
|
The [`sexql`](http://sheetjs.com/sexql) browser demo uses WebSQL, which is
|
||
|
limited to the SQLite fundamental types. Its schema builder scans the first row
|
||
|
to find headers:
|
||
|
|
||
|
```js
|
||
|
if(!ws || !ws['!ref']) return;
|
||
|
var range = XLSX.utils.decode_range(ws['!ref']);
|
||
|
if(!range || !range.s || !range.e || range.s > range.e) return;
|
||
|
var R = range.s.r, C = range.s.c;
|
||
|
|
||
|
var names = new Array(range.e.c-range.s.c+1);
|
||
|
for(C = range.s.c; C<= range.e.c; ++C){
|
||
|
var addr = XLSX.utils.encode_cell({c:C,r:R});
|
||
|
names[C-range.s.c] = ws[addr] ? ws[addr].v : XLSX.utils.encode_col(C);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
After finding the headers, a deduplication step ensures that data is not lost.
|
||
|
Duplicate headers will be suffixed with `_1`, `_2`, etc.
|
||
|
|
||
|
```js
|
||
|
for(var i = 0; i < names.length; ++i) if(names.indexOf(names[i]) < i)
|
||
|
for(var j = 0; j < names.length; ++j) {
|
||
|
var _name = names[i] + "_" + (j+1);
|
||
|
if(names.indexOf(_name) > -1) continue;
|
||
|
names[i] = _name;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
A column-major walk helps determine the data type. For SQLite the only relevant
|
||
|
data types are `REAL` and `TEXT`. If a string or date or error is seen in any
|
||
|
value of a column, the column is marked as `TEXT`:
|
||
|
|
||
|
```js
|
||
|
var types = new Array(range.e.c-range.s.c+1);
|
||
|
for(C = range.s.c; C<= range.e.c; ++C) {
|
||
|
var seen = {}, _type = "";
|
||
|
for(R = range.s.r+1; R<= range.e.r; ++R)
|
||
|
seen[(ws[XLSX.utils.encode_cell({c:C,r:R})]||{t:"z"}).t] = true;
|
||
|
if(seen.s || seen.str) _type = "TEXT";
|
||
|
else if(seen.n + seen.b + seen.d + seen.e > 1) _type = "TEXT";
|
||
|
else switch(true) {
|
||
|
case seen.b:
|
||
|
case seen.n: _type = "REAL"; break;
|
||
|
case seen.e: _type = "TEXT"; break;
|
||
|
case seen.d: _type = "TEXT"; break;
|
||
|
}
|
||
|
types[C-range.s.c] = _type || "TEXT";
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The included `SheetJSSQL.js` script demonstrates SQL statement generation.
|
||
|
|
||
|
## Objects, K/V and "Schema-less" Databases
|
||
|
|
||
|
So-called "Schema-less" databases allow for arbitrary keys and values within the
|
||
|
entries in the database. K/V stores and Objects add additional restrictions.
|
||
|
|
||
|
There is no natural way to translate arbitrarily shaped schemas to worksheets
|
||
|
in a workbook. One common trick is to dedicate one worksheet to holding named
|
||
|
keys. For example, considering the JS object:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"title": "SheetDB",
|
||
|
"metadata": {
|
||
|
"author": "SheetJS",
|
||
|
"code": 7262
|
||
|
},
|
||
|
"data": [
|
||
|
{ "Name": "Barack Obama", "Index": 44 },
|
||
|
{ "Name": "Donald Trump", "Index": 45 },
|
||
|
]
|
||
|
}
|
||
|
```
|
||
|
|
||
|
A dedicated worksheet should store the one-off named values:
|
||
|
|
||
|
```
|
||
|
XXX| A | B |
|
||
|
---+-----------------+---------+
|
||
|
1 | Path | Value |
|
||
|
2 | title | SheetDB |
|
||
|
3 | metadata.author | SheetJS |
|
||
|
4 | metadata.code | 7262 |
|
||
|
```
|
||
|
|
||
|
The included `ObjUtils.js` script demonstrates object-workbook conversion:
|
||
|
|
||
|
```js
|
||
|
function deepset(obj, path, value) {
|
||
|
if(path.indexOf(".") == -1) return obj[path] = value;
|
||
|
var parts = path.split(".");
|
||
|
if(!obj[parts[0]]) obj[parts[0]] = {};
|
||
|
return deepset(obj[parts[0]], parts.slice(1).join("."), value);
|
||
|
}
|
||
|
function workbook_to_object(wb) {
|
||
|
var out = {};
|
||
|
|
||
|
/* assign one-off keys */
|
||
|
var ws = wb.Sheets["_keys"]; if(ws) {
|
||
|
var data = XLSX.utils.sheet_to_json(ws, {raw:true});
|
||
|
data.forEach(function(r) { deepset(out, r.path, r.value); });
|
||
|
}
|
||
|
|
||
|
/* assign arrays from worksheet tables */
|
||
|
wb.SheetNames.forEach(function(n) {
|
||
|
if(n == "_keys") return;
|
||
|
out[n] = XLSX.utils.sheet_to_json(wb.Sheets[n], {raw:true});
|
||
|
});
|
||
|
|
||
|
return out;
|
||
|
}
|
||
|
|
||
|
function walk(obj, key, arr) {
|
||
|
if(Array.isArray(obj)) return;
|
||
|
if(typeof obj != "object") { arr.push({path:key, value:obj}); return; }
|
||
|
Object.keys(obj).forEach(function(k) { walk(obj[k], key?key+"."+k:k, arr); });
|
||
|
}
|
||
|
function object_to_workbook(obj) {
|
||
|
var wb = XLSX.utils.book_new();
|
||
|
|
||
|
/* keyed entries */
|
||
|
var base = []; walk(obj, "", base);
|
||
|
var ws = XLSX.utils.json_to_sheet(base, {header:["path", "value"]});
|
||
|
XLSX.utils.book_append_sheet(wb, ws, "_keys");
|
||
|
|
||
|
/* arrays */
|
||
|
Object.keys(obj).forEach(function(k) {
|
||
|
if(!Array.isArray(obj[k])) return;
|
||
|
XLSX.utils.book_append_sheet(wb, XLSX.utils.json_to_sheet(obj[k]), k);
|
||
|
});
|
||
|
|
||
|
return wb;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
|
||
|
## Browser APIs
|
||
|
|
||
|
#### WebSQL
|
||
|
|
||
|
WebSQL is a popular SQL-based in-browser database available on Chrome / Safari.
|
||
|
In practice, it is powered by SQLite, and most simple SQLite-compatible queries
|
||
|
work as-is in WebSQL.
|
||
|
|
||
|
The public demo <http://sheetjs.com/sexql> generates a database from workbook.
|
||
|
|
||
|
#### LocalStorage and SessionStorage
|
||
|
|
||
|
The Storage API, encompassing `localStorage` and `sessionStorage`, describes
|
||
|
simple key-value stores that only support string values and keys. Objects can be
|
||
|
stored as JSON using `JSON.stringify` and `JSON.parse` to set and get keys.
|
||
|
|
||
|
`SheetJSStorage.js` extends the `Storage` prototype with a `load` function to
|
||
|
populate the db based on an object and a `dump` function to generate a workbook
|
||
|
from the data in the storage. `LocalStorage.html` tests `localStorage`.
|
||
|
|
||
|
#### IndexedDB
|
||
|
|
||
|
IndexedDB is a more complex storage solution, but the `localForage` wrapper
|
||
|
supplies a Promise-based interface mimicking the `Storage` API.
|
||
|
|
||
|
`SheetJSForage.js` extends the `localforage` object with a `load` function to
|
||
|
populate the db based on an object and a `dump` function to generate a workbook
|
||
|
from the data in the storage. `LocalForage.html` forces IndexedDB mode.
|
||
|
|
||
|
|
||
|
## External Database Demos
|
||
|
|
||
|
### SQL Databases
|
||
|
|
||
|
There are nodejs connector libraries for all of the popular RDBMS systems. They
|
||
|
have facilities for connecting to a database, executing queries, and obtaining
|
||
|
results as arrays of JS objects that can be passed to `json_to_sheet`. The main
|
||
|
differences surround API shape and supported data types.
|
||
|
|
||
|
#### SQLite
|
||
|
|
||
|
[The `better-sqlite3` module](https://www.npmjs.com/package/better-sqlite3)
|
||
|
provides a very simple API for working with SQLite databases. `Statement#all`
|
||
|
runs a prepared statement and returns an array of JS objects
|
||
|
|
||
|
`SQLiteTest.js` generates a simple two-table SQLite database (`SheetJS1.db`),
|
||
|
exports to XLSX (`sqlite.xlsx`), imports the new XLSX file to a new database
|
||
|
(`SheetJS2.db`) and verifies the tables are preserved.
|
||
|
|
||
|
#### MySQL / MariaDB
|
||
|
|
||
|
[The `mysql2` module](https://www.npmjs.com/package/mysql2) supplies a callback
|
||
|
API as well as a Promise wrapper. `Connection#query` runs a statement and
|
||
|
returns an array whose first element is an array of JS objects.
|
||
|
|
||
|
`MySQLTest.js` connects to the MySQL instance running on `localhost`, builds two
|
||
|
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
|
||
|
the `sheetj5` database and verifies the tables are preserved.
|
||
|
|
||
|
#### PostgreSQL
|
||
|
|
||
|
[The `pg` module](https://www.npmjs.com/package/pg) supplies a Promise wrapper.
|
||
|
Like with `mysql2`, `Client#query` runs a statement and returns a result object.
|
||
|
The `rows` key of the object is an array of JS objects.
|
||
|
|
||
|
`PgSQLTest.js` connects to the PostgreSQL server on `localhost`, builds two
|
||
|
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
|
||
|
the `sheetj5` database and verifies the tables are preserved.
|
||
|
|
||
|
### Key/Value Stores
|
||
|
|
||
|
#### Redis
|
||
|
|
||
|
Redis is a powerful data structure server that can store simple strings, sets,
|
||
|
sorted sets, hashes and lists. One simple database representation stores the
|
||
|
strings in a special worksheet (`_strs`), the manifest in another worksheet
|
||
|
(`_manifest`), and each object in its own worksheet (`obj##`).
|
||
|
|
||
|
`RedisTest.js` connects to a local Redis server, populates data based on the
|
||
|
official Redis tutorial, exports to XLSX, flushes the server, imports the new
|
||
|
XLSX file and verifies the data round-tripped correctly. `SheetJSRedis.js`
|
||
|
includes the implementation details
|
||
|
|
||
|
#### LowDB
|
||
|
|
||
|
LowDB is a small schemaless database powered by `lodash`. `_.get` and `_.set`
|
||
|
helper functions make storing metadata a breeze. The included `SheetJSLowDB.js`
|
||
|
script demonstrates a simple adapter that can load and dump data.
|
||
|
|
||
|
|
||
|
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/js-xlsx?pixel)](https://github.com/SheetJS/js-xlsx)
|