sheetjs/demos/database/README.md

341 lines
11 KiB
Markdown

# Databases
"Database" is a catch-all term referring to traditional RDBMS as well as K/V
stores, document databases, and other "NoSQL" storages. There are many external
database systems as well as browser APIs like WebSQL and `localStorage`
This demo discusses general strategies and provides examples for a variety of
database systems. The examples are merely intended to demonstrate very basic
functionality.
## Structured Tables
Database tables are a common import and export target for spreadsheets. One
common representation of a database table is an array of JS objects whose keys
are column headers and whose values are the underlying data values. For example,
| Name | Index |
| :----------- | ----: |
| Barack Obama | 44 |
| Donald Trump | 45 |
is naturally represented as an array of objects
```js
[
{ Name: "Barack Obama", Index: 44 },
{ Name: "Donald Trump", Index: 45 }
]
```
The `sheet_to_json` and `json_to_sheet` helper functions work with objects of
similar shape, converting to and from worksheet objects. The corresponding
worksheet would include a header row for the labels:
```
XXX| A | B |
---+--------------+-------+
1 | Name | Index |
2 | Barack Obama | 44 |
3 | Donald Trump | 45 |
```
## Building Schemas from Worksheets
The `sheet_to_json` helper function generates arrays of JS objects that can be
scanned to determine the column "types", and there are third-party connectors
that can push arrays of JS objects to database tables.
The [`sexql`](http://sheetjs.com/sexql) browser demo uses WebSQL, which is
limited to the SQLite fundamental types.
<details>
<summary><b>Implementation details</b> (click to show)</summary>
The `sexql` schema builder scans the first row to find headers:
```js
if(!ws || !ws['!ref']) return;
var range = XLSX.utils.decode_range(ws['!ref']);
if(!range || !range.s || !range.e || range.s > range.e) return;
var R = range.s.r, C = range.s.c;
var names = new Array(range.e.c-range.s.c+1);
for(C = range.s.c; C<= range.e.c; ++C){
var addr = XLSX.utils.encode_cell({c:C,r:R});
names[C-range.s.c] = ws[addr] ? ws[addr].v : XLSX.utils.encode_col(C);
}
```
After finding the headers, a deduplication step ensures that data is not lost.
Duplicate headers will be suffixed with `_1`, `_2`, etc.
```js
for(var i = 0; i < names.length; ++i) if(names.indexOf(names[i]) < i)
for(var j = 0; j < names.length; ++j) {
var _name = names[i] + "_" + (j+1);
if(names.indexOf(_name) > -1) continue;
names[i] = _name;
}
```
A column-major walk helps determine the data type. For SQLite the only relevant
data types are `REAL` and `TEXT`. If a string or date or error is seen in any
value of a column, the column is marked as `TEXT`:
```js
var types = new Array(range.e.c-range.s.c+1);
for(C = range.s.c; C<= range.e.c; ++C) {
var seen = {}, _type = "";
for(R = range.s.r+1; R<= range.e.r; ++R)
seen[(ws[XLSX.utils.encode_cell({c:C,r:R})]||{t:"z"}).t] = true;
if(seen.s || seen.str) _type = "TEXT";
else if(seen.n + seen.b + seen.d + seen.e > 1) _type = "TEXT";
else switch(true) {
case seen.b:
case seen.n: _type = "REAL"; break;
case seen.e: _type = "TEXT"; break;
case seen.d: _type = "TEXT"; break;
}
types[C-range.s.c] = _type || "TEXT";
}
```
</details>
The included `SheetJSSQL.js` script demonstrates SQL statement generation.
## Objects, K/V and "Schema-less" Databases
So-called "Schema-less" databases allow for arbitrary keys and values within the
entries in the database. K/V stores and Objects add additional restrictions.
There is no natural way to translate arbitrarily shaped schemas to worksheets
in a workbook. One common trick is to dedicate one worksheet to holding named
keys. For example, considering the JS object:
```json
{
"title": "SheetDB",
"metadata": {
"author": "SheetJS",
"code": 7262
},
"data": [
{ "Name": "Barack Obama", "Index": 44 },
{ "Name": "Donald Trump", "Index": 45 },
]
}
```
A dedicated worksheet should store the one-off named values:
```
XXX| A | B |
---+-----------------+---------+
1 | Path | Value |
2 | title | SheetDB |
3 | metadata.author | SheetJS |
4 | metadata.code | 7262 |
```
The included `ObjUtils.js` script demonstrates object-workbook conversion:
<details>
<summary><b>Implementation details</b> (click to show)</summary>
```js
function deepset(obj, path, value) {
if(path.indexOf(".") == -1) return obj[path] = value;
var parts = path.split(".");
if(!obj[parts[0]]) obj[parts[0]] = {};
return deepset(obj[parts[0]], parts.slice(1).join("."), value);
}
function workbook_to_object(wb) {
var out = {};
/* assign one-off keys */
var ws = wb.Sheets["_keys"]; if(ws) {
var data = XLSX.utils.sheet_to_json(ws, {raw:true});
data.forEach(function(r) { deepset(out, r.path, r.value); });
}
/* assign arrays from worksheet tables */
wb.SheetNames.forEach(function(n) {
if(n == "_keys") return;
out[n] = XLSX.utils.sheet_to_json(wb.Sheets[n], {raw:true});
});
return out;
}
function walk(obj, key, arr) {
if(Array.isArray(obj)) return;
if(typeof obj != "object") { arr.push({path:key, value:obj}); return; }
Object.keys(obj).forEach(function(k) { walk(obj[k], key?key+"."+k:k, arr); });
}
function object_to_workbook(obj) {
var wb = XLSX.utils.book_new();
/* keyed entries */
var base = []; walk(obj, "", base);
var ws = XLSX.utils.json_to_sheet(base, {header:["path", "value"]});
XLSX.utils.book_append_sheet(wb, ws, "_keys");
/* arrays */
Object.keys(obj).forEach(function(k) {
if(!Array.isArray(obj[k])) return;
XLSX.utils.book_append_sheet(wb, XLSX.utils.json_to_sheet(obj[k]), k);
});
return wb;
}
```
</details>
## Browser APIs
#### WebSQL
WebSQL is a popular SQL-based in-browser database available on Chrome / Safari.
In practice, it is powered by SQLite, and most simple SQLite-compatible queries
work as-is in WebSQL.
The public demo <http://sheetjs.com/sexql> generates a database from workbook.
#### LocalStorage and SessionStorage
The Storage API, encompassing `localStorage` and `sessionStorage`, describes
simple key-value stores that only support string values and keys. Objects can be
stored as JSON using `JSON.stringify` and `JSON.parse` to set and get keys.
`SheetJSStorage.js` extends the `Storage` prototype with a `load` function to
populate the db based on an object and a `dump` function to generate a workbook
from the data in the storage. `LocalStorage.html` tests `localStorage`.
#### IndexedDB
IndexedDB is a more complex storage solution, but the `localForage` wrapper
supplies a Promise-based interface mimicking the `Storage` API.
`SheetJSForage.js` extends the `localforage` object with a `load` function to
populate the db based on an object and a `dump` function to generate a workbook
from the data in the storage. `LocalForage.html` forces IndexedDB mode.
## External Database Demos
### SQL Databases
There are nodejs connector libraries for all of the popular RDBMS systems. They
have facilities for connecting to a database, executing queries, and obtaining
results as arrays of JS objects that can be passed to `json_to_sheet`. The main
differences surround API shape and supported data types.
#### SQLite
[The `better-sqlite3` module](https://www.npmjs.com/package/better-sqlite3)
provides a very simple API for working with SQLite databases. `Statement#all`
runs a prepared statement and returns an array of JS objects.
`SQLiteTest.js` generates a simple two-table SQLite database (`SheetJS1.db`),
exports to XLSX (`sqlite.xlsx`), imports the new XLSX file to a new database
(`SheetJS2.db`) and verifies the tables are preserved.
#### MySQL / MariaDB
[The `mysql2` module](https://www.npmjs.com/package/mysql2) supplies a callback
API as well as a Promise wrapper. `Connection#query` runs a statement and
returns an array whose first element is an array of JS objects.
`MySQLTest.js` connects to the MySQL instance running on `localhost`, builds two
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
the `sheetj5` database and verifies the tables are preserved.
#### PostgreSQL
[The `pg` module](https://www.npmjs.com/package/pg) supplies a Promise wrapper.
Like with `mysql2`, `Client#query` runs a statement and returns a result object.
The `rows` key of the object is an array of JS objects.
`PgSQLTest.js` connects to the PostgreSQL server on `localhost`, builds two
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
the `sheetj5` database and verifies the tables are preserved.
#### Knex Query Builder
[The `knex` module](https://www.npmjs.com/package/knex) builds SQL queries. The
same exact code can be used against Oracle Database, MSSQL, and other engines.
`KnexTest.js` uses the `sqlite3` connector and follows the same procedure as the
SQLite test. The included `SheetJSKnex.js` script converts between the query
builder and the common spreadsheet format.
### Key/Value Stores
#### Redis
Redis is a powerful data structure server that can store simple strings, sets,
sorted sets, hashes and lists. One simple database representation stores the
strings in a special worksheet (`_strs`), the manifest in another worksheet
(`_manifest`), and each object in its own worksheet (`obj##`).
`RedisTest.js` connects to a local Redis server, populates data based on the
official Redis tutorial, exports to XLSX, flushes the server, imports the new
XLSX file and verifies the data round-tripped correctly. `SheetJSRedis.js`
includes the implementation details.
#### LowDB
LowDB is a small schemaless database powered by `lodash`. `_.get` and `_.set`
helper functions make storing metadata a breeze. The included `SheetJSLowDB.js`
script demonstrates a simple adapter that can load and dump data.
### Document Databases
Since document databases are capable of holding more complex objects, they can
actually hold the underlying worksheet objects! In some cases, where arrays are
supported, they can even hold the workbook object.
#### MongoDB
MongoDB is a popular document-oriented database engine. `MongoDBTest.js` uses
MongoDB to hold a simple workbook and export to XLSX.
`MongoDBCRUD.js` follows the SQL examples using an idiomatic collection
structure. Exporting and importing collections are straightforward:
<details>
<summary><b>Example code</b> (click to show)</summary>
```js
/* generate a worksheet from a collection */
const aoa = await db.collection('coll').find({}).toArray();
aoa.forEach((x) => delete x._id);
const ws = XLSX.utils.json_to_sheet(aoa);
/* import data from a worksheet to a collection */
const aoa = XLSX.utils.sheet_to_json(ws);
await db.collection('coll').insertMany(aoa, {ordered: true});
```
</details>
#### Firebase
[`firebase-server`](https://www.npmjs.com/package/firebase-server) is a simple
mock Firebase server used in the tests, but the same code works in an external
Firebase deployment when plugging in the database connection info.
`FirebaseDemo.html` and `FirebaseTest.js` demonstrate a whole-workbook process.
The entire workbook object is persisted, a few cells are changed, and the stored
data is dumped and exported to XLSX.
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/js-xlsx?pixel)](https://github.com/SheetJS/js-xlsx)