sheetjs_sheetjs/demos/database/README.md

301 lines
9.9 KiB
Markdown
Raw Normal View History

2018-01-27 00:52:46 +00:00
# Databases
"Database" is a catch-all term referring to traditional RDBMS as well as K/V
stores, document databases, and other "NoSQL" storages. There are many external
database systems as well as browser APIs like WebSQL and `localStorage`
This demo discusses general strategies and provides examples for a variety of
database systems. The examples are merely intended to demonstrate very basic
functionality.
## Structured Tables
Database tables are a common import and export target for spreadsheets. One
common representation of a database table is an array of JS objects whose keys
are column headers and whose values are the underlying data values. For example,
| Name | Index |
| :----------- | ----: |
| Barack Obama | 44 |
| Donald Trump | 45 |
is naturally represented as an array of objects
```js
[
{ Name: "Barack Obama", Index: 44 },
{ Name: "Donald Trump", Index: 45 }
]
```
The `sheet_to_json` and `json_to_sheet` helper functions work with objects of
similar shape, converting to and from worksheet objects. The corresponding
worksheet would include a header row for the labels:
```
XXX| A | B |
---+--------------+-------+
1 | Name | Index |
2 | Barack Obama | 44 |
3 | Donald Trump | 45 |
```
## Building Schemas from Worksheets
The `sheet_to_json` helper function generates arrays of JS objects that can be
scanned to determine the column "types", and there are third-party connectors
that can push arrays of JS objects to database tables.
The [`sexql`](http://sheetjs.com/sexql) browser demo uses WebSQL, which is
limited to the SQLite fundamental types.
<details>
<summary><b>Implementation details</b> (click to show)</summary>
The `sexql` schema builder scans the first row to find headers:
2018-01-27 00:52:46 +00:00
```js
if(!ws || !ws['!ref']) return;
var range = XLSX.utils.decode_range(ws['!ref']);
if(!range || !range.s || !range.e || range.s > range.e) return;
var R = range.s.r, C = range.s.c;
var names = new Array(range.e.c-range.s.c+1);
for(C = range.s.c; C<= range.e.c; ++C){
var addr = XLSX.utils.encode_cell({c:C,r:R});
names[C-range.s.c] = ws[addr] ? ws[addr].v : XLSX.utils.encode_col(C);
}
```
After finding the headers, a deduplication step ensures that data is not lost.
Duplicate headers will be suffixed with `_1`, `_2`, etc.
```js
for(var i = 0; i < names.length; ++i) if(names.indexOf(names[i]) < i)
for(var j = 0; j < names.length; ++j) {
var _name = names[i] + "_" + (j+1);
if(names.indexOf(_name) > -1) continue;
names[i] = _name;
}
```
A column-major walk helps determine the data type. For SQLite the only relevant
data types are `REAL` and `TEXT`. If a string or date or error is seen in any
value of a column, the column is marked as `TEXT`:
```js
var types = new Array(range.e.c-range.s.c+1);
for(C = range.s.c; C<= range.e.c; ++C) {
var seen = {}, _type = "";
for(R = range.s.r+1; R<= range.e.r; ++R)
seen[(ws[XLSX.utils.encode_cell({c:C,r:R})]||{t:"z"}).t] = true;
if(seen.s || seen.str) _type = "TEXT";
else if(seen.n + seen.b + seen.d + seen.e > 1) _type = "TEXT";
else switch(true) {
case seen.b:
case seen.n: _type = "REAL"; break;
case seen.e: _type = "TEXT"; break;
case seen.d: _type = "TEXT"; break;
}
types[C-range.s.c] = _type || "TEXT";
}
```
</details>
2018-01-27 00:52:46 +00:00
The included `SheetJSSQL.js` script demonstrates SQL statement generation.
2018-01-27 00:52:46 +00:00
## Objects, K/V and "Schema-less" Databases
So-called "Schema-less" databases allow for arbitrary keys and values within the
entries in the database. K/V stores and Objects add additional restrictions.
There is no natural way to translate arbitrarily shaped schemas to worksheets
in a workbook. One common trick is to dedicate one worksheet to holding named
keys. For example, considering the JS object:
```json
{
"title": "SheetDB",
"metadata": {
"author": "SheetJS",
"code": 7262
},
"data": [
{ "Name": "Barack Obama", "Index": 44 },
{ "Name": "Donald Trump", "Index": 45 },
]
}
```
A dedicated worksheet should store the one-off named values:
```
XXX| A | B |
---+-----------------+---------+
1 | Path | Value |
2 | title | SheetDB |
3 | metadata.author | SheetJS |
4 | metadata.code | 7262 |
```
The included `ObjUtils.js` script demonstrates object-workbook conversion:
<details>
<summary><b>Implementation details</b> (click to show)</summary>
2018-01-27 00:52:46 +00:00
```js
function deepset(obj, path, value) {
if(path.indexOf(".") == -1) return obj[path] = value;
var parts = path.split(".");
if(!obj[parts[0]]) obj[parts[0]] = {};
return deepset(obj[parts[0]], parts.slice(1).join("."), value);
}
function workbook_to_object(wb) {
var out = {};
/* assign one-off keys */
var ws = wb.Sheets["_keys"]; if(ws) {
var data = XLSX.utils.sheet_to_json(ws, {raw:true});
data.forEach(function(r) { deepset(out, r.path, r.value); });
}
/* assign arrays from worksheet tables */
wb.SheetNames.forEach(function(n) {
if(n == "_keys") return;
out[n] = XLSX.utils.sheet_to_json(wb.Sheets[n], {raw:true});
});
return out;
}
function walk(obj, key, arr) {
if(Array.isArray(obj)) return;
if(typeof obj != "object") { arr.push({path:key, value:obj}); return; }
Object.keys(obj).forEach(function(k) { walk(obj[k], key?key+"."+k:k, arr); });
}
function object_to_workbook(obj) {
var wb = XLSX.utils.book_new();
/* keyed entries */
var base = []; walk(obj, "", base);
var ws = XLSX.utils.json_to_sheet(base, {header:["path", "value"]});
XLSX.utils.book_append_sheet(wb, ws, "_keys");
/* arrays */
Object.keys(obj).forEach(function(k) {
if(!Array.isArray(obj[k])) return;
XLSX.utils.book_append_sheet(wb, XLSX.utils.json_to_sheet(obj[k]), k);
});
return wb;
}
```
</details>
2018-01-27 00:52:46 +00:00
## Browser APIs
#### WebSQL
WebSQL is a popular SQL-based in-browser database available on Chrome / Safari.
In practice, it is powered by SQLite, and most simple SQLite-compatible queries
work as-is in WebSQL.
The public demo <http://sheetjs.com/sexql> generates a database from workbook.
#### LocalStorage and SessionStorage
The Storage API, encompassing `localStorage` and `sessionStorage`, describes
simple key-value stores that only support string values and keys. Objects can be
stored as JSON using `JSON.stringify` and `JSON.parse` to set and get keys.
`SheetJSStorage.js` extends the `Storage` prototype with a `load` function to
populate the db based on an object and a `dump` function to generate a workbook
from the data in the storage. `LocalStorage.html` tests `localStorage`.
#### IndexedDB
IndexedDB is a more complex storage solution, but the `localForage` wrapper
supplies a Promise-based interface mimicking the `Storage` API.
`SheetJSForage.js` extends the `localforage` object with a `load` function to
populate the db based on an object and a `dump` function to generate a workbook
from the data in the storage. `LocalForage.html` forces IndexedDB mode.
## External Database Demos
### SQL Databases
There are nodejs connector libraries for all of the popular RDBMS systems. They
have facilities for connecting to a database, executing queries, and obtaining
results as arrays of JS objects that can be passed to `json_to_sheet`. The main
differences surround API shape and supported data types.
#### SQLite
[The `better-sqlite3` module](https://www.npmjs.com/package/better-sqlite3)
provides a very simple API for working with SQLite databases. `Statement#all`
runs a prepared statement and returns an array of JS objects.
2018-01-27 00:52:46 +00:00
`SQLiteTest.js` generates a simple two-table SQLite database (`SheetJS1.db`),
exports to XLSX (`sqlite.xlsx`), imports the new XLSX file to a new database
(`SheetJS2.db`) and verifies the tables are preserved.
#### MySQL / MariaDB
[The `mysql2` module](https://www.npmjs.com/package/mysql2) supplies a callback
API as well as a Promise wrapper. `Connection#query` runs a statement and
returns an array whose first element is an array of JS objects.
`MySQLTest.js` connects to the MySQL instance running on `localhost`, builds two
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
the `sheetj5` database and verifies the tables are preserved.
#### PostgreSQL
[The `pg` module](https://www.npmjs.com/package/pg) supplies a Promise wrapper.
Like with `mysql2`, `Client#query` runs a statement and returns a result object.
The `rows` key of the object is an array of JS objects.
`PgSQLTest.js` connects to the PostgreSQL server on `localhost`, builds two
tables in the `sheetjs` database, exports to XLSX, imports the new XLSX file to
the `sheetj5` database and verifies the tables are preserved.
#### Knex Query Builder
[The `knex` module](https://www.npmjs.com/package/knex) builds SQL queries. The
same exact code can be used against Oracle Database, MSSQL, and other engines.
`KnexTest.js` uses the `sqlite3` connector and follows the same procedure as the
SQLite test. The included `SheetJSKnex.js` script converts between the query
builder and the common spreadsheet format.
2018-01-27 00:52:46 +00:00
### Key/Value Stores
#### Redis
Redis is a powerful data structure server that can store simple strings, sets,
sorted sets, hashes and lists. One simple database representation stores the
strings in a special worksheet (`_strs`), the manifest in another worksheet
(`_manifest`), and each object in its own worksheet (`obj##`).
`RedisTest.js` connects to a local Redis server, populates data based on the
official Redis tutorial, exports to XLSX, flushes the server, imports the new
XLSX file and verifies the data round-tripped correctly. `SheetJSRedis.js`
includes the implementation details.
2018-01-27 00:52:46 +00:00
#### LowDB
LowDB is a small schemaless database powered by `lodash`. `_.get` and `_.set`
helper functions make storing metadata a breeze. The included `SheetJSLowDB.js`
script demonstrates a simple adapter that can load and dump data.
[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/js-xlsx?pixel)](https://github.com/SheetJS/js-xlsx)