sheetjs_sheetjs/demos/database/README.md
SheetJS 5de62a947f version bump 0.12.5: ancillary utility update
- add BOM to `stream.to_csv` (fixes #1024 h/t @hr5959)
- `utils.format_cell` type (h/t @victorj2307)
- duktape niggles
- demo cleanup
2018-03-12 22:51:54 -04:00

9.9 KiB

Databases

"Database" is a catch-all term referring to traditional RDBMS as well as K/V stores, document databases, and other "NoSQL" storages. There are many external database systems as well as browser APIs like WebSQL and localStorage

This demo discusses general strategies and provides examples for a variety of database systems. The examples are merely intended to demonstrate very basic functionality.

Structured Tables

Database tables are a common import and export target for spreadsheets. One common representation of a database table is an array of JS objects whose keys are column headers and whose values are the underlying data values. For example,

Name Index
Barack Obama 44
Donald Trump 45

is naturally represented as an array of objects

[
  { Name: "Barack Obama", Index: 44 },
  { Name: "Donald Trump", Index: 45 }
]

The sheet_to_json and json_to_sheet helper functions work with objects of similar shape, converting to and from worksheet objects. The corresponding worksheet would include a header row for the labels:

XXX|      A       |   B   |
---+--------------+-------+
 1 | Name         | Index |
 2 | Barack Obama |    44 |
 3 | Donald Trump |    45 |

Building Schemas from Worksheets

The sheet_to_json helper function generates arrays of JS objects that can be scanned to determine the column "types", and there are third-party connectors that can push arrays of JS objects to database tables.

The sexql browser demo uses WebSQL, which is limited to the SQLite fundamental types.

Implementation details (click to show)

The sexql schema builder scans the first row to find headers:

  if(!ws || !ws['!ref']) return;
  var range = XLSX.utils.decode_range(ws['!ref']);
  if(!range || !range.s || !range.e || range.s > range.e) return;
  var R = range.s.r, C = range.s.c;

  var names = new Array(range.e.c-range.s.c+1);
  for(C = range.s.c; C<= range.e.c; ++C){
    var addr = XLSX.utils.encode_cell({c:C,r:R});
    names[C-range.s.c] = ws[addr] ? ws[addr].v : XLSX.utils.encode_col(C);
  }

After finding the headers, a deduplication step ensures that data is not lost. Duplicate headers will be suffixed with _1, _2, etc.

  for(var i = 0; i < names.length; ++i) if(names.indexOf(names[i]) < i)
    for(var j = 0; j < names.length; ++j) {
      var _name = names[i] + "_" + (j+1);
      if(names.indexOf(_name) > -1) continue;
      names[i] = _name;
    }

A column-major walk helps determine the data type. For SQLite the only relevant data types are REAL and TEXT. If a string or date or error is seen in any value of a column, the column is marked as TEXT:

  var types = new Array(range.e.c-range.s.c+1);
  for(C = range.s.c; C<= range.e.c; ++C) {
    var seen = {}, _type = "";
    for(R = range.s.r+1; R<= range.e.r; ++R)
      seen[(ws[XLSX.utils.encode_cell({c:C,r:R})]||{t:"z"}).t] = true;
    if(seen.s || seen.str) _type = "TEXT";
    else if(seen.n + seen.b + seen.d + seen.e > 1) _type = "TEXT";
    else switch(true) {
      case seen.b:
      case seen.n: _type = "REAL"; break;
      case seen.e: _type = "TEXT"; break;
      case seen.d: _type = "TEXT"; break;
    }
    types[C-range.s.c] = _type || "TEXT";
  }

The included SheetJSSQL.js script demonstrates SQL statement generation.

Objects, K/V and "Schema-less" Databases

So-called "Schema-less" databases allow for arbitrary keys and values within the entries in the database. K/V stores and Objects add additional restrictions.

There is no natural way to translate arbitrarily shaped schemas to worksheets in a workbook. One common trick is to dedicate one worksheet to holding named keys. For example, considering the JS object:

{
  "title": "SheetDB",
  "metadata": {
    "author": "SheetJS",
    "code": 7262
  },
  "data": [
    { "Name": "Barack Obama", "Index": 44 },
    { "Name": "Donald Trump", "Index": 45 },
  ]
}

A dedicated worksheet should store the one-off named values:

XXX|        A        |    B    |
---+-----------------+---------+
 1 | Path            | Value   |
 2 | title           | SheetDB |
 3 | metadata.author | SheetJS |
 4 | metadata.code   |    7262 |

The included ObjUtils.js script demonstrates object-workbook conversion:

Implementation details (click to show)
function deepset(obj, path, value) {
  if(path.indexOf(".") == -1) return obj[path] = value;
  var parts = path.split(".");
  if(!obj[parts[0]]) obj[parts[0]] = {};
  return deepset(obj[parts[0]], parts.slice(1).join("."), value);
}
function workbook_to_object(wb) {
  var out = {};

  /* assign one-off keys */
  var ws = wb.Sheets["_keys"]; if(ws) {
    var data = XLSX.utils.sheet_to_json(ws, {raw:true});
    data.forEach(function(r) { deepset(out, r.path, r.value); });
  }

  /* assign arrays from worksheet tables */
  wb.SheetNames.forEach(function(n) {
    if(n == "_keys") return;
    out[n] = XLSX.utils.sheet_to_json(wb.Sheets[n], {raw:true});
  });

  return out;
}

function walk(obj, key, arr) {
  if(Array.isArray(obj)) return;
  if(typeof obj != "object") { arr.push({path:key, value:obj}); return; }
  Object.keys(obj).forEach(function(k) { walk(obj[k], key?key+"."+k:k, arr); });
}
function object_to_workbook(obj) {
  var wb = XLSX.utils.book_new();

  /* keyed entries */
  var base = []; walk(obj, "", base);
  var ws = XLSX.utils.json_to_sheet(base, {header:["path", "value"]});
  XLSX.utils.book_append_sheet(wb, ws, "_keys");

  /* arrays */
  Object.keys(obj).forEach(function(k) {
    if(!Array.isArray(obj[k])) return;
    XLSX.utils.book_append_sheet(wb, XLSX.utils.json_to_sheet(obj[k]), k);
  });

  return wb;
}

Browser APIs

WebSQL

WebSQL is a popular SQL-based in-browser database available on Chrome / Safari. In practice, it is powered by SQLite, and most simple SQLite-compatible queries work as-is in WebSQL.

The public demo http://sheetjs.com/sexql generates a database from workbook.

LocalStorage and SessionStorage

The Storage API, encompassing localStorage and sessionStorage, describes simple key-value stores that only support string values and keys. Objects can be stored as JSON using JSON.stringify and JSON.parse to set and get keys.

SheetJSStorage.js extends the Storage prototype with a load function to populate the db based on an object and a dump function to generate a workbook from the data in the storage. LocalStorage.html tests localStorage.

IndexedDB

IndexedDB is a more complex storage solution, but the localForage wrapper supplies a Promise-based interface mimicking the Storage API.

SheetJSForage.js extends the localforage object with a load function to populate the db based on an object and a dump function to generate a workbook from the data in the storage. LocalForage.html forces IndexedDB mode.

External Database Demos

SQL Databases

There are nodejs connector libraries for all of the popular RDBMS systems. They have facilities for connecting to a database, executing queries, and obtaining results as arrays of JS objects that can be passed to json_to_sheet. The main differences surround API shape and supported data types.

SQLite

The better-sqlite3 module provides a very simple API for working with SQLite databases. Statement#all runs a prepared statement and returns an array of JS objects.

SQLiteTest.js generates a simple two-table SQLite database (SheetJS1.db), exports to XLSX (sqlite.xlsx), imports the new XLSX file to a new database (SheetJS2.db) and verifies the tables are preserved.

MySQL / MariaDB

The mysql2 module supplies a callback API as well as a Promise wrapper. Connection#query runs a statement and returns an array whose first element is an array of JS objects.

MySQLTest.js connects to the MySQL instance running on localhost, builds two tables in the sheetjs database, exports to XLSX, imports the new XLSX file to the sheetj5 database and verifies the tables are preserved.

PostgreSQL

The pg module supplies a Promise wrapper. Like with mysql2, Client#query runs a statement and returns a result object. The rows key of the object is an array of JS objects.

PgSQLTest.js connects to the PostgreSQL server on localhost, builds two tables in the sheetjs database, exports to XLSX, imports the new XLSX file to the sheetj5 database and verifies the tables are preserved.

Knex Query Builder

The knex module builds SQL queries. The same exact code can be used against Oracle Database, MSSQL, and other engines.

KnexTest.js uses the sqlite3 connector and follows the same procedure as the SQLite test. The included SheetJSKnex.js script converts between the query builder and the common spreadsheet format.

Key/Value Stores

Redis

Redis is a powerful data structure server that can store simple strings, sets, sorted sets, hashes and lists. One simple database representation stores the strings in a special worksheet (_strs), the manifest in another worksheet (_manifest), and each object in its own worksheet (obj##).

RedisTest.js connects to a local Redis server, populates data based on the official Redis tutorial, exports to XLSX, flushes the server, imports the new XLSX file and verifies the data round-tripped correctly. SheetJSRedis.js includes the implementation details.

LowDB

LowDB is a small schemaless database powered by lodash. _.get and _.set helper functions make storing metadata a breeze. The included SheetJSLowDB.js script demonstrates a simple adapter that can load and dump data.

Analytics