docs.sheetjs.com/docz/docs/03-demos/23-data/16-postgresql.md
Asad 7315b4211c feat(pg): add Ubuntu/Debian PostgreSQL installation guide & enhanced SheetJSPG.js type deduc
- Include Ubuntu/Debian installation instructions
- Improve PostgreSQL table creation with better type detection
- Enhance date format detection and parsing

Breaking Changes:
- Replace `aoo_to_pg_table` to `sheet_to_pg_table` & now it takes worksheet object instead of array of arrays
- Changed type detection algorithm may produce column types
2024-11-20 19:45:34 -05:00

15 KiB

title sidebar_label pagination_prev pagination_next sidebar_custom_props
Sheets with PostgreSQL PostgreSQL demos/cli/index demos/local/index
sql
true

import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';

PostgreSQL (colloquially referenced as "Postgres") is an open source object-relational database system.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses SheetJS to exchange data between spreadsheets and PostgreSQL databases. We'll explore how to save tables from a database to spreadsheets and how to add data from spreadsheets into a database.

:::caution pass

It is strongly recommended to use PostgreSQL with a query builder or ORM.

While it is possible to generate SQL statements directly, there are many subtle details and pitfalls. Battle-tested solutions generally provide mitigations against SQL injection and other vulnerabilities.

:::

:::note Tested Deployments

This demo was tested in the following environments:

Postgres Connector Library Date
16.2.1 pg (8.11.4) 2024-03-31
15.6 pg (8.11.4) 2024-03-31
14.11 pg (8.11.4) 2024-03-31

:::

Integration Details

The SheetJS NodeJS module can be loaded in NodeJS scripts that connect to PostgreSQL databases.

This demo uses the pg connector module1, but the same mechanics apply to other PostgreSQL libraries.

Exporting Data

Client#query returns a Promise that resolves to a result set. The rows property of the result is an array of objects.

The SheetJS json_to_sheet method2 can generate a worksheet object3 from the array of objects:

const table_name = "Tabeller1"; // name of table

/* fetch all data from specified table */
const res = await client.query(`SELECT * FROM ${table_name}`);

/* generate a SheetJS worksheet object from the data */
const worksheet = XLSX.utils.json_to_sheet(res.rows);

A workbook object can be built from the worksheet using utility functions4. The workbook can be exported using the SheetJS writeFile method5:

/* create a new workbook and add the worksheet */
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, worksheet, "Sheet1");

/* export workbook to XLSX */
XLSX.writeFile(wb, "SheetJSPGExport.xlsx");

Importing Data

The SheetJS sheet_to_json function6 takes a worksheet object and generates an array of objects.

Queries must be manually generated from the objects. Assuming the field names in the object match the column headers, a loop can generate INSERT queries.

:::danger pass

PostgreSQL does not allow parameterized queries with variable column names

INSERT INTO table_name (?) VALUES (?);
-- ---------------------^ variable column names are not valid

Queries are generated manually. To help prevent SQL injection vulnerabilities, the pg-format7 module escapes identifiers and fields.

:::

/* generate an array of arrays from the worksheet */
const aoo = XLSX.utils.sheet_to_json(ws);

const table_name = "Blatte1"; // name of table

/* loop through the data rows */
for(let row of aoo) {

  /* generate format helper strings */
  const ent = Object.entries(row);
  const Istr = Array.from({length: entries.length}, ()=>"%I").join(", ");
  const Lstr = Array.from({length: entries.length}, ()=>"%L").join(", ");

  /* generate INSERT statement */
  let query = format.withArray(
    `INSERT INTO %I (${Istr}) VALUES(${Lstr})`,
    [ table_name, ...ent.map(x => x[0]), ...ent.map(x => x[1]) ]
  );

  /* execute INSERT statement */
  await client.query(query);
}

Creating a Table

The worksheet can be scanned to determine column names and types. With the names and types, a CREATE TABLE query can be written.

Implementation Details (click to show)

The sheet_to_pg_table function:

  • scans worksheet cells to determine column names and types
  • drops and creates a new table with the determined column names and types
  • loads the entire dataset into the new table
/* create table and load data given a worksheet and a PostgreSQL client */
async function sheet_to_pg_table(client, worksheet, tableName) {
    if (!worksheet['!ref']) return;
    
    const range = XLSX.utils.decode_range(worksheet['!ref']);
    
    /* Extract headers from first row, clean names for PostgreSQL */
    const headers = [];
    for (let col = range.s.c; col <= range.e.c; col++) {
        const cellAddress = XLSX.utils.encode_cell({ r: range.s.r, c: col });
        const cell = worksheet[cellAddress];
        const headerValue = cell ? String(cell.v).replace(/[^a-zA-Z0-9_]/g, '_') : `column_${col + 1}`;
        headers.push(headerValue.toLowerCase());
    }

    /* Group cell values by column for type deduction */
    const columnValues = headers.map(() => []);
    for (let row = range.s.r + 1; row <= range.e.r; row++) {
        for (let col = range.s.c; col <= range.e.c; col++) {
            const cellAddress = XLSX.utils.encode_cell({ r: row, c: col });
            const cell = worksheet[cellAddress];
            columnValues[col].push(cell);
        }
    }

    /* Deduce PostgreSQL type for each column */
    const types = {};
    headers.forEach((header, idx) => {
        types[header] = deduceType(columnValues[idx]);
    });

    /* Delete table if it exists in the DB */
    await client.query(format('DROP TABLE IF EXISTS %I', tableName));
    
    /* Create table */
    const createTableSQL = format(
        'CREATE TABLE %I (%s)',
        tableName,
        headers.map(header => format('%I %s', header, types[header])).join(', ')
    );
    await client.query(createTableSQL);

    /* Insert data row by row */
    for (let row = range.s.r + 1; row <= range.e.r; row++) {
        const values = headers.map((header, col) => {
            const cellAddress = XLSX.utils.encode_cell({ r: row, c: col });
            const cell = worksheet[cellAddress];
            return parseValue(cell, types[header]);
        });

        const insertSQL = format(
            'INSERT INTO %I (%s) VALUES (%s)',
            tableName,
            headers.map(h => format('%I', h)).join(', '),
            values.map(() => '%L').join(', ')
        );
        await client.query(format(insertSQL, ...values));
    }
}

function deduceType(cells) {
    if (!cells || cells.length === 0) return 'text';

    const nonEmptyCells = cells.filter(cell => cell && cell.v != null);
    if (nonEmptyCells.length === 0) return 'text';

    // Check for dates by looking at both cell type and formatted value
    const isDateCell = cell => cell?.t === 'd' || (cell?.t === 'n' && cell.w && /\d{4}-\d{2}-\d{2}|\d{1,2}\/\d{1,2}\/\d{4}|\d{2}-[A-Za-z]{3}-\d{4}|[A-Za-z]{3}-\d{2}|\d{1,2}-[A-Za-z]{3}/.test(cell.w));

    if (nonEmptyCells.some(isDateCell)) { return 'date'; }

    const allBooleans = nonEmptyCells.every(cell => cell.t === 'b');
    if (allBooleans) { return 'boolean'; }

    const allNumbers = nonEmptyCells.every(cell => cell.t === 'n' || (cell.t === 's' && !isNaN(cell.v.replace(/[,$\s%()]/g, ''))));

    if (allNumbers) {
        const numbers = nonEmptyCells.map(cell => {
            if (cell.t === 'n') return cell.v;
            return parseFloat(cell.v.replace(/[,$\s%()]/g, ''));
        });

        const needsPrecision = numbers.some(num => {
            const str = num.toString();
            return str.includes('e') ||
                   (str.includes('.') && str.split('.')[1].length > 6) ||
                   Math.abs(num) > 1e15;
        });

        return needsPrecision ? 'numeric' : 'double precision';
    }
    return 'text';
}

function parseValue(cell, type) {
    if (!cell || cell.v == null) return null;
    
    switch (type) {
        case 'date':
            if (cell.t === 'd') { return cell.v.toISOString().split('T')[0]; }
            if (cell.t === 'n') {
                const date = new Date((cell.v - 25569) * 86400 * 1000);
                return date.toISOString().split('T')[0];
            }
            return null;

        case 'numeric':
        case 'double precision':
            if (cell.t === 'n') return cell.v;
            if (cell.t === 's') {
                const cleaned = cell.v.replace(/[,$\s%()]/g, '');
                if (!isNaN(cleaned)) return parseFloat(cleaned);
            }
            return null;

        case 'boolean':
            return cell.t === 'b' ? cell.v : null;

        default:
            return String(cell.v);
    }
}

Complete Example

  1. Install and start the PostgreSQL server.
Installation Notes (click to show)

On macOS, install the postgresql formula with Homebrew:

brew install postgresql@16

The last few lines of the installer explain how to start the database:

Or, if you don't want/need a background service you can just run:
// highlight-next-line
  LC_ALL="C" /usr/local/opt/postgresql@16/bin/postgres -D /usr/local/var/postgresql@16

On Linux, install postgresql by running the following script:

echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
wget -qO - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

sudo apt update
sudo apt install postgresql-16
sudo systemctl start postgresql

Run the command to start a local database instance.

  1. Drop any existing database with the name SheetJSPG:
dropdb SheetJSPG

# Ubuntu/Debian
sudo -i -u postgres dropdb SheetJSPG

:::info pass

If the server is running elsewhere, or if the username is different from the current user, command-line flags can override the defaults.

Option Explanation
-h HOSTNAME Name of the server
-p PORT specifies the port number
-U USERNAME specifies the username

:::

  1. Create an empty SheetJSPG database using the createdb command:
createdb SheetJSPG

# Ubuntu/Debian
sudo -i -u postgres createdb SheetJSPG

:::note pass

createdb supports the same -h, -p, and -U flags as dropdb.

:::

Connector Test

  1. Create a project folder:
mkdir sheetjs-pg
cd sheetjs-pg
npm init -y
  1. Install the pg connector module:
npm i --save pg@8.11.4
  1. Save the following example codeblock to PGTest.js:
const pg = require("pg");
const client = new pg.Client({
  database:"SheetJSPG",
// highlight-start
  host: "127.0.0.1", // localhost
  port: 5432,
  //user: "",
  //password: ""
// highlight-end
});
(async() => {

await client.connect();
const res = await client.query('SELECT $1::text as message', ['Hello world!']);
console.log(res.rows[0].message); // Hello world!
await client.end();

})();
  1. Edit the new PGTest.js script and modify the highlighted lines from the codeblock to reflect the database deployment settings.

The settings in the codeblock match the default configuration for macOS Homebrew PostgreSQL server. For other deployments:

  • If the server is not running on your computer, set host and port to the correct host name and port number.

  • If the server expects a different username and password, uncomment the user and password lines and replace the values with the username and password.

  • For Ubuntu/Debian PostgreSQL installations, the default user is postgres. The password must be set during installation or using sudo -u postgres psql followed by \password postgres in the psql prompt.

  1. Run the script:
node PGTest.js

It should print Hello world!

:::caution pass

If the output is not Hello world! or if there is an error, please report the issue to the pg connector project for further diagnosis.

:::

Add SheetJS

  1. Install dependencies:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz pg-format@1.0.4}

  1. Download SheetJSPG.js:
curl -LO https://docs.sheetjs.com/postgresql/SheetJSPG.js

This script will:

  • read and parse the test file pres.numbers
  • create a connection to the SheetJSPG database on a local PostgreSQL server
  • load data from the first worksheet into a table with name Presidents
  • disconnect and reconnect to the database
  • dump data from the table Presidents
  • export the dataset to SheetJSPG.xlsx
  1. Edit the SheetJSPG.js script.

The script defines an opts object:

const XLSX = require("xlsx");
const opts = {
  database:"SheetJSPG",
// highlight-start
  host: "127.0.0.1", // localhost
  port: 5432,
  //user: "",
  //password: ""
// highlight-end
};

The settings in the codeblock match the default configuration for macOS Homebrew PostgreSQL server. For other deployments:

  • If the server is not running on your computer, set host and port to the correct host name and port number.

  • If the server expects a different username and password, uncomment the user and password lines and replace the values with the username and password.

  • For Ubuntu/Debian PostgreSQL installations, the default user is postgres. The password must be set during installation or using sudo -u postgres psql followed by \password postgres in the psql prompt.

  1. Fetch the example file pres.numbers:
curl -L -O https://docs.sheetjs.com/pres.numbers
  1. Run the script:
node SheetJSPG.js
  1. Verify the result:
  • SheetJSPGExport.xlsx can be opened in a spreadsheet app or tested in the terminal
npx xlsx-cli SheetJSPGExport.xlsx
  • The database server can be queried using the psql command line tool.

If the server is running locally, the command will be:

psql SheetJSPG -c 'SELECT * FROM "Presidents";'

:::note pass

psql supports the same -h, -p, and -U flags as dropdb and createdb.

:::


  1. See the official pg website for more info. ↩︎

  2. See json_to_sheet in "Utilities" ↩︎

  3. See "Sheet Objects" in "SheetJS Data Model" for more details. ↩︎

  4. See "Workbook Helpers" in "Utilities" for details on book_new and book_append_sheet. ↩︎

  5. See writeFile in "Writing Files" ↩︎

  6. See sheet_to_json in "Utilities" ↩︎

  7. The pg-format package is available on the public NPM registry. Even though the project is marked as deprecated, the official pg website still recommends pg-format ↩︎