docs.sheetjs.com/docz/docs/03-demos/06-content.md
2023-01-09 19:57:01 -05:00

27 KiB
Raw Blame History

title
Content and Site Generation

import current from '/version.js';

With the advent of server-side frameworks and content management systems, it is possible to build sites whose source of truth is a spreadsheet! This demo explores a number of approaches.

Lume

The official Sheets plugin uses SheetJS to load data from spreadsheets.

Lume Demo

:::note

This was tested against lume v1.14.2 on 2022 December 27.

:::

Complete Example (click to show)
  1. Create a stock site:
mkdir -p sheetjs-lume
cd sheetjs-lume
deno run -Ar https://deno.land/x/lume/init.ts

When prompted, enter the following options:

  • Use TypeScript for the configuration file: press Enter (use default N)
  • Do you want to use plugins: type sheets and press Enter

The project will be configured and modules will be installed.

  1. Download https://sheetjs.com/pres.numbers and place in a _data folder:
mkdir -p _data
curl -L -o _data/pres.numbers https://sheetjs.com/pres.numbers
  1. Create a index.njk file that references the file. Since the file is pres.numbers, the parameter name is pres:
<h2>Presidents</h2>
<table><thead><th>Name</th><th>Index</th></thead>
  <tbody>
  {% for row in pres %}{% if (loop.index >= 1) %}
    <tr>
      <td>{{ row.Name }}</td>
      <td>{{ row.Index }}</td>
    </tr>
  {% endif %}{% endfor %}
  </tbody>
</table>
  1. Run the development server:
deno task lume --serve

To verify it works, access http://localhost:3000 from your web browser. Adding a new row and saving pres.numbers should refresh the data

  1. Stop the server (press CTRL+C in the terminal window) and run
deno task lume

This will create a static site in the _site folder, which can be served with:

npx http-server _site

Accessing the page http://localhost:8080 will show the page contents.

GatsbyJS

gatsby-transformer-excel generates nodes for each data row of each worksheet. The official documentation includes examples and more detailed usage instructions.

:::note

gatsby-transformer-excel is maintained by the Gatsby core team and all bugs should be directed to the main Gatsby project. If it is determined to be a bug in the parsing logic, issues should then be raised with the SheetJS project.

:::

GraphQL details (click to show)

gatsby-transformer-excel generates nodes for each data row of each worksheet. Under the hood, it uses sheet_to_json to generate row objects using the headers in the first row as keys.

pres.xlsx

Assuming the file name is pres.xlsx and the data is stored in "Sheet1", the following nodes will be created:

[
  { Name: "Bill Clinton", Index: 42, type: "PresXlsxSheet1" },
  { Name: "GeorgeW Bush", Index: 43, type: "PresXlsxSheet1" },
  { Name: "Barack Obama", Index: 44, type: "PresXlsxSheet1" },
  { Name: "Donald Trump", Index: 45, type: "PresXlsxSheet1" },
  { Name: "Joseph Biden", Index: 46, type: "PresXlsxSheet1" },
]

The type is a proper casing of the file name concatenated with the sheet name.

The following query pulls the Name and Index fields from each row:

{
  allPresXlsxSheet1 { # "all" followed by type
    edges {
      node { # each line in this block should be a field in the data
        Name
        Index
      }
    }
  }
}

:::caution

gatsby-transformer-excel uses an older version of the library. It can be overridden through a package.json override in the latest versions of NodeJS:

{`\
{
  "overrides": {
    "xlsx": "https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz"
  }
}`}

:::

GatsbyJS Demo

Complete Example (click to show)

:::note

This demo was tested on 2022 November 11 against create-gatsby@3.0.0. The generated project used gatsby@5.0.0 and react@18.2.0.

:::

  1. Run npm init gatsby -- -y sheetjs-gatsby to create the template site.

  2. Follow the on-screen instructions for starting the local development server:

cd sheetjs-gatsby
npm run develop

Open a web browser to the displayed URL (typically http://localhost:8000/)

  1. Edit package.json and add the highlighted lines in the JSON object:
{`\
{
  // highlight-start
  "overrides": {
    "xlsx": "https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz"
  },
  // highlight-end
  "name": "sheetjs-gatsby",
  "version": "1.0.0",
`}
  1. Install the library and plugins:
{`\
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz
npm i --save gatsby-transformer-excel gatsby-source-filesystem
`}
  1. Edit gatsby-config.js and add the following lines to the plugins array:
  plugins: [
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `data`,
        path: `${__dirname}/src/data/`,
      },
    },
    `gatsby-transformer-excel`,
  ],

Stop and restart the development server process (npm run develop).

  1. Make a src/data directory, download https://sheetjs.com/pres.xlsx, and move the downloaded file into the new folder:
mkdir -p src/data
curl -L -o src/data/pres.xlsx https://sheetjs.com/pres.xlsx
  1. To verify, open the GraphiQL editor at http://localhost:8000/___graphql.

There is an editor in the left pane. Paste the following query into the editor:

{
  allPresXlsxSheet1 {
    edges {
      node {
        Name
        Index
      }
    }
  }
}

Press the Execute Query button and data should show up in the right pane:

GraphiQL Screenshot

  1. Create a new file src/pages/pres.js that uses the query and displays the result:
import { graphql } from "gatsby"
import * as React from "react"

export const query = graphql`query {
  allPresXlsxSheet1 {
    edges {
      node {
        Name
        Index
      }
    }
  }
}`;

const PageComponent = ({data}) => {
  return ( <pre>{JSON.stringify(data, 2, 2)}</pre>);
};
export default PageComponent;

After saving the file, access http://localhost:8000/pres. The displayed JSON is the data that the component receives:

{
  "allPresXlsxSheet1": {
    "edges": [
      {
        "node": {
          "Name": "Bill Clinton",
          "Index": 42
        }
      },
  // ....
  1. Change PageComponent to display a table based on the data:
import { graphql } from "gatsby"
import * as React from "react"

export const query = graphql`query {
  allPresXlsxSheet1 {
    edges {
      node {
        Name
        Index
      }
    }
  }
}`;

// highlight-start
const PageComponent = ({data}) => {
  const rows = data.allPresXlsxSheet1.edges.map(r => r.node);
  return ( <table>
    <thead><tr><th>Name</th><th>Index</th></tr></thead>
    <tbody>{rows.map(row => ( <tr>
      <td>{row.Name}</td>
      <td>{row.Index}</td>
    </tr>))}</tbody>
  </table> );
};
// highlight-end

export default PageComponent;

Going back to the browser, http://localhost:8000/pres will show a table:

Data in Table

  1. Open the file src/data/pres.xlsx in Excel or LibreOffice or Numbers. Add a new row at the end of the file:

New Row in File

Save the file and notice that the table has refreshed with the new data:

Updated Table

  1. Stop the development server and run npm run build. Once the build is finished, the display will confirm that the /pres route is static:
Pages

┌ src/pages/404.js
│ ├   /404/
│ └   /404.html
├ src/pages/index.js
│ └   /
└ src/pages/pres.js
  └   /pres/

  ╭────────────────────────────────────────────────────────────────╮
  │                                                                │
  │   (SSG) Generated at build time                                │
  │ D (DSG) Deferred static generation - page generated at runtime │
  │ ∞ (SSR) Server-side renders at runtime (uses getServerData)    │
  │ λ (Function) Gatsby function                                   │
  │                                                                │
  ╰────────────────────────────────────────────────────────────────╯

The built page will be placed in public/pres/index.html. Open the page with a text editor and search for "SheetJS" to verify raw HTML was generated:

<tr><td>SheetJS Dev</td><td>47</td></tr>

Bundling Data

Bundlers can run JS code and process assets during development and during site builds. Custom plugins can extract data from spreadsheets.

ViteJS

:::note

This demo covers static asset imports. For processing files in the browser, the "Bundlers" demo includes an example.

:::

ViteJS supports static asset imports, but the default raw loader interprets data as UTF-8 strings. This corrupts binary formats like XLSX and XLS, but a custom loader can override the default behavior.

For a pure static site, a plugin can load data into an array of row objects. The SheetJS work is performed in the plugin. The library is not loaded in the page!

import { readFileSync } from 'fs';
import { read, utils } from 'xlsx';
import { defineConfig } from 'vite';

export default defineConfig({
  assetsInclude: ['**/*.xlsx'], // xlsx file should be treated as assets

  plugins: [
    { // this plugin handles ?sheetjs tags
      name: "vite-sheet",
      transform(code, id) {
        if(!id.match(/\?sheetjs$/)) return;
        var wb = read(readFileSync(id.replace(/\?sheetjs$/, "")));
        var data = utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]]);
        return `export default JSON.parse('${JSON.stringify(data)}')`;
      }
    }
  ]
});

This loader uses the query sheetjs:

import data from './data.xlsx?sheetjs';

document.querySelector('#app').innerHTML = `<div><pre>
${data.map(row => JSON.stringify(row)).join("\n")}
</pre></div>`;
Base64 plugin (click to show)

This loader pulls in data as a Base64 string that can be read with XLSX.read. While this approach works, it is not recommended since it loads the library in the front-end site.

import { readFileSync } from 'fs';
import { defineConfig } from 'vite';

export default defineConfig({
  assetsInclude: ['**/*.xlsx'], // mark that xlsx file should be treated as assets

  plugins: [
    { // this plugin handles ?b64 tags
      name: "vite-b64-plugin",
      transform(code, id) {
        if(!id.match(/\?b64$/)) return;
        var path = id.replace(/\?b64/, "");
        var data = readFileSync(path, "base64");
        return `export default '${data}'`;
      }
    }
  ]
});

When importing using the b64 query, the raw Base64 string will be exposed. This can be read directly with XLSX.read in JS code:

import { read, utils } from "xlsx";

/* reference workbook */
import b64 from './data.xlsx?b64';
/* parse workbook and export first sheet to CSV */
const wb = await read(b64);
const wsname = wb.SheetNames[0];
const csv = utils.sheet_to_csv(wb.Sheets[wsname]);

document.querySelector('#app').innerHTML = `<div><pre>
<b>${wsname}</b>
${csv}
</pre></div>`;

Site Generators

NextJS

:::note

This was tested against next v13.1.1 on 2022 December 28.

:::

:::info

At a high level, there are two ways to pull spreadsheet data into NextJS apps: loading an asset module or performing the file read operations from the NextJS lifecycle. At the time of writing, NextJS does not offer an out-of-the-box asset module solution, so this demo focuses on raw operations. NextJS does not watch the spreadsheets, so next dev hot reloading will not work!

:::

The general strategy with NextJS apps is to generate HTML snippets or data from the lifecycle functions and reference them in the template.

HTML output can be generated using XLSX.utils.sheet_to_html and inserted into the document using the dangerouslySetInnerHTML attribute:

export default function Index({html, type}) { return (
  // ...
// highlight-next-line
  <div dangerouslySetInnerHTML={{ __html: html }} />
  // ...
); }

:::warning Reading and writing files during the build process

fs cannot be statically imported from the top level in NextJS pages. The dynamic import must happen within a lifecycle function. For example:

/* it is safe to import the library from the top level */
import { readFile, utils, set_fs } from 'xlsx';
/* it is not safe to import 'fs' from the top level ! */
// import * as fs from 'fs'; // this will fail
import { join } from 'path';
import { cwd } from 'process';

export async function getServerSideProps() {
// highlight-next-line
  set_fs(await import("fs")); // dynamically import 'fs' when needed
  const wb = readFile(join(cwd(), "public", "sheetjs.xlsx")); // works
  // ...
}

:::

:::caution Next 13+ and SWC

Next 13 switched to the SWC minifier. There are known issues with the minifier. Until those issues are resolved, SWC should be disabled in next.config.js:

module.exports = {
// highlight-next-line
  swcMinify: false
};

:::

Demo

Complete Example (click to show)
  1. Disable NextJS telemetry:
npx next@13.1.1 telemetry disable

Confirm it is disabled by running

npx next@13.1.1 telemetry status
  1. Set up folder structure. At the end, a pages folder with a sheets subfolder must be created. On Linux or MacOS or WSL:
mkdir -p pages/sheets/
  1. Download the test file and place in the project root. On Linux or MacOS or WSL:
curl -LO https://docs.sheetjs.com/next/sheetjs.xlsx
  1. Install dependencies:
npm i --save https://cdn.sheetjs.com/xlsx-latest/xlsx-latest.tgz next@13.1.1
  1. Download test scripts:

Download and place the following scripts in the pages subfolder:

Download [id].js and place in the pages/sheets subfolder.

:::caution Percent-Encoding in the script name

The [id].js script must have the literal square brackets in the name. If your browser saved the file to %5Bid%5D.js. rename the file.

:::

On Linux or MacOS or WSL:

cd pages
curl -LO https://docs.sheetjs.com/next/index.js
curl -LO https://docs.sheetjs.com/next/getServerSideProps.js
curl -LO https://docs.sheetjs.com/next/getStaticPaths.js
curl -LO https://docs.sheetjs.com/next/getStaticProps.js
cd sheets
curl -LOg 'https://docs.sheetjs.com/next/[id].js'
cd ../..
  1. Test the deployment:
npx next@13.1.1

Open a web browser and access:

The individual worksheets are available at

  1. Stop the server and run a production build:
npx next@13.1.1 build

The final output will show a list of the routes and types:

Route (pages)                              Size     First Load JS
┌ ○ /                                      541 B          77.4 kB
├ ○ /404                                   181 B          73.7 kB
├ λ /getServerSideProps                    594 B          77.4 kB
├ ● /getStaticPaths                        2.56 kB        79.4 kB
├ ● /getStaticProps                        591 B          77.4 kB
└ ● /sheets/[id] (447 ms)                  569 B          77.4 kB
    ├ /sheets/0
    ├ /sheets/1
    └ /sheets/2

As explained in the summary, the /getStaticPaths and /getStaticProps routes are completely static. 3 /sheets/# pages were generated, corresponding to 3 worksheets in the file. /getServerSideProps is server-rendered.

  1. Try to build a static site:
npx next@13.1.1 export

:::note The static export will fail!

A static page cannot be generated at this point because /getServerSideProps is still server-rendered.

:::

  1. Remove pages/getServerSideProps.js and rebuild with npx next@13.1.1 build

Inspecting the output, there should be no lines with the λ symbol:

Route (pages)                              Size     First Load JS
┌ ○ /                                      541 B          77.4 kB
├ ○ /404                                   181 B          73.7 kB
├ ● /getStaticPaths                        2.56 kB        79.4 kB
├ ● /getStaticProps                        591 B          77.4 kB
└ ● /sheets/[id] (459 ms)                  569 B          77.4 kB
    ├ /sheets/0
    ├ /sheets/1
    └ /sheets/2
  1. Generate the static site:
npx next@13.1.1 export

The static site will be written to the out subfolder, which can be hosted with

npx http-server out

The command will start a local HTTP server on port 8080.

"Static Site Generation" using getStaticProps

When using getStaticProps, the file will be read once during build time.

import { readFile, set_fs, utils } from 'xlsx';

export async function getStaticProps() {
  /* read file */
  set_fs(await import("fs"));
  const wb = readFile(path_to_file)

  /* generate and return the html from the first worksheet */
  const html = utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]);
  return { props: { html } };
};

"Static Site Generation with Dynamic Routes" using getStaticPaths

Typically a static site with dynamic routes has an endpoint /sheets/[id] that implements both getStaticPaths and getStaticProps.

  • getStaticPaths should return an array of worksheet indices:
export async function getStaticPaths() {
  /* read file */
  set_fs(await import("fs"));
  const wb = readFile(path);

  /* generate an array of objects that will be used for generating pages */
  const paths = wb.SheetNames.map((name, idx) => ({ params: { id: idx.toString() } }));
  return { paths, fallback: false };
};

:::note

For a pure static site, fallback must be set to false!

:::

  • getStaticProps will generate the actual HTML for each page:
export async function getStaticProps(ctx) {
  /* read file */
  set_fs(await import("fs"));
  const wb = readFile(path);

  /* get the corresponding worksheet and generate HTML */
  const ws = wb.Sheets[wb.SheetNames[ctx.params.id]]; // id from getStaticPaths
  const html = utils.sheet_to_html(ws);
  return { props: { html } };
};

"Server-Side Rendering" using getServerSideProps

:::caution Do not use on a static site

These routes require a NodeJS dynamic server. Static page generation will fail!

getStaticProps and getStaticPaths support static site generation (SSG).

getServerSideProps is suited for NodeJS hosted deployments where the workbook changes frequently and a static site is undesirable.

:::

When using getServerSideProps, the file will be read on each request.

import { readFile, set_fs, utils } from 'xlsx';

export async function getServerSideProps() {
  /* read file */
  set_fs(await import("fs"));
  const wb = readFile(path_to_file);

  /* generate and return the html from the first worksheet */
  const html = utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]);
  return { props: { html } };
};

NuxtJS

@nuxt/content is a file-based CMS for Nuxt, enabling static-site generation and on-demand server rendering powered by spreadsheets.

:::note

This demo was tested on 2022 November 18 against Nuxt Content v1.15.1.

:::

Nuxt Content Demo

Complete Example (click to show)

:::note

The project was generated using create-nuxt-app v4.0.0. The generated project used Nuxt v2.15.8 and Nuxt Content v1.15.1.

:::

  1. Create a stock app:
npx create-nuxt-app@4.0.0 SheetJSNuxt

When prompted, enter the following options:

  • Project name: press Enter (use default SheetJSNuxt)
  • Programming language: press Down Arrow (TypeScript selected) then Enter
  • Package manager: select Npm and press Enter
  • UI framework: select None and press Enter
  • Nuxt.js modules: scroll to Content, select with Space, then press Enter
  • Linting tools: press Enter (do not select any Linting tools)
  • Testing framework: select None and press Enter
  • Rendering mode: select Universal (SSR / SSG) and press Enter
  • Deployment target: select Static (Static/Jamstack hosting) and press Enter
  • Development tools: press Enter (do not select any Development tools)
  • What is your GitHub username?: press Enter
  • Version control system: select None

The project will be configured and modules will be installed.

  1. Install the SheetJS library and start the server:
cd SheetJSNuxt
npm i --save https://cdn.sheetjs.com/xlsx-latest/xlsx-latest.tgz
npm run dev

When the build finishes, the terminal will display a URL like:

 Listening on: http://localhost:64688/

The server is listening on that URL. Open the link in a web browser.

  1. Download https://sheetjs.com/pres.xlsx and move to the content folder.
curl -L -o content/pres.xlsx https://sheetjs.com/pres.xlsx
  1. Modify nuxt.config.js as follows:
  • Add the following to the top of the script:
import { readFile, utils } from 'xlsx';

// This will be called when the files change
const parseSheet = (file, { path }) => {
  // `path` is a path that can be read with `XLSX.readFile`
  const wb = readFile(path);
  const o = wb.SheetNames.map(name => ({ name, data: utils.sheet_to_json(wb.Sheets[name])}));
  return { data: o };
}
  • Look for the exported object. There should be a content property:
  // Content module configuration: https://go.nuxtjs.dev/config-content
  content: {},

Replace the property with the following definition:

  // content.extendParser allows us to hook into the parsing step
  content: {
    extendParser: {
      // the keys are the extensions that will be matched.  The "." is required
      ".numbers": parseSheet,
      ".xlsx": parseSheet,
      ".xls": parseSheet,
      // can add other extensions like ".fods" as desired
    }
  },

(If the property is missing, add it to the end of the exported object)

  1. Replace pages/index.vue with the following:
<!-- sheetjs (C) 2013-present  SheetJS -- https://sheetjs.com -->
<template><div>
  <div v-for="item in data.data" v-bind:key="item.name">
    <h2>{{ item.name }}</h2>
    <table><thead><tr><th>Name</th><th>Index</th></tr></thead><tbody>
      <tr v-for="row in item.data" v-bind:key="row.Index">
        <td>{{ row.Name }}</td>
        <td>{{ row.Index }}</td>
      </tr>
    </tbody></table>
  </div>
</div></template>

<script>
export default {
  async asyncData ({$content}) {
    return {
      data: await $content('pres').fetch()
    };
  }
};
</script>

The browser should refresh to show the contents of the spreadsheet. If it does not, click Refresh manually or open a new browser window.

Nuxt Demo end of step 5

  1. To verify that hot loading works, open pres.xlsx from the content folder in Excel. Add a new row to the bottom and save the file:

Adding a new line to pres.xlsx

The server terminal window should show a line like:

 Updated ./content/pres.xlsx                                       @nuxt/content 05:43:37

The page should automatically refresh with the new content:

Nuxt Demo end of step 6

  1. Stop the server (press CTRL+C in the terminal window) and run
npm run generate

This will create a static site in the dist folder, which can be served with:

npx http-server dist

Accessing the page http://localhost:8080 will show the page contents. Verifying the static nature is trivial: make another change in Excel and save. The page will not change.

nuxt.config.js configuration

Through an override in nuxt.config.js, Nuxt Content will use custom parsers. Differences from a stock create-nuxt-app config are shown below:

import { readFile, utils } from 'xlsx';

// This will be called when the files change
const parseSheet = (file, { path }) => {
  // `path` is a path that can be read with `XLSX.readFile`
  const wb = readFile(path);
  const o = wb.SheetNames.map(name => ({ name, data: utils.sheet_to_json(wb.Sheets[name])}));
  return { data: o };
}

export default {
// ...

  // content.extendParser allows us to hook into the parsing step
  content: {
    extendParser: {
      // the keys are the extensions that will be matched.  The "." is required
      ".numbers": parseSheet,
      ".xlsx": parseSheet,
      ".xls": parseSheet,
      // can add other extensions like ".fods" as desired
    }
  },

// ...
}

Template Use

When a spreadsheet is placed in the content folder, Nuxt will find it. The data can be referenced in a view with asyncData. The name should not include the extension, so "sheetjs.numbers" would be referenced as "sheetjs":

  async asyncData ({$content}) {
    return {
      // $content('sheetjs') will match files with extensions in nuxt.config.js
      data: await $content('sheetjs').fetch()
    };
  }

In the template, data.data is an array of objects. Each object has a name property for the worksheet name and a data array of row objects. This maps neatly with nested v-for:

  <!-- loop over the worksheets -->
  <div v-for="item in data.data" v-bind:key="item.name">
    <table>
      <!-- loop over the rows of each worksheet -->
      <tr v-for="row in item.data" v-bind:key="row.Index">
        <!-- here `row` is a row object generated from sheet_to_json -->
        <td>{{ row.Name }}</td>
        <td>{{ row.Index }}</td>
      </tr>
    </table>
  </div>