This commit is contained in:
SheetJS 2023-02-11 23:20:11 -05:00
parent 30e51a2244
commit 4a2314409e
3 changed files with 34 additions and 33 deletions

@ -1,40 +1,38 @@
---
title: Data in Version Control
pagination_prev: demos/hosting/index
title: GitHub
pagination_prev: demos/ml
pagination_next: solutions/input
---
Git is a popular system for organizing a historical record of source code and
changes. Git can also store and track binary data artifacts, but data tools
are more effective in processing data stored in plain text formats like CSV.
Many official data releases by governments and organizations include XLSX or
XLS files. SheetJS trivializes the conversion to CSV. For example, in NodeJS:
XLS files. Unfortunately some data sources do not retain older versions.
```js
const XLSX = require("xlsx");
Git is a popular system for organizing a historical record of source code and
changes. Git can also store and track binary data artifacts.
(async() => {
/* Download Data */
const f = await fetch("https://docs.sheetjs.com/pres.xlsx");
const data = await f.arrayBuffer();
GitHub is a popular host for Git repositories. GitHub's "Flat Data" project
explores storing and comparing versions of structured CSV and JSON data. The
official "Excel to CSV" example uses SheetJS to generate CSV data from files:
/* Parse workbook */
// highlight-next-line
const wb = XLSX.read(data);
/* Convert first worksheet to CSV */
const ws = wb.Sheets[wb.SheetNames[0]];
// highlight-next-line
const csv = XLSX.utils.sheet_to_csv(ws);
console.log(csv);
})();
```mermaid
sequenceDiagram
autonumber
participant R as GH Repo
participant A as GH Action
participant S as Data Source
loop Regular Interval (cron)
A->>R: clone repo
R->>A: old repo
A->>S: fetch file
S->>A: spreadsheet
Note over A: SheetJS<br/>convert to CSV
alt Data changed
Note over A: commit new data
A->>R: push new commit
end
end
```
GitHub's "Flat Data" project explores storing and comparing versions of
structured CSV and JSON data. The official "Excel to CSV" example uses SheetJS
under the hood to generate CSV data from an XLSX file.
This demo covers implementation details elided in the official write-up.
## Flat Data
@ -49,7 +47,7 @@ As a project from the company, the entire lifecycle uses GitHub offerings:
:::caution
A GitHub account is required. At the time of writing (2022 November 08), free
A GitHub account is required. At the time of writing (2023 February 11), free
GitHub accounts have no Actions usage limits for public repositories.
Using private GitHub repositories is not recommended because the Flat Viewer
@ -79,14 +77,16 @@ The `githubocto/flat` action can be added as a step in a workflow:
postprocess: ./postprocess.ts
```
The `http_url` will be fetched and saved to `downloaded_filename` in the repo.
This action performs the following steps:
1) `http_url` will be fetched and saved to `downloaded_filename` in the repo.
This can be approximated with the following command:
```bash
curl -L -o data.xlsx https://docs.sheetjs.com/pres.xlsx
```
After saving, the `postprocess` script will be run. When a `.ts` file is the
2) After saving, the `postprocess` script will be run. When a `.ts` file is the
script, it will run the script in the Deno runtime. The `postprocess` script is
expected to read the downloaded file and create or overwrite files in the repo.
This can be approximated with the following command:
@ -95,7 +95,7 @@ This can be approximated with the following command:
deno run -A ./postprocess.ts data.xlsx
```
The action will then compare the contents of the repo, creating a new commit if
3) The action will compare the contents of the repo, creating a new commit if
the source data or artifacts from the `postprocess` script changed.
@ -153,7 +153,7 @@ Deno.writeFileSync(out_file, new TextEncoder().encode(csv));
:::note
This was tested on 2022 November 08 using the GitHub UI.
This was tested on 2023 February 11 using the GitHub UI.
:::

@ -81,7 +81,7 @@ run in the web browser, demos will include interactive examples.
### File Hosting Services
- [`Dropbox`](/docs/demos/hosting/dropbox)
- [`Git`](/docs/demos/git)
- [`GitHub`](/docs/demos/hosting/github)
### Platforms and Integrations

@ -172,6 +172,7 @@ const config = {
{ from: '/docs/getting-started/demos/', to: '/docs/demos/' },
{ from: '/docs/getting-started/demos/excel', to: '/docs/demos/' },
{ from: '/docs/demos/content', to: '/docs/demos/static/' },
{ from: '/docs/demos/git', to: '/docs/demos/hosting/github/' },
/* frontend */
{ from: '/docs/demos/angular', to: '/docs/demos/frontend/angular/' },
{ from: '/docs/demos/react', to: '/docs/demos/frontend/react/' },