docs.sheetjs.com/docz/docs/03-demos/42-engines/01-duktape.md

455 lines
11 KiB
Markdown
Raw Normal View History

2023-02-13 04:07:25 +00:00
---
2023-10-28 08:57:22 +00:00
title: Data Processing with Duktape
sidebar_label: C + Duktape
description: Process structured data in C programs. Seamlessly integrate spreadsheets into your program by pairing Duktape and SheetJS. Supercharge programs with modern data tools.
2023-02-28 11:40:44 +00:00
pagination_prev: demos/bigdata/index
pagination_next: solutions/input
2023-02-13 04:07:25 +00:00
---
2023-04-27 09:12:19 +00:00
import current from '/version.js';
2023-09-27 04:43:00 +00:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
2023-05-07 13:58:36 +00:00
import CodeBlock from '@theme/CodeBlock';
2023-04-27 09:12:19 +00:00
2023-10-28 08:57:22 +00:00
[Duktape](https://duktape.org) is an embeddable JS engine written in C. It has
been ported to a number of exotic architectures and operating systems.
2023-02-13 04:07:25 +00:00
2023-10-28 08:57:22 +00:00
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
2023-10-29 03:22:50 +00:00
The ["Complete Example"](#complete-example) section includes a complete
2023-10-28 08:57:22 +00:00
command-line tool for reading data from spreadsheets and exporting to Excel XLSB
workbooks.
The ["Bindings"](#bindings) section covers bindings for other ecosystems.
2023-02-13 04:07:25 +00:00
## Integration Details
2023-10-28 08:57:22 +00:00
### Initialize Duktape
2023-02-13 04:07:25 +00:00
Duktape does not provide a `global` variable. It can be created in one line:
```c
/* initialize */
duk_context *ctx = duk_create_heap_default();
/* duktape does not expose a standard "global" by default */
// highlight-next-line
duk_eval_string_noresult(ctx, "var global = (function(){ return this; }).call(null);");
```
2023-10-28 08:57:22 +00:00
### Load SheetJS Scripts
The [SheetJS Standalone scripts](/docs/getting-started/installation/standalone)
can be parsed and evaluated in a Duktape context.
2023-02-13 04:07:25 +00:00
The shim and main libraries can be loaded by reading the scripts from the file
system and evaluating in the Duktape context:
```c
/* simple wrapper to read the entire script file */
static duk_int_t eval_file(duk_context *ctx, const char *filename) {
size_t len;
/* read script from filesystem */
FILE *f = fopen(filename, "rb");
if(!f) { duk_push_undefined(ctx); perror("fopen"); return 1; }
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
len = fread((void *) buf, 1, fsize, f);
fclose(f);
if(!buf) { duk_push_undefined(ctx); perror("fread"); return 1; }
// highlight-start
/* load script into the context */
duk_push_lstring(ctx, (const char *)buf, (duk_size_t)len);
/* eval script */
duk_int_t retval = duk_peval(ctx);
/* cleanup */
duk_pop(ctx);
// highlight-end
return retval;
}
// ...
duk_int_t res = 0;
if((res = eval_file(ctx, "shim.min.js")) != 0) { /* error handler */ }
if((res = eval_file(ctx, "xlsx.full.min.js")) != 0) { /* error handler */ }
```
To confirm the library is loaded, `XLSX.version` can be inspected:
```c
/* get version string */
duk_eval_string(ctx, "XLSX.version");
printf("SheetJS library version %s\n", duk_get_string(ctx, -1));
duk_pop(ctx);
```
### Reading Files
Duktape supports `Buffer` natively but should be sliced before processing.
Assuming `buf` is a C byte array, with length `len`, this snippet parses data:
```c
/* load C char array and save to a Buffer */
duk_push_external_buffer(ctx);
duk_config_buffer(ctx, -1, buf, len);
duk_put_global_string(ctx, "buf");
/* parse with SheetJS */
duk_eval_string_noresult("workbook = XLSX.read(buf.slice(0, buf.length), {type:'buffer'});");
```
`workbook` will be a variable in the JS environment that can be inspected using
the various SheetJS API functions.
### Writing Files
`duk_get_buffer_data` can pull `Buffer` object data into the C code:
```c
/* write with SheetJS using type: "array" */
duk_eval_string(ctx, "XLSX.write(workbook, {type:'array', bookType:'xlsx'})");
/* pull result back to C */
duk_size_t sz;
char *buf = (char *)duk_get_buffer_data(ctx, -1, sz);
/* discard result in duktape */
duk_pop(ctx);
```
The resulting `buf` can be written to file with `fwrite`.
## Complete Example
2023-12-02 08:39:35 +00:00
:::note Tested Deployments
2023-02-13 04:07:25 +00:00
2023-06-03 09:10:50 +00:00
This demo was tested in the following deployments:
| Architecture | Version | Date |
|:-------------|:--------|:-----------|
2024-01-03 06:47:00 +00:00
| `darwin-x64` | `2.7.0` | 2023-12-05 |
2023-10-19 05:23:55 +00:00
| `darwin-arm` | `2.7.0` | 2023-10-18 |
2023-10-28 08:57:22 +00:00
| `win10-x64` | `2.7.0` | 2023-10-27 |
2023-12-02 08:39:35 +00:00
| `win11-arm` | `2.7.0` | 2023-12-01 |
2024-01-03 06:47:00 +00:00
| `linux-x64` | `2.7.0` | 2023-12-07 |
2023-12-02 08:39:35 +00:00
| `linux-arm` | `2.7.0` | 2023-12-01 |
2023-02-13 04:07:25 +00:00
:::
This program parses a file and prints CSV data from the first worksheet. It also
generates an XLSB file and writes to the filesystem.
The [flow diagram is displayed after the example steps](#flow-diagram)
2023-10-28 08:57:22 +00:00
:::info pass
On Windows, the Visual Studio "Native Tools Command Prompt" must be used.
:::
0) Create a project folder:
2023-02-13 04:07:25 +00:00
```bash
mkdir sheetjs-duk
cd sheetjs-duk
2023-10-28 08:57:22 +00:00
```
1) Download and extract Duktape:
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
</TabItem>
<TabItem value="win" label="Windows">
:::caution pass
The Windows built-in `tar` does not support `xz` archives.
**The commands must be run within WSL `bash`.**
After the `mv` command, exit WSL.
:::
</TabItem>
</Tabs>
```bash
2023-02-13 04:07:25 +00:00
curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz
mv duktape-2.7.0/src/*.{c,h} .
```
2023-10-28 08:57:22 +00:00
2) Download the SheetJS Standalone script, shim script and test file. Move all
2023-09-22 06:32:55 +00:00
three files to the project directory:
2023-02-13 04:07:25 +00:00
<ul>
2023-04-27 09:12:19 +00:00
<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js`}>shim.min.js</a></li>
<li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}>xlsx.full.min.js</a></li>
2023-02-13 04:07:25 +00:00
<li><a href="https://sheetjs.com/pres.numbers">pres.numbers</a></li>
</ul>
2023-10-28 08:57:22 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
</TabItem>
<TabItem value="win" label="Windows">
:::caution pass
If the `curl` command fails, run the commands within WSL `bash`.
:::
</TabItem>
</Tabs>
2023-05-07 13:58:36 +00:00
<CodeBlock language="bash">{`\
2023-04-27 09:12:19 +00:00
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js
curl -LO https://sheetjs.com/pres.numbers`}
2023-05-07 13:58:36 +00:00
</CodeBlock>
2023-02-13 04:07:25 +00:00
2023-10-28 08:57:22 +00:00
3) Download [`sheetjs.duk.c`](pathname:///duk/sheetjs.duk.c):
2023-02-13 04:07:25 +00:00
```bash
curl -LO https://docs.sheetjs.com/duk/sheetjs.duk.c
```
2023-10-28 08:57:22 +00:00
4) Compile standalone `sheetjs.duk` binary
2023-02-13 04:07:25 +00:00
2023-09-27 04:43:00 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
2023-02-13 04:07:25 +00:00
```bash
gcc -std=c99 -Wall -osheetjs.duk sheetjs.duk.c duktape.c -lm
```
2023-11-04 05:05:26 +00:00
:::note pass
2023-10-19 05:23:55 +00:00
GCC may generate a warning:
```
duk_js_compiler.c:5628:13: warning: variable 'num_stmts' set but not used [-Wunused-but-set-variable]
duk_int_t num_stmts;
^
```
This warning can be ignored.
:::
2023-09-27 04:43:00 +00:00
</TabItem>
<TabItem value="win" label="Windows">
```powershell
cl sheetjs.duk.c duktape.c /I .\
```
</TabItem>
</Tabs>
2023-10-28 08:57:22 +00:00
5) Run the demo:
2023-02-13 04:07:25 +00:00
2023-09-27 04:43:00 +00:00
<Tabs groupId="os">
<TabItem value="unix" label="Linux/MacOS">
2023-02-13 04:07:25 +00:00
```bash
./sheetjs.duk pres.numbers
```
2023-09-27 04:43:00 +00:00
</TabItem>
<TabItem value="win" label="Windows">
```bash
.\sheetjs.duk.exe pres.numbers
```
</TabItem>
</Tabs>
2023-02-13 04:07:25 +00:00
If the program succeeded, the CSV contents will be printed to console and the
file `sheetjsw.xlsb` will be created. That file can be opened with Excel.
2023-02-13 09:20:49 +00:00
### Flow Diagram
2023-02-13 04:07:25 +00:00
```mermaid
sequenceDiagram
participant F as Filesystem
participant C as C Code
participant D as Duktape
activate C
opt
Note over F,D: ~ Prepare Duktape ~
C->>+D: Initialize
deactivate C
D->>-C: Done
activate C
C->>F: Need SheetJS
F->>C: SheetJS Code
C->>+D: Load SheetJS Code
deactivate C
D->>-C: Loaded
activate C
C->>+D: Execute Code
deactivate C
Note over D: Eval SheetJS Code
D->>-C: Done
activate C
Note over D: XLSX<br/>ready to rock
end
opt
Note over F,D: ~ Parse File ~
C->>F: Read Spreadsheet
F->>C: Spreadsheet File
C->>+D: Load Data
deactivate C
D->>-C: Loaded
activate C
C->>+D: eval `var workbook = XLSX.read(...)`
deactivate C
Note over D: Parse File
D->>-C: Done
activate C
Note over D: `workbook`<br/>can be used later
end
opt
Note over F,D: ~ Print CSV to screen ~
C->>+D: eval `XLSX.utils.sheet_to_csv(...)`
deactivate C
Note over D: Generate CSV
D->>-C: CSV Data
activate C
Note over C: Print to standard output
end
opt
Note over F,D: ~ Write XLSB File ~
C->>+D: eval `XLSX.write(...)`
deactivate C
Note over D: Generate File
D->>-C: done
activate C
C->>+D: get file bytes
deactivate C
D->>-C: binary data
activate C
C->>F: Write File
end
deactivate C
2023-02-13 09:20:49 +00:00
```
## Bindings
2023-05-26 22:50:23 +00:00
Bindings exist for many languages. As these bindings require "native" code, they
may not work on every platform.
2023-02-13 09:20:49 +00:00
### Perl
2023-10-27 01:49:35 +00:00
The Perl binding for Duktape is available as `JavaScript::Duktape` on CPAN.
The Perl binding does not have raw `Buffer` ops, so Base64 strings are used.
#### Perl Demo
2023-12-02 08:39:35 +00:00
:::note Tested Deployments
2023-10-27 01:49:35 +00:00
This demo was tested in the following deployments:
| Architecture | Version | Date |
|:-------------|:--------|:-----------|
| `darwin-x64` | `2.5.0` | 2023-10-26 |
2023-12-02 08:39:35 +00:00
| `linux-arm` | `2.5.0` | 2023-12-01 |
2023-10-27 01:49:35 +00:00
:::
0) Ensure `perl` and `cpan` are installed and available on the system path.
1) Install the `JavaScript::Duktape` library:
2023-02-13 09:20:49 +00:00
```bash
cpan install JavaScript::Duktape
```
2023-12-02 08:39:35 +00:00
:::note pass
On some systems, the command must be run as the root user:
```bash
sudo cpan install JavaScript::Duktape
```
:::
2023-10-27 01:49:35 +00:00
2) Save the following codeblock to `SheetJSDuk.pl`:
2023-02-13 09:20:49 +00:00
2023-10-27 01:49:35 +00:00
```perl title="SheetJSDuk.pl"
2023-02-13 09:20:49 +00:00
# usage: perl SheetJSDuk.pl path/to/file
use JavaScript::Duktape;
use File::Slurp;
use MIME::Base64 qw( encode_base64 decode_base64 );
# Initialize
2023-10-27 01:49:35 +00:00
my $js = JavaScript::Duktape->new( max_memory => 256 * 1024 * 1024 );
2023-02-13 09:20:49 +00:00
$js->eval("var global = (function(){ return this; }).call(null);");
# Load the ExtendScript build
my $src = read_file('xlsx.extendscript.js', { binmode => ':raw' });
$src =~ s/^\xEF\xBB\xBF//;
my $XLSX = $js->eval($src);
# Print version number
$js->set('log' => sub { print $_[0], "\n"; });
$js->eval("log('SheetJS library version ' + XLSX.version);");
# Parse File
my $raw_data = encode_base64(read_file($ARGV[0], { binmode => ':raw' }), "");
$js->set("b64", $raw_data);
$js->eval(qq{
2023-10-27 01:49:35 +00:00
global.wb = XLSX.read(b64, {type: "base64", WTF:1});
2023-02-13 09:20:49 +00:00
global.ws = wb.Sheets[wb.SheetNames[0]];
2023-10-27 01:49:35 +00:00
void 0;
2023-02-13 09:20:49 +00:00
});
# Print first worksheet CSV
2023-10-27 01:49:35 +00:00
$js->eval('log(XLSX.utils.sheet_to_csv(global.ws))');
2023-02-13 09:20:49 +00:00
# Write XLSB file
my $xlsb = $js->eval("XLSX.write(global.wb, {type:'base64', bookType:'xlsb'})");
write_file("SheetJSDuk.xlsb", decode_base64($xlsb));
2023-10-27 01:49:35 +00:00
```
3) Download the SheetJS ExtendScript build and test file:
<CodeBlock language="bash">{`\
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.extendscript.js
curl -LO https://sheetjs.com/pres.xlsx`}
</CodeBlock>
4) Run the script:
```bash
perl SheetJSDuk.pl pres.xlsx
```
If the script succeeded, the data in the test file will be printed in CSV rows.
2023-12-02 08:39:35 +00:00
The script will also export `SheetJSDuk.xlsb`.
:::note pass
In the latest Linux ARM64 test, the command failed due to missing `File::Slurp`:
```
Can't locate File/Slurp.pm in @INC (you may need to install the File::Slurp module)
```
The fix is to install `File::Slurp` with `cpan`:
```bash
sudo cpan install File::Slurp
```
:::