docs.sheetjs.com/docz/docs/03-demos/42-engines/01-duktape.md
2024-01-03 01:47:00 -05:00

11 KiB

title sidebar_label description pagination_prev pagination_next
Data Processing with Duktape C + Duktape Process structured data in C programs. Seamlessly integrate spreadsheets into your program by pairing Duktape and SheetJS. Supercharge programs with modern data tools. demos/bigdata/index solutions/input

import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock';

Duktape is an embeddable JS engine written in C. It has been ported to a number of exotic architectures and operating systems.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

The "Complete Example" section includes a complete command-line tool for reading data from spreadsheets and exporting to Excel XLSB workbooks.

The "Bindings" section covers bindings for other ecosystems.

Integration Details

Initialize Duktape

Duktape does not provide a global variable. It can be created in one line:

/* initialize */
duk_context *ctx = duk_create_heap_default();

/* duktape does not expose a standard "global" by default */
// highlight-next-line
duk_eval_string_noresult(ctx, "var global = (function(){ return this; }).call(null);");

Load SheetJS Scripts

The SheetJS Standalone scripts can be parsed and evaluated in a Duktape context.

The shim and main libraries can be loaded by reading the scripts from the file system and evaluating in the Duktape context:

/* simple wrapper to read the entire script file */
static duk_int_t eval_file(duk_context *ctx, const char *filename) {
  size_t len;
  /* read script from filesystem */
  FILE *f = fopen(filename, "rb");
  if(!f) { duk_push_undefined(ctx); perror("fopen"); return 1; }
  long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
  char *buf = (char *)malloc(fsize * sizeof(char));
  len = fread((void *) buf, 1, fsize, f);
  fclose(f);
  if(!buf) { duk_push_undefined(ctx); perror("fread"); return 1; }

  // highlight-start
  /* load script into the context */
  duk_push_lstring(ctx, (const char *)buf, (duk_size_t)len);
  /* eval script */
  duk_int_t retval = duk_peval(ctx);
  /* cleanup */
  duk_pop(ctx);
  // highlight-end
  return retval;
}

// ...
  duk_int_t res = 0;

  if((res = eval_file(ctx, "shim.min.js")) != 0) { /* error handler */ }
  if((res = eval_file(ctx, "xlsx.full.min.js")) != 0) { /* error handler */ }

To confirm the library is loaded, XLSX.version can be inspected:

  /* get version string */
  duk_eval_string(ctx, "XLSX.version");
  printf("SheetJS library version %s\n", duk_get_string(ctx, -1));
  duk_pop(ctx);

Reading Files

Duktape supports Buffer natively but should be sliced before processing. Assuming buf is a C byte array, with length len, this snippet parses data:

/* load C char array and save to a Buffer */
duk_push_external_buffer(ctx);
duk_config_buffer(ctx, -1, buf, len);
duk_put_global_string(ctx, "buf");

/* parse with SheetJS */
duk_eval_string_noresult("workbook = XLSX.read(buf.slice(0, buf.length), {type:'buffer'});");

workbook will be a variable in the JS environment that can be inspected using the various SheetJS API functions.

Writing Files

duk_get_buffer_data can pull Buffer object data into the C code:

/* write with SheetJS using type: "array" */
duk_eval_string(ctx, "XLSX.write(workbook, {type:'array', bookType:'xlsx'})");

/* pull result back to C */
duk_size_t sz;
char *buf = (char *)duk_get_buffer_data(ctx, -1, sz);

/* discard result in duktape */
duk_pop(ctx);

The resulting buf can be written to file with fwrite.

Complete Example

:::note Tested Deployments

This demo was tested in the following deployments:

Architecture Version Date
darwin-x64 2.7.0 2023-12-05
darwin-arm 2.7.0 2023-10-18
win10-x64 2.7.0 2023-10-27
win11-arm 2.7.0 2023-12-01
linux-x64 2.7.0 2023-12-07
linux-arm 2.7.0 2023-12-01

:::

This program parses a file and prints CSV data from the first worksheet. It also generates an XLSB file and writes to the filesystem.

The flow diagram is displayed after the example steps

:::info pass

On Windows, the Visual Studio "Native Tools Command Prompt" must be used.

:::

  1. Create a project folder:
mkdir sheetjs-duk
cd sheetjs-duk
  1. Download and extract Duktape:

:::caution pass

The Windows built-in tar does not support xz archives.

The commands must be run within WSL bash.

After the mv command, exit WSL.

:::

curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz
mv duktape-2.7.0/src/*.{c,h} .
  1. Download the SheetJS Standalone script, shim script and test file. Move all three files to the project directory:

:::caution pass

If the curl command fails, run the commands within WSL bash.

:::

{\ curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js curl -LO https://sheetjs.com/pres.numbers}

  1. Download sheetjs.duk.c:
curl -LO https://docs.sheetjs.com/duk/sheetjs.duk.c
  1. Compile standalone sheetjs.duk binary
gcc -std=c99 -Wall -osheetjs.duk sheetjs.duk.c duktape.c -lm

:::note pass

GCC may generate a warning:

duk_js_compiler.c:5628:13: warning: variable 'num_stmts' set but not used [-Wunused-but-set-variable]
                duk_int_t num_stmts;
                          ^

This warning can be ignored.

:::

cl sheetjs.duk.c duktape.c /I .\
  1. Run the demo:
./sheetjs.duk pres.numbers
.\sheetjs.duk.exe pres.numbers

If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

Flow Diagram

sequenceDiagram
  participant F as Filesystem
  participant C as C Code
  participant D as Duktape
  activate C
  opt
    Note over F,D: ~ Prepare Duktape ~
    C->>+D: Initialize
    deactivate C
    D->>-C: Done
    activate C
    C->>F: Need SheetJS
    F->>C: SheetJS Code
    C->>+D: Load SheetJS Code
    deactivate C
    D->>-C: Loaded
    activate C
    C->>+D: Execute Code
    deactivate C
    Note over D: Eval SheetJS Code
    D->>-C: Done
    activate C
    Note over D: XLSX<br/>ready to rock
  end
  opt
    Note over F,D: ~ Parse File ~
    C->>F: Read Spreadsheet
    F->>C: Spreadsheet File
    C->>+D: Load Data
    deactivate C
    D->>-C: Loaded
    activate C
    C->>+D: eval `var workbook = XLSX.read(...)`
    deactivate C
    Note over D: Parse File
    D->>-C: Done
    activate C
    Note over D: `workbook`<br/>can be used later
  end
  opt
    Note over F,D: ~ Print CSV to screen ~
    C->>+D: eval `XLSX.utils.sheet_to_csv(...)`
    deactivate C
    Note over D: Generate CSV
    D->>-C: CSV Data
    activate C
    Note over C: Print to standard output
  end
  opt
    Note over F,D: ~ Write XLSB File ~
    C->>+D: eval `XLSX.write(...)`
    deactivate C
    Note over D: Generate File
    D->>-C: done
    activate C
    C->>+D: get file bytes
    deactivate C
    D->>-C: binary data
    activate C
    C->>F: Write File
  end
  deactivate C

Bindings

Bindings exist for many languages. As these bindings require "native" code, they may not work on every platform.

Perl

The Perl binding for Duktape is available as JavaScript::Duktape on CPAN.

The Perl binding does not have raw Buffer ops, so Base64 strings are used.

Perl Demo

:::note Tested Deployments

This demo was tested in the following deployments:

Architecture Version Date
darwin-x64 2.5.0 2023-10-26
linux-arm 2.5.0 2023-12-01

:::

  1. Ensure perl and cpan are installed and available on the system path.

  2. Install the JavaScript::Duktape library:

cpan install JavaScript::Duktape

:::note pass

On some systems, the command must be run as the root user:

sudo cpan install JavaScript::Duktape

:::

  1. Save the following codeblock to SheetJSDuk.pl:
# usage: perl SheetJSDuk.pl path/to/file
use JavaScript::Duktape;
use File::Slurp;
use MIME::Base64 qw( encode_base64 decode_base64 );

# Initialize
my $js = JavaScript::Duktape->new( max_memory => 256 * 1024 * 1024 );
$js->eval("var global = (function(){ return this; }).call(null);");

# Load the ExtendScript build
my $src = read_file('xlsx.extendscript.js', { binmode => ':raw' });
$src =~ s/^\xEF\xBB\xBF//;
my $XLSX = $js->eval($src);

# Print version number
$js->set('log' => sub { print $_[0], "\n"; });
$js->eval("log('SheetJS library version ' + XLSX.version);");

# Parse File
my $raw_data = encode_base64(read_file($ARGV[0], { binmode => ':raw' }), "");
$js->set("b64", $raw_data);
$js->eval(qq{
  global.wb = XLSX.read(b64, {type: "base64", WTF:1});
  global.ws = wb.Sheets[wb.SheetNames[0]];
  void 0;
});

# Print first worksheet CSV
$js->eval('log(XLSX.utils.sheet_to_csv(global.ws))');

# Write XLSB file
my $xlsb = $js->eval("XLSX.write(global.wb, {type:'base64', bookType:'xlsb'})");
write_file("SheetJSDuk.xlsb", decode_base64($xlsb));
  1. Download the SheetJS ExtendScript build and test file:

{\ curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.extendscript.js curl -LO https://sheetjs.com/pres.xlsx}

  1. Run the script:
perl SheetJSDuk.pl pres.xlsx

If the script succeeded, the data in the test file will be printed in CSV rows. The script will also export SheetJSDuk.xlsb.

:::note pass

In the latest Linux ARM64 test, the command failed due to missing File::Slurp:

Can't locate File/Slurp.pm in @INC (you may need to install the File::Slurp module)

The fix is to install File::Slurp with cpan:

sudo cpan install File::Slurp

:::