docs.sheetjs.com/docz/docs/03-demos/42-engines/02-v8.md

24 KiB

title sidebar_label description pagination_prev pagination_next
Blazing Fast Data Processing with V8 C++ + V8 Process structured data in C++ or Rust programs. Seamlessly integrate spreadsheets by paring V8 and SheetJS. Modernize workflows while preserving Excel compatibility. demos/bigdata/index solutions/input

import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock';

V8 is an embeddable JavaScript engine written in C++. It powers Chromium and Chrome, NodeJS and Deno, Adobe UXP and other platforms.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses V8 and SheetJS to read and write spreadsheets. We'll explore how to load SheetJS in a V8 context and process spreadsheets and structured data from C++ and Rust programs.

The "Complete Example" creates a C++ command-line tool for reading spreadsheet files and generating new workbooks. "Bindings" covers V8 engine bindings for other programming languages.

Integration Details

The SheetJS Standalone scripts can be parsed and evaluated in a V8 context.

Initialize V8

The official V8 hello-world example covers initialization and cleanup. For the purposes of this demo, the key variables are noted below:

v8::Isolate* isolate = v8::Isolate::New(create_params);
v8::Local<v8::Context> context = v8::Context::New(isolate);

The following helper function evaluates C strings as JS code:

v8::Local<v8::Value> eval_code(v8::Isolate *isolate, v8::Local<v8::Context> context, char* code, size_t sz = -1) {
  v8::Local<v8::String> source = v8::String::NewFromUtf8(isolate, code, v8::NewStringType::kNormal, sz).ToLocalChecked();
  v8::Local<v8::Script> script = v8::Script::Compile(context, source).ToLocalChecked();
  return script->Run(context).ToLocalChecked();
}

Load SheetJS Scripts

The main library can be loaded by reading the scripts from the file system and evaluating in the V8 context:

/* simple wrapper to read the entire script file */
static char *read_file(const char *filename, size_t *sz) {
  FILE *f = fopen(filename, "rb");
  if(!f) return NULL;
  long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
  char *buf = (char *)malloc(fsize * sizeof(char));
  *sz = fread((void *) buf, 1, fsize, f);
  fclose(f);
  return buf;
}

// ...
  size_t sz; char *file = read_file("xlsx.full.min.js", &sz);
  v8::Local<v8::Value> result = eval_code(isolate, context, file, sz);

To confirm the library is loaded, XLSX.version can be inspected:

  /* get version string */
  v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.version");
  v8::String::Utf8Value vers(isolate, result);
  printf("SheetJS library version %s\n", *vers);

Reading Files

V8 supports ArrayBuffer natively. Assuming buf is a C byte array, with length len, this snippet stores the data as an ArrayBuffer in global scope:

/* load C char array and save to an ArrayBuffer */
std::unique_ptr<v8::BackingStore> back = v8::ArrayBuffer::NewBackingStore(isolate, len);
memcpy(back->Data(), buf, len);
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(isolate, std::move(back));
v8::Maybe<bool> res = context->Global()->Set(context, v8::String::NewFromUtf8Literal(isolate, "buf"), ab);

/* parse with SheetJS */
v8::Local<v8::Value> result = eval_code(isolate, context, "globalThis.wb = XLSX.read(buf)");

wb will be a variable in the JS environment that can be inspected using the various SheetJS API functions.

Writing Files

The underlying memory from an ArrayBuffer can be recovered:

/* write with SheetJS using type: "array" */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.write(wb, {type:'array', bookType:'xlsb'})");

/* pull result back to C++ */
v8::Local<v8::ArrayBuffer> ab = v8::Local<v8::ArrayBuffer>::Cast(result);
size_t sz = ab->ByteLength();
char *buf = ab->Data();

The resulting buf can be written to file with fwrite.

Complete Example

:::note Tested Deployments

This demo was tested in the following deployments:

V8 Version Platform OS Version Compiler Date
12.4.253 darwin-x64 macOS 14.4 clang 15.0.0 2024-03-15
12.1.283 darwin-arm macOS 14.1.2 clang 15.0.0 2023-12-01
12.5.48 win10-x64 Windows 10 CL 19.39.33523 2024-03-24
12.5.48 linux-x64 HoloOS 3.5.17 gcc 13.1.1 2024-03-21
11.8.82 linux-arm Debian 12 gcc 12.2.0 2023-12-01

:::

This program parses a file and prints CSV data from the first worksheet. It also generates an XLSB file and writes to the filesystem.

:::caution pass

When the demo was last tested, there were errors in the official V8 embed guide. The correct instructions are included below.

:::

:::caution pass

The build process is long and will test your patience.

:::

Preparation

  1. Prepare /usr/local/lib:
mkdir -p /usr/local/lib
cd /usr/local/lib

:::note pass

If this step throws a permission error, run:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib

:::

  1. Follow the official "Visual Studio" installation steps.

:::info pass

Using the installer tool, the "Desktop development with C++" workload must be installed. In the sidebar, verify the following components are checked:

  • "C++ ATL for latest ... build tools" (v143 when last tested)
  • "C++ MFC for latest ... build tools" (v143 when last tested)

In the "Individual components" tab, search for "Windows 11 SDK" and verify that "Windows 11 SDK (10.0.22621.0)" is checked.

Click "Modify" and allow the installer to finish.

The SDK debugging tools must be installed after the SDK is installed.

  1. Using the Search bar, search "Apps & features".

  2. When the setting panel opens, scroll down to "Windows Software Development Kit - Windows 10.0.22621 and click "Modify".

  3. In the new window, select "Change" and click "Next"

  4. Check "Debugging Tools for Windows" and click "Change"

:::

The following git settings should be changed:

git config --global core.autocrlf false
git config --global core.filemode false
git config --global branch.autosetuprebase always
  1. Download and install depot_tools:
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

:::note pass

If this step throws a permission error, run:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib

:::

The bundle is a ZIP file that should be downloaded and extracted.

The demo was last tested on an exFAT-formatted drive (mounted at E:\).

After extracting, verify that the depot_tools folder is not read-only.

  1. Add the path to the PATH environment variable:
export PATH="/usr/local/lib/depot_tools:$PATH"

At this point, it is strongly recommended to add the line to a shell startup script such as .bashrc or .zshrc

:::caution pass

These instructions are for cmd use. Do not run in PowerShell!

It is strongly recommended to use the "Developer Command Prompt" from Visual Studio as it prepares the console to run build tools.

:::

set DEPOT_TOOLS_WIN_TOOLCHAIN=0
set PATH=E:\depot_tools;%PATH%

In addition, the vs2022_install variable must be set to the Visual Studio folder. For example, using the "Community Edition", the assignment should be

set vs2022_install="C:\Program Files\Microsoft Visual Studio\2022\Community"

These environment variables can be persisted in the Control Panel.

  1. Run gclient once to update depot_tools:
gclient
gclient

:::caution pass

gclient may throw errors related to git and permissions issues:

fatal: detected dubious ownership in repository at 'E:/depot_tools'
'E:/depot_tools' is on a file system that doesnot record ownership
To add an exception for this directory, call:

        git config --global --add safe.directory E:/depot_tools

These issues are related to the exFAT file system. They were resolved by running the recommended commands and re-running gclient.

:::

:::caution pass

There were errors pertaining to gitconfig:

error: could not write config file E:/depot_tools/bootstrap-2@3_8_10_chromium_26_bin/git/etc/gitconfig: File exists

This can happen if the depot_tools folder is read-only. The workaround is to unset the read-only flag for the E:\depot_tools folder.

:::

Clone V8

  1. Create a base directory:
mkdir -p ~/dev/v8
cd ~/dev/v8
fetch v8
cd v8

Note that the actual repo will be placed in ~/dev/v8/v8.

cd E:\
mkdir v8
cd v8
fetch v8
cd v8

:::caution pass

On exFAT, every cloned repo elicited the same git permissions error. fetch will fail with a clear remedy message such as

        git config --global --add safe.directory E:/v8/v8

Run the command then run gclient sync, repeating each time the command fails.

:::

:::caution pass

There were occasional git conflict errors:

v8/tools/clang (ERROR)
----------------------------------------
[0:00:01] Started.
...
error: Your local changes to the following files would be overwritten by checkout:
        plugins/FindBadRawPtrPatterns.cpp
...
Please commit your changes or stash them before you switch branches.
Aborting
error: could not detach HEAD
----------------------------------------
Error: 28> Unrecognized error, please merge or rebase manually.
28> cd E:\v8\v8\tools\clang && git rebase --onto 65ceb79efbc9d1dec9b1a0f4bc0b8d010b9d7a66 refs/remotes/origin/main

The recommended fix is to delete the referenced folder and re-run gclient sync

:::

  1. Checkout the desired version. The following command pulls 12.5.48:
git checkout tags/12.5.48 -b sample

:::caution pass

The official documentation recommends:

git checkout refs/tags/12.5.48 -b sample -t

This command failed in local testing:

E:\v8\v8>git checkout refs/tags/12.5.48 -b sample -t
fatal: cannot set up tracking information; starting point 'refs/tags/12.5.48' is not a branch

:::

Build V8

  1. Build the static library.
tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith
tools/dev/v8gen.py arm64.release.sample
ninja -C out.gn/arm64.release.sample v8_monolith
tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith

:::note pass

In some Linux x64 tests using GCC 12, there were build errors that stemmed from warnings. The error messages included the tag -Werror:

../../src/compiler/turboshaft/wasm-gc-type-reducer.cc:212:18: error: 'back_insert_iterator' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
  212 |                  std::back_insert_iterator(snapshots), [this](Block* pred) {
      |                  ^
../../build/linux/debian_bullseye_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_iterator.h:596:11: note: add a deduction guide to suppress this warning
  596 |     class back_insert_iterator
      |           ^
1 error generated.

This was resolved by manually editing out.gn/x64.release.sample/args.gn. The option treat_warnings_as_errors should be set to false:

treat_warnings_as_errors = false

:::

tools/dev/v8gen.py arm64.release.sample

Append the following lines to out.gn/arm64.release.sample/args.gn:

is_clang = false
treat_warnings_as_errors = false

Run the build:

ninja -C out.gn/arm64.release.sample v8_monolith
python3 tools\dev\v8gen.py -vv x64.release.sample
ninja -C out.gn\x64.release.sample v8_monolith

:::caution pass

In local testing, the build sometimes failed with a dbghelp.dll error:

 Exception: dbghelp.dll not found in "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll"

This issue was fixed by removing and reinstalling "Debugging Tools for Windows" from the Control Panel as described in step 0.

:::

:::caution pass

In local testing, the ninja build failed with C++ deprecation errors:

../..\src/wasm/wasm-code-manager.h(670,28): error: 'atomic_load<v8::base::OwnedVector<const unsigned char>>' is deprecated: warning STL4029: std::atomic_*() overloads for shared_ptr are deprecated in C++20. The shared_ptr specialization of std::atomic provides superior functionality. You can define _SILENCE_CXX20_OLD_SHARED_PTR_ATOMIC_SUPPORT_DEPRECATION_WARNING or _SILENCE_ALL_CXX20_DEPRECATION_WARNINGS to suppress this warning. [-Werror,-Wdeprecated-declarations]
  670 |     auto wire_bytes = std::atomic_load(&wire_bytes_);
      |                            ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\memory(3794,1): note: 'atomic_load<v8::base::OwnedVector<const unsigned char>>' has been explicitly marked deprecated here
 3794 | _CXX20_DEPRECATE_OLD_SHARED_PTR_ATOMIC_SUPPORT _NODISCARD shared_ptr<_Ty> atomic_load(
      | ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\yvals_core.h(1317,7): note: expanded from macro '_CXX20_DEPRECATE_OLD_SHARED_PTR_ATOMIC_SUPPORT'
 1317 |     [[deprecated("warning STL4029: "                                                                \
      |       ^
2 errors generated.

The workaround is to append a line to out.gn\x64.release.sample\args.gn:

treat_warnings_as_errors = false

After adding the line, run the ninja command again:

ninja -C out.gn\x64.release.sample v8_monolith

:::

  1. Ensure the sample hello-world compiles and runs:
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -ldl -Lout.gn/x64.release.sample/obj/ -pthread \
    -std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

:::info pass

In older V8 versions, the flags -lv8_libbase -lv8_libplatform were required.

Linking against libv8_libbase or libv8_libplatform in V8 version 12.4.253 elicited linker errors:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

:::

g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -lv8_libbase -lv8_libplatform -ldl -Lout.gn/arm64.release.sample/obj/ -pthread \
    -std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -lv8_libbase -lv8_libplatform -ldl -Lout.gn/x64.release.sample/obj/ -pthread \
    -std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -lv8_libbase -lv8_libplatform -ldl -Lout.gn/arm64.release.sample/obj/ -pthread \
    -std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
cl /I. /Iinclude samples/hello-world.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:hello_world.exe /LIBPATH:out.gn\x64.release.sample\obj\
.\hello_world.exe

Prepare Project

  1. Make a new project folder:
cd ~/dev
mkdir -p sheetjs-v8
cd sheetjs-v8
cd E:\
mkdir sheetjs-v8
cd sheetjs-v8
  1. Copy the sample source:
cp ~/dev/v8/v8/samples/hello-world.cc .
  1. Create symbolic links to the include headers and obj library folders:
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/arm64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/arm64.release.sample/obj
copy E:\v8\v8\samples\hello-world.cc .\
  1. Observe that exFAT does not support symbolic links and move on to step 11.
  1. Build and run the hello-world example from this folder:
g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
    -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

:::caution pass

In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
    -ldl -Lobj/ -pthread -std=c++17 \
    -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

:::

cl /MT /I..\v8\v8\ /I..\v8\v8\include hello-world.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:hello_world.exe /LIBPATH:..\v8\v8\out.gn\x64.release.sample\obj\
.\hello_world.exe

Add SheetJS

  1. Download the SheetJS Standalone script and test file. Save both files in the project directory:

{\ curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js curl -LO https://docs.sheetjs.com/pres.numbers}

  1. Download sheetjs.v8.cc:
curl -LO https://docs.sheetjs.com/v8/sheetjs.v8.cc
  1. Compile standalone sheetjs.v8 binary
g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
    -lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
    -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX

:::caution pass

In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
    -ldl -Lobj/ -pthread -std=c++17 \
    -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX

:::

cl /MT /I..\v8\v8\ /I..\v8\v8\include sheetjs.v8.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:sheetjs.v8.exe /LIBPATH:..\v8\v8\out.gn\x64.release.sample\obj\
  1. Run the demo:
./sheetjs.v8 pres.numbers
.\sheetjs.v8.exe pres.numbers

If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

Bindings

Bindings exist for many languages. As these bindings require "native" code, they may not work on every platform.

Rust

The v8 crate provides binary builds and straightforward bindings. The Rust code is similar to the C++ code.

Pulling data from an ArrayBuffer back into Rust involves an unsafe operation:

/* assuming JS code returns an ArrayBuffer, copy result to a Vec<u8> */
fn eval_code_ab(scope: &mut v8::HandleScope, code: &str) -> Vec<u8> {
  let source = v8::String::new(scope, &code).unwrap();
  let script = v8::Script::compile(scope, source, None).unwrap();
  let result: v8::Local<v8::ArrayBuffer> = script.run(scope).unwrap().try_into().unwrap();
  /* In C++, `Data` returns a pointer. Collecting data into Vec<u8> is unsafe */
  unsafe { return std::slice::from_raw_parts_mut(
    result.data().unwrap().cast::<u8>().as_ptr(),
    result.byte_length()
  ).to_vec(); }
}

:::note Tested Deployments

This demo was last tested in the following deployments:

Architecture V8 Crate Date
darwin-x64 0.89.0 2024-04-04
darwin-arm 0.82.0 2023-12-01
win10-x64 0.89.0 2024-03-24
linux-x64 0.91.0 2024-04-25
linux-arm 0.82.0 2023-12-01

:::

  1. Create a new project:
cargo new sheetjs-rustyv8
cd sheetjs-rustyv8
cargo run
  1. Add the v8 crate:
cargo add v8
cargo run
  1. Download the SheetJS Standalone script and test file. Save both files in the project directory:

{\ curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js curl -LO https://docs.sheetjs.com/pres.numbers}

  1. Download main.rs and replace src/main.rs:
curl -L -o src/main.rs https://docs.sheetjs.com/v8/main.rs
  1. Build and run the app:
cargo run pres.numbers

If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.