2023-02-13 09:20:49 +00:00
|
|
|
---
|
2024-07-01 03:59:01 +00:00
|
|
|
title: Data Processing with JE
|
|
|
|
sidebar_label: Perl + JE
|
2023-02-28 11:40:44 +00:00
|
|
|
pagination_prev: demos/bigdata/index
|
|
|
|
pagination_next: solutions/input
|
2023-02-13 09:20:49 +00:00
|
|
|
---
|
|
|
|
|
2023-04-27 09:12:19 +00:00
|
|
|
import current from '/version.js';
|
2023-05-07 13:58:36 +00:00
|
|
|
import CodeBlock from '@theme/CodeBlock';
|
2023-04-27 09:12:19 +00:00
|
|
|
|
2024-04-14 07:40:38 +00:00
|
|
|
:::danger pass
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
In a production application, it is strongly recommended to use a binding for a
|
2023-05-22 08:06:09 +00:00
|
|
|
C engine like [`JavaScript::Duktape`](/docs/demos/engines/duktape#perl)
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
:::
|
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
[`JE`](https://metacpan.org/pod/JE) is a pure-Perl JavaScript engine.
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
|
|
|
|
data from spreadsheets.
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
This demo uses JE and SheetJS to pull data from a spreadsheet and print CSV
|
|
|
|
rows. We'll explore how to load SheetJS in a JE context and process spreadsheets
|
|
|
|
from Perl scripts.
|
|
|
|
|
|
|
|
The ["Complete Example"](#complete-example) section includes a complete script
|
|
|
|
for reading data from XLS files, printing CSV rows, and writing FODS workbooks.
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
## Integration Details
|
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
The [SheetJS ExtendScript build](/docs/getting-started/installation/extendscript)
|
|
|
|
can be parsed and evaluated in a JE context.
|
|
|
|
|
|
|
|
The engine deviates from ES3. Modifying prototypes can fix some behavior:
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary><b>Required shim to support JE</b> (click to show)</summary>
|
|
|
|
|
|
|
|
The following features are implemented:
|
|
|
|
|
|
|
|
- simple string `charCodeAt`
|
|
|
|
- Number `charCodeAt` (to work around string `split` bug)
|
|
|
|
- String `match` (to work around a bug when there are no matches)
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
```js title="Required shim to support JE"
|
2023-02-13 09:20:49 +00:00
|
|
|
/* String#charCodeAt is missing */
|
|
|
|
var string = "";
|
|
|
|
for(var i = 0; i < 256; ++i) string += String.fromCharCode(i);
|
|
|
|
String.prototype.charCodeAt = function(n) {
|
|
|
|
var result = string.indexOf(this.charAt(n));
|
|
|
|
if(result == -1) throw this.charAt(n);
|
|
|
|
return result;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* workaround for String split bug */
|
|
|
|
Number.prototype.charCodeAt = function(n) { return this + 48; };
|
|
|
|
|
|
|
|
/* String#match bug with empty results */
|
|
|
|
String.prototype.old_match = String.prototype.match;
|
|
|
|
String.prototype.match = function(p) {
|
|
|
|
var result = this.old_match(p);
|
|
|
|
return (Array.isArray(result) && result.length == 0) ? null : result;
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
</details>
|
|
|
|
|
2023-02-13 09:20:49 +00:00
|
|
|
When loading the ExtendScript build, the BOM must be removed:
|
|
|
|
|
|
|
|
```perl
|
|
|
|
## Load SheetJS source
|
|
|
|
my $src = read_file('xlsx.extendscript.js', { binmode => ':raw' });
|
|
|
|
$src =~ s/^\xEF\xBB\xBF//; ## remove UTF8 BOM
|
|
|
|
my $XLSX = $je->eval($src);
|
|
|
|
```
|
|
|
|
|
|
|
|
### Reading Files
|
|
|
|
|
|
|
|
Data should be passed as Base64 strings:
|
|
|
|
|
|
|
|
```perl
|
|
|
|
use File::Slurp;
|
|
|
|
use MIME::Base64 qw( encode_base64 );
|
|
|
|
|
|
|
|
## Set up conversion method
|
|
|
|
$je->eval(<<'EOF');
|
|
|
|
function sheetjsparse(data) { try {
|
|
|
|
return XLSX.read(String(data), {type: "base64", WTF:1});
|
|
|
|
} catch(e) { return String(e); } }
|
|
|
|
EOF
|
|
|
|
|
|
|
|
## Read file
|
|
|
|
my $raw_data = encode_base64(read_file($ARGV[0], { binmode => ':raw' }), "");
|
|
|
|
|
|
|
|
## Call method with data
|
|
|
|
$return_val = $je->method(sheetjsparse => $raw_data);
|
|
|
|
```
|
|
|
|
|
|
|
|
### Writing Files
|
|
|
|
|
|
|
|
Due to bugs in data interchange, it is strongly recommended to use a simple
|
|
|
|
format like `.fods`:
|
|
|
|
|
|
|
|
```perl
|
|
|
|
use File::Slurp;
|
|
|
|
|
|
|
|
## Set up conversion method
|
|
|
|
$je->eval(<<'EOF');
|
|
|
|
function sheetjswrite(wb) { try {
|
|
|
|
return XLSX.write(wb, { WTF:1, bookType: "fods", type: "string" });
|
|
|
|
} catch(e) { return String(e); } }
|
|
|
|
EOF
|
|
|
|
|
|
|
|
## Generate file
|
|
|
|
my $fods = $je->method(sheetjswrite => $workbook);
|
|
|
|
|
|
|
|
## Write to filesystem
|
|
|
|
write_file("SheetJE.fods", $fods);
|
|
|
|
```
|
|
|
|
|
|
|
|
## Complete Example
|
|
|
|
|
2024-01-03 06:47:00 +00:00
|
|
|
:::note Tested Deployments
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2024-05-26 07:50:55 +00:00
|
|
|
This demo was tested in the following deployments:
|
|
|
|
|
|
|
|
| Architecture | Version | Date |
|
|
|
|
|:-------------|:--------|:-----------|
|
2024-12-18 05:47:18 +00:00
|
|
|
| `darwin-x64` | `0.066` | 2024-12-17 |
|
2024-05-26 07:50:55 +00:00
|
|
|
| `darwin-arm` | `0.066` | 2024-05-25 |
|
2024-07-01 03:59:01 +00:00
|
|
|
| `linux-x64` | `0.066` | 2024-06-29 |
|
2024-05-26 07:50:55 +00:00
|
|
|
| `linux-arm` | `0.066` | 2024-05-25 |
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
:::
|
|
|
|
|
2024-05-26 07:50:55 +00:00
|
|
|
1) Install `JE` and `File::Slurp` through CPAN:
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
```bash
|
2024-05-26 07:50:55 +00:00
|
|
|
cpan install JE File::Slurp
|
2023-02-13 09:20:49 +00:00
|
|
|
```
|
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
:::note pass
|
|
|
|
|
|
|
|
There were permissions errors in some test runs:
|
|
|
|
|
|
|
|
```
|
|
|
|
mkdir /Library/Perl/5.30/File: Permission denied at /System/Library/Perl/5.30/ExtUtils/Install.pm line 489.
|
|
|
|
```
|
|
|
|
|
2024-12-18 05:47:18 +00:00
|
|
|
The install command should be run through `sudo`:
|
2024-07-01 03:59:01 +00:00
|
|
|
|
|
|
|
```bash
|
|
|
|
sudo cpan install JE File::Slurp
|
|
|
|
```
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
2) Download the [SheetJS ExtendScript build](/docs/getting-started/installation/extendscript):
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2023-05-07 13:58:36 +00:00
|
|
|
<CodeBlock language="bash">{`\
|
2023-04-27 09:12:19 +00:00
|
|
|
curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.extendscript.js`}
|
2023-05-07 13:58:36 +00:00
|
|
|
</CodeBlock>
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2023-02-15 01:00:49 +00:00
|
|
|
3) Download the demo [`SheetJE.pl`](pathname:///perl/SheetJE.pl):
|
2023-02-13 09:20:49 +00:00
|
|
|
|
2023-02-15 01:00:49 +00:00
|
|
|
```bash
|
|
|
|
curl -LO https://docs.sheetjs.com/perl/SheetJE.pl
|
2023-02-13 09:20:49 +00:00
|
|
|
```
|
|
|
|
|
2024-07-01 03:59:01 +00:00
|
|
|
4) Download the [test file](pathname:///cd.xls) and run:
|
2023-02-13 09:20:49 +00:00
|
|
|
|
|
|
|
```bash
|
2024-04-26 04:16:13 +00:00
|
|
|
curl -LO https://docs.sheetjs.com/cd.xls
|
2023-02-13 09:20:49 +00:00
|
|
|
perl SheetJE.pl cd.xls
|
|
|
|
```
|
|
|
|
|
2024-01-03 06:47:00 +00:00
|
|
|
After a short wait, the contents will be displayed in CSV form. The script will
|
|
|
|
also generate the spreadsheet `SheetJE.fods` which can be opened in LibreOffice.
|