docs.sheetjs.com/docz/docs/03-demos/09-cloud/02-aws.md

327 lines
9.3 KiB
Markdown
Raw Normal View History

2022-08-22 00:39:07 +00:00
---
title: Amazon Web Services
2023-02-28 11:40:44 +00:00
pagination_prev: demos/local/index
pagination_next: demos/extensions/index
2022-08-22 00:39:07 +00:00
---
2023-04-27 09:12:19 +00:00
import current from '/version.js';
2022-08-22 00:39:07 +00:00
AWS is a Cloud Services platform which includes traditional virtual machine
support, "Serverless Functions", cloud storage and much more.
:::caution
AWS iterates quickly and there is no guarantee that the referenced services
will be available in the future.
:::
This demo focuses on two key offerings: cloud storage ("S3") and the
"Serverless Function" platform ("Lambda").
2023-04-25 04:44:16 +00:00
The [NodeJS Module](/docs/getting-started/installation/nodejs) can be shipped in
a bundled Lambda function.
2022-08-22 00:39:07 +00:00
:::note
2023-04-25 04:44:16 +00:00
This was tested on 2023 April 24.
2022-08-22 00:39:07 +00:00
:::
## AWS Lambda Functions
In this demo, the "Function URL" (automatic API Gateway management) features
are used. Older deployments required special "Binary Media Types" to handle
formats like XLSX. At the time of testing, the configuration was not required.
### Reading Data
In the Lambda handler method, the `event.body` attribute is a Base64-encoded
string. The `busboy` body parser can accept a decoded body.
2023-04-25 04:44:16 +00:00
<details open><summary><b>Code Sample</b> (click to hide)</summary>
This example takes the first uploaded file submitted with the key `upload`,
parses the file and returns the CSV content of the first worksheet.
2022-08-22 00:39:07 +00:00
```js
const XLSX = require('xlsx');
var Busboy = require('busboy');
exports.handler = function(event, context, callback) {
/* set up busboy */
var ctype = event.headers['Content-Type']||event.headers['content-type'];
var bb = Busboy({headers:{'content-type':ctype}});
/* busboy is evented; accumulate the fields and files manually */
var fields = {}, files = {};
bb.on('error', function(err) { callback(null, { body: err.message }); });
bb.on('field', function(fieldname, val) {fields[fieldname] = val });
// highlight-start
bb.on('file', function(fieldname, file, filename) {
/* concatenate the individual data buffers */
var buffers = [];
file.on('data', function(data) { buffers.push(data); });
file.on('end', function() { files[fieldname] = [Buffer.concat(buffers), filename]; });
});
// highlight-end
/* on the finish event, all of the fields and files are ready */
bb.on('finish', function() {
/* grab the first file */
var f = files["upload"];
if(!f) callback(new Error("Must submit a file for processing!"));
/* f[0] is a buffer */
// highlight-next-line
var wb = XLSX.read(f[0]);
/* grab first worksheet and convert to CSV */
var ws = wb.Sheets[wb.SheetNames[0]];
callback(null, { statusCode: 200, body: XLSX.utils.sheet_to_csv(ws) });
});
/* start the processing */
// highlight-next-line
bb.end(Buffer.from(event.body, "base64"));
};
```
</details>
### Writing Data
For safely transmitting binary data, the `base64` type should be used. Lambda
2023-04-25 04:44:16 +00:00
callback response `isBase64Encoded` property forces a binary download.
2022-08-22 00:39:07 +00:00
2023-04-25 04:44:16 +00:00
<details open><summary><b>Code Sample</b> (click to hide)</summary>
This example generates a sample workbook and writes to a XLSX workbook.
2022-08-22 00:39:07 +00:00
```js
var XLSX = require('xlsx');
exports.handler = function(event, context, callback) {
/* make workbook */
var wb = XLSX.read("S,h,e,e,t,J,S\n5,4,3,3,7,9,5", {type: "binary"});
2022-08-25 08:22:28 +00:00
/* write to XLSX file in Base64 encoding */
2022-08-22 00:39:07 +00:00
// highlight-next-line
2023-04-25 04:44:16 +00:00
var body = XLSX.write(wb, { type: "base64", bookType: "xlsx" });
2022-08-22 00:39:07 +00:00
/* mark as attached file */
var headers = { "Content-Disposition": 'attachment; filename="SheetJSLambda.xlsx"'};
/* Send back data */
callback(null, {
statusCode: 200,
// highlight-next-line
isBase64Encoded: true,
body: body,
headers: headers
});
};
```
</details>
### Demo
2023-04-25 04:44:16 +00:00
<details open><summary><b>Complete Example</b> (click to hide)</summary>
2022-08-22 00:39:07 +00:00
0) Review the quick start for JavaScript on AWS
1) Create a new folder and download [`index.js`](pathname:///aws/index.js):
```bash
2022-10-21 00:10:10 +00:00
mkdir -p SheetJSLambda
2022-08-22 00:39:07 +00:00
cd SheetJSLambda
curl -LO https://docs.sheetjs.com/aws/index.js
```
2023-04-29 11:21:37 +00:00
2) Install dependencies in the project directory;
2022-08-22 00:39:07 +00:00
2023-04-29 11:21:37 +00:00
<pre><code parentName="pre" {...{"className": "language-bash"}}>{`\
2022-10-21 00:10:10 +00:00
mkdir -p node_modules
2023-04-29 11:21:37 +00:00
npm install https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz busboy`}
</code></pre>
2022-08-22 00:39:07 +00:00
3) Create a .zip package of the contents of the folder:
```bash
yes | zip -c ../SheetJSLambda.zip -r .
```
4) In the web interface for AWS Lambda, create a new Function with the options:
- Select "Author from scratch" (default choice when last verified)
- "Function Name": SheetJSLambda
- "Runtime": "Node.js" (select the version in the "Latest supported" block)
- Advanced Settings:
2023-04-25 04:44:16 +00:00
+ check "Enable function URL"
+ Auth type: NONE
+ Check "Configure cross-origin resource sharing (CORS)"
2022-08-22 00:39:07 +00:00
5) In the Interface, click "Upload from" and select ".zip file". Click the
"Upload" button in the modal, select SheetJSLambda.zip, and click "Save".
2023-04-25 04:44:16 +00:00
When the demo was last tested, the ZIP was small enough that the Lambda code
editor will load the package.
2022-08-22 00:39:07 +00:00
6) Enable external access to the function.
Under Configuration > Function URL, click "Edit" and ensure that Auth type is
2023-04-25 04:44:16 +00:00
set to NONE. If it is not, select NONE and click Save.
2022-08-22 00:39:07 +00:00
Under Configuration > Permissions, scroll down to "Resource-based policy".
If no policy statements are defined, select "Add Permission" with the options:
- Select "Function URL" at the top
- Auth type: NONE
- Ensure that Statement ID is set to `FunctionURLAllowPublicAccess`
- Ensure that Principal is set to `*`
- Ensure that Action is set to `lambda:InvokeFunctionUrl`
Click "Save" and a new Policy statement should be created.
7) Find the Function URL (It is in the "Function Overview" section).
Try to access that URL in a web browser and the site will try to download
`SheetJSLambda.xlsx`. Save and open the file to confirm it is valid.
2023-04-25 04:44:16 +00:00
To test parsing, download <https://sheetjs.com/pres.numbers> and make a POST
request to the public function URL (change `FUNCTION_URL` in the command):
2022-08-22 00:39:07 +00:00
```bash
curl -X POST -F "upload=@pres.numbers" FUNCTION_URL
```
The result should be a CSV output of the first sheet.
</details>
## S3 Storage
The main module for S3 and all AWS services is `aws-sdk`.
### Reading Data
The `s3#getObject` method returns an object with a `createReadStream` method.
2023-04-25 04:44:16 +00:00
Buffers can be concatenated and passed to `XLSX.read`.
<details open><summary><b>Demo</b> (click to hide)</summary>
2022-08-22 00:39:07 +00:00
2023-04-25 04:44:16 +00:00
This sample fetches a buffer from S3 and parses the workbook.
2022-08-22 00:39:07 +00:00
2023-04-25 04:44:16 +00:00
1) Save the following script to `SheetJSReadFromS3.js`:
```js title="SheetJSReadFromS3.js"
2022-08-22 00:39:07 +00:00
var XLSX = require("xlsx");
var AWS = require('aws-sdk');
/* replace these constants */
var accessKeyId = "<REPLACE WITH ACCESS KEY ID>";
var secretAccessKey = "<REPLACE WITH SECRET ACCESS KEY>";
var Bucket = "<REPLACE WITH BUCKET NAME>";
2023-04-25 04:44:16 +00:00
var Key = "pres.numbers";
2022-08-22 00:39:07 +00:00
/* Get stream */
var s3 = new AWS.S3({
apiVersion: '2006-03-01',
credentials: {
accessKeyId: accessKeyId,
secretAccessKey: secretAccessKey
}
});
var f = s3.getObject({ Bucket: Bucket, Key: Key }).createReadStream();
/* collect data */
var bufs = [];
f.on('data', function(data) { bufs.push(data); });
f.on('end', function() {
/* concatenate and parse */
var wb = XLSX.read(Buffer.concat(bufs));
console.log(XLSX.utils.sheet_to_csv(wb.Sheets[wb.SheetNames[0]]));
});
```
2023-04-25 04:44:16 +00:00
2) Create a new bucket (or get the name of an existing bucket).
3) Download <https://sheetjs.com/pres.numbers>
In the S3 site, open the bucket and click "Upload". In the Upload page, click
and drag the `pres.numbers` file into the browser window and click "Upload".
4) Edit `SheetJSReadFromS3.js` and replace the marked constants:
- `accessKeyId`: access key for the AWS account
- `secretAccessKey`: secret access key for the AWS account
- `Bucket`: name of the bucket
5) Run the script:
```bash
node SheetJSReadFromS3.js
```
The program will display the data in CSV format.
2022-08-22 00:39:07 +00:00
</details>
### Writing Data
2023-04-25 04:44:16 +00:00
`S3#upload` directly accepts a Buffer.
<details open><summary><b>Demo</b> (click to hide)</summary>
This sample creates a simple workbook, generates a NodeJS buffer, and uploads
the buffer to S3.
2022-08-22 00:39:07 +00:00
2023-04-25 04:44:16 +00:00
1) Save the following script to `SheetJSWriteToS3.js`:
2022-08-22 00:39:07 +00:00
```js title="SheetJSWriteToS3.js"
var XLSX = require("xlsx");
var AWS = require('aws-sdk');
/* replace these constants */
var accessKeyId = "<REPLACE WITH ACCESS KEY ID>";
var secretAccessKey = "<REPLACE WITH SECRET ACCESS KEY>";
var Bucket = "<REPLACE WITH BUCKET NAME>";
2023-04-25 04:44:16 +00:00
var Key = "test.xlsx";
2022-08-22 00:39:07 +00:00
/* Create a simple workbook and write XLSX to buffer */
var ws = XLSX.utils.aoa_to_sheet(["SheetJS".split(""), [5,4,3,3,7,9,5]]);
var wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
var Body = XLSX.write(wb, {type: "buffer", bookType: "xlsx"});
/* upload buffer */
var s3 = new AWS.S3({
apiVersion: '2006-03-01',
credentials: {
accessKeyId: accessKeyId,
secretAccessKey: secretAccessKey
}
});
s3.upload({ Bucket: Bucket, Key: Key, Body: Body }, function(err, data) {
if(err) throw err;
console.log("Uploaded to " + data.Location);
});
```
2023-04-25 04:44:16 +00:00
2) Create a new bucket (or get the name of an existing bucket).
3) Edit `SheetJSWriteToS3.js` and replace the marked constants:
- `accessKeyId`: access key for the AWS account
- `secretAccessKey`: secret access key for the AWS account
- `Bucket`: name of the bucket
4) Run the script:
```bash
node SheetJSWriteToS3.js
```
5) In the S3 site, select the bucket and download the object named `test.xlsx`.
Open the file in a spreadsheet editor.
2022-08-22 00:39:07 +00:00
</details>