Puppeteer is a Node library commonly used for browser automation tasks. In this shot we will cover how to run puppeteer on AWS Lambda, a serverless computing platform, and how to save screenshots in
In this shot we will be using the Serverless framework and Lambda layers.
We assume that you have a configured environment with AWS CLI installed, proper AWS credentials, and Serverless.
Let’s begin by setting up the directory and initializing a Node.js project:
mkdir -p puppeteer-lambda/src
cd puppeteer-lambda
npm init -y
Next, we need to install Serverless as a dev dependency:
npm install --save-dev serverless
Now you may want to install a puppeteer and use it directly, but the puppeteer package size is quite large, and you may run into problems running it on AWS Lambda.
A better method is to use chrome-aws-lambda. This module comes with the chromium binary for the Lambda environment, making it suitable for our project.
Let’s install the chrome-aws-lambda module:
npm install --save-dev chrome-aws-lambda
Notice that we installed chrome-aws-lambda as a dev dependency. This is because we will be using a Lambda layer that comes pre-installed with this module.
Now we need to configure the serverless.yml file. We have provided a sample file with comments, most of the parameters are the same as the official documentation.
# serverless.yml service: lambdaScreenshot custom: # change this name to something unique s3Bucket: screenshot-files provider: name: aws region: us-east-1 versionFunctions: false # here we put the layers we want to use layers: # Google Chrome for AWS Lambda as a layer # Make sure you use the latest version depending on the region # https://github.com/shelfio/chrome-aws-lambda-layer - arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:10 # function parameters runtime: nodejs12.x memorySize: 2048 # recommended timeout: 30 iamRoleStatements: - Effect: Allow Action: - s3:PutObject - s3:PutObjectAcl Resource: arn:aws:s3:::${self:custom.s3Bucket}/* functions: capture: handler: src/capture.handler environment: S3_REGION: ${self:provider.region} S3_BUCKET: ${self:custom.s3Bucket} resources: Resources: # Bucket where the screenshots are stored screenshotsBucket: Type: AWS::S3::Bucket DeletionPolicy: Delete Properties: BucketName: ${self:custom.s3Bucket} AccessControl: Private # Grant public read-only access to the bucket screenshotsBucketPolicy: Type: AWS::S3::BucketPolicy Properties: PolicyDocument: Statement: - Effect: Allow Action: - s3:GetObject Principal: "*" Resource: arn:aws:s3:::${self:custom.s3Bucket}/* Bucket: Ref: screenshotsBucket
Finally, we can write the function that will run on AWS Lambda.
In the code given below, we use puppeteer to spin up a headless chromium instance, go to a page, and take a screenshot. Next, we upload the screenshot to the S3 client for storage. Finally, we return the URL of the uploaded screenshot file.
// src/capture.js // this module will be provided by the layer const chromeLambda = require("chrome-aws-lambda"); // aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes const S3Client = require("aws-sdk/clients/s3"); // create an S3 client const s3 = new S3Client({ region: process.env.S3_REGION }); // The function to run exports.handler = async (event) => { // launch a headless browser const browser = await chromeLambda.puppeteer.launch({ args: chromeLambda.args, defaultViewport: chromium.defaultViewport, executablePath: await chromeLambda.executablePath }); // Open a page and navigate to the url const page = await browser.newPage(); await page.goto(event.url); // take a screenshot const buffer = await page.screenshot() // upload the image using the current timestamp as filename const result = await s3 .upload({ Bucket: process.env.S3_BUCKET, Key: `${Date.now()}.png`, Body: buffer, ContentType: "image/png", ACL: "public-read" }) .promise(); // return the uploaded image url return { url: result.Location }; };
Lastly, we can deploy the function to AWS using the Serverless command:
sls deploy
To test the function, you can go to the AWS console by going to “Configure test events” and entering a test URL in the textbox:
{
"url": "https://example.com/"
}
Click “Create” and then click “Test”. You should see your test running and, after a few seconds, you will have the URL to your screenshot.
RELATED TAGS
CONTRIBUTOR
View all Courses