Puppeteer on AWS Lambda

Puppeteer is a Node library commonly used for browser automation tasks. In this shot we will cover how to run puppeteer on AWS Lambda, a serverless computing platform, and how to save screenshots in AWS S3Amazon’s storage service (as an example).

Setup

In this shot we will be using the Serverless framework and Lambda layers.

We assume that you have a configured environment with AWS CLI installed, proper AWS credentials, and Serverless.

Let’s begin by setting up the directory and initializing a Node.js project:

mkdir -p puppeteer-lambda/src
cd puppeteer-lambda
npm init -y

Next, we need to install Serverless as a dev dependency:

npm install --save-dev serverless

Now you may want to install a puppeteer and use it directly, but the puppeteer package size is quite large, and you may run into problems running it on AWS Lambda.

A better method is to use chrome-aws-lambda. This module comes with the chromium binary for the Lambda environment, making it suitable for our project.

Let’s install the chrome-aws-lambda module:

npm install --save-dev chrome-aws-lambda

Notice that we installed chrome-aws-lambda as a dev dependency. This is because we will be using a Lambda layer that comes pre-installed with this module.

# serverless.yml
service: lambdaScreenshot
custom:
  # change this name to something unique
  s3Bucket: screenshot-files
provider:
  name: aws
  region: us-east-1
  versionFunctions: false
  # here we put the layers we want to use
  layers:
    # Google Chrome for AWS Lambda as a layer
    # Make sure you use the latest version depending on the region
    # https://github.com/shelfio/chrome-aws-lambda-layer
    - arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:10
  # function parameters
  runtime: nodejs12.x
  memorySize: 2048 # recommended
  timeout: 30
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:PutObject
        - s3:PutObjectAcl
      Resource: arn:aws:s3:::${self:custom.s3Bucket}/*
functions:
  capture:
    handler: src/capture.handler
    environment:
      S3_REGION: ${self:provider.region}
      S3_BUCKET: ${self:custom.s3Bucket}
resources:
  Resources:
    # Bucket where the screenshots are stored
    screenshotsBucket:
      Type: AWS::S3::Bucket
      DeletionPolicy: Delete
      Properties:
        BucketName: ${self:custom.s3Bucket}
        AccessControl: Private
    # Grant public read-only access to the bucket
    screenshotsBucketPolicy:
      Type: AWS::S3::BucketPolicy
      Properties:
        PolicyDocument:
          Statement:
            - Effect: Allow
              Action:
                - s3:GetObject
              Principal: "*"
              Resource: arn:aws:s3:::${self:custom.s3Bucket}/*
        Bucket:
          Ref: screenshotsBucket

// src/capture.js
// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");
// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");
// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });
// The function to run
exports.handler = async (event) => {
  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromeLambda.executablePath
  });
  // Open a page and navigate to the url
  const page = await browser.newPage();
  await page.goto(event.url);
  // take a screenshot
  const buffer = await page.screenshot()
  // upload the image using the current timestamp as filename
  const result = await s3
    .upload({
      Bucket: process.env.S3_BUCKET,
      Key: `${Date.now()}.png`,
      Body: buffer,
      ContentType: "image/png",
      ACL: "public-read"
    })
    .promise();
  // return the uploaded image url
  return { url: result.Location };
};

Deploy and test

Lastly, we can deploy the function to AWS using the Serverless command:

sls deploy

To test the function, you can go to the AWS console by going to “Configure test events” and entering a test URL in the textbox:

{
  "url": "https://example.com/"
}

Click “Create” and then click “Test”. You should see your test running and, after a few seconds, you will have the URL to your screenshot.

Puppeteer on AWS Lambda

Setup

Configuring Serverless file

Writing the function

Deploy and test