Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

aws

Puppeteer on AWS Lambda

Nouman Abbasi

Puppeteer is a Node library commonly used for browser automation tasks. In this shot we will cover how to run puppeteer on AWS Lambda, a serverless computing platform, and how to save screenshots in AWS S3Amazon’s storage service (as an example).

svg viewer

Setup

In this shot we will be using the Serverless framework and Lambda layers.

We assume that you have a configured environment with AWS CLI installed, proper AWS credentials, and Serverless.

Let’s begin by setting up the directory and initializing a Node.js project:

mkdir -p puppeteer-lambda/src
cd puppeteer-lambda
npm init -y

Next, we need to install Serverless as a dev dependency:

npm install --save-dev serverless

Now you may want to install a puppeteer and use it directly, but the puppeteer package size is quite large, and you may run into problems running it on AWS Lambda.

A better method is to use chrome-aws-lambda. This module comes with the chromium binary for the Lambda environment, making it suitable for our project.

Let’s install the chrome-aws-lambda module:

npm install --save-dev chrome-aws-lambda

Notice that we installed chrome-aws-lambda as a dev dependency. This is because we will be using a Lambda layer that comes pre-installed with this module.

Configuring Serverless file

Now we need to configure the serverless.yml file. We have provided a sample file with comments, most of the parameters are the same as the official documentation.

# serverless.yml

service: lambdaScreenshot

custom:
  # change this name to something unique
  s3Bucket: screenshot-files

provider:
  name: aws
  region: us-east-1
  versionFunctions: false
  # here we put the layers we want to use
  layers:
    # Google Chrome for AWS Lambda as a layer
    # Make sure you use the latest version depending on the region
    # https://github.com/shelfio/chrome-aws-lambda-layer
    - arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:10
  # function parameters
  runtime: nodejs12.x
  memorySize: 2048 # recommended
  timeout: 30
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:PutObject
        - s3:PutObjectAcl
      Resource: arn:aws:s3:::${self:custom.s3Bucket}/*

functions:
  capture:
    handler: src/capture.handler
    environment:
      S3_REGION: ${self:provider.region}
      S3_BUCKET: ${self:custom.s3Bucket}

resources:
  Resources:
    # Bucket where the screenshots are stored
    screenshotsBucket:
      Type: AWS::S3::Bucket
      DeletionPolicy: Delete
      Properties:
        BucketName: ${self:custom.s3Bucket}
        AccessControl: Private
    # Grant public read-only access to the bucket
    screenshotsBucketPolicy:
      Type: AWS::S3::BucketPolicy
      Properties:
        PolicyDocument:
          Statement:
            - Effect: Allow
              Action:
                - s3:GetObject
              Principal: "*"
              Resource: arn:aws:s3:::${self:custom.s3Bucket}/*
        Bucket:
          Ref: screenshotsBucket

Writing the function

Finally, we can write the function that will run on AWS Lambda.

In the code given below, we use puppeteer to spin up a headless chromium instance, go to a page, and take a screenshot. Next, we upload the screenshot to the S3 client for storage. Finally, we return the URL of the uploaded screenshot file.

// src/capture.js

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });


// The function to run
exports.handler = async (event) => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromeLambda.executablePath
  });

  // Open a page and navigate to the url
  const page = await browser.newPage();
  await page.goto(event.url);

  // take a screenshot
  const buffer = await page.screenshot()

  // upload the image using the current timestamp as filename
  const result = await s3
    .upload({
      Bucket: process.env.S3_BUCKET,
      Key: `${Date.now()}.png`,
      Body: buffer,
      ContentType: "image/png",
      ACL: "public-read"
    })
    .promise();

  // return the uploaded image url
  return { url: result.Location };
};

Deploy and test

Lastly, we can deploy the function to AWS using the Serverless command:

sls deploy

To test the function, you can go to the AWS console by going to “Configure test events” and entering a test URL in the textbox:

{
  "url": "https://example.com/"
}

Click “Create” and then click “Test”. You should see your test running and, after a few seconds, you will have the URL to your screenshot.

RELATED TAGS

aws

CONTRIBUTOR

Nouman Abbasi
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring