Serverless Framework Custom Docker Containers For AWS Lambda

Why to use custom docker images in AWS lambda ?

  • AWS lambda provides runtimes for programming languages like python, java, ruby, etc.
  • Let's see list of runtimes that AWS lambda currently supports.
  • Python runtimes
    • python3.9
    • python3.8
    • python3.7
    • python3.6
  • Node.js runtimes
    • nodejs16.x
    • nodejs14.x
    • nodejs12.x
  • Ruby runtimes
    • ruby2.7
  • Java runtimes
    • java11
    • java8.al2
    • java8
  • Go runtimes
    • go1.x
  • .NET runtimes

    • dotnet6
    • dotnetcore3.1
  • If we want a runtime that requires both python and java programming languages then we will need to use the custom runtime for aws lambda function. Because AWS currently didn't have a support for that.

Pre-requisites

using custom runtime docker container with AWS lambda & serverless framework

  • we will be using serverless framework to deploy our aws lambda function which uses custom docker container.
  • Let's say we have a usecase that we want to extract the tables from PDF using tabula-py
  • To do that we need a python and java development environments as tabula-py is a wrapper java-tabula.
  • Let's create severless configuration and Dockerfile to solve above problem.

Directory structure

.
├── pdf-table-extract
│   ├── Dockerfile
│   ├── main.py
│   └── requirements.txt
└── serverless.yml

Let's define these files

serverless.yml

service: custom-service

provider:
  name: aws
  ecr:
    # In this section you can define images that will be built locally and uploaded to ECR
    images:
      extract_pdf_tables:
        path: ./pdf-table-extract

functions:
  pdf_tables_to_json:
    image:
      name: extract_pdf_tables

pdf-table-extract/Dockerfile

FROM public.ecr.aws/lambda/python:3.9-x86_64
RUN yum install -y java-17-amazon-corretto
COPY requirements.txt ${LAMBDA_TASK_ROOT}/requirements.txt
RUN pip3 install -r ${LAMBDA_TASK_ROOT}/requirements.txt
COPY . ${LAMBDA_TASK_ROOT}
CMD [ "main.lambda_handler" ]

pdf-table-extract/requirements.txt

certifi==2022.5.18.1
charset-normalizer==2.0.12
distro==1.7.0
idna==3.3
numpy==1.22.4
pandas==1.4.2
python-dateutil==2.8.2
pytz==2022.1
requests==2.27.1
six==1.16.0
tabula-py==2.3.0
urllib3==1.26.9

pdf-table-extract/main.py

import tabula


def lambda_handler(event, context):
    cols = ["col1", "col2"]
    url = "https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf"
    pdf_df = tabula.read_pdf(url)
    df = pdf_df[0]
    df.columns = cols
    df.fillna('', inplace=True)
    return {"data": df.to_dict("records")}

if __name__ == "__main__":
    print(lambda_handler({}, {}))

Now, Let's deploy it to aws lambda with below commands.

sls deploy

It will deploy the lambda function pdf_tables_to_json to AWS cloud. To verify it login to your aws console and goto lambda functions and search for the function pdf_tables_to_json and test it.

Note: It will throw an error if your computer did not have the docker and serverless installed

References: