Why to use custom docker images in AWS lambda ?¶
- AWS lambda provides runtimes for programming languages like
python
,java
,ruby
, etc. - Let's see list of runtimes that AWS lambda currently supports.
- Python runtimes
- python3.9
- python3.8
- python3.7
- python3.6
- Node.js runtimes
- nodejs16.x
- nodejs14.x
- nodejs12.x
- Ruby runtimes
- ruby2.7
- Java runtimes
- java11
- java8.al2
- java8
- Go runtimes
- go1.x
-
.NET runtimes
- dotnet6
- dotnetcore3.1
-
If we want a runtime that requires both
python
andjava
programming languages then we will need to use the custom runtime for aws lambda function. Because AWS currently didn't have a support for that.
Pre-requisites¶
using custom runtime docker container with AWS lambda & serverless framework¶
- we will be using serverless framework to deploy our aws lambda function which uses custom docker container.
- Let's say we have a usecase that we want to extract the tables from PDF using
tabula-py
- To do that we need a python and java development environments as tabula-py is a wrapper java-tabula.
- Let's create severless configuration and Dockerfile to solve above problem.
Directory structure¶
.
├── pdf-table-extract
│ ├── Dockerfile
│ ├── main.py
│ └── requirements.txt
└── serverless.yml
Let's define these files
serverless.yml
service: custom-service
provider:
name: aws
ecr:
# In this section you can define images that will be built locally and uploaded to ECR
images:
extract_pdf_tables:
path: ./pdf-table-extract
functions:
pdf_tables_to_json:
image:
name: extract_pdf_tables
pdf-table-extract/Dockerfile
FROM public.ecr.aws/lambda/python:3.9-x86_64
RUN yum install -y java-17-amazon-corretto
COPY requirements.txt ${LAMBDA_TASK_ROOT}/requirements.txt
RUN pip3 install -r ${LAMBDA_TASK_ROOT}/requirements.txt
COPY . ${LAMBDA_TASK_ROOT}
CMD [ "main.lambda_handler" ]
pdf-table-extract/requirements.txt
certifi==2022.5.18.1
charset-normalizer==2.0.12
distro==1.7.0
idna==3.3
numpy==1.22.4
pandas==1.4.2
python-dateutil==2.8.2
pytz==2022.1
requests==2.27.1
six==1.16.0
tabula-py==2.3.0
urllib3==1.26.9
pdf-table-extract/main.py
import tabula
def lambda_handler(event, context):
cols = ["col1", "col2"]
url = "https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf"
pdf_df = tabula.read_pdf(url)
df = pdf_df[0]
df.columns = cols
df.fillna('', inplace=True)
return {"data": df.to_dict("records")}
if __name__ == "__main__":
print(lambda_handler({}, {}))
Now, Let's deploy it to aws lambda with below commands.
sls deploy
It will deploy the lambda function pdf_tables_to_json
to AWS cloud. To verify it login to your aws console and goto lambda functions and search for the function pdf_tables_to_json
and test it.
Note: It will throw an error if your computer did not have the docker and serverless installed