Deploying Mozilla DeepSpeech models to AWS Lambda using Serverless


In this tutorial, we will be deploying DeepSpeech to AWS Lambda using the Serverless framework. One caveat of this approach is that the language model is too large to meet AWS Lambda’s size requirements. If your goal is recognition of a small vocabulary, one option would be to create a smaller language model. For this tutorial, we will be deploying the pre-trained model without a language model. If you need a larger LM you’ll probably have to look at other approaches to serving DeepSpeech models.

This tutorial will go through everything step-by-step, but if you prefer just seeing the code it’s here on Github.

Let’s get started!

Setting up Serverless

Next, we need to install Node.js. On Ubuntu, I did this using the following commands:

$ sudo apt-get install nodejs$ sudo apt-get install npm

Now we can install serverless by running the following command (you might have to run as sudo):

npm install -g serverless

You can then double-check that it is installed by running:

$ serverless --version

Next, give your Serverless account access to your AWS account by following the instructions here:

Finally, make sure you’re logged in to serverless on your computer by running

$ serverless login

Installing Deepspeech

$ conda create -n deepspeech_lambda python=3.6

and then activate the environment:

$ conda activate deepspeech_lambda

Now let’s install DeepSpeech by running

$ pip install deepspeech

This should install version 0.5.1 as of the writing of this tutorial. Also, install scipy by running:

$ pip install scipy

Creating a Serverless Project

$ mkdir deepspeech_lambda && cd deepspeech_lambda

and create the serverless project in the directory by running:

$ serverless create --template aws-python

The previous command sets up a file as well as a serverless.yml config file. You can take a look at them to see how they work and what the different options are for the config file.

Now let’s rename the to as well as the handler function:

import jsondef inferHandler(event, context):
body = {
“message”: “Go Serverless v1.0! Your function executed successfully!”,
“input”: event
response = {
“statusCode”: 200,
“body”: json.dumps(body)
return response

We also need to update the serverless.yml

service: deepspeech-lambda-demo
app: deepspeechlambda
org: lukasgrasse
name: aws
runtime: python3.6
stage: dev
region: us-east-1
handler: infer.inferHandler
timeout: 30
- http:
path: infer
method: post

Set the org name to your serverless org name, and the app name to the serverless app name that you set up in the serverless dashboard.

Now we are ready to test deploying our serverless app to AWS Lambda:

$ serverless deploy -v

When the deployment is complete it should display an info message containing the endpoint, which should look like: https://<some id>

You can test the endpoint by running:

$ curl -X POST https://<some id>

and verify that it returns a JSON object containing “message”: “Go Serverless v1.0! Your function executed successfully”.

Adding DeepSpeech to the Serverless Project

First, save the python dependencies into a requirements.txt file by running

pip freeze > requirements.txt

and add the serverless plugin that sets up the python dependencies:

serverless plugin install -n serverless-python-requirements

We also need to add this block to our serverless.yml file:

dockerizePip: true
slim: true
zip: true

This custom block makes serverless zip up the dependencies and slims down any extras that aren’t needed. This is important because the DeepSpeech model takes up most of the 250 Mb upload limit. You also are going to need docker installed for local testing using the dockerizePip option.

Now, create a model folder and copy the output_graph.pbmm and alphabet.txt files into the folder. The final directory structure should look like:

├──├── model│   ├── alphabet.txt│   └── output_graph.pbmm├── package-lock.json├── package.json├── requirements.txt└── serverless.yml

Updating the Handler Function

import unzip_requirements
except ImportError:
from deepspeech import Model, printVersions
import json
import base64
import io
import numpy as np
import scipy
ds = Model('model/output_graph.pbmm' , N_FEATURES, N_CONTEXT, 'model/alphabet.txt' , BEAM_WIDTH)def inferHandler(event, context):
body = json.loads(event['body'])
content = base64.b64decode(body['content'])bytes = io.BytesIO(content)samplerate, data = = ds.stt(data, samplerate)

response = {
"statusCode": 200,
"body": recognized_text

return response

Now, we can deploy the new function by running:

$ serverless deploy -v

And we should be good to go! We can test that it’s working by posting a wav file from the terminal using curl:

(echo -n '{"content": "'; base64 test.wav; echo '"}') | curl -H "Content-Type: application/json" -d @- https://<some id>


In a future tutorial, I will also demonstrate how to create a custom language model that’s small enough to meet lambda’s storage requirements.

Get in Contact




CTO and Co-Founder of Reverb Robotics Inc | Machine Learning and AI Consultant | Ph.D. Student in Neuroscience @ U of L.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lukas Grasse

CTO and Co-Founder of Reverb Robotics Inc | Machine Learning and AI Consultant | Ph.D. Student in Neuroscience @ U of L.