Deploying Mozilla DeepSpeech models to AWS Lambda using Serverless

Lukas Grasse
5 min readSep 9, 2019



In recent years end-to-end neural networks have become a common approach for speech recognition. Mozilla’s open-source DeepSpeech is a popular implementation of such a system . It comes with a pretrained model, has Python and Javascript bindings, and can also run on ARM processors.

In this tutorial, we will be deploying DeepSpeech to AWS Lambda using the Serverless framework. One caveat of this approach is that the language model is too large to meet AWS Lambda’s size requirements. If your goal is recognition of a small vocabulary, one option would be to create a smaller language model. For this tutorial, we will be deploying the pre-trained model without a language model. If you need a larger LM you’ll probably have to look at other approaches to serving DeepSpeech models.

This tutorial will go through everything step-by-step, but if you prefer just seeing the code it’s here on Github.

Let’s get started!

Setting up Serverless

The first step is to sign up for accounts with AWS and Serverless if you don’t have accounts with them already.

Next, we need to install Node.js. On Ubuntu, I did this using the following commands:

$ sudo apt-get install nodejs$ sudo apt-get install npm

Now we can install serverless by running the following command (you might have to run as sudo):

npm install -g serverless

You can then double-check that it is installed by running:

$ serverless --version

Next, give your Serverless account access to your AWS account by following the instructions here:

Finally, make sure you’re logged in to serverless on your computer by running

$ serverless login

Installing Deepspeech

I usually use Anaconda to manage my python environments. If you also use Anaconda you can create a new python environment for the project by running the following command:

$ conda create -n deepspeech_lambda python=3.6

and then activate the environment:

$ conda activate deepspeech_lambda

Now let’s install DeepSpeech by running

$ pip install deepspeech

This should install version 0.5.1 as of the writing of this tutorial. Also, install scipy by running:

$ pip install scipy

Creating a Serverless Project

Next, let’s create a directory for our project:

$ mkdir deepspeech_lambda && cd deepspeech_lambda

and create the serverless project in the directory by running:

$ serverless create --template aws-python

The previous command sets up a file as well as a serverless.yml config file. You can take a look at them to see how they work and what the different options are for the config file.

Now let’s rename the to as well as the handler function:

import jsondef inferHandler(event, context):
body = {
“message”: “Go Serverless v1.0! Your function executed successfully!”,
“input”: event
response = {
“statusCode”: 200,
“body”: json.dumps(body)
return response

We also need to update the serverless.yml

service: deepspeech-lambda-demo
app: deepspeechlambda
org: lukasgrasse
name: aws
runtime: python3.6
stage: dev
region: us-east-1
handler: infer.inferHandler
timeout: 30
- http:
path: infer
method: post

Set the org name to your serverless org name, and the app name to the serverless app name that you set up in the serverless dashboard.

Now we are ready to test deploying our serverless app to AWS Lambda:

$ serverless deploy -v

When the deployment is complete it should display an info message containing the endpoint, which should look like: https://<some id>

You can test the endpoint by running:

$ curl -X POST https://<some id>

and verify that it returns a JSON object containing “message”: “Go Serverless v1.0! Your function executed successfully”.

Adding DeepSpeech to the Serverless Project

The next step is to add DeepSpeech to our inferHandler.

First, save the python dependencies into a requirements.txt file by running

pip freeze > requirements.txt

and add the serverless plugin that sets up the python dependencies:

serverless plugin install -n serverless-python-requirements

We also need to add this block to our serverless.yml file:

dockerizePip: true
slim: true
zip: true

This custom block makes serverless zip up the dependencies and slims down any extras that aren’t needed. This is important because the DeepSpeech model takes up most of the 250 Mb upload limit. You also are going to need docker installed for local testing using the dockerizePip option.

Now, create a model folder and copy the output_graph.pbmm and alphabet.txt files into the folder. The final directory structure should look like:

├──├── model│   ├── alphabet.txt│   └── output_graph.pbmm├── package-lock.json├── package.json├── requirements.txt└── serverless.yml

Updating the Handler Function

Here is the updated code for a handler function that takes base64 encoded wav data, and returns the recognized text:

import unzip_requirements
except ImportError:
from deepspeech import Model, printVersions
import json
import base64
import io
import numpy as np
import scipy
ds = Model('model/output_graph.pbmm' , N_FEATURES, N_CONTEXT, 'model/alphabet.txt' , BEAM_WIDTH)def inferHandler(event, context):
body = json.loads(event['body'])
content = base64.b64decode(body['content'])bytes = io.BytesIO(content)samplerate, data = = ds.stt(data, samplerate)

response = {
"statusCode": 200,
"body": recognized_text

return response

Now, we can deploy the new function by running:

$ serverless deploy -v

And we should be good to go! We can test that it’s working by posting a wav file from the terminal using curl:

(echo -n '{"content": "'; base64 test.wav; echo '"}') | curl -H "Content-Type: application/json" -d @- https://<some id>


That pretty much sums up how to get Mozilla Deep Speech running on AWS Lambda. If you are planning to use this in production it is probably a good idea to add some error handling as well as a proper production deployment with serverless.

In a future tutorial, I will also demonstrate how to create a custom language model that’s small enough to meet lambda’s storage requirements.

Get in Contact

I am also a consultant who specializes in Speech Recognition, Machine Learning and AI. I would be glad to help you! You can find my contact info at




Lukas Grasse

CTO and Co-Founder of Reverb Robotics Inc | Machine Learning and AI Consultant | Ph.D. Student in Neuroscience @ U of L.