Deploying Mozilla DeepSpeech models to AWS Lambda using Serverless


In recent years end-to-end neural networks have become a common approach for speech recognition. Mozilla’s open-source DeepSpeech is a popular implementation of such a system . It comes with a pretrained model, has Python and Javascript bindings, and can also run on ARM processors.

In this tutorial, we will be deploying DeepSpeech to AWS Lambda using the Serverless framework. One caveat of this approach is that the language model is too large to meet AWS Lambda’s size requirements. If your goal is recognition of a small vocabulary, one option would be to create a smaller language model. For this tutorial, we will be deploying the pre-trained model without a language model. If you need a larger LM you’ll probably have to look at other approaches to serving DeepSpeech models.

This tutorial will go through everything step-by-step, but if you prefer just seeing the code it’s here on Github.

Let’s get started!

Setting up Serverless

The first step is to sign up for accounts with AWS and Serverless if you don’t have accounts with them already.

Next, we need to install Node.js. On Ubuntu, I did this using the following commands:

Now we can install serverless by running the following command (you might have to run as sudo):

You can then double-check that it is installed by running:

Next, give your Serverless account access to your AWS account by following the instructions here:

Finally, make sure you’re logged in to serverless on your computer by running

Installing Deepspeech

I usually use Anaconda to manage my python environments. If you also use Anaconda you can create a new python environment for the project by running the following command:

and then activate the environment:

Now let’s install DeepSpeech by running

This should install version 0.5.1 as of the writing of this tutorial. Also, install scipy by running:

Creating a Serverless Project

Next, let’s create a directory for our project:

and create the serverless project in the directory by running:

The previous command sets up a file as well as a serverless.yml config file. You can take a look at them to see how they work and what the different options are for the config file.

Now let’s rename the to as well as the handler function:

We also need to update the serverless.yml

Set the org name to your serverless org name, and the app name to the serverless app name that you set up in the serverless dashboard.

Now we are ready to test deploying our serverless app to AWS Lambda:

When the deployment is complete it should display an info message containing the endpoint, which should look like: https://<some id>

You can test the endpoint by running:

and verify that it returns a JSON object containing “message”: “Go Serverless v1.0! Your function executed successfully”.

Adding DeepSpeech to the Serverless Project

The next step is to add DeepSpeech to our inferHandler.

First, save the python dependencies into a requirements.txt file by running

and add the serverless plugin that sets up the python dependencies:

We also need to add this block to our serverless.yml file:

This custom block makes serverless zip up the dependencies and slims down any extras that aren’t needed. This is important because the DeepSpeech model takes up most of the 250 Mb upload limit. You also are going to need docker installed for local testing using the dockerizePip option.

Now, create a model folder and copy the output_graph.pbmm and alphabet.txt files into the folder. The final directory structure should look like:

Updating the Handler Function

Here is the updated code for a handler function that takes base64 encoded wav data, and returns the recognized text:

Now, we can deploy the new function by running:

And we should be good to go! We can test that it’s working by posting a wav file from the terminal using curl:


That pretty much sums up how to get Mozilla Deep Speech running on AWS Lambda. If you are planning to use this in production it is probably a good idea to add some error handling as well as a proper production deployment with serverless.

In a future tutorial, I will also demonstrate how to create a custom language model that’s small enough to meet lambda’s storage requirements.

Get in Contact

I am also a consultant who specializes in Speech Recognition, Machine Learning and AI. I would be glad to help you! You can find my contact info at




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lukas Grasse

Lukas Grasse

CTO and Co-Founder of Reverb Robotics Inc | Machine Learning and AI Consultant | Ph.D. Student in Neuroscience @ U of L.