Deploying Mozilla DeepSpeech models to AWS Lambda using Serverless
Intro
In recent years end-to-end neural networks have become a common approach for speech recognition. Mozilla’s open-source DeepSpeech is a popular implementation of such a system . It comes with a pretrained model, has Python and Javascript bindings, and can also run on ARM processors.
In this tutorial, we will be deploying DeepSpeech to AWS Lambda using the Serverless framework. One caveat of this approach is that the language model is too large to meet AWS Lambda’s size requirements. If your goal is recognition of a small vocabulary, one option would be to create a smaller language model. For this tutorial, we will be deploying the pre-trained model without a language model. If you need a larger LM you’ll probably have to look at other approaches to serving DeepSpeech models.
This tutorial will go through everything step-by-step, but if you prefer just seeing the code it’s here on Github.
Let’s get started!
Setting up Serverless
The first step is to sign up for accounts with AWS and Serverless if you don’t have accounts with them already.
Next, we need to install Node.js. On Ubuntu, I did this using the following commands:
$ sudo apt-get install nodejs$ sudo apt-get install npm
Now we can install serverless by running the following command (you might have to run as sudo):
npm install -g serverless
You can then double-check that it is installed by running:
$ serverless --version
Next, give your Serverless account access to your AWS account by following the instructions here: https://serverless.com/framework/docs/providers/aws/guide/installation/
Finally, make sure you’re logged in to serverless on your computer by running
$ serverless login
Installing Deepspeech
I usually use Anaconda to manage my python environments. If you also use Anaconda you can create a new python environment for the project by running the following command:
$ conda create -n deepspeech_lambda python=3.6
and then activate the environment:
$ conda activate deepspeech_lambda
Now let’s install DeepSpeech by running
$ pip install deepspeech
This should install version 0.5.1 as of the writing of this tutorial. Also, install scipy by running:
$ pip install scipy
Creating a Serverless Project
Next, let’s create a directory for our project:
$ mkdir deepspeech_lambda && cd deepspeech_lambda
and create the serverless project in the directory by running:
$ serverless create --template aws-python
The previous command sets up a handler.py file as well as a serverless.yml config file. You can take a look at them to see how they work and what the different options are for the config file.
Now let’s rename the handler.py to infer.py as well as the handler function:
import jsondef inferHandler(event, context):
body = {
“message”: “Go Serverless v1.0! Your function executed successfully!”,
“input”: event
}response = {
“statusCode”: 200,
“body”: json.dumps(body)
}return response
We also need to update the serverless.yml
service: deepspeech-lambda-demo
app: deepspeechlambda
org: lukasgrasseprovider:
name: aws
runtime: python3.6
stage: dev
region: us-east-1functions:
infer:
handler: infer.inferHandler
timeout: 30
events:
- http:
path: infer
method: post
Set the org name to your serverless org name, and the app name to the serverless app name that you set up in the serverless dashboard.
Now we are ready to test deploying our serverless app to AWS Lambda:
$ serverless deploy -v
When the deployment is complete it should display an info message containing the endpoint, which should look like: https://<some id>.execute-api.us-east-1.amazonaws.com/dev/infer
You can test the endpoint by running:
$ curl -X POST https://<some id>.execute-api.us-east-1.amazonaws.com/dev/infer
and verify that it returns a JSON object containing “message”: “Go Serverless v1.0! Your function executed successfully”.
Adding DeepSpeech to the Serverless Project
The next step is to add DeepSpeech to our inferHandler.
First, save the python dependencies into a requirements.txt file by running
pip freeze > requirements.txt
and add the serverless plugin that sets up the python dependencies:
serverless plugin install -n serverless-python-requirements
We also need to add this block to our serverless.yml file:
custom:
pythonRequirements:
dockerizePip: true
slim: true
zip: true
This custom block makes serverless zip up the dependencies and slims down any extras that aren’t needed. This is important because the DeepSpeech model takes up most of the 250 Mb upload limit. You also are going to need docker installed for local testing using the dockerizePip option.
Now, create a model folder and copy the output_graph.pbmm and alphabet.txt files into the folder. The final directory structure should look like:
├── infer.py├── model│ ├── alphabet.txt│ └── output_graph.pbmm├── package-lock.json├── package.json├── requirements.txt└── serverless.yml
Updating the Handler Function
Here is the updated code for a handler function that takes base64 encoded wav data, and returns the recognized text:
try:
import unzip_requirements
except ImportError:
passfrom deepspeech import Model, printVersions
import json
import base64
import io
import numpy as np
import scipy
import scipy.io.wavfileSAMPLE_RATE = 16000
BEAM_WIDTH = 500
N_FEATURES = 26
N_CONTEXT = 9ds = Model('model/output_graph.pbmm' , N_FEATURES, N_CONTEXT, 'model/alphabet.txt' , BEAM_WIDTH)def inferHandler(event, context):
body = json.loads(event['body'])content = base64.b64decode(body['content'])bytes = io.BytesIO(content)samplerate, data = scipy.io.wavfile.read(bytes)recognized_text = ds.stt(data, samplerate)
response = {
"statusCode": 200,
"body": recognized_text
}
return response
Now, we can deploy the new function by running:
$ serverless deploy -v
And we should be good to go! We can test that it’s working by posting a wav file from the terminal using curl:
(echo -n '{"content": "'; base64 test.wav; echo '"}') | curl -H "Content-Type: application/json" -d @- https://<some id>.execute-api.us-east-1.amazonaws.com/dev/infer
Conclusion
That pretty much sums up how to get Mozilla Deep Speech running on AWS Lambda. If you are planning to use this in production it is probably a good idea to add some error handling as well as a proper production deployment with serverless.
In a future tutorial, I will also demonstrate how to create a custom language model that’s small enough to meet lambda’s storage requirements.
Get in Contact
I am also a consultant who specializes in Speech Recognition, Machine Learning and AI. I would be glad to help you! You can find my contact info at https://lukasgrasse.com