TUTORIAL: Deploying DeepSpeech to AWS Lambda

Hi Everyone!

Recently I have been playing with running DeepSpeech on AWS Lambda, and I put together a quick tutorial with instructions on how to do so: https://medium.com/@lukasgrasse/deploying-mozilla-deepspeech-models-to-aws-lambda-using-serverless-b5405ccd546b. Hopefully this will be useful for anyone also trying to do this.

Cheers,
Lukas

6 Likes

Thanks! I’m having a look at your blog post :slight_smile:

FTR, we’ve got people hitting issues when using conda, glad to see it works for you.

@lukasgrasse.
Hi, friend. Thanks for your tuto.
Vincent

Fascinating post Lukas, appreciate it. I have a couple questions if you don’t mind.

  1. If you have an output_graph.pbmm (for a custom language model) that is 181M in size do you think it still might fit within the 250M overall limit you mentioned?

  2. Sorry if this is a general purpose question about serverless, but how does the start up time work in this scenario. When hosting our models via traditional flask/docker method we find that a good few seconds required to load the model into memory and then a few more secs to do the actual transcription. So, obviously, we try to load it only once, then use it. Is that something that would happen ‘automatically’ in a serverless environment or are you looking at a model-load-time cost on every inference (every transcription) ??

  3. How does serverless relate to GPU can you use serverless approach but specify elastic GPU in the back end?

Sorry for all the newbie questions! Appreciate any of your thoughts on this though and thanks again for writing this up!

–

PS I see that you can run things to keep your serverless lambdas ‘warm’ but I guess wondering if that ‘warmness’ is actually going to carry over to the extent of keeping the actual tensorflow weights model in memory somewhere…

1 Like

Any one else give this a try? Keen to hear about “start up time” e.g. if you’re the first person to query this Lambda API, what’s the response time like?