Scaling DeepSpeech to deal with many concurrent requests

Scaling seems to be the topic of the week, this is the third post about it in 24 hours :slight_smile: Why don’t we collect ideas in this post. Check the other two here and here.

@Paul_Raine, @Dsa, @aphorist13 please continue to post here, so we have all ideas on concurrency and scaling in one place.

@utunga I guess you have a bigger installation running, any ideas on high performance? And @lissyx or @reuben, do you have any input on how to scale DeepSpeech?

For starters, we have a couple smaller virtual CPU servers running that simply get jobs from a selfmade balancer. But it is not time critical, so it is ok for us if it takes a couple minutes.

Summary for future reference:

  • The underlying libdeepspeech.so can be accessed concurrently on CPU via the native bindings. You would have to manage the processes yourself.
1 Like