Hey all,
We are currently using deepspeech plus kenlm to offer listen and repeat and read aloud type activities to English language learners.
The problem we’re having is the time it takes for DeepSpeech to generate a response when there are multiple concurrent requests (say 20 or more students all using the system at the same time)
As our tool gains popularity, we will have even more concurrent requests, possibly even hundreds or thousands at a time.
So we need to ensure that DeepSpeech is returning a transcript result ideally within 3 to 5 seconds.
The speech segments are short: about 15 to 30 seconds each.
And we pre-generate custom scorers with kenlm, so that is not a bottle neck here.
We’d like to know if anyone else has scaled up DeepSpeech to comfortably handle 10s or 100s of concurrent requests.
Importantly, whatever solution we adopt mustn’t be prohibitively expensive.
Here are some questions we are considering:
- should we have multiple instances of DS with a load balancer?
- should we run DS on a node server or on Nginx with a PHP bridge (using PHP exec function to trigger DS recognition)
- is there any significant difference in speed between these two solutions?
- should we run DeepSpeech on a compatible GPU
- how much memory/what size cpu are optimal?
Any advice on the above or any other information about how to effectively scale DeepSpeech (on a budget) gratefully received.