Capacity need for real time

pete · March 24, 2019, 9:24am

Hello!

I am asking your opinion about the following subject:

We have 10 on going phonecalls (8000hz, mono, average phonecall lenght is 5min) all the time. If we would like to do speech to text near realtime, what kind of capacity requirements we need ? How many CPU’s it takes to handle this kind of load ? Does GPU do any big difference in decoding ? Can deepspeech handle streaming data (wavs).

We are planning to do realtime assistant and we need to calculate costs to run it (AWS, google … )

convert phonecall to 16000hz and mono
stream converted audio to model …
save results to sql … or something

All guesses are welcome or even better if you have some experience from this kind of case.

Thanks in advance!

tuttlebr · March 24, 2019, 1:31pm

DeepSpeech already converts audio using SoX to the correct format (see: line 42 of native_client/python/client.py)
The memory mapped model can save resources and can be generated easily from the task cluster py file.
As far as benchmarks, I’ve run without gpu on an ultra book (i7 vPro chip) and inference time was ~.6 seconds for samples ~5 seconds in length. This assumes the model is already loaded to memory as a rest API.

Hope that helps.

Edit: just saw you wanted streaming, sorry. Maybe the streaming examples in the Deepspeech repo would help. I don’t think they convert on the fly.

pete · March 24, 2019, 1:44pm

Hello, and thanks for quick reply.

I think that “streaming part” can be made in different way: Split audio lenght of 5min to 10 sec pieces and do S2T to this small wav, write results somewhere, and receive next 10 sec piece … so, referring to your numbers, that 10 sec might take 1-2 sec to process, which is fast enough …

tuttlebr · March 24, 2019, 2:28pm

My devils advocate response is that you’d want to consider voice activity detection based chunks instead of geometrical chunks of audio.

Topic		Replies	Views
DeepSpeech cloud architecture DeepSpeech	2	785	April 16, 2018
Recommended configurations for hosting a DeepSpeech model in a Virtual Private Server (DigitalOcean) DeepSpeech participation , learning	6	1080	November 10, 2020
Deepspeech for production server DeepSpeech	13	2362	June 7, 2019
CPU bottleneck when inferencing with GPU? DeepSpeech	4	535	June 23, 2020
What will be the minimum hardware requirement for processing 50000 audio files? DeepSpeech learning , feedback	4	650	August 14, 2020

Capacity need for real time

Related topics