Recommended configurations for hosting a DeepSpeech model in a Virtual Private Server (DigitalOcean)

Hey Everyone, i am looking to host a trained DeepSpeech model on a virtual server

1.) To detect Real-Time audio stream recognition through mic, any recommendations on server configuration.
2.) Some steps to avoid latency issue and if there is anything else i should consider.
3.) Any guidance / link’s on building and deploying it to virtual server is also very helpful. Thanks in advance.

Please read the docs before you post. Here is a start for server. Generally you are fine with 1-2 CPUs and 3-4 GB RAM and Ubuntu 18, 20 has newer Python versions which weren’t supported up until now.

2 Likes

Got a idea for configuration thanks, i do have some follow-up questions
1.) Let’s say i have a speech stream for a hour and am cutting the audio for 10 secs and passing it to model-1 then the next 10 seconds to model-2 while model-1 completes inference and get a continuous output, to implement this should i go with threading (multiprocessing)
2.) Any code-snippets to the above would help or what are my options with / without GPU.

DeepSpeech is not yet able to handle mulitprocessing well. Search the forum, there was a similar post today. Your approach sounds good, cutting the input well will be your major problem. Let us know if you find a good solution and follow the other post.

Sure will let you know if i find a solution, also will keep this post open so if someone else has done it in a different approach and guide me to my goal.

I probably did not fully understand your use case - maybe you have a look in streaming server implementations:


Note: both repos are a bit outdated and currently do not support latest DeepSpeech versions

1 Like

Will take a look, this might give me a idea. :+1: