Hey Everyone, i am looking to host a trained DeepSpeech model on a virtual server
1.) To detect Real-Time audio stream recognition through mic, any recommendations on server configuration.
2.) Some steps to avoid latency issue and if there is anything else i should consider.
3.) Any guidance / link’s on building and deploying it to virtual server is also very helpful. Thanks in advance.
Please read the docs before you post. Here is a start for server. Generally you are fine with 1-2 CPUs and 3-4 GB RAM and Ubuntu 18, 20 has newer Python versions which weren’t supported up until now.
Got a idea for configuration thanks, i do have some follow-up questions 1.) Let’s say i have a speech stream for a hour and am cutting the audio for 10 secs and passing it to model-1 then the next 10 seconds to model-2 while model-1 completes inference and get a continuous output, to implement this should i go with threading (multiprocessing) 2.) Any code-snippets to the above would help or what are my options with / without GPU.
DeepSpeech is not yet able to handle mulitprocessing well. Search the forum, there was a similar post today. Your approach sounds good, cutting the input well will be your major problem. Let us know if you find a good solution and follow the other post.
Sure will let you know if i find a solution, also will keep this post open so if someone else has done it in a different approach and guide me to my goal.