When I use the pre-trained model, it outputs the following:
Loaded model in 1.678s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 5.425s.
Running inference.
Inference took 21.084s for 3.990s audio file.
Is there a way to load the model and maybe also the language model only once, before submitting audio files? Because for example, when 10 users want to use the service one after the other, the model gets also loaded 10 times.
I am building a transcription service based on the DeepSpeech library and this could lead to a significant performance improvement.
Regards,
Niklas
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Share us code, because that’s trivial to do, and this is already what we do in native_client/client.cc for the multi-WAV usecase (multiple WAV files in one directory).
Just for clarification: You mean I should use the command-line client instead of the python package and change the native-client “client.py” script, so that it loads the model only once?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
No, I’m saying you should not call subprocess and just directly import deepspeech in your codebase. This way you can control loading of the model and separately inference. And you can just look at client.py to see the code to use
Deepspeech is installed on my machine and I also upgraded it today. For example, when I open a python shell, I can do “import deepspeech” without any issues.
Is that what you mean?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
Yes, but it’s still very blurry … Like, which version of DeepSpeech is installed ? And how did you installed that ?
Thanks for your patience. I have version 0.2.0 and I installed it via pip.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
20
Can you ensure it’s available when you try the client.py code? The error suggests it’s not available at this moment. Again, a proper description of your STR would save everyone a lot of time.