Is there a way to load the model only once?

Hello everybody,

When I use the pre-trained model, it outputs the following:

Loaded model in 1.678s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 5.425s.
Running inference.
Inference took 21.084s for 3.990s audio file.

Is there a way to load the model and maybe also the language model only once, before submitting audio files? Because for example, when 10 users want to use the service one after the other, the model gets also loaded 10 times.
I am building a transcription service based on the DeepSpeech library and this could lead to a significant performance improvement.

Regards,
Niklas

Share us code, because that’s trivial to do, and this is already what we do in native_client/client.cc for the multi-WAV usecase (multiple WAV files in one directory).

ds-srv in Rust also does it: https://gitlab.com/deepspeech/ds-srv/blob/5ba4bf0bf338aeb7c2b186cb2f6d65fd72eff4c0/src/inference.rs#L56-74 https://gitlab.com/deepspeech/ds-srv/blob/5ba4bf0bf338aeb7c2b186cb2f6d65fd72eff4c0/src/inference.rs#L210

1 Like

That seems very slow, what’s your hardware ?

2014 Macbook Pro with 2 GHz Intel Core i7, 8 GB RAM, Intel Iris Pro 1536 MB.

Thanks for your quick answer Lissyx.
Unfortunately, I can’t share the code with you, except a few lines.

How can I change the following command (in python), so that it loads the model only once?

subprocess.Popen("deepspeech models/output_graph.pb {} models/alphabet.txt 
                                   "models/lm.binary "
                     "models/trie > {}".format(audio_name, result_name), shell=True).wait()

Well, if you use directly the binary or the CLI python tool through forks, there’s no help we can provide.

Why don’t you just look at the client.py and get inspired instead of just forking like that?

1 Like
1 Like

You are also using the non mmap-able file format, so it means lots of memory allocations. We have a warning in place for that.

Just for clarification: You mean I should use the command-line client instead of the python package and change the native-client “client.py” script, so that it loads the model only once?

No, I’m saying you should not call subprocess and just directly import deepspeech in your codebase. This way you can control loading of the model and separately inference. And you can just look at client.py to see the code to use :slight_smile:

1 Like

Thanks for you quick answers!
I am having a little issue: Where do you have these functions from? “from deepspeech import Model, printVersions”

I’m not sure I understand your question … But you should look inside native_client/python/ …

Yes I did. But I can’t import these functions like you do in the script. Whether I just run the script or when I try to import them in the terminal.

It would help if you could be a bit more verbose and share the error … Everything you need is in native_client/python/ …

Sorry if I am unprecise.

For example when I run “DeepSpeech/native_client/python/client.py” it outputs:

Traceback (most recent call last):

File "client.py", line 12, in <module>

from deepspeech import Model, printVersions

ImportError: cannot import name 'Model'

Can you document what you do ? Like virtualenv setup, pip steps, etc. ?

Deepspeech is installed on my machine and I also upgraded it today. For example, when I open a python shell, I can do “import deepspeech” without any issues.

Is that what you mean?

Yes, but it’s still very blurry … Like, which version of DeepSpeech is installed ? And how did you installed that ?

This is heavily tested code, it works for sure.

Thanks for your patience. I have version 0.2.0 and I installed it via pip.

Can you ensure it’s available when you try the client.py code? The error suggests it’s not available at this moment. Again, a proper description of your STR would save everyone a lot of time.