Reusing tensorflow model for multiple inference, without binaries

Hey ya’ll,

First off thanks @lissyx for organizing a great meeting - I’ll reply more in the other thread.

Secondly, bit of a long shot but thought I’d ask.

I’m currently working on proving some pronunciation ideas in pure python, meaning that for inference purposes I’m writing a python program and importing various libraries from and then doing inference.

Obviously it is easy to get up and running in ‘one_shot_infer’ mode but I am trying to write an API that keeps the process in memory and - ideally - i would like to be able to create the model and load the weights into tensorflow/GPU one time only, and then apply different audio data sets.

This is quite doable I’m sure but currently struggling to find the exact point where i can slice between loading_model and loading_data for inference.

To be more specific. Working off of as a template, if you see on line 49 where it calls create_model this is occurs after the step where it loads the audio file.

And its not just order in the file it actually takes the batch_x and seq_len parameters from the iterator which is created from the audio file so it currently seems to need the audio file before it creates the model.

But surely it is possible to create the actual tensorflow graph and load the weights one time only and then apply different data to it? That’s what happens in the binary version, right?

I guess what I’m trying to work out is how to call ‘create_model’ in a generic way that doesn’t relate to the data and then call multiple times with different data.

Given time I think I’ll figure it out but I wondered if there was an example where someone else had done this before?

Just to provide a specific error, if I naively just transcribe multiple times (with different audio data) and try to create_model each time then it gives the following error. Which sort of makes sense, I don’t wan to create_model multiple times, just once.

File "", line 155, in test
    await model.compare_audio("146238.wav", target_sentence)

  File "/DeepSpeech/training/deepspeech_training/", line 181, in create_model
    layers['layer_1'] = layer_1 = dense('layer_1', batch_x, Config.n_hidden_1, dropout_rate=dropout[0])
  File "/DeepSpeech/training/deepspeech_training/", line 80, in dense
    bias = variable_on_cpu('bias', [units], tfv1.zeros_initializer())
  File "/DeepSpeech/training/deepspeech_training/", line 54, in variable_on_cpu
    var = tfv1.get_variable(name=name, shape=shape, initializer=initializer)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/", line 1500, in get_variable
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/", line 1243, in get_variable
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/", line 567, in get_variable
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/", line 519, in _true_getter
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/", line 868, in _get_single_variable
    (err_msg, "".join(traceback.format_list(tb))))
ValueError: Variable layer_1/bias already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? 

Just asking in case other people have already done this and/or will want to do this in future (if I figure it out I’ll reply below).


What works for me is derived from this example:
Edit: actually I took it from the native_client code it self:

I use the sttWithMetadata() method for inference instead of just stt(), to get the timing on the indivual words.

You probably have found it already, but none the less the python API is documented here:

Thanks @SanderE ! Yeah you’ll see that this relies on importing the ‘deepspeech’ pip package. As a result what you are actually doing here is calling the compiled native deepspeech client via it’s python wrapper.

That’s usually the right thing to do. In this case however I’m trying to do some experiments with the raw logits and so I’m trying to just use pure python (ie the deepspeech_training package) and Tensorflow.

Ah ok, lower level poking. In that case I’m afraid I haven’t got any advise.

Although perhaps looking at the way works with TF might help (it is what gets used for the test stage of training and seems to only load the model once).

As @SanderE said, maybe is a better start :slight_smile:

It is, but we use a completely different stack. What you might want to leverage is the checkpoint-loading logic? Loading data is a graph operation itself, so it’s quite tightly bounded.

This is probably the simplest example to follow for that, using feed_dict instead of the full blown feeding pipeline: