Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?
Thanks a lot
Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?
Thanks a lot
We do release the checkpoints, so you should be able to use them
Thanks for your reply. On the github page, I saw it was possible to download the pretrained model at
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz
which contains the .pb graph. Is that what you are talking about?
Or is there also the model in a .ckpt format somewhere ? If yes, do you have a link ?
Thanks
The very same page, has a link, just above the one you copied, that is the checkpoints: https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-checkpoint.tar.gz
Oh I see, I was in the README page, but it is in the Releases section.
Many thanks !
If I understood well, the pb file contains the graph and the checkpoitn contains the weights of the model. But now, if I want to access the activations of the neurons, I need to put a wav file at the input. But I donât understand where is the preprocessing of the wav file for conversion in mfcc before the input of the model.
the model.py seems to be an interface (swig) with something else I donât know how to access.
Any idea ?
You seem to be mixing inference code (model.py ? there is no such thing) and training code. But we use SWIG to generate the Python bindings.
There is support for single-shot inference in DeepSpeech.py
, but Iâm not sure exactly what you want to do.
So, what do you want to do ? Which activations are you interested in ?
During training, it should all be done from util/feeding.py
, check the call to audiofile_to_input_vector
In native_client/python/client.py, there is ds (deepspeech model I guess)
"ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)"
which is called later âds.stt(audio, fs)â which is in the deepspeech module in the script âmodel.pyâ
what I want is to visualize the activation of the neurons for a given audio file. For example, taking the mean of the activation of every neuron, store them in a matrix and visualize them with an imshow()
Can you describe exactly what you mean by âactivation of the neuronsâ ? Do you want every neuron? Thatâs going to be a lot of weights to deal with.
Anyway, I think you should do that by playing with the single shot inference codepath in DeepSpeech.py
instead, it will be easier to hack for you.
You can write some TensorFlow code to fetch the âlogitsâ node instead of the decoded output. You could modify the do_single_file_inference
function in DeepSpeech.py to fetch 'logits'
instead of outputs['outputs']
: https://github.com/mozilla/DeepSpeech/blob/b6c78264ee5101c7363a6e8f36b553132451b983/DeepSpeech.py#L1778-L1781
Itâll be a tensor of shape [timesteps, batch_size, num_classes]
, where timesteps
is variable and depends on the length of the audio file, batch_size
is 1 by default, and num_classes
is the size of the used alphabet plus one (for the CTC blank label).
Thanks a lot, it seems interesting, Iâll check that out !
Hi,
The âlogitsâ variable seems to correspond to the raw predictions (or activations) of the model for each timestep, e.g. the output of the final layer of the model.
The â_â variable captures the layers of the model, and Iâm trying to figure out if these layers do also contain activations at each timestep and if so, how to extract these.
Is there a way to obtain the activations for each of the hidden layers?
Reference to the V0.6.1 code: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L890-L926
Thanks!
Using this code right after where the logits are currently obtained results in what seem to be the activations of the hidden layers (right after: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L924).
full_logits = {}
for layer_ref in _:
if layer_ref in ['input_reshaped', 'rnn_output_state', 'raw_logits']: continue
full_logits[layer_ref] = session.run(
_[layer_ref],
feed_dict={
inputs['input']: features,
inputs['input_lengths']: features_len,
inputs['previous_state_c']: previous_state_c,
inputs['previous_state_h']: previous_state_h,
}
)
The shapes of the obtained logits seem to be correct: [timesteps, n_neurons_in_layer].
Should this indeed capture the neuron activations of the hidden layers, and would this be a convenient way of obtaining them?
In Python, _
is used to denote a variable youâre not interested in, an unused value. Rather than using it directly, you should rename the variable to something else, for example in this case it could be layers
.
In addition, you donât have to loop. TensorFlow lets you fetch as many tensors from the graph as you want when doing a session.run
. Just fetch all of the layers at once, itâll be much faster.
If you look at the model definition code, youâll see that each layerâs activations are added to the layers
object (_
in your code) already, so you can just fetch them directly.