Access activations of the neurons

Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?

Thanks a lot

We do release the checkpoints, so you should be able to use them :slight_smile:

Thanks for your reply. On the github page, I saw it was possible to download the pretrained model at
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

which contains the .pb graph. Is that what you are talking about?

Or is there also the model in a .ckpt format somewhere ? If yes, do you have a link ?

Thanks

The very same page, has a link, just above the one you copied, that is the checkpoints: https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-checkpoint.tar.gz

Oh I see, I was in the README page, but it is in the Releases section.
Many thanks ! :slight_smile:

If I understood well, the pb file contains the graph and the checkpoitn contains the weights of the model. But now, if I want to access the activations of the neurons, I need to put a wav file at the input. But I don’t understand where is the preprocessing of the wav file for conversion in mfcc before the input of the model.
the model.py seems to be an interface (swig) with something else I don’t know how to access.
Any idea ?

You seem to be mixing inference code (model.py ? there is no such thing) and training code. But we use SWIG to generate the Python bindings.

There is support for single-shot inference in DeepSpeech.py, but I’m not sure exactly what you want to do.

So, what do you want to do ? Which activations are you interested in ?

During training, it should all be done from util/feeding.py, check the call to audiofile_to_input_vector

In native_client/python/client.py, there is ds (deepspeech model I guess)
"ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)"
which is called later “ds.stt(audio, fs)” which is in the deepspeech module in the script “model.py”

what I want is to visualize the activation of the neurons for a given audio file. For example, taking the mean of the activation of every neuron, store them in a matrix and visualize them with an imshow()

Can you describe exactly what you mean by “activation of the neurons” ? Do you want every neuron? That’s going to be a lot of weights to deal with.

Anyway, I think you should do that by playing with the single shot inference codepath in DeepSpeech.py instead, it will be easier to hack for you.

You can write some TensorFlow code to fetch the “logits” node instead of the decoded output. You could modify the do_single_file_inference function in DeepSpeech.py to fetch 'logits' instead of outputs['outputs']: https://github.com/mozilla/DeepSpeech/blob/b6c78264ee5101c7363a6e8f36b553132451b983/DeepSpeech.py#L1778-L1781

It’ll be a tensor of shape [timesteps, batch_size, num_classes], where timesteps is variable and depends on the length of the audio file, batch_size is 1 by default, and num_classes is the size of the used alphabet plus one (for the CTC blank label).

Thanks a lot, it seems interesting, I’ll check that out !

Hi,

The ‘logits’ variable seems to correspond to the raw predictions (or activations) of the model for each timestep, e.g. the output of the final layer of the model.

The ‘_’ variable captures the layers of the model, and I’m trying to figure out if these layers do also contain activations at each timestep and if so, how to extract these.

Is there a way to obtain the activations for each of the hidden layers?

Reference to the V0.6.1 code: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L890-L926

Thanks!

Using this code right after where the logits are currently obtained results in what seem to be the activations of the hidden layers (right after: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L924).

full_logits = {}
for layer_ref in _:
    if layer_ref in ['input_reshaped', 'rnn_output_state', 'raw_logits']: continue
    full_logits[layer_ref] = session.run(
        _[layer_ref],
        feed_dict={
            inputs['input']: features,
            inputs['input_lengths']: features_len,
            inputs['previous_state_c']: previous_state_c,
            inputs['previous_state_h']: previous_state_h,
        }
    )

The shapes of the obtained logits seem to be correct: [timesteps, n_neurons_in_layer].

Should this indeed capture the neuron activations of the hidden layers, and would this be a convenient way of obtaining them?

In Python, _ is used to denote a variable you’re not interested in, an unused value. Rather than using it directly, you should rename the variable to something else, for example in this case it could be layers.

In addition, you don’t have to loop. TensorFlow lets you fetch as many tensors from the graph as you want when doing a session.run. Just fetch all of the layers at once, it’ll be much faster.

If you look at the model definition code, you’ll see that each layer’s activations are added to the layers object (_ in your code) already, so you can just fetch them directly.

1 Like