Access activations of the neurons

noetits · March 12, 2018, 9:43am

Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?

Thanks a lot

lissyx · March 12, 2018, 9:59am

We do release the checkpoints, so you should be able to use them

noetits · March 12, 2018, 10:35am

Thanks for your reply. On the github page, I saw it was possible to download the pretrained model at
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

which contains the .pb graph. Is that what you are talking about?

Or is there also the model in a .ckpt format somewhere ? If yes, do you have a link ?

Thanks

lissyx · March 12, 2018, 10:37am

The very same page, has a link, just above the one you copied, that is the checkpoints: https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-checkpoint.tar.gz

noetits · March 12, 2018, 10:54am

Oh I see, I was in the README page, but it is in the Releases section.
Many thanks !

noetits · March 15, 2018, 1:49pm

If I understood well, the pb file contains the graph and the checkpoitn contains the weights of the model. But now, if I want to access the activations of the neurons, I need to put a wav file at the input. But I don’t understand where is the preprocessing of the wav file for conversion in mfcc before the input of the model.
the model.py seems to be an interface (swig) with something else I don’t know how to access.
Any idea ?

lissyx · March 15, 2018, 1:53pm

You seem to be mixing inference code (model.py ? there is no such thing) and training code. But we use SWIG to generate the Python bindings.

There is support for single-shot inference in DeepSpeech.py, but I’m not sure exactly what you want to do.

So, what do you want to do ? Which activations are you interested in ?

lissyx · March 15, 2018, 2:00pm

During training, it should all be done from util/feeding.py, check the call to audiofile_to_input_vector

noetits · March 15, 2018, 5:06pm

In native_client/python/client.py, there is ds (deepspeech model I guess)
"ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)"
which is called later “ds.stt(audio, fs)” which is in the deepspeech module in the script “model.py”

what I want is to visualize the activation of the neurons for a given audio file. For example, taking the mean of the activation of every neuron, store them in a matrix and visualize them with an imshow()

lissyx · March 15, 2018, 7:53pm

Can you describe exactly what you mean by “activation of the neurons” ? Do you want every neuron? That’s going to be a lot of weights to deal with.

Anyway, I think you should do that by playing with the single shot inference codepath in DeepSpeech.py instead, it will be easier to hack for you.

reuben · March 16, 2018, 10:53am

You can write some TensorFlow code to fetch the “logits” node instead of the decoded output. You could modify the do_single_file_inference function in DeepSpeech.py to fetch 'logits' instead of outputs['outputs']: https://github.com/mozilla/DeepSpeech/blob/b6c78264ee5101c7363a6e8f36b553132451b983/DeepSpeech.py#L1778-L1781

It’ll be a tensor of shape [timesteps, batch_size, num_classes], where timesteps is variable and depends on the length of the audio file, batch_size is 1 by default, and num_classes is the size of the used alphabet plus one (for the CTC blank label).

noetits · March 16, 2018, 3:59pm

Thanks a lot, it seems interesting, I’ll check that out !

Hyno · April 16, 2020, 10:38am

Hi,

The ‘logits’ variable seems to correspond to the raw predictions (or activations) of the model for each timestep, e.g. the output of the final layer of the model.

The ‘_’ variable captures the layers of the model, and I’m trying to figure out if these layers do also contain activations at each timestep and if so, how to extract these.

Is there a way to obtain the activations for each of the hidden layers?

Reference to the V0.6.1 code: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L890-L926

Thanks!

Hyno · April 16, 2020, 11:28am

Using this code right after where the logits are currently obtained results in what seem to be the activations of the hidden layers (right after: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/DeepSpeech.py#L924).

full_logits = {}
for layer_ref in _:
    if layer_ref in ['input_reshaped', 'rnn_output_state', 'raw_logits']: continue
    full_logits[layer_ref] = session.run(
        _[layer_ref],
        feed_dict={
            inputs['input']: features,
            inputs['input_lengths']: features_len,
            inputs['previous_state_c']: previous_state_c,
            inputs['previous_state_h']: previous_state_h,
        }
    )

The shapes of the obtained logits seem to be correct: [timesteps, n_neurons_in_layer].

Should this indeed capture the neuron activations of the hidden layers, and would this be a convenient way of obtaining them?

reuben · April 16, 2020, 2:53pm

In Python, _ is used to denote a variable you’re not interested in, an unused value. Rather than using it directly, you should rename the variable to something else, for example in this case it could be layers.

In addition, you don’t have to loop. TensorFlow lets you fetch as many tensors from the graph as you want when doing a session.run. Just fetch all of the layers at once, it’ll be much faster.

If you look at the model definition code, you’ll see that each layer’s activations are added to the layers object (_ in your code) already, so you can just fetch them directly.

Topic		Replies	Views
Intermediate layers representations DeepSpeech	3	739	September 18, 2022
Getting logits as output DeepSpeech	14	1168	August 12, 2020
Output matrix from neural net DeepSpeech	3	791	March 4, 2018
Can I use pre-trained model with DeepSpeech.py? DeepSpeech	8	3776	December 27, 2019
Want to train on new dataset over the Pretrained model provided by Deepspeech DeepSpeech	23	2073	August 18, 2020

Access activations of the neurons

Related topics