Load model, get layer_5 and add extra layers

goodmorning · November 26, 2020, 1:29pm

Dear community,

I am trying to integrate a DeepSpeech trained model into a larger network. I have read the paper and the documentation several times, and also several posts here. However, I am still struggling to understand several things.
Apologies if this has been asked before or if what I say does not make sense, as I am still relatively new to this.

What I am trying to do is the following:

Load pre-trained deepspeech model (deepspeech-0.9.1-models.pbmm)
Obtain the output of layer_5
Add a CNN layer
Train

My main problem is how to load the model as a tensorflow model, so then I can add more layers and train them. I have looked into the client.py as it creates a Model using the python API, but I do not seem to be able to access any layers from the Model (I think I may have been confusing things here). I have also looked into train.py/create_model, but again I am not sure how to use that externally as if I do something like this:

from deepspeech_training.train import create_model
model = create_model(batch_x=input_tensor, batch_size=batch_size, seq_lenght=None, dropout=no_dropout)

I get a RuntimeError(“Global configuration not yet initialized.”).

I have also tried adding:

from deepspeech_training.util.config import Config, initialize_globals
initialize_globals()

But some arguments seem to be missing, and I am obviously doing something wrong…

I also have the checkpoint file, but as far as I understand I still need a model to load the values…

If anyone could give me some indications on how to load this model for the purpose I explained, it would be very appreciated. The documentation shows how to perform transfer-learning but using the command line only, which is not what I need.

Details

Platform OS

Linux

Python Environment

Python 3.7.7
Virtual env: Venv / virtualenv

othiele · November 26, 2020, 1:48pm

Thanks for writing a good post. You’ll need to get everything as you would if you were training a normal model. So no client, but look into the deepspeech_training folder. Most TF is in train.py, but maybe start following the drop_source_layersflag from util/flags.py as it drops the last layer and this might be a good starting point.

Let us know how it works, would be great to hear if a CNN layer makes stuff better

goodmorning · December 3, 2020, 3:41pm

Thank you so much @othiele!

Unfortunately, I am still struggling with this… My main worry is that I have the feeling that what I am trying to do, is something that I am not supposed to do… so my question is: As a starting point, does it make sense to try to create a model using something like this?

from deepspeech_training.train import create_model
model = create_model(batch_x=input_tensor, batch_size=batch_size, seq_lenght=None, dropout=no_dropout)

Then load checkpoint, get output of layer_5, etc…

Thank you in advance!

lissyx · December 3, 2020, 3:54pm

that’s on purpose, the goal of the api is to make use of the model, not access its internals

I’m not sure our code is easily reusable for that usecase

I doubt you really want to load the pbmm file, rather load the weights from the checkpoints

maybe you should get inspired from the export codepath, but loading the model and adding a new layer will require extensive hacking.

yes and no

it’s more that the code is not intended to this usecase, so you have to hack

we have six layers, are you intending to replace softmax with a cnn or do you really want to drop layers 5, 6 and replace that with a cnn before the softmax output?

anyway, since you are changing the shape of the network, re-using the weights will be very complicated

If you were expecting to use pbmm file to re-use the shape of the network and NOT its weights, then it’s “”“easier”"" and you should just hack into create_model i guess

reuben · December 3, 2020, 3:57pm

The easiest way to do that is by modifying the training code rather than writing everything from scratch, as you’ll have to conform to all the assumptions the training code makes. Follow Olaf’s advice and just understand how the --drop_source_layers flag works and it’ll get you an understanding of how to implement this.