Transfer learning between different languages

At transfer-learning2 there is a parametry to specify how many last layers should be dropped. If you specify at least 1 you should be fine.

I am aware of this parameter. I tried to load the english model with the drop_source_layers = 2, but it fails restoring the weights because the amount of nodes in the last layer dont accord (due to german alphabet being bigger). Have you had different experiences?

That’s interesting, I have German alphabet with 3 umlaut characters, but did not experience any problems with transfer learning. I’ll have a closer look and try to report my findings tomorrow.

Thanks, that would be very helpful. Because I suspect that the drop_source_layers parameter only drops the weights after they have been initially loaded. But the initial loading doesn’t work if the network (or alphabet in that sense) deviates.

This is the concrete error with usung the drop_source_layers flag:

deepspeech_asr_1 | E InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
deepspeech_asr_1 | E
deepspeech_asr_1 | E Assign requires shapes of both tensors to match. lhs shape= [2048,33] rhs shape= [2048,29]
deepspeech_asr_1 | E [[node save/Assign_32 (defined at ]]
deepspeech_asr_1 | E
deepspeech_asr_1 | E The checkpoint in /model/model.v0.5.1 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /model/model.v0.5.1.

It looks fine on my end, I haven’t seen similar errors as yours.

If I don’t drop any layer I get:

    Initializing model from /home/ben/Downloads/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading layer_5/bias
Loading layer_5/weights
Loading layer_6/bias
Traceback (most recent call last):
  File "/home/ben/PycharmProjects/DeepSpeech/", line 893, in <module>
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/platform/", line 125, in run
  File "/home/ben/PycharmProjects/DeepSpeech/", line 877, in main
  File "/home/ben/PycharmProjects/DeepSpeech/", line 483, in train
    v.load(ckpt.get_tensor(, session=session)
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/ops/", line 2175, in load, {self._initializer_op.inputs[1]: value})
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/client/", line 929, in run
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/client/", line 1128, in _run
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(33,)'

Process finished with exit code 1

If I drop last layer only:

Initializing model from /home/ben/Downloads/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading layer_5/bias
Loading layer_5/weights
Loading global_step
Loading beta1_power
Loading beta2_power
Loading layer_1/bias/Adam
Loading layer_1/bias/Adam_1
Loading layer_1/weights/Adam
Loading layer_1/weights/Adam_1
Loading layer_2/bias/Adam
Loading layer_2/bias/Adam_1
Loading layer_2/weights/Adam
Loading layer_2/weights/Adam_1
Loading layer_3/bias/Adam
Loading layer_3/bias/Adam_1
Loading layer_3/weights/Adam
Loading layer_3/weights/Adam_1
Loading lstm_fused_cell/kernel/Adam
Loading lstm_fused_cell/kernel/Adam_1
Loading lstm_fused_cell/bias/Adam
Loading lstm_fused_cell/bias/Adam_1
Loading layer_5/bias/Adam
Loading layer_5/bias/Adam_1
Loading layer_5/weights/Adam
Loading layer_5/weights/Adam_1
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:02 | Steps: 2 | Loss: 145.359764

So it looks fine. Which branch are you using? transfer-learning2?

Yes I was using transfer-learning2. I was able to find the issue. I had to additionally give the parameters ‘–load init’ and ’ --source_model_checkpoint_dir /model’. Before I just gave the checkpoint_dir, which apparently wasn’t enough. I am not entirely sure why though.

Anyway, thank you very much for the help.

1 Like

I have a hindi-english mixed data having transcript in roman script with 8k sample rate, previously trained a 0.5.1 on that data, and now trying to add new data by using transfer-learning2 branch, and I am unable to load checkpoints and getting this error:

“Key lstm_fused_cell/bias not found in checkpoint, [[node save/RestoreV2 (defined at ./ ]]”

The checkpoints are working when checkpoint weights fine-tuned using DeepSpeech 0.5.1 branch. But I am getting this error when I use transfer-learning2 branch.

Why do you need to do transfer learning if you are just adding more data? Just use the checkpoints and continue training.

Actually the new data is of bit different vertical with some new keywords, and is about 300 hrs. When I fine tuned that checkpoints the validation loss remain the same as i got previuosly. So I thought, may be my weight are not updating much bcz I tried to update all the layers weight and it does not change my training and validation loss, while my wer has increased a bit.

Thats why tried to use transfer learning.

I haveI have not seen such an error, maybe someone else would give you a hint, but in order to get it right you should definitely try out the current master. It has merged transfer learning now.

ok, thanks for responding.
My base model was trained on 0.5.1, Now if I switched to master branch I wont able to use the previous checkpoints and I have to train it from scrach which will take time.

btw Could you please tell in what case, transfer learning is usefull. like

  1. If my base model is of same domain (hinglish) but i want it to train for different vertical
  2. If I have a totally diffrent model (like mozilla english), and I use transfer learning to fine tune mozila model on my data (hinglish).


I am trying transfer learning on my Hinglish data, using deepspeech 0.7.1 English checkpoints.
can you tell me what are the parameters you have given like dropout,batch_size, epochs, total hour and from which mode you were getting WER 11,7 %( 1.17 WER)

11,7% WER is 0.117 WER. And this is the result which I got in learning from scratch with my data (it was around 600 hours), but I think the test set was not that representative so the results should have been worse.

Besides I used the default training values given for the specific release

Could someone please post the parameters used for the transfer learning training? I’ve tried and while phonetically the output seems close to the actual audio file the words are just scrambled letters.
I’ve seen that in normal training i can pass --lm_binary_path and --lm_trie_path for my vocabulary. Is it possible to pass this arguments also in transfer learning?

Scorer will only have a play at the test step

Thanks for the response. In order to build a general purpose scorer for another language ( as deepspeech-0.7.3-models.scorer is for English - Our KenLM language model was generated from the LibriSpeech normalized LM training text ) we would have to include a custom vocabulary of as many words in that language as possible? Is there any further documentation or information on how to build such a collection? Thank you!

1 Like

What is the question here? Source data is a text file. Sourcing those data depends on your language I can’t tell you how …

But many people take a Wikipedia dump and use that, maybe this could be a start.

It was a bit unclear for me whether the scorer is just a list of all possible words in the language, as a dictionary or it also contains regular used expressions and other as such. Now I understand that it’s the latter but I couldn’t find any material which describes in more detail how to make a generic language scorer and I read all mentioned resources I could get my hands on.
Your answers did help me, thanks! I ll check out Wikipedia as a starting point

You can find all about building a custom scorer in the docs