Transfer learning between different languages

@daniel.cruzado I was thinking of transfer learning from English to Arabic. Does it worth trying?

1 Like

Does anyone tried this? I’m also trying to train a Spanish model.

I tried transfer learning from English to German and after removing 2 last layers and allowing fine tuning with 0.0001 I was able to get down with WER from 11,7% to 9,4%.

@reyxuan here’s my result for spanish using transfer learning Non native english with transfer learning from V0.5.1 Model, right branch, method and discussion

@carlfm01 I don’t get what you did there. Did you train with English and test with Spanish?

@Jendker Thank you very much. I’ll try that. How many hours of audio did you use?

I had about 500 hours, did not try any data augmentation though, maybe that would additionally help.

Sorry for the confusion, no, with the English model did transfer learning to Spanish using 500h of Spanish, both test and train using Spanish.

Perfect! I’ll try a model from scratch firstly and then the Transfer Learning method. How did you get 500h of Spanish? Using Data Augmentation?

No, using data from voxforge,openslr, librivox, and private speech.

You can download my 120h of clean Spanish here:https://www.kaggle.com/carlfm01/120h-spanish-speech/
From Librivox, under public domain :slight_smile:

I’d already downloaded it. Thanks! I mixed it with these datasets:

Be careful with the tedx dataset, it has a lot of grammar mistakes, and caito did not converge for me.
About the crowdsourced data, I think it needs silence trimming.

I’ve already trimmed the audio. I’ll check what you said about tedx!

Did you substitute ä=>ae and so on? If not, how were you able to restore the weights from the english model with the german alphabet having more letters and therefore more nodes in the last layer?

At transfer-learning2 there is a parametry to specify how many last layers should be dropped. If you specify at least 1 you should be fine.

I am aware of this parameter. I tried to load the english model with the drop_source_layers = 2, but it fails restoring the weights because the amount of nodes in the last layer dont accord (due to german alphabet being bigger). Have you had different experiences?

That’s interesting, I have German alphabet with 3 umlaut characters, but did not experience any problems with transfer learning. I’ll have a closer look and try to report my findings tomorrow.

Thanks, that would be very helpful. Because I suspect that the drop_source_layers parameter only drops the weights after they have been initially loaded. But the initial loading doesn’t work if the network (or alphabet in that sense) deviates.

This is the concrete error with usung the drop_source_layers flag:

deepspeech_asr_1 | E InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
deepspeech_asr_1 | E
deepspeech_asr_1 | E Assign requires shapes of both tensors to match. lhs shape= [2048,33] rhs shape= [2048,29]
deepspeech_asr_1 | E [[node save/Assign_32 (defined at DeepSpeech.py:448) ]]
deepspeech_asr_1 | E
deepspeech_asr_1 | E The checkpoint in /model/model.v0.5.1 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /model/model.v0.5.1.

It looks fine on my end, I haven’t seen similar errors as yours.

If I don’t drop any layer I get:

    Initializing model from /home/ben/Downloads/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading layer_5/bias
Loading layer_5/weights
Loading layer_6/bias
Traceback (most recent call last):
  File "/home/ben/PycharmProjects/DeepSpeech/DeepSpeech.py", line 893, in <module>
    tf.app.run(main)
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/ben/PycharmProjects/DeepSpeech/DeepSpeech.py", line 877, in main
    train()
  File "/home/ben/PycharmProjects/DeepSpeech/DeepSpeech.py", line 483, in train
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 2175, in load
    session.run(self._initializer_op, {self._initializer_op.inputs[1]: value})
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/ben/PycharmProjects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(33,)'

Process finished with exit code 1

If I drop last layer only:

Initializing model from /home/ben/Downloads/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading layer_5/bias
Loading layer_5/weights
Loading global_step
Loading beta1_power
Loading beta2_power
Loading layer_1/bias/Adam
Loading layer_1/bias/Adam_1
Loading layer_1/weights/Adam
Loading layer_1/weights/Adam_1
Loading layer_2/bias/Adam
Loading layer_2/bias/Adam_1
Loading layer_2/weights/Adam
Loading layer_2/weights/Adam_1
Loading layer_3/bias/Adam
Loading layer_3/bias/Adam_1
Loading layer_3/weights/Adam
Loading layer_3/weights/Adam_1
Loading lstm_fused_cell/kernel/Adam
Loading lstm_fused_cell/kernel/Adam_1
Loading lstm_fused_cell/bias/Adam
Loading lstm_fused_cell/bias/Adam_1
Loading layer_5/bias/Adam
Loading layer_5/bias/Adam_1
Loading layer_5/weights/Adam
Loading layer_5/weights/Adam_1
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:02 | Steps: 2 | Loss: 145.359764

So it looks fine. Which branch are you using? transfer-learning2? https://github.com/mozilla/DeepSpeech/tree/transfer-learning2

Yes I was using transfer-learning2. I was able to find the issue. I had to additionally give the parameters ‘–load init’ and ’ --source_model_checkpoint_dir /model’. Before I just gave the checkpoint_dir, which apparently wasn’t enough. I am not entirely sure why though.

Anyway, thank you very much for the help.

1 Like