Hello @caucheteux
Here’s my insight that can be useful for you of my Spanish model.
FYI here’s the branch used for the tests, is just a few days behind the current master of DeepSpeech: https://github.com/carlfm01/DeepSpeech/tree/layers-testing
To use this branch, you will need to add and read the following params:
--fine_tune It will fine-tune the transfered layers from source model or not
--drop_source_layers single integer for how many layers to drop from source model (to drop just output == 1, drop penultimate and output ==2, etc)')
--source_model_checkpoint_dir The path to the trained model, it will load all the layers and drop the specified layers
Thing you can’t do with the current branch, fine tune specific layers, drop specific layer and freeze specific layers.
For the following results I only drop the last layer and fine tune the others.
Total hours | LR | Dropout | Epochs | Mode | Batch Size | Test set | WER |
---|---|---|---|---|---|---|---|
500 | 0.00012 | 0.24 | 1 | Transfer Learning | 12 | Train-es-common voice | 27% |
500 | 0.000001 | 0.11 | 2 | Transfer Learning | 10 | Train-es-common voice | 46% |
500 | 0.0001 | 0.22 | 6 | From scratch | 24 | Train-es-common voice | 50% |
For 500h 1 epoch seems enough dropping the last layer and fine tuning the other ones.
As @lissyx mentioned I think you way to go is just fine tune the existing model with your data using a very low lr like “0.000001”
The transfer learning approach I feel is to solve the issue with different alphabets.