Yes, it’s going to be way to big. We limit ourselves in current importers to 10-15 secs max, to balance between batch size and GPU memory usage.
I have one last question, may be dumb but I have to know may be you or @josh_meyer can help me.
If I use TL, retraining only the last layer from v0.5.1 checkpoint , stops, test my model and restart retraining only the last layer. What checkpoints should I use ? Do I have to restart from v0.5.1 checkpoint ?
Or can I start from my retrained checkpoint ? If so, should I remove the --drop_source_layers 2 flags ?
Hmmm… What’s my solution then ? Find an other dataset or try to create part of 10-15secs from this with the risk to cut sentence in two ?
Is it possible with 35s ? I have an other data set with the same sentence pronounced by a lot of different accented people but the sentence is quite long
Continuing the discussion from Non native english with transfer learning from V0.5.1 Model, right branch, method and discussion:
Edit title as the discussion is more than just the choice of the right branch for transfer learning
I guess with VAD and force-alignment you should get something. We have @nicolaspanel who contributed ~182h of LibriVox as TrainingSpeech this way.
It might still be low batch size, even though it fits into GPU RAM.
Honestly I have not tested this branch for long.
Hello @caucheteux
Here’s my insight that can be useful for you of my Spanish model.
FYI here’s the branch used for the tests, is just a few days behind the current master of DeepSpeech: https://github.com/carlfm01/DeepSpeech/tree/layers-testing
To use this branch, you will need to add and read the following params:
--fine_tune It will fine-tune the transfered layers from source model or not
--drop_source_layers single integer for how many layers to drop from source model (to drop just output == 1, drop penultimate and output ==2, etc)')
--source_model_checkpoint_dir The path to the trained model, it will load all the layers and drop the specified layers
Thing you can’t do with the current branch, fine tune specific layers, drop specific layer and freeze specific layers.
For the following results I only drop the last layer and fine tune the others.
Total hours | LR | Dropout | Epochs | Mode | Batch Size | Test set | WER |
---|---|---|---|---|---|---|---|
500 | 0.00012 | 0.24 | 1 | Transfer Learning | 12 | Train-es-common voice | 27% |
500 | 0.000001 | 0.11 | 2 | Transfer Learning | 10 | Train-es-common voice | 46% |
500 | 0.0001 | 0.22 | 6 | From scratch | 24 | Train-es-common voice | 50% |
For 500h 1 epoch seems enough dropping the last layer and fine tuning the other ones.
As @lissyx mentioned I think you way to go is just fine tune the existing model with your data using a very low lr like “0.000001”
The transfer learning approach I feel is to solve the issue with different alphabets.
Thanks Carlos !
It helps a lot ! I’ll test some of your configs
Here you talk about 500hours of data for your training and after, you test on Common-Voice data ? Just to be sure I understand it well
thanks again, keep you in touch
Yes, 500h for training then test on common voice Spanish set, almost all my training set is from the same domain, so I better use common voice set to avoid biased results.
Got it ! Very helpful thanks !
And what were your machine features ? GPU ?
You need 500h because you had to transfer from english to spanish but for my accent problematic, I think I don’t need so much data as I stay in english.
meaning only the output layer ? or the output layer and the last layer ?
I just re-trained my model with CV dataset (14 epochs, --drop_source_layers 2, lr 0.00001) and the result is worse than before retraining (goes from 48% WER before to 56% after). That shows that indeed more retraining make it worse…
Next try, 1 epoch with same parameters
Azure NC instance with K80 GPU, took 1 day to complete 1 epoch.
Only the output
What are the advantages of specifically using the transfer-learning2
branch? I ran the following command successfully from the master branch, to retrain the pretrained english model some differently-accented English voice data:
(voice_to_text) mepstein@pop-os:~/voice_to_text/vendor/DeepSpeech$ python DeepSpeech.py \
--n_hidden 2048 \
--checkpoint_dir ../../deepspeech-0.5.1-checkpoint/ \
--epochs 50 \
--train_files ../../data/ds_csvs/train.csv \
--dev_files ../../data/ds_csvs/val.csv \
--test_files ../../data/ds_csvs/test.csv \
--learning_rate 0.0001 \
--train_batch_size 4 \
--dev_batch_size 4 \
--test_batch_size 4 \
--es_steps 15 \
--lm_binary_path models-0.5.1/lm.binary \
--lm_trie_path models-0.5.1/trie \
--export_dir ../../models_retrained_1
"""
Is the transfer learning branch for dropping or freezing certain layers, can that not be done from master? If not, is there a timetable for merging the transfer learning branch into master?
Yes
@josh_meyer might be able to answer that, I don’t know how much work that is though. Do you think we should merge it? @kdavis
As a user, it seems to me like it would be helpful to merge tf2 into master because it makes it less likely that over time tf2 will fall far behind master.
If there’s a set of issues you are aware of that the tf2 branch needs resolved before it can be merged into master, I’d be happy to take a shot at a PR to help get it there.
Josh will know better, but my understanding is that there aren’t really any outstanding problems to be solved, it’s just that we didn’t want to give the impression that transfer learning is supposed to be a simple thing, just set this flag and it works. We already get people who think training DeepSpeech is supposed to be like this… But other than that, I’m not opposed to merging the code into master.
Perhaps a big warning in documentation ? I know people might still be wrong about training and transfer learning but It can help
Honestly, we already have a lot of things clearly documented and straightforward, and still people not reading it
Yeah I know but there is no other solution. I hope with time, people may read documentation before going straight into training or use
I’d love to share your hope, but I don’t think it will change over time. That being said, a doc PR is always welcome and is always useful to show people “hey we documented that.”, so if you have the need it’s worth sending a PR
Thanks for all the comments everyone.
Re:
I don’t know whether I’d call it “simple” or not, but in my experience transfer learning is a “standard” application of a deep-learning library. It would be great to have it in master! Or rather, I’ve already done transfer learning from master (retraining from your pretrained model checkout), but it would be great to be able to freeze layers as part of my transfer learning model development / experiments, without relying on a non-master branch.
Just my two cents. Much appreciated if you decide to merge in tf2, but also, thanks regardless for all the great work on DeepSpeech already!