Non native english with transfer learning from V0.5.1 Model, right branch, method and discussion

That’s what I though too…

Yes, I read that you have a flag where you can choose with layer you retrain. I tried it with a comparison original Model VS Re-trained model tested on CV dataset. Not a great improvement after few epochs (training stop early because of not enough evolution of loss)… I don’t know yet if it’s because I mess up with my parameters or if it’s just that i did not enough training.
I’ll try again re-training from checkpoints of my last retraining !

Don’t know if it’s useful to you but there is a dataset which african accent french speaker http://www.openslr.org/57/

It’s always useful to know about that, I’m obviously not spending enough time on OpenSLR website :slight_smile:

If I have the idea to start from an importer like the librivox importer.Don’t know where it’ll lead and how much time it’ll take though…
Does the fact that each records of AMI corpus is ~1h30 long a problem ? I saw that it was an issue in V0.3

Yes, it’s going to be way to big. We limit ourselves in current importers to 10-15 secs max, to balance between batch size and GPU memory usage.

I have one last question, may be dumb but I have to know may be you or @josh_meyer can help me.

If I use TL, retraining only the last layer from v0.5.1 checkpoint , stops, test my model and restart retraining only the last layer. What checkpoints should I use ? Do I have to restart from v0.5.1 checkpoint ?
Or can I start from my retrained checkpoint ? If so, should I remove the --drop_source_layers 2 flags ?

Hmmm… What’s my solution then ? Find an other dataset or try to create part of 10-15secs from this with the risk to cut sentence in two ?

Is it possible with 35s ? I have an other data set with the same sentence pronounced by a lot of different accented people but the sentence is quite long :confused:

Continuing the discussion from Non native english with transfer learning from V0.5.1 Model, right branch, method and discussion:

Edit title as the discussion is more than just the choice of the right branch for transfer learning

I guess with VAD and force-alignment you should get something. We have @nicolaspanel who contributed ~182h of LibriVox as TrainingSpeech this way.

It might still be low batch size, even though it fits into GPU RAM.

Honestly I have not tested this branch for long.

Hello @caucheteux

Here’s my insight that can be useful for you of my Spanish model.

FYI here’s the branch used for the tests, is just a few days behind the current master of DeepSpeech: https://github.com/carlfm01/DeepSpeech/tree/layers-testing

To use this branch, you will need to add and read the following params:

--fine_tune It will fine-tune the transfered layers from source model or not 

--drop_source_layers  single integer for how many layers to drop from source model (to drop just output == 1, drop penultimate and output ==2, etc)')

--source_model_checkpoint_dir The path to the trained model, it will load all the layers and drop the specified layers

Thing you can’t do with the current branch, fine tune specific layers, drop specific layer and freeze specific layers.

For the following results I only drop the last layer and fine tune the others.

Total hours LR Dropout Epochs Mode Batch Size Test set WER
500 0.00012 0.24 1 Transfer Learning 12 Train-es-common voice 27%
500 0.000001 0.11 2 Transfer Learning 10 Train-es-common voice 46%
500 0.0001 0.22 6 From scratch 24 Train-es-common voice 50%

For 500h 1 epoch seems enough dropping the last layer and fine tuning the other ones.

As @lissyx mentioned I think you way to go is just fine tune the existing model with your data using a very low lr like “0.000001”

The transfer learning approach I feel is to solve the issue with different alphabets.

1 Like

Thanks Carlos !

It helps a lot ! I’ll test some of your configs

Here you talk about 500hours of data for your training and after, you test on Common-Voice data ? Just to be sure I understand it well :slight_smile:

thanks again, keep you in touch

Yes, 500h for training then test on common voice Spanish set, almost all my training set is from the same domain, so I better use common voice set to avoid biased results.

Got it ! Very helpful thanks !
And what were your machine features ? GPU ?

You need 500h because you had to transfer from english to spanish but for my accent problematic, I think I don’t need so much data as I stay in english.

meaning only the output layer ? or the output layer and the last layer ?

I just re-trained my model with CV dataset (14 epochs, --drop_source_layers 2, lr 0.00001) and the result is worse than before retraining (goes from 48% WER before to 56% after). That shows that indeed more retraining make it worse…

Next try, 1 epoch with same parameters

Azure NC instance with K80 GPU, took 1 day to complete 1 epoch.

Only the output

1 Like

What are the advantages of specifically using the transfer-learning2 branch? I ran the following command successfully from the master branch, to retrain the pretrained english model some differently-accented English voice data:

    (voice_to_text) mepstein@pop-os:~/voice_to_text/vendor/DeepSpeech$ python DeepSpeech.py \
        --n_hidden 2048 \
        --checkpoint_dir ../../deepspeech-0.5.1-checkpoint/ \
        --epochs 50 \
        --train_files ../../data/ds_csvs/train.csv \
        --dev_files ../../data/ds_csvs/val.csv \
        --test_files ../../data/ds_csvs/test.csv \
        --learning_rate 0.0001 \
        --train_batch_size 4 \
        --dev_batch_size 4 \
        --test_batch_size 4 \
        --es_steps 15 \
        --lm_binary_path models-0.5.1/lm.binary \
        --lm_trie_path models-0.5.1/trie \
        --export_dir ../../models_retrained_1
    """ 

Is the transfer learning branch for dropping or freezing certain layers, can that not be done from master? If not, is there a timetable for merging the transfer learning branch into master?

Yes

@josh_meyer might be able to answer that, I don’t know how much work that is though. Do you think we should merge it? @kdavis

As a user, it seems to me like it would be helpful to merge tf2 into master because it makes it less likely that over time tf2 will fall far behind master.

If there’s a set of issues you are aware of that the tf2 branch needs resolved before it can be merged into master, I’d be happy to take a shot at a PR to help get it there.

@josh_meyer might know …

Josh will know better, but my understanding is that there aren’t really any outstanding problems to be solved, it’s just that we didn’t want to give the impression that transfer learning is supposed to be a simple thing, just set this flag and it works. We already get people who think training DeepSpeech is supposed to be like this… But other than that, I’m not opposed to merging the code into master.

1 Like

Perhaps a big warning in documentation ? I know people might still be wrong about training and transfer learning but It can help :slight_smile:

Honestly, we already have a lot of things clearly documented and straightforward, and still people not reading it :slight_smile: