I started training from scratch but it is giving error while exporting the model

dimanshu.jatav · April 22, 2020, 12:51pm

i used this

./DeepSpeech.py --train_files my-train.csv --dev_files my-dev.csv  --epochs 3  --save_checkpoint_dir ../checkpoint/ --train_cudnn true --export_dir ../checkpoint/ --alphabet_config_path /home/dimanshu/alpha.txt

output is -

I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:01:04 | Steps: 410 | Loss: 114.940936                                                                                                                                             
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 152.060320 | Dataset: my-dev.csv                                                                                                                        
I Saved new best validating model with loss 152.060320 to: ../checkpoint/best_dev-410
Epoch 1 |   Training | Elapsed Time: 0:01:00 | Steps: 410 | Loss: 111.319498                                                                                                                                             
Epoch 1 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 144.299709 | Dataset: my-dev.csv                                                                                                                        
I Saved new best validating model with loss 144.299709 to: ../checkpoint/best_dev-820
Epoch 2 |   Training | Elapsed Time: 0:00:42 | Steps: 336 | Loss: 110.281201                                                                                                                                             Epoch 2 |   Training | Elapsed Time: 0:00:42 | Steps: 337 | Loss: 110.260607                                                                                        Epoch 2 |   Training | Elapsed Time: 0:01:00 | Steps: 410 | Loss: 111.206717                                                                                                                                             
Epoch 2 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 141.602309 | Dataset: my-dev.csv                                                                                                                        
I Saved new best validating model with loss 141.602309 to: ../checkpoint/best_dev-1230
I FINISHED optimization in 0:03:18.015672
I Exporting the model...
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
E All initialization methods failed (['best', 'last']).

reuben · April 22, 2020, 12:57pm

When loading checkpoints, the code respects the --load_checkpoint_dir flag. When saving, it respects the --save_checkpoint_dir flag. You should be able to run again with --load_checkpoint_dir and the export flags, and it’ll pick up the checkpoint saved during training.

dimanshu.jatav · April 22, 2020, 1:07pm

thanks @reuben you solved my problem

dimanshu.jatav · April 22, 2020, 1:07pm

there is one more problem on different issue :-

can you please look into this

dimanshu.jatav · April 23, 2020, 8:11am

hey @reuben
i trainined on existing model ;-

./DeepSpeech.py --n_hidden 2048 --save_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --load_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --epochs 100 --train_files my-train.csv --dev_files my-dev.csv --test_files my-test.csv --learning_rate 0.0001 --train_cudnn true --alphabet_config_path /home/dimanshu/alpha.txt --export_dir /home/dimanshu/latestcheckpoiint/checkpoint



but after completing  the  training when im checking with sample data it is showing no result

--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 137.519836
 - wav: file:///home/dimanshu/mydatadeepspeech/youtube-course-1/final_sound/5c45ebc9-8e10-4079-9a03-0688fbc3b96c.wav
 - src: "every literals this called axiom now in"
 - res: ""


Loading model from file /home/dimanshu/latestcheckpoiint/checkpoint/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-04-23 07:58:18.297305: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.146s.
Running inference.
  
Inference took 4.137s for 2.490s audio file.

dimanshu.jatav · April 23, 2020, 8:11am

it is not showing any result

othiele · April 23, 2020, 8:33am

It looks like you have very few input files. How many hours do you use for input and what do you what do you want to do with the model?

dimanshu.jatav · April 27, 2020, 4:32am

training data =20 k files
dev = 4k
test =1.5k

@lissyx suggested me to fine tune my model

python3 DeepSpeech.py --drop_source_layers 1 --alphabet_config_path /home/dimanshu/alpha.txt  --load_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint  --save_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint/ --train_files train.csv   --test_files test.csv --dev_files dev.csv --train_cudnn true --export_dir /home/dimanshu/best_path

one epoch is taking 40mins to complete

Epoch 0 |   Training | Elapsed Time: 0:38:34 | Steps: 18999 | Loss: 75.033748                                                                                                                                            
Epoch 0 | Validation | Elapsed Time: 0:04:26 | Steps: 4852 | Loss: 95.878292 | Dataset: dev.csv                                                                                                                          
I Saved new best validating model with loss 95.878292 to: /home/dimanshu/latestcheckpoiint/checkpoint/best_dev-404775
Epoch 1 |   Training | Elapsed Time: 0:38:28 | Steps: 18999 | Loss: 75.318631                                                                                                                                            
Epoch 1 | Validation | Elapsed Time: 0:04:26 | Steps: 4852 | Loss: 95.129001 | Dataset: dev.csv

done with 75 epochs

WER: 1.000000, CER: 1.000000, loss: 194.206146
 - wav: file:///home/dimanshu/mydatadeepspeech/youtube-course-1/final_sound/d5b17000-a6d4-416d-8d97-5f4852536d8a.wav
 - src: "language C++ Java PHP Python JavaScript"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 183.460220
 - wav: file:///home/dimanshu/mydatadeepspeech/youtube-course-1/final_sound/78ec2f60-1272-40c2-8385-a14684178572.wav
 - src: "when I use Moe followed by an underscore"
 - res: ""

why result is still blank ?

othiele · April 27, 2020, 6:52am

What GPU are you using?

What language are you training?

What is your alphabet file like and is it the same for training/testing?

Use train and dev batch sizes.

Train from scratch not from checkpoint.

othiele · April 27, 2020, 6:53am

And 20k files is typically not enough for a full language model. 200k is more like it.

dimanshu.jatav · April 27, 2020, 6:56am

GPU = tesla t4 but now im using v100
language = english
alphabet file =capital and small letter , special character and numbers .
train and dev batch size =default
it will take too much time to train from scratch so that’s why i’m training over the existing latest checkpoint releases v0.6.1

after 75epochs
Loading model from file …/best_path/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-04-27 05:04:41.270571: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 1.61s.
Running inference.
n l sth js

expected output=am global to make sure that it just
output from the model =n l sth js

how much more data and training is required .

othiele · April 27, 2020, 12:12pm

If you are running 20k files at 6 secs each, it should take 5 mins or so per epoch. So increase the batch size and that should work.

Judging from the output you are not using GPU? Either way, check that you are and that you get about 5 min per epoch with batch = 64 or so.

Typically the alphabet is just letters, no special chars and no number. Check num2words for that.

Maybe check what this repo does and you’ll get decent results:

dimanshu.jatav · April 28, 2020, 4:45am

i used this :-
./DeepSpeech.py --n_hidden 2048 --save_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --load_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --epochs 5 --train_files train.csv --test_files test.csv --dev_files --learning_rate 0.0001 --train_cudnn true --alphabet_config_path /home/dimanshu/alpha.txt --export_dir /home/dimanshu/best_path/ --train_batch_size 64 --test_batch_size 64 --dev_batch_size 64

how to reduce the more loss
what are the parameters that i have to change dropout layer or anything else ?

3)how to see WER after every epoch ?

lissyx · April 28, 2020, 6:56am

You don’t want that.

Your data, your knowledge, we can’t teach you.

Depends on how your networks learns. Again, your data, your training, your knowledge.

othiele · April 28, 2020, 8:09am

Adding to @lissyx try a dropout of 0.4, but I as I said you may need about 200k to get a better WER. Your results are ok for that amount of data if the language is quite diverse.

dimanshu.jatav · May 1, 2020, 12:50pm

@othiele i have some questions
im having dataset of = 80k

and when i started training it is showing this result

why training and validation loss is increasing ?
after completing this if i start the training again then it will resume from best check path which is on 4th epoch so there is no point after 4th epoch.?
my dataset consist of only alphabet and number
= http://34.83.214.234/show/meta-train.csv
how to train this so that validation loss will increase. m i doing something wrong ?
and i can’t see WER

./DeepSpeech.py --n_hidden 2048 --save_checkpoint_dir /home/dimanshu/latestcheckpoiint/load --load_checkpoint_dir /home/dimanshu/latestcheckpoiint/load --epochs 100 --train_files train.csv --test_files test.csv --dev_files dev.csv–learning_rate 0.0001 --train_cudnn true --alphabet_config_path /home/dimanshu/alpha.txt --export_dir /home/dimanshu/best_path --train_batch_size 64 --dev_batch_size 64 --test_batch_size 64

after completing the training the result of test.csv will be null
** wer:1 **
res:""

othiele · May 1, 2020, 12:59pm

Because validation changes the hyperparameters.
Training didn’t get any better, therefore 4th.
Link is dead …
Please read my comment above, I can’t help you if you don’t.

dimanshu.jatav · May 1, 2020, 1:00pm

http://34.83.214.234/show/meta_train.csv

dimanshu.jatav · May 1, 2020, 1:02pm

yes i will add more data to make it 200k .
1)so i have to fine tune my model first then start the training ?and i will use dropout = 0.4 and learning rate= 0.0001

othiele · May 1, 2020, 1:21pm

Ah, you can read The dropout should get you a lot further.

Topic		Replies	Views
Want to train on new dataset over the Pretrained model provided by Deepspeech DeepSpeech	23	2070	August 18, 2020
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3264	October 8, 2019
Bad performance on checkpoint DeepSpeech	4	467	June 19, 2020
Pre-trained model become worse when i trained common voice data DeepSpeech	15	1812	September 21, 2019
Running Deepspeech 0.7.4 on Google Commands Dataset DeepSpeech	24	1159	July 24, 2020

I started training from scratch but it is giving error while exporting the model

Related topics