Very high error rate for this audio clip with my own model

kausthubnaarayan · March 15, 2018, 7:30am

I recorded my own voice to test out deepspeech and want the model to recognise this voice when played back to the model.

So these are the steps which i followed:

1> prepared train.csv, test.csv and dev.csv all having the following single entry:

/Users/kausthub.naarayan/speech_project/long_route.wav|1|the driver took a long route|

2> I then ran this command to train the model against the voice i had given:

python -u DeepSpeech.py \ --train_files ../new_model-2/train-2.csv \ --dev_files ../new_model-2/dev-2.csv \ --test_files ../new_model-2/test-2.csv \ --train_batch_size 80 \ --dev_batch_size 80 \ --test_batch_size 40 \ --n_hidden 375 \ --epoch 33 \ --validation_step 1 \ --early_stop True \ --earlystop_nsteps 6 \ --estop_mean_thresh 0.1 \ --estop_std_thresh 0.1 \ --dropout_rate 0.22 \ --learning_rate 0.00095 \ --report_count 100 \ --use_seq_length False \ --export_dir ../new_model-2/ \ --decoder_library_path ../libctc_decoder_with_kenlm.so \ --alphabet_config_path ../models/alphabet.txt \ --lm_binary_path ../models/lm.binary \ --lm_trie_path ../models/trie \ "$@"

I am using lm.binary and alphabet.txt and trie from the pre trained model given along with DeepSpeech.

This outputs a new model.

3> I run this audio file against this new model and existing language model by the following command:

../DeepSpeech/deepspeech output_graph.pb ../models/alphabet.txt ../models/lm.binary ../models/trie ../long_route.wav

This gives an output which is not even close to what the transcript is.

this is the result i got when i ran:

the rotototogroe

the actual transcript is:
the driver took a long route

Can anyone please help me in finding out what is the mistake i am doing ??

This is the details of the audio file:
Input File : 'long_route.wav’
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:06.41 = 102516 samples ~ 480.544 CDDA sectors
File Size : 205k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

link to my audio file: https://vocaroo.com/i/s0qUMwH3qqUF

thanks in advance

kdavis · March 15, 2018, 8:21am

What happens when you run bin/run-ldc93s1.sh?

What happens when you replace data/ldc93s1/ldc93s1.csv in bin/run-ldc93s1.sh with your .csv?

kausthubnaarayan · March 15, 2018, 9:35am

@kdavis what happens meaning ?
I ran that and it completed running.
after replacing the csv file, I ran it and it ran successfully, it didnt error out.
Not sure what to look for.

Can you please help ?

Thanks

kdavis · March 15, 2018, 10:53am

If it worked when you replaced data/ldc93s1/ldc93s1.csv with your .csv, it’s working then with your data as far as I can see.

kausthubnaarayan · March 15, 2018, 11:00am

@kdavis the model gets built.
But the output given by the speech to text has more than 60% error.
I think i am doing something wrong, but not sure what or how to improve this ??

Thanks

kdavis · March 15, 2018, 11:06am

What happens if you change the line of run-ldc93s1.sh from

...
 --epoch 50 \
...

to

...
 --epoch 100 \
...

kausthubnaarayan · March 15, 2018, 11:13am

@kdavis I wanted to understand what is the meaning of “loss” which gets printed while training the model ?? the more closer it gets to 0 is better is it ??
So more epoch’s mean more iterations and better learning ??

Topic		Replies	Views
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3254	October 8, 2019
Getting wrong output from the trained model DeepSpeech	4	768	May 25, 2019
Training Vietnamese model DeepSpeech	33	3566	May 21, 2019
DeepSpeech model training DeepSpeech	65	7986	November 12, 2019
Pre-trained model become worse when i trained common voice data DeepSpeech	15	1797	September 21, 2019

Very high error rate for this audio clip with my own model

Related topics