Hi Team,
Thanks for the awesome model.
I have downloaded the checkpoints from pretrained model from “https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/deepspeech-0.4.1-checkpoint.tar.gz” and I have trained the model with my own data for additional 3 epochs using following command:
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/checkout/ --epoch -3 --train_files /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/train/train.csv --dev_files /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/dev/dev.csv --test_files /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/test/test.csv --learning_rate 0.0001 --export_dir /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/model_export/
Model training is completed successfully.
When I try the inference of an audio file using checkpoint generated using command:
python3 -u DeepSpeech.py --checkpoint_dir /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/checkout/ --one_shot_infer /media/santhosh/Data/Arctic_a0023.wav --train 0 --test 0
Following is the output:
WARNING:root:frame length (1536) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
a combination of canadian capital quickly organized and petitioned for the same privileges
This is the proper output.
But When I try the inference of an audio file using output graph generated using command:
deepspeech --audio /media/santhosh/Data/Arctic_a0023.wav --alpha
bet data/alphabet.txt --lm data/lm/lm.binary --trie data/lm/trie --model /media/santhosh/Data/SpeechRecognition/Other_languages/Gujarati/gujarati_male_english/model_export/output_graph.pb
Following is the output:
Loading model from file /media/santhosh/Data/SpeechRecognition/output_graph.pb
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-02-15 11:17:34.488296: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0837s.
Loading language model from files data/lm/lm.binary data/lm/trie
Loaded language model in 0.269s.
Warning: original sample rate (48000) is different than 16kHz. Resampling might produce erratic speech recognition.
Running inference.
2019-02-15 11:17:35.202634: W tensorflow/core/framework/allocator.cc:122] Allocation of 134217728 exceeds 10% of system memory.
2019-02-15 11:17:35.306563: W tensorflow/core/framework/allocator.cc:122] Allocation of 134217728 exceeds 10% of system memory.
2019-02-15 11:17:35.647207: W tensorflow/core/framework/allocator.cc:122] Allocation of 134217728 exceeds 10% of system memory.
2019-02-15 11:17:35.707844: W tensorflow/core/framework/allocator.cc:122] Allocation of 134217728 exceeds 10% of system memory.
2019-02-15 11:17:35.768958: W tensorflow/core/framework/allocator.cc:122] Allocation of 134217728 exceeds 10% of system memory.
a coming sun a canyon capital quickly unanatomical fie the fame of ages
Inference took 7.522s for 35.268s audio file.
Please help me out in getting correct inference from output graph. Thanks a lot.