I have trained a model on my own data
1000 files and validated of 100 files 40 sec of length(each file)
(1)I am try to train a speech recognition model to recognize the digits
(2)My model is trained but showing only one transcript i.e =>“seven” for all the test files
(3) I used this blog to prepare and train my data
Tutorial How to build your homemade deepspeech model from scratch
Adapt links and params with your needs…
For my robotic project, I needed to create a small monospeaker model, with nearly 1000 sentences orders (not just single word !)
I recorded wav’s with a Respeaker Microphone Array :
Wav’s were recorder with the following params : mono / 16 bits / 16 k.
The use of the google vad lib helped me t…
please help me figure out what is wrong ?
((slow to reply) [NOT PROVIDING SUPPORT])
June 24, 2019, 12:27pm
Can you describe better your issue? I don’t understand the problem you are facing.
I have a issue of accuracy and the result
I am getting the transcript as “seven” for all my test files
((slow to reply) [NOT PROVIDING SUPPORT])
June 25, 2019, 8:40am
What is your training setup, dataset, parameters ?
I have 1000 files 40 sec of length each
100 in dev
100 in test
stereo channel 44khz
((slow to reply) [NOT PROVIDING SUPPORT])
June 25, 2019, 8:51am
First, that’s not a lot of data. Second, you don’t document training setup. Third, have you made sure you properly adapted the code to handle stereo / 44kHz ? We default to mono / 16 kHz, it’s possible there might be some changes to do.
help me to adapt code to handle steroe/44khz
I have checked document of training setup
((slow to reply) [NOT PROVIDING SUPPORT])
June 25, 2019, 9:14am
I’m asking you to share what you did …
Have a look at the codebase, search for 16000
, you will find places to update / flags to use.
this is the file I executed
set -xe
if [ ! -f DeepSpeech.py ]; then
echo “Please make sure you run this from DeepSpeech’s top level directory.”
exit 1
python -u DeepSpeech.py
–train_files /home/lucifers/Deepspeech/train/train.csv
–dev_files /home/lucifers/Deepspeech/dev/dev.csv
–test_files /home/lucifers/Deepspeech/test/test.csv
–train_batch_size 8
–dev_batch_size 8
–test_batch_size 4
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False
–export_dir /home/lucifers/Deepspeech/results/model_export/
–checkpoint_dir /home/lucifers/Deepspeech/results/checkout/
–decoder_library_path /home/lucifers/Deepspeech/tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so
–alphabet_config_path /home/lucifers/Deepspeech//alphabet.txt
–lm_binary_path /home/lucifers/Deepspeech/lm.binary
–lm_trie_path /home/lucifers/Deepspeech/trie
Where to find the codebase?
June 25, 2019, 11:11am
–n_hidden 375
Why did you select this value. It seems too small.
ok I will Change the parameter but my doubt is for a 40 sec long test file which has 20 words why the transcript is only one word i.e “seven”
(1)Is this the problem of parameter,data?
(2)or something else went wrong
June 25, 2019, 11:18am
With n_hidden equal to 375 the model is not large enough to answer your question.
ok So I should increase the hidden units
and what about hidden layers should I increase that also?
June 25, 2019, 11:40am
Just leave the layers as it is for now.
ok I will increase it to 1000 and
please let me know to adapt code to handle stereo/44khz to mono/16khz
where to look for codebase and what to change?
June 25, 2019, 2:11pm
@lissyx advice on 44KHz vs 16KHz is applicable in this case.
ok I only find the stats.py which has the values to be altered
Is it the only file or I need to change in more files?
June 26, 2019, 7:17am
Use grep. For example just looking at .py files
(.virtualenv) kdavis-19htdh:DeepSpeech kdavis$ find . -name "*.py" -exec grep 16000 {} /dev/null \;
./util/flags.py: f.DEFINE_integer('audio_sample_rate', 16000, 'sample rate value expected by model')
./bin/import_cv2.py:SAMPLE_RATE = 16000
./bin/import_fisher.py: origAudios = [librosa.load(wav_file, sr=16000, mono=False) for wav_file in wav_files]
./bin/import_swb.py: audioData, frameRate = librosa.load(temp_wav_file, sr=16000, mono=True)
./bin/import_ts.py:SAMPLE_RATE = 16000
./bin/import_cv.py:SAMPLE_RATE = 16000
./bin/import_gram_vaani.py:SAMPLE_RATE = 16000
./bin/import_lingua_libre.py:SAMPLE_RATE = 16000
./bin/import_aishell.py: durations = (df['wav_filesize'] - 44) / 16000 / 2
./examples/vad_transcriber/wavTranscriber.py: audio_length = len(audio) * (1 / 16000)
./examples/vad_transcriber/wavTranscriber.py: assert sample_rate == 16000, "Only 16000Hz input WAV files are supported for now!"
./examples/vad_transcriber/wavSplit.py: assert sample_rate in (8000, 16000, 32000)
./examples/mic_vad_streaming/mic_vad_streaming.py: RATE_PROCESS = 16000
./examples/mic_vad_streaming/mic_vad_streaming.py: """Return a block of audio data resampled to 16000hz, blocking if necessary."""
./examples/mic_vad_streaming/mic_vad_streaming.py: DEFAULT_SAMPLE_RATE = 16000
./stats.py: parser.add_argument("--sample-rate", type=int, default=16000, required=False, help="Audio sample rate")
./native_client/python/client.py: sox_cmd = 'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '.format(quote(audio_path))
./native_client/python/client.py: return 16000, np.frombuffer(output, np.int16)
./native_client/python/client.py: if fs != 16000:
./native_client/python/client.py: audio_length = fin.getnframes() * (1/16000)
./native_client/python/__init__.py: def setupStream(self, pre_alloc_frames=150, sample_rate=16000):
ok Thanks a lot
(1)So now I have change 16000 to 44100 everywhere right?
(2)Is this same procedure to change channel from mono to stereo ? in all files
((slow to reply) [NOT PROVIDING SUPPORT])
June 26, 2019, 1:32pm
Well, everywhere you might need it, not sure you have to change all the importers, nor all the examples …