Trained model on my own data

I have trained a model on my own data
1000 files and validated of 100 files 40 sec of length(each file)

(1)I am try to train a speech recognition model to recognize the digits
(2)My model is trained but showing only one transcript i.e =>“seven” for all the test files
(3) I used this blog to prepare and train my data


please help me figure out what is wrong ?

Can you describe better your issue? I don’t understand the problem you are facing.

I have a issue of accuracy and the result
I am getting the transcript as “seven” for all my test files

What is your training setup, dataset, parameters ?

I have 1000 files 40 sec of length each
100 in dev
100 in test
stereo channel 44khz

First, that’s not a lot of data. Second, you don’t document training setup. Third, have you made sure you properly adapted the code to handle stereo / 44kHz ? We default to mono / 16 kHz, it’s possible there might be some changes to do.

help me to adapt code to handle steroe/44khz
I have checked document of training setup

I’m asking you to share what you did …

Have a look at the codebase, search for 16000, you will find places to update / flags to use.

this is the file I executed
#!/bin/sh
set -xe
if [ ! -f DeepSpeech.py ]; then
echo “Please make sure you run this from DeepSpeech’s top level directory.”
exit 1
fi;

python -u DeepSpeech.py
–train_files /home/lucifers/Deepspeech/train/train.csv
–dev_files /home/lucifers/Deepspeech/dev/dev.csv
–test_files /home/lucifers/Deepspeech/test/test.csv
–train_batch_size 8
–dev_batch_size 8
–test_batch_size 4
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False
–export_dir /home/lucifers/Deepspeech/results/model_export/
–checkpoint_dir /home/lucifers/Deepspeech/results/checkout/
–decoder_library_path /home/lucifers/Deepspeech/tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so
–alphabet_config_path /home/lucifers/Deepspeech//alphabet.txt
–lm_binary_path /home/lucifers/Deepspeech/lm.binary
–lm_trie_path /home/lucifers/Deepspeech/trie
“$@”

Where to find the codebase?

Why did you select this value. It seems too small.

ok I will Change the parameter but my doubt is for a 40 sec long test file which has 20 words why the transcript is only one word i.e “seven”
(1)Is this the problem of parameter,data?
(2)or something else went wrong

With n_hidden equal to 375 the model is not large enough to answer your question.

ok So I should increase the hidden units
and what about hidden layers should I increase that also?

Just leave the layers as it is for now.

ok I will increase it to 1000 and
please let me know to adapt code to handle stereo/44khz to mono/16khz
where to look for codebase and what to change?

@lissyx advice on 44KHz vs 16KHz is applicable in this case.

ok I only find the stats.py which has the values to be altered
Is it the only file or I need to change in more files?

Use grep. For example just looking at .py files

(.virtualenv) kdavis-19htdh:DeepSpeech kdavis$ find . -name "*.py" -exec grep 16000 {} /dev/null \;
./util/flags.py:    f.DEFINE_integer('audio_sample_rate', 16000, 'sample rate value expected by model')
./bin/import_cv2.py:SAMPLE_RATE = 16000
./bin/import_fisher.py:            origAudios = [librosa.load(wav_file, sr=16000, mono=False) for wav_file in wav_files]
./bin/import_swb.py:                audioData, frameRate = librosa.load(temp_wav_file, sr=16000, mono=True)
./bin/import_ts.py:SAMPLE_RATE = 16000
./bin/import_cv.py:SAMPLE_RATE = 16000
./bin/import_gram_vaani.py:SAMPLE_RATE = 16000
./bin/import_lingua_libre.py:SAMPLE_RATE = 16000
./bin/import_aishell.py:            durations = (df['wav_filesize'] - 44) / 16000 / 2
./examples/vad_transcriber/wavTranscriber.py:    audio_length = len(audio) * (1 / 16000)
./examples/vad_transcriber/wavTranscriber.py:    assert sample_rate == 16000, "Only 16000Hz input WAV files are supported for now!"
./examples/vad_transcriber/wavSplit.py:        assert sample_rate in (8000, 16000, 32000)
./examples/mic_vad_streaming/mic_vad_streaming.py:    RATE_PROCESS = 16000
./examples/mic_vad_streaming/mic_vad_streaming.py:        """Return a block of audio data resampled to 16000hz, blocking if necessary."""
./examples/mic_vad_streaming/mic_vad_streaming.py:    DEFAULT_SAMPLE_RATE = 16000
./stats.py:    parser.add_argument("--sample-rate", type=int, default=16000, required=False, help="Audio sample rate")
./native_client/python/client.py:    sox_cmd = 'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '.format(quote(audio_path))
./native_client/python/client.py:    return 16000, np.frombuffer(output, np.int16)
./native_client/python/client.py:    if fs != 16000:
./native_client/python/client.py:    audio_length = fin.getnframes() * (1/16000)
./native_client/python/__init__.py:    def setupStream(self, pre_alloc_frames=150, sample_rate=16000):

ok Thanks a lot
(1)So now I have change 16000 to 44100 everywhere right?
and
(2)Is this same procedure to change channel from mono to stereo ? in all files

Well, everywhere you might need it, not sure you have to change all the importers, nor all the examples …