Loss Not Reducing

Hi

I’m training the Deepspeech model with the following parameter, on an audio dataset of size 72hrs:27min:34sec split between train:dev:test in the ratio 60:20:20.
But the loss does not seam to reduce after 4 epochs, don’t know what is the problem, any help is much appreciated.

Parameters

#!/bin/sh
set -xe
if [ ! -f DeepSpeech.py ]; then
    echo "Please make sure you run this from DeepSpeech's top level directory."
    exit 1
fi;
python -u DeepSpeech.py \
  --train_files /datadrive/dalon/video_datasets/other-train.csv \
  --dev_files /datadrive/dalon/video_datasets/other-dev.csv \
  --test_files /datadrive/dalon/video_datasets/other-test.csv \
  --train_batch_size 16 \
  --dev_batch_size 8 \
  --test_batch_size 8 \
  --validation_step 1 \
  --display_step 5 \
  --limit_train 0 \
  --limit_dev 0 \
  --limit_test 0 \
  --n_hidden 2048 \
  --epoch 50 \
  --checkpoint_dir /datadrive/dalon/video_datasets/checkpoint/ \
  --export_dir /datadrive/dalon/video_datasets/model_export/ \
  --decoder_library_path /datadrive/dalon/DeepSpeech/native_client/libctc_decoder_with_kenlm.so \
  --alphabet_config_path /datadrive/dalon/pre_trained/models/alphabet.txt \
  --lm_binary_path /datadrive/dalon/lm_models/ds_full_lm_o5.binary \
  --lm_trie_path /datadrive/dalon/lm_models/ds_full_lm_trie \
  --fulltrace True \
  --use_warpctc True \
  --early_stop False \
  --report_count 100 \
  --use_seq_length False \
"$@"

Output
Screenshot from 2018-01-16 11-44-15

With only 72 hours, if you take 60% of it for train set, you have 43 hours. We use n_hidden=2048 with 100 times that amount of audio, so I guess you should reduce that parameter. Try much lower values, and do binary search to find appropriate width :).

1 Like

@lissyx thanks, I’ll try reducing the hidden layers

I’d also suggest adding the following hyperparameter values

...
  --dropout_rate 0.30 \
  --default_stddev 0.046875 \
  --learning_rate 0.0001 \
...

If you plan on changing n_hidden you should also adjust default_stddev in accordingly

@kdavis Could you please tell how default_stddev should depend on n_hidden?

There are various schools of thought on this.

However, generally what I’ve been guided by is Xavier initialization[1].

2 Likes