If you’ve found a bug, or have a feature request, then please create an issue with the following information:
- Have I written custom code (as opposed to running examples on an unmodified clone of the repository) : NO
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04) : Ubuntu 18.04
- TensorFlow installed from (our builds, or upstream TensorFlow) : Upstream tensorflow r1.15.3 (with GPU)
- TensorFlow version (use command below) : tensorflow r1.15.3 (with GPU)
- Python version : Python3
- Bazel version (if compiling from source) : tensorflow compiled from sources bazel version 0.26.1
- GCC/Compiler version (if compiling from source) : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
- CUDA/cuDNN version : CUDA 10.2 / CUDNN v7.6.5
- GPU model and memory : NVIDIA 1080 Ti (11 GB)
- Exact command to reproduce :
python3 DeepSpeech.py \
--n_hidden 2048 \
--drop_source_layers 1 \
--alphabet_config_path data/new_alphabet.txt \
--save_checkpoint_dir /data/Self/test/DeepSpeech/train_3/ \
--load_checkpoint_dir /data/Self/test/DeepSpeech/checkpoint/ \
--train_files data/clips/train.csv \
--dev_files data/clips/dev.csv \
--test_files data/clips/test.csv \
--learning_rate 0.000005 \
--use_allow_growth true \
--train_cudnn \
--epochs 20 \
--export_dir /data/Self/test/DeepSpeech/train_3/ \
--summary_dir /data/Self/test/DeepSpeech/train_3/summary \
--train_batch_size 32 \
--dev_batch_size 32 \
--test_batch_size 32 \
--export_batch_size 1 \
--dropout_rate=0.30
I’m using DeepSpeech version 0.7.3 .
I wanted to train deepspeech model with my own domain specific data. So I want to add new vocabulary such as integers, (double quote) " and period(.) to the existing deepspeech alphabets given here .
As a first step before using my own data, I wanted to do transfer learning with the common voice data and new alphabets from existing published checkpoint give here. The intention is to use this new checkpoint to start training with my own data (since it has new alphabets) rather than using the deepspeech published checkpoint (since it has limited alphabets/vocabulary).
My new alphabets are here
# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric label.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' as a label.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
'
0
1
2
3
4
5
6
7
8
9
.
"
# The last (non-comment) line needs to end with a newline.
When I used the command
python3 DeepSpeech.py \
--n_hidden 2048 \
--drop_source_layers 1 \
--alphabet_config_path data/new_alphabet.txt \
--save_checkpoint_dir /data/Self/test/DeepSpeech/train_3/ \
--load_checkpoint_dir /data/Self/test/DeepSpeech/checkpoint/ \
--train_files data/clips/train.csv \
--dev_files data/clips/dev.csv \
--test_files data/clips/test.csv \
--learning_rate 0.000005 \
--use_allow_growth true \
--train_cudnn \
--epochs 20 \
--export_dir /data/Self/test/DeepSpeech/train_3/ \
--summary_dir /data/Self/test/DeepSpeech/train_3/summary \
--train_batch_size 32 \
--dev_batch_size 32 \
--test_batch_size 32 \
--export_batch_size 1 \
--dropout_rate=0.30
My training loss is starting from lower value and increasing as the epoch is under progress.
Output from terminal is as follows:
Epoch 0 | Training | Elapsed Time: 0:48:14 | Steps: 7186 | Loss: 51.725979
Epoch 0 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 44.295110 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 44.295110 to: /data/Self/test/DeepSpeech/train_3/best_dev-739708
--------------------------------------------------------------------------------
Epoch 1 | Training | Elapsed Time: 0:48:03 | Steps: 7186 | Loss: 33.749292
Epoch 1 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 41.206871 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 41.206871 to: /data/Self/test/DeepSpeech/train_3/best_dev-746894
--------------------------------------------------------------------------------
Epoch 2 | Training | Elapsed Time: 0:48:01 | Steps: 7186 | Loss: 31.213234
Epoch 2 | Validation | Elapsed Time: 0:01:24 | Steps: 475 | Loss: 39.810643 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 39.810643 to: /data/Self/test/DeepSpeech/train_3/best_dev-754080
--------------------------------------------------------------------------------
Epoch 3 | Training | Elapsed Time: 0:48:03 | Steps: 7186 | Loss: 29.791398
Epoch 3 | Validation | Elapsed Time: 0:01:24 | Steps: 475 | Loss: 39.136365 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 39.136365 to: /data/Self/test/DeepSpeech/train_3/best_dev-761266
--------------------------------------------------------------------------------
Epoch 4 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 28.845716
Epoch 4 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 38.489472 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 38.489472 to: /data/Self/test/DeepSpeech/train_3/best_dev-768452
--------------------------------------------------------------------------------
Epoch 5 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 28.051135
Epoch 5 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 37.851685 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 37.851685 to: /data/Self/test/DeepSpeech/train_3/best_dev-775638
--------------------------------------------------------------------------------
Epoch 6 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 27.403971
Epoch 6 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 37.467827 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 37.467827 to: /data/Self/test/DeepSpeech/train_3/best_dev-782824
--------------------------------------------------------------------------------
Epoch 7 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 26.854938
Epoch 7 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 37.366411 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 37.366411 to: /data/Self/test/DeepSpeech/train_3/best_dev-790010
--------------------------------------------------------------------------------
Epoch 8 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 26.361723
Epoch 8 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 37.046090 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 37.046090 to: /data/Self/test/DeepSpeech/train_3/best_dev-797196
--------------------------------------------------------------------------------
Epoch 9 | Training | Elapsed Time: 0:48:02 | Steps: 7186 | Loss: 25.926279
Epoch 9 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 36.745385 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 36.745385 to: /data/Self/test/DeepSpeech/train_3/best_dev-804382
--------------------------------------------------------------------------------
Epoch 10 | Training | Elapsed Time: 0:48:08 | Steps: 7186 | Loss: 25.514988
Epoch 10 | Validation | Elapsed Time: 0:01:25 | Steps: 475 | Loss: 36.442261 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 36.442261 to: /data/Self/test/DeepSpeech/train_3/best_dev-811568
--------------------------------------------------------------------------------
Epoch 11 | Training | Elapsed Time: 0:48:07 | Steps: 7186 | Loss: 25.151161
Epoch 11 | Validation | Elapsed Time: 0:01:24 | Steps: 475 | Loss: 36.470721 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
--------------------------------------------------------------------------------
Epoch 12 | Training | Elapsed Time: 0:48:01 | Steps: 7186 | Loss: 24.817111
Epoch 12 | Validation | Elapsed Time: 0:01:24 | Steps: 475 | Loss: 36.185578 | Dataset: /home/tumu/Self/Research/Work/tensorflow_work/models/try/rnnt-speech-recognition/data/clips/dev.csv
I Saved new best validating model with loss 36.185578 to: /data/Self/test/DeepSpeech/train_3/best_dev-825940
I’m not getting good results with my new checkpoints (as I did the same exercise for 3 epochs and verified some results with Common speech audio files in train.tsv), seems I’m doing something wrong.
- When I add new alphabets, do I need to retrain the scorer as well ? My new alphabets are mostly (a-z, 0-9 and . , " , comma (,), semi-colon ) (few of like these) ?
- Do I need to change my training parameters to as may be I’m doing something wrong.
- I followed the procedure mentioned here. Also prepared the data using the command
bin/import_cv2.py --filter_alphabet data/new_alphabet.txt <path_to_common_speech_tsv_files>
- Do I need to perform any additional steps or drop more layers to train the model with new alphabets ( new vocabulary) ? Also I would like to start from the deeplearning provided checkpoint rather than from scratch.
Please let me know how to proceed with training a model with new alphabets. Thank you