I trained my model with the common voice dataset in Spanish for 10 epochs. The results both in the validation of the training and the inference that is obtained when executing:
deepspeech --model deepspeech-0.9.1-models.pbmm --scorer deepspeech-0.9.1-models.scorer --audio my_audio_file.wav
They return a blank result:
Example:
WER: 1.000000, CER: 0.864865, loss: 107.391472
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565384151948609.wav
- src: "qué peleas se agarraban entre ustedes"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.340851
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19602468.wav
- src: "sentí que cada riff estaba escrito"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.299416
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1562620039745670.wav
- src: "oyó a un grupo releyendo geografía"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.810811, loss: 107.287590
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19139609.wav
- src: "en roma estuvo en el colegio de lieja"
- res: " "
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.333333, loss: 21.902287
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1557876943950223.wav
- src: "non"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.615292
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-156452630840292.wav
- src: "rossi"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.549049
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556823942669887.wav
- src: "sisisi"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 17.611378
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565617749088932.wav
- src: "enid"
- res: " "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.800000, loss: 17.374151
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556197352940137.wav
- src: "no no"
- res: " "
--------------------------------------------------------------------------------
Training execution line:
CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py --train_files ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/train.csv --dev_files ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/dev.csv --test_files ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/test.csv --automatic_mixed_precision --alphabet_config_path ~/train_deepspeech/alphabet.txt --checkpoint_dir ~/train_deepspeech/deepspeech/checkpoints --export_dir ~/train_deepspeech/deepspeech/checkpoints/export --log_level 0 --epochs 10 --limit_test 5000
Number dataset files:
train.csv: 256522
dev.csv: 28611
test.csv: 21574
Alphabet.txt:
a
á
à
â
ä
b
c
d
e
é
è
ê
ë
f
g
h
i
í
ì
î
ï
j
k
l
m
n
ñ
o
ó
ò
ô
ö
p
q
r
s
t
u
ú
ù
û
ü
v
w
x
y
z
!
¡
?
¿
´
¨
’
“blank space”
Environment:
- deepspeech: 0.9.2
- deepspeech-training: 0.9.2
- OS Platform and Distribution: Ubuntnu 18.04
- TensorFlow installed from: 1.15.4
- TensorFlow version: 1.15.4
- Python version: 3.6
- CUDA/cuDNN version: CUDA Version 10.0.130 / CUDNN_MAJOR 7
- GPU model and memory: Tesla v100 16 GB