High WER CER LOSS training own model

mustain5 · September 12, 2021, 4:57am

I am trying to build deepspeech model for indonesian language,
for the dataset, I used it from common voice with total of 13k hours, and 11k hours for validation.

I have tried to use this paramater to train my model.
%cd /content/DeepSpeech/ ! python3 DeepSpeech.py \ --train_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/train.csv \ --dev_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/dev.csv \ --test_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv \ --checkpoint_dir /content/drive/MyDrive/DeepSpeech/checkpoint_1 \ --export_dir /content/drive/MyDrive/DeepSpeech/model \ --alphabet_config_path /content/id/alphabet.txt \ --scorer data/lm/kenlm.scorer \ --train_batch_size 1 \ --test_batch_size 1 \ --n_hidden 100 \ --epochs 10 \ --utf8

And the results is very bad, I don’t know what I am doing wrong.

Test epoch | Steps: 3038 | Elapsed Time: 0:13:44
Test on /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv - WER: 1.000000, CER: 1.000000, loss: 76.047173

Best WER:

WER: 1.000000, CER: 1.000000, loss: 771.664917

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22967183.wav

src: “cintai teman kelasmu cintai kedua orang tuamu cintai tanah airmu”

res: “”

WER: 1.000000, CER: 1.000000, loss: 509.124603

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_19773611.wav

src: “dia berkata pada dirinya sendiri aku pasti bisa”

res: “”

WER: 1.000000, CER: 1.000000, loss: 355.355682

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20954734.wav

src: “terima kasih untuk pertolongan anda”

res: “”

WER: 1.000000, CER: 1.000000, loss: 354.010773

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20287820.wav

src: “satu hal bagi lelaki yang sudah menikah adalah jangan pernah melupakan hari perayaan pernikahan”

res: “”

WER: 1.000000, CER: 1.000000, loss: 332.152679

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20954648.wav

src: “apakah saya harus membelikannya barang”

res: “”

Median WER:

WER: 1.000000, CER: 1.000000, loss: 68.902573

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20962340.wav

src: “di sini tempat yang sangat terkenal di jepang”

res: “”

WER: 1.000000, CER: 1.000000, loss: 68.881241

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_19967474.wav

src: “malam ini saya tidak ingin pergi ke mana mana”

res: “”

WER: 1.000000, CER: 1.000000, loss: 68.862602

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_25221497.wav

src: “saya yakin saya akan dapat menemukannya”

res: “”

WER: 1.000000, CER: 1.000000, loss: 68.847015

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_24976979.wav

src: “ketika berbelanja saya menggunakan kartu”

res: “”

WER: 1.000000, CER: 1.000000, loss: 68.812393

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_23967336.wav

src: “sesekali ikutlah acara kami”

res: “”

Worst WER:

WER: 1.000000, CER: 1.000000, loss: 7.721856

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22412572.wav

src: “keren”

res: “”

WER: 1.000000, CER: 1.000000, loss: 4.876138

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_25221714.wav

src: “perhatian”

res: “”

WER: 1.000000, CER: 1.000000, loss: 4.278893

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22366433.wav

src: “iya”

res: “”

WER: 1.000000, CER: 1.000000, loss: 2.609815

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22528019.wav

src: “satu”

res: “”

WER: 1.000000, CER: 1.000000, loss: 1.907428

wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20362675.wav

src: “tidak”

res: “”

For generating lm.binary I collect some indonesian sentences and it generated 38.130 vocabulary in vocabulary-500000.txt files.

For the alphabet here is what I got from training,test,and valid datasets.

### Reading in the following transcript files: ###
### ['/content/id/cv-corpus-7.0-2021-07-21/id/clips/train.csv', '/content/id/cv-corpus-7.0-2021-07-21/id/clips/dev.csv', '/content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv'] ###
### The following unique characters were found in your transcripts: ###
p
”
b
j
f
t
！
c
，
“
—
o
a
ł
ń
s
é
z
m
á
–
n
q
i
e
w
g
‘
’
 
d
'
l
x
y
v
r
h
k
u
### ^^^ You can copy-paste these into data/alphabet.txt ###

I have several question regarding training own deepspeech model.
Q1 : Is there some steps that I forget to use?
Q2 : As you can see the alphabet is messy,there are some characters,that does not belong to Indonesian. should i clean the dataset first?
Q3 : To generate lm.binary with kenlm how many sentences should i put in?

Thank you…

Topic		Replies	Views
Saved new best validating model have worst LOSS value when re-training DeepSpeech	3	318	November 20, 2020
Train Fench common voice data set DeepSpeech	9	1003	April 12, 2019
How to add language model like lm_binary_path parameter before DeepSpeech	3	630	January 27, 2021
Does anyone got a good result when training the Common Voice data set? DeepSpeech	6	2093	March 14, 2019
Testing result not good? DeepSpeech	10	310	February 22, 2020

High WER CER LOSS training own model

Test epoch | Steps: 3038 | Elapsed Time: 0:13:44 Test on /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv - WER: 1.000000, CER: 1.000000, loss: 76.047173

Best WER:

Median WER:

Worst WER:

Related topics

Test epoch | Steps: 3038 | Elapsed Time: 0:13:44
Test on /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv - WER: 1.000000, CER: 1.000000, loss: 76.047173