I am trying to build deepspeech model for indonesian language,
for the dataset, I used it from common voice with total of 13k hours, and 11k hours for validation.
I have tried to use this paramater to train my model.
%cd /content/DeepSpeech/ ! python3 DeepSpeech.py \ --train_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/train.csv \ --dev_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/dev.csv \ --test_files /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv \ --checkpoint_dir /content/drive/MyDrive/DeepSpeech/checkpoint_1 \ --export_dir /content/drive/MyDrive/DeepSpeech/model \ --alphabet_config_path /content/id/alphabet.txt \ --scorer data/lm/kenlm.scorer \ --train_batch_size 1 \ --test_batch_size 1 \ --n_hidden 100 \ --epochs 10 \ --utf8
And the results is very bad, I don’t know what I am doing wrong.
Test epoch | Steps: 3038 | Elapsed Time: 0:13:44
Test on /content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv - WER: 1.000000, CER: 1.000000, loss: 76.047173Best WER:
WER: 1.000000, CER: 1.000000, loss: 771.664917
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22967183.wav
- src: “cintai teman kelasmu cintai kedua orang tuamu cintai tanah airmu”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 509.124603
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_19773611.wav
- src: “dia berkata pada dirinya sendiri aku pasti bisa”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 355.355682
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20954734.wav
- src: “terima kasih untuk pertolongan anda”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 354.010773
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20287820.wav
- src: “satu hal bagi lelaki yang sudah menikah adalah jangan pernah melupakan hari perayaan pernikahan”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 332.152679
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20954648.wav
- src: “apakah saya harus membelikannya barang”
- res: “”
Median WER:
WER: 1.000000, CER: 1.000000, loss: 68.902573
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20962340.wav
- src: “di sini tempat yang sangat terkenal di jepang”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.881241
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_19967474.wav
- src: “malam ini saya tidak ingin pergi ke mana mana”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.862602
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_25221497.wav
- src: “saya yakin saya akan dapat menemukannya”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.847015
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_24976979.wav
- src: “ketika berbelanja saya menggunakan kartu”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 68.812393
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_23967336.wav
- src: “sesekali ikutlah acara kami”
- res: “”
Worst WER:
WER: 1.000000, CER: 1.000000, loss: 7.721856
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22412572.wav
- src: “keren”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 4.876138
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_25221714.wav
- src: “perhatian”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 4.278893
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22366433.wav
- src: “iya”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 2.609815
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_22528019.wav
- src: “satu”
- res: “”
WER: 1.000000, CER: 1.000000, loss: 1.907428
- wav: file:///content/id/cv-corpus-7.0-2021-07-21/id/clips/common_voice_id_20362675.wav
- src: “tidak”
- res: “”
For generating lm.binary I collect some indonesian sentences and it generated 38.130 vocabulary in vocabulary-500000.txt files.
For the alphabet here is what I got from training,test,and valid datasets.
### Reading in the following transcript files: ###
### ['/content/id/cv-corpus-7.0-2021-07-21/id/clips/train.csv', '/content/id/cv-corpus-7.0-2021-07-21/id/clips/dev.csv', '/content/id/cv-corpus-7.0-2021-07-21/id/clips/test.csv'] ###
### The following unique characters were found in your transcripts: ###
p
”
b
j
f
t
!
c
,
“
—
o
a
ł
ń
s
é
z
m
á
–
n
q
i
e
w
g
‘
’
d
'
l
x
y
v
r
h
k
u
### ^^^ You can copy-paste these into data/alphabet.txt ###
I have several question regarding training own deepspeech model.
Q1 : Is there some steps that I forget to use?
Q2 : As you can see the alphabet is messy,there are some characters,that does not belong to Indonesian. should i clean the dataset first?
Q3 : To generate lm.binary with kenlm how many sentences should i put in?
Thank you…