Training fail with error about characters

bem0302 · April 3, 2019, 4:18pm

Hi, I’m got this error when nearly finish my training job:

Decoding predictions…

[ctc_beam_search_decoder.cpp:[ctc_beam_search_decoder.cpp:29] FATAL[ctc_beam_search_decoder.cpp:29] FATAL29] FATAL: [ctc_beam_search_decoder.cpp:29] ": [ctc_beam_search_decoder.cpp:29] “(class_dim) == (alphabet.GetSize()+1)” check failed. FATAL[ctc_beam_search_decoder.cpp:29] FATAL: : (class_dim) == (alphabet.GetSize()+1)[ctc_beam_search_decoder.cpp:29] " check failed. The shape of probs does not match with the shape of the vocabularyFATAL

The shape of probs does not match with the shape of the vocabulary: [ctc_beam_search_decoder.cpp:"29] “”

(class_dim) == (alphabet.GetSize()+1): FATAL(class_dim) == (alphabet.GetSize()+1)FATAL"" check failed. " check failed. : “: (class_dim) == (alphabet.GetSize()+1)(class_dim) == (alphabet.GetSize()+1)” check failed. The shape of probs does not match with the shape of the vocabulary

" check failed. (class_dim) == (alphabet.GetSize()+1)The shape of probs does not match with the shape of the vocabulary"The shape of probs does not match with the shape of the vocabularyThe shape of probs does not match with the shape of the vocabulary

Please help me to resolve this error.

lissyx · April 4, 2019, 5:58am

The error message is pretty obvious. Please share more details on your training.

bem0302 · April 4, 2019, 8:54am

This is list of characters in my alphabet.txt (already contain the whitespace char ’ '):
a
á
à
ả
ã
ạ
ă
ắ
ằ
ẳ
ẵ
ặ
â
ấ
ầ
ẩ
ẫ
ậ
b
c
d
đ
e
é
è
ẻ
ẽ
ẹ
ê
ế
ề
ể
ễ
ệ
g
h
i
í
ì
ỉ
ĩ
ị
k
l
m
n
o
ó
ò
õ
ỏ
ọ
ô
ố
ồ
ổ
ỗ
ộ
ơ
ớ
ờ
ở
ỡ
ợ
p
q
r
s
t
u
ú
ù
ủ
ũ
ụ
ư
ứ
ừ
ử
ữ
ự
v
x
y
ý
ỳ
ỷ
ỹ
ỵ

And this is all alphabets in 3 dataset I get when check with util/check_characters.py
train.csv:
[‘ơ’, ‘ễ’, ‘ể’, ‘è’, ‘y’, ‘ặ’, ‘ư’, ‘ứ’, ‘ầ’, ‘ẫ’, ‘ẩ’, ‘b’, ‘u’, ‘ọ’, ‘â’, ‘m’, ‘ẳ’, ‘ề’, ‘ế’, ‘ỡ’, ‘ự’, ‘c’, ‘ỗ’, ‘ổ’, ‘ử’, ‘ớ’, ‘í’, ‘t’, ‘ả’, ‘ờ’, ‘ẵ’, ‘đ’, ‘ỷ’, ‘ừ’, ‘ộ’, ‘ị’, ‘ẹ’, ‘ỉ’, ‘ở’, ‘g’, ‘ợ’, ‘ù’, ‘ồ’, ‘ủ’, ‘ó’, ‘ắ’, ‘ý’, ‘ă’, ’ ', ‘e’, ‘a’, ‘ã’, ‘ệ’, ‘ò’, ‘n’, ‘ẻ’, ‘ẽ’, ‘ê’, ‘ì’, ‘i’, ‘ằ’, ‘ấ’, ‘ạ’, ‘r’, ‘ữ’, ‘k’, ‘ỹ’, ‘ũ’, ‘á’, ‘d’, ‘ỳ’, ‘ậ’, ‘o’, ‘s’, ‘ỏ’, ‘ô’, ‘p’, ‘ĩ’, ‘à’, ‘õ’, ‘h’, ‘q’, ‘ụ’, ‘ỵ’, ‘é’, ‘v’, ‘ú’, ‘x’, ‘l’, ‘ố’]

dev.csv:
[‘x’, ‘v’, ‘ă’, ‘ỵ’, ‘c’, ‘ấ’, ‘ẳ’, ‘h’, ‘ậ’, ‘ố’, ‘ẩ’, ‘ỗ’, ‘ề’, ‘ẽ’, ‘é’, ‘ứ’, ‘ầ’, ‘s’, ‘ư’, ‘ằ’, ‘ẵ’, ‘ô’, ‘ử’, ‘ù’, ‘ũ’, ‘ữ’, ‘ĩ’, ‘r’, ‘ặ’, ‘g’, ‘ắ’, ‘ự’, ’ ', ‘p’, ‘ủ’, ‘ỡ’, ‘y’, ‘ê’, ‘b’, ‘ừ’, ‘ọ’, ‘à’, ‘ế’, ‘ộ’, ‘ở’, ‘ớ’, ‘q’, ‘í’, ‘ễ’, ‘ạ’, ‘ỏ’, ‘ỹ’, ‘d’, ‘â’, ‘ì’, ‘ỷ’, ‘ệ’, ‘ò’, ‘l’, ‘è’, ‘o’, ‘ó’, ‘đ’, ‘ồ’, ‘á’, ‘t’, ‘u’, ‘ú’, ‘ị’, ‘ỳ’, ‘ể’, ‘ẹ’, ‘ẫ’, ‘ẻ’, ‘ợ’, ‘ỉ’, ‘ả’, ‘ơ’, ‘m’, ‘n’, ‘ã’, ‘ổ’, ‘ụ’, ‘k’, ‘ý’, ‘e’, ‘ờ’, ‘õ’, ‘i’, ‘a’]

test.csv:
[‘ế’, ‘đ’, ‘ổ’, ‘ĩ’, ‘l’, ‘ả’, ‘ậ’, ‘ẹ’, ‘ạ’, ‘ý’, ‘ừ’, ‘ồ’, ‘ẫ’, ‘ẩ’, ‘ỗ’, ‘t’, ‘ặ’, ‘ấ’, ‘ở’, ‘p’, ‘ẽ’, ‘í’, ‘ự’, ‘x’, ‘ằ’, ‘s’, ‘ỡ’, ‘b’, ‘ỹ’, ‘a’, ‘ử’, ‘ủ’, ‘y’, ‘ẻ’, ‘ị’, ‘ờ’, ‘ề’, ‘e’, ‘á’, ‘ọ’, ‘ứ’, ‘ắ’, ‘ữ’, ‘k’, ‘â’, ‘ư’, ‘n’, ‘õ’, ‘é’, ‘ộ’, ‘m’, ‘ỷ’, ‘d’, ‘ỉ’, ‘r’, ‘ô’, ‘ợ’, ’ ', ‘c’, ‘v’, ‘ã’, ‘ố’, ‘ơ’, ‘u’, ‘ũ’, ‘ù’, ‘g’, ‘ụ’, ‘ó’, ‘è’, ‘ớ’, ‘ă’, ‘ê’, ‘o’, ‘ễ’, ‘ỳ’, ‘ì’, ‘ò’, ‘ẳ’, ‘ầ’, ‘ỏ’, ‘ú’, ‘q’, ‘i’, ‘h’, ‘à’, ‘ể’, ‘ệ’]

@lissyx If you need anythings eles in my training to resolve this just tell me

lissyx · April 4, 2019, 10:06am

Well, have you checked the dimensions ?

bem0302 · April 4, 2019, 11:11am

I don’t know how to check it, can you help me

lissyx · April 4, 2019, 11:15am

The shape of probs does not match with the shape of the vocabulary

So you need to check the size of your output layer with the size of your alphabet.

So, again, please explain more what you are doing. What’s this model, training dataset, language model, etc etc.

bem0302 · April 6, 2019, 4:06pm

Thank you @lissyx, I have resolved this error by create new alphabet.txt file. The last one have somethings error in there where I only have 90 characters but when I print the alphabet in evaluate.py in decoding step its was 91.

Topic		Replies	Views
ValueError DeepSpeech	35	2677	March 23, 2020
Inference error: Shapes of all inputs must match DeepSpeech	5	2635	January 22, 2020
Shape mismatch for mel_input and decoder_output TTS (Text-to-Speech)	1	426	November 1, 2019
[WaveRNN]ValueError: all input arrays must have the same shape TTS (Text-to-Speech)	4	912	June 27, 2020
Error while training alphabet, says it is missing characters DeepSpeech	19	3246	June 18, 2020

Training fail with error about characters

" check failed. (class_dim) == (alphabet.GetSize()+1)The shape of probs does not match with the shape of the vocabulary"The shape of probs does not match with the shape of the vocabularyThe shape of probs does not match with the shape of the vocabulary

Related topics