Training fail with error about characters

bem0302 · April 3, 2019, 4:18pm

Hi, I’m got this error when nearly finish my training job:

Decoding predictions…

[ctc_beam_search_decoder.cpp:[ctc_beam_search_decoder.cpp:29] FATAL[ctc_beam_search_decoder.cpp:29] FATAL29] FATAL: [ctc_beam_search_decoder.cpp:29] ": [ctc_beam_search_decoder.cpp:29] “(class_dim) == (alphabet.GetSize()+1)” check failed. FATAL[ctc_beam_search_decoder.cpp:29] FATAL: : (class_dim) == (alphabet.GetSize()+1)[ctc_beam_search_decoder.cpp:29] " check failed. The shape of probs does not match with the shape of the vocabularyFATAL

The shape of probs does not match with the shape of the vocabulary: [ctc_beam_search_decoder.cpp:"29] “”

(class_dim) == (alphabet.GetSize()+1): FATAL(class_dim) == (alphabet.GetSize()+1)FATAL"" check failed. " check failed. : “: (class_dim) == (alphabet.GetSize()+1)(class_dim) == (alphabet.GetSize()+1)” check failed. The shape of probs does not match with the shape of the vocabulary

" check failed. (class_dim) == (alphabet.GetSize()+1)The shape of probs does not match with the shape of the vocabulary"The shape of probs does not match with the shape of the vocabularyThe shape of probs does not match with the shape of the vocabulary

Please help me to resolve this error.

lissyx · April 4, 2019, 5:58am

The error message is pretty obvious. Please share more details on your training.

bem0302 · April 4, 2019, 8:54am

This is list of characters in my alphabet.txt (already contain the whitespace char ’ '):
a
á
à
ả
ã
ạ
ă
ắ
ằ
ẳ
ẵ
ặ
â
ấ
ầ
ẩ
ẫ
ậ
b
c
d
đ
e
é
è
ẻ
ẽ
ẹ
ê
ế
ề
ể
ễ
ệ
g
h
i
í
ì
ỉ
ĩ
ị
k
l
m
n
o
ó
ò
õ
ỏ
ọ
ô
ố
ồ
ổ
ỗ
ộ
ơ
ớ
ờ
ở
ỡ
ợ
p
q
r
s
t
u
ú
ù
ủ
ũ
ụ
ư
ứ
ừ
ử
ữ
ự
v
x
y
ý
ỳ
ỷ
ỹ
ỵ

And this is all alphabets in 3 dataset I get when check with util/check_characters.py
train.csv:
[‘ơ’, ‘ễ’, ‘ể’, ‘è’, ‘y’, ‘ặ’, ‘ư’, ‘ứ’, ‘ầ’, ‘ẫ’, ‘ẩ’, ‘b’, ‘u’, ‘ọ’, ‘â’, ‘m’, ‘ẳ’, ‘ề’, ‘ế’, ‘ỡ’, ‘ự’, ‘c’, ‘ỗ’, ‘ổ’, ‘ử’, ‘ớ’, ‘í’, ‘t’, ‘ả’, ‘ờ’, ‘ẵ’, ‘đ’, ‘ỷ’, ‘ừ’, ‘ộ’, ‘ị’, ‘ẹ’, ‘ỉ’, ‘ở’, ‘g’, ‘ợ’, ‘ù’, ‘ồ’, ‘ủ’, ‘ó’, ‘ắ’, ‘ý’, ‘ă’, ’ ', ‘e’, ‘a’, ‘ã’, ‘ệ’, ‘ò’, ‘n’, ‘ẻ’, ‘ẽ’, ‘ê’, ‘ì’, ‘i’, ‘ằ’, ‘ấ’, ‘ạ’, ‘r’, ‘ữ’, ‘k’, ‘ỹ’, ‘ũ’, ‘á’, ‘d’, ‘ỳ’, ‘ậ’, ‘o’, ‘s’, ‘ỏ’, ‘ô’, ‘p’, ‘ĩ’, ‘à’, ‘õ’, ‘h’, ‘q’, ‘ụ’, ‘ỵ’, ‘é’, ‘v’, ‘ú’, ‘x’, ‘l’, ‘ố’]

dev.csv:
[‘x’, ‘v’, ‘ă’, ‘ỵ’, ‘c’, ‘ấ’, ‘ẳ’, ‘h’, ‘ậ’, ‘ố’, ‘ẩ’, ‘ỗ’, ‘ề’, ‘ẽ’, ‘é’, ‘ứ’, ‘ầ’, ‘s’, ‘ư’, ‘ằ’, ‘ẵ’, ‘ô’, ‘ử’, ‘ù’, ‘ũ’, ‘ữ’, ‘ĩ’, ‘r’, ‘ặ’, ‘g’, ‘ắ’, ‘ự’, ’ ', ‘p’, ‘ủ’, ‘ỡ’, ‘y’, ‘ê’, ‘b’, ‘ừ’, ‘ọ’, ‘à’, ‘ế’, ‘ộ’, ‘ở’, ‘ớ’, ‘q’, ‘í’, ‘ễ’, ‘ạ’, ‘ỏ’, ‘ỹ’, ‘d’, ‘â’, ‘ì’, ‘ỷ’, ‘ệ’, ‘ò’, ‘l’, ‘è’, ‘o’, ‘ó’, ‘đ’, ‘ồ’, ‘á’, ‘t’, ‘u’, ‘ú’, ‘ị’, ‘ỳ’, ‘ể’, ‘ẹ’, ‘ẫ’, ‘ẻ’, ‘ợ’, ‘ỉ’, ‘ả’, ‘ơ’, ‘m’, ‘n’, ‘ã’, ‘ổ’, ‘ụ’, ‘k’, ‘ý’, ‘e’, ‘ờ’, ‘õ’, ‘i’, ‘a’]

test.csv:
[‘ế’, ‘đ’, ‘ổ’, ‘ĩ’, ‘l’, ‘ả’, ‘ậ’, ‘ẹ’, ‘ạ’, ‘ý’, ‘ừ’, ‘ồ’, ‘ẫ’, ‘ẩ’, ‘ỗ’, ‘t’, ‘ặ’, ‘ấ’, ‘ở’, ‘p’, ‘ẽ’, ‘í’, ‘ự’, ‘x’, ‘ằ’, ‘s’, ‘ỡ’, ‘b’, ‘ỹ’, ‘a’, ‘ử’, ‘ủ’, ‘y’, ‘ẻ’, ‘ị’, ‘ờ’, ‘ề’, ‘e’, ‘á’, ‘ọ’, ‘ứ’, ‘ắ’, ‘ữ’, ‘k’, ‘â’, ‘ư’, ‘n’, ‘õ’, ‘é’, ‘ộ’, ‘m’, ‘ỷ’, ‘d’, ‘ỉ’, ‘r’, ‘ô’, ‘ợ’, ’ ', ‘c’, ‘v’, ‘ã’, ‘ố’, ‘ơ’, ‘u’, ‘ũ’, ‘ù’, ‘g’, ‘ụ’, ‘ó’, ‘è’, ‘ớ’, ‘ă’, ‘ê’, ‘o’, ‘ễ’, ‘ỳ’, ‘ì’, ‘ò’, ‘ẳ’, ‘ầ’, ‘ỏ’, ‘ú’, ‘q’, ‘i’, ‘h’, ‘à’, ‘ể’, ‘ệ’]

@lissyx If you need anythings eles in my training to resolve this just tell me

lissyx · April 4, 2019, 10:06am

Well, have you checked the dimensions ?

bem0302 · April 4, 2019, 11:11am

I don’t know how to check it, can you help me

lissyx · April 4, 2019, 11:15am

The shape of probs does not match with the shape of the vocabulary

So you need to check the size of your output layer with the size of your alphabet.

So, again, please explain more what you are doing. What’s this model, training dataset, language model, etc etc.

bem0302 · April 6, 2019, 4:06pm

Thank you @lissyx, I have resolved this error by create new alphabet.txt file. The last one have somethings error in there where I only have 90 characters but when I print the alphabet in evaluate.py in decoding step its was 91.