Training fail with error about characters

Hi, I’m got this error when nearly finish my training job:


Decoding predictions…

[ctc_beam_search_decoder.cpp:[ctc_beam_search_decoder.cpp:29] FATAL[ctc_beam_search_decoder.cpp:29] FATAL29] FATAL: [ctc_beam_search_decoder.cpp:29] ": [ctc_beam_search_decoder.cpp:29] “(class_dim) == (alphabet.GetSize()+1)” check failed. FATAL[ctc_beam_search_decoder.cpp:29] FATAL: : (class_dim) == (alphabet.GetSize()+1)[ctc_beam_search_decoder.cpp:29] " check failed. The shape of probs does not match with the shape of the vocabularyFATAL

The shape of probs does not match with the shape of the vocabulary: [ctc_beam_search_decoder.cpp:"29] “”

(class_dim) == (alphabet.GetSize()+1): FATAL(class_dim) == (alphabet.GetSize()+1)FATAL"" check failed. " check failed. : “: (class_dim) == (alphabet.GetSize()+1)(class_dim) == (alphabet.GetSize()+1)” check failed. The shape of probs does not match with the shape of the vocabulary

" check failed. (class_dim) == (alphabet.GetSize()+1)The shape of probs does not match with the shape of the vocabulary"The shape of probs does not match with the shape of the vocabularyThe shape of probs does not match with the shape of the vocabulary

Please help me to resolve this error.

The error message is pretty obvious. Please share more details on your training.

This is list of characters in my alphabet.txt (already contain the whitespace char ’ '):
a
á
à

ã

ă





â





b
c
d
đ
e
é
è



ê
ế




g
h
i
í
ì

ĩ

k
l
m
n
o
ó
ò
õ


ô





ơ





p
q
r
s
t
u
ú
ù

ũ

ư





v
x
y
ý



And this is all alphabets in 3 dataset I get when check with util/check_characters.py
train.csv:
[‘ơ’, ‘ễ’, ‘ể’, ‘è’, ‘y’, ‘ặ’, ‘ư’, ‘ứ’, ‘ầ’, ‘ẫ’, ‘ẩ’, ‘b’, ‘u’, ‘ọ’, ‘â’, ‘m’, ‘ẳ’, ‘ề’, ‘ế’, ‘ỡ’, ‘ự’, ‘c’, ‘ỗ’, ‘ổ’, ‘ử’, ‘ớ’, ‘í’, ‘t’, ‘ả’, ‘ờ’, ‘ẵ’, ‘đ’, ‘ỷ’, ‘ừ’, ‘ộ’, ‘ị’, ‘ẹ’, ‘ỉ’, ‘ở’, ‘g’, ‘ợ’, ‘ù’, ‘ồ’, ‘ủ’, ‘ó’, ‘ắ’, ‘ý’, ‘ă’, ’ ', ‘e’, ‘a’, ‘ã’, ‘ệ’, ‘ò’, ‘n’, ‘ẻ’, ‘ẽ’, ‘ê’, ‘ì’, ‘i’, ‘ằ’, ‘ấ’, ‘ạ’, ‘r’, ‘ữ’, ‘k’, ‘ỹ’, ‘ũ’, ‘á’, ‘d’, ‘ỳ’, ‘ậ’, ‘o’, ‘s’, ‘ỏ’, ‘ô’, ‘p’, ‘ĩ’, ‘à’, ‘õ’, ‘h’, ‘q’, ‘ụ’, ‘ỵ’, ‘é’, ‘v’, ‘ú’, ‘x’, ‘l’, ‘ố’]

dev.csv:
[‘x’, ‘v’, ‘ă’, ‘ỵ’, ‘c’, ‘ấ’, ‘ẳ’, ‘h’, ‘ậ’, ‘ố’, ‘ẩ’, ‘ỗ’, ‘ề’, ‘ẽ’, ‘é’, ‘ứ’, ‘ầ’, ‘s’, ‘ư’, ‘ằ’, ‘ẵ’, ‘ô’, ‘ử’, ‘ù’, ‘ũ’, ‘ữ’, ‘ĩ’, ‘r’, ‘ặ’, ‘g’, ‘ắ’, ‘ự’, ’ ', ‘p’, ‘ủ’, ‘ỡ’, ‘y’, ‘ê’, ‘b’, ‘ừ’, ‘ọ’, ‘à’, ‘ế’, ‘ộ’, ‘ở’, ‘ớ’, ‘q’, ‘í’, ‘ễ’, ‘ạ’, ‘ỏ’, ‘ỹ’, ‘d’, ‘â’, ‘ì’, ‘ỷ’, ‘ệ’, ‘ò’, ‘l’, ‘è’, ‘o’, ‘ó’, ‘đ’, ‘ồ’, ‘á’, ‘t’, ‘u’, ‘ú’, ‘ị’, ‘ỳ’, ‘ể’, ‘ẹ’, ‘ẫ’, ‘ẻ’, ‘ợ’, ‘ỉ’, ‘ả’, ‘ơ’, ‘m’, ‘n’, ‘ã’, ‘ổ’, ‘ụ’, ‘k’, ‘ý’, ‘e’, ‘ờ’, ‘õ’, ‘i’, ‘a’]

test.csv:
[‘ế’, ‘đ’, ‘ổ’, ‘ĩ’, ‘l’, ‘ả’, ‘ậ’, ‘ẹ’, ‘ạ’, ‘ý’, ‘ừ’, ‘ồ’, ‘ẫ’, ‘ẩ’, ‘ỗ’, ‘t’, ‘ặ’, ‘ấ’, ‘ở’, ‘p’, ‘ẽ’, ‘í’, ‘ự’, ‘x’, ‘ằ’, ‘s’, ‘ỡ’, ‘b’, ‘ỹ’, ‘a’, ‘ử’, ‘ủ’, ‘y’, ‘ẻ’, ‘ị’, ‘ờ’, ‘ề’, ‘e’, ‘á’, ‘ọ’, ‘ứ’, ‘ắ’, ‘ữ’, ‘k’, ‘â’, ‘ư’, ‘n’, ‘õ’, ‘é’, ‘ộ’, ‘m’, ‘ỷ’, ‘d’, ‘ỉ’, ‘r’, ‘ô’, ‘ợ’, ’ ', ‘c’, ‘v’, ‘ã’, ‘ố’, ‘ơ’, ‘u’, ‘ũ’, ‘ù’, ‘g’, ‘ụ’, ‘ó’, ‘è’, ‘ớ’, ‘ă’, ‘ê’, ‘o’, ‘ễ’, ‘ỳ’, ‘ì’, ‘ò’, ‘ẳ’, ‘ầ’, ‘ỏ’, ‘ú’, ‘q’, ‘i’, ‘h’, ‘à’, ‘ể’, ‘ệ’]

@lissyx If you need anythings eles in my training to resolve this just tell me :smiley:

1 Like

Well, have you checked the dimensions ?

I don’t know how to check it, can you help me :smiley:

The shape of probs does not match with the shape of the vocabulary

So you need to check the size of your output layer with the size of your alphabet.

So, again, please explain more what you are doing. What’s this model, training dataset, language model, etc etc.

Thank you @lissyx, I have resolved this error by create new alphabet.txt file. The last one have somethings error in there where I only have 90 characters but when I print the alphabet in evaluate.py in decoding step its was 91.