Hi Everyone,
I am trying to train very small model first to get my self used to with all the commands.
Language I am trying train is Gujarati. Is this issue with Unicode characters for my language?
I am using following tutorial,
Reference Guide:
Command:
…/…/native_client/kenlm/build/bin/build_binary -T -s words.arpa lm.binary
Error:
/DeepSpeech/native_client/kenlm/lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&)
[with Search = lm::ngram::detail::HashedSearchlm::ngram::BackoffValue; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 20
ERROR
Thanks in advance for help.