Has anybody here tried to develop a language model for Hindi/Nepali language?

(Roosan Gm) #1

I was planning to develop a language model for Devanagari script, but we have a lot of alphabets and variations to even try. क, ख, अ, आ and their variations such as के, का, खे खा, etc.

Also, we have half letters as well. For example, क् in क्या.

How do you guys propose to build alphabet.txt in the cases like these ?

(Roosan Gm) #2

I realized that, के is made with two unicode characters combined “क” and " े". Would it be okay to have it (के) as a alphabet in alphabets.txt ?

(Roosan Gm) #3

Any kind of lead would be helpful for me in this. Are there any other scripts like Devanagiri ?