Hi Guys!
Actually currently I am working on a Japanese speech recognition system for TV, with simple commands like turn off or turn on and etc. and the total vocabulary that used into the commands and I would like to train, are about 600 words. based on this, how many hours of data do you think be enough for creating a model? and my second question is do you think that DeepSpeech is suitable for my project(PS. previously I used Julius speech recognition but it was not really good, so because of that I think it is better to use a deep neural networks)?