Yep it worked without the dash (thank youu!!!):
Reading words.arpa
----5โ10โ15โ20โ25โ30โ35โ40โ45โ50โ55โ60โ65โ70โ75โ80โ85โ90โ95โ100
Identifying n-grams omitted by SRI
----5โ10โ15โ20โ25โ30โ35โ40โ45โ50โ55โ60โ65โ70โ75โ80โ85โ90โ95โ100
But generate_package gives me the error
4860 unique words read from vocabulary file.
Doesnโt look like a character based model.
Package created in kenlm.scorer
Now it doesnโt say anything about the missing the header. In the vocabulary file I placed all my transcripts, and they are sentences in Romanian, we have some special characters like ฤ, รฎ, ล, ลฃ, รข. I also placed them in the alphabetโฆ
Well at the moment I am training it for a single speaker, I have multiple speakers and around of 17hours of recording provided by my University. I have a lot of transcripts and probably I will be using those. Anyway thank you so much for your help! As I said I am working on my Final Project and I will mention you and this helpful community in there and maybe one day I will be able to also help someone who is training DeepSpeech for Romanian language. many many thanks
I am not an expert of bash, try to run it on the command line, then use a script.
As for DeepSpeech, you should try to increase train and dev batch sizes to speed up training. If you have a GPU, use train_cudnn. What to use for n_hidden varies widely, typically multiples of 2, so maybe 128 or 256. I didnโt have much difference between such values, but I use larger inputs.
If you have the space, store more checkpoints to see whether a previous checkpoint has better results if you run for 100 epochs.