Mic-vad-streaming

Vlad_Hornai · April 26, 2021, 5:32am

Hi,
I found some examples from a github repo and I tested using the english model and scorer from github and it works almost perfectly. The problem is when I try to use my own model and scorer in another language. The programs recognizes nothing and it performs poorly. I also converted pb to pbmm. What could be the possible issues? The dataset has 12 speakers and almsot 25 hours of speech.
Thanks !!

lissyx · April 26, 2021, 8:01am

You either need more data or need to use transfer learning.

Vlad_Hornai · April 26, 2021, 2:41pm

Thanks!!! I am going to see what I can do

ftyers · April 27, 2021, 2:24pm

Which language are you trying to train for? Romanian? If so, there is a pretrained model here: https://tepozcatl.omnilingo.cc/ro/
Results are (CER, WER) with and without LM.

You’ll want to use transfer learning from the released English model.

Vlad_Hornai · April 27, 2021, 3:00pm

Thanks!!!
Were did you find the model? Does it also have the dataset? My model is weak because I have only 25 hours of speech which is not enough. How can I do transfer learning from English ? I read the docs and I saw that there is transfer learning in the documentation, but more hints could help me.
Thanks!

ftyers · April 27, 2021, 3:43pm

I trained the model myself. See e.g. this post 25 hours of speech should be fine for transfer learning. I’m currently retraining Romanian with 100 epochs instead of 50 epochs, which should lead to a slightly better model. But I’m only training with the ~3 hours of training data from Common Voice, so with 25h it should be a lot better.

In terms of how to do transfer learning, it’s described in the documentation, but I use this command.

If you’re interested in more, you can find us on Matrix.

Vlad_Hornai · April 27, 2021, 3:54pm

Thanks @ftyers !!! I really appreciate! See you on Matrix

mitumitu · May 31, 2021, 8:16am

A romanian experimental language model

https://drive.google.com/drive/folders/1rd3RQXDXv_GVp9UgyHu-qjLzcadunJxt

Vlad_Hornai · May 31, 2021, 6:50pm

Thanks! I am going to check it out!