I checked microphone streaming with the English language with pre-trained weight, it is working. please, could you explain how can I use this for another language? If I train network with other languages and use PB converted weight file for microphone streaming inferencing, can it work for me?
Yes, as long as you keep the versions consistent (if you’re using DeepSpeech v0.6.0, make sure you use the training code from v0.6.0), you can export custom models and they will work with all clients.
I tested with korean language and it is predicting korean language without any error. I am training network with just 10 commands voice without splitting commands and the number of speakers is 100. Each speakers have 10 recording .
Hyper parameters are,
val_batch = 48
epoch = 75
The train_loss and validation_loss is less than 0 but prediction have some problem. when I speak some one command network is predicting more commands. could you suggest me how can I handle this problem? The inference with test data also the similar problem.
Looks like you just don’t have enough data. Try reducing model compelxity / learning rate. Loss at 0 obviously looks like overfitting.
I trained network with new configuration and I split single audio into 10 command audio files. So I have 10 command audio files and I augment audio by recording with different sound level playing with speaker and recording. Now I have around 4 GB data with different level of sound.
New configuration is,
The train loss is less than 0.571 but validation loss is less than 1.5023. but WER is not good. when i test this weight with mic_vad_streaming.py it is predicting correct, but if i speak some another language or another words it is predicting some training vocabulary. I am using language model also.
could you help me how can i solve these problem for make network stable and ignore if different commands are spoken?
That’s going to be tricky, since it means your network would be able to infer those. Given your setup, it’s obviously overfitted to your commands. Also with your dataset, it is very much likely not enough of data.
Have you tried just building a command-set language model and use our generic English model ?
@lissyx I just used my command to build a language model and I am working in the Korean language. so I am not sure what is the problem is. Model is able to predict every command but the problem is any type of voice also predicting some of my commands.
Right, so you can’t use the English model.
Sorry, I don’t understand what you state here.
@lissyx I mean at the time of inferencing using mic_vad_streaming.py file for microphone streaming inference, network prediction is correct for all command used for training but if I speak some other command rather than training commands network is continuously predicting one of the training command actually spoken command is not included in the training. so how can I ignore if user speaks some command rather than training command?
Because if we speak some other things and network are predicting one of the training command it will be not useful for real application.
As I said, how can you expect a model that has been overfitted on a few commands to output something else, or recognize that what you say is not a known command ?
could you provide some idea about tuning hyperparameter for small number of data? I tried to lower the learning rate(0.00001), but the loss is almost stable after some epoch. It means there is no learning just memorizing simply.
No, it really depends on your dataset, and it really looks like you just have not enough data.
@lissyx Thank you, I will increase the dataset with different sound levels and inform you result after training. what about changing pitch and adding some white noise?
It cannot do any harm, if done properly. Is there any community effort around Korean ? On Common Voice for example ?
I have no idea about it. I will try to find and let you know if available.