Hi everyone. I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM. The dataset is a Common Voice dataset for Persian. My configurations are as follows:
- Batch size = 2 (due to cuda OOM)
- Learning rate = 0.0001
- Num. neurons = 2048
- Num. epochs = 50
- Train set size = 7500
- Test and Dev sets size = 5000
- dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)
Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40. The predictions are all empty strings at the end of the process. Please help me improve this model. Thanks in advance