Hey there!
We started seriously training DeepSpeech models and I have collected some questions about training.
-
When I started training on our initial data (couple of hours) I could use a higher batch size of 24, but as we progress and get more data I find myself having to lower the training batch size everytime we add a new hour ot the data, otherwise I run out of GPU memory. How can that be? More data should not lead to more memory consumption per batch (if not some new files are notoriously long but in our case it seems pretty even). Is this normal, is there something I am missing? Maybe is it related to augmentation?
-
I always wondered about the loss pattern during training. At the beginning of an epoch, it is low and then steadily increases. How is the loss calculated there?
(as you can see we are doing model fine tuning)
-
For training the acoustic model you still have to give it the scorer. Is the training actually influenced by the scorer? Or is it just for dev set evaluation and finding the best performing model?
-
Which files are recommended to run LM optimization on (the lm_optimizer script)? Just the test set files? Train + dev + test? What is your intuition about that?
Thank you very much