I am trying to build an adaptable speech Recognition system based on Mozilla DeepSpeech (which is TensorFlow implementation of the DeepSpeech paper)
The idea is that,
- We will pretrain a model on a certain voice. Then, save the model + create a checkpoint.
- The saved model is used for transcribing speech to text.
- If user notices something is incorrectly transcribed, he can provide a feedback on what the correct text should be for the voice he just recorded.
- This forms a new sample for training. The model is restored to the previous checkpoint, and then trained on the new sample. (We would also use some data augmentation techniques to increase the number of samples)
- Now the resulting model should be better adopted to the user voice / pronunciation
- Repeat from Step 3, if there is an incorrect transcription
Is this the proper way of using checkpoints? I mean, every time when I train on new sample, I restore to last checkpoint & replace the complete training data with the new sample.
Any suggestions would be appreciated!
Thanks in advance!