This is my first attempt at fine tuning a Deep Speech model. I have done a lot of reading on how to do this, but none of them quite applies to the Tedlium dataset I have just downloaded.
Here are some issues:
I know I need to have a CSV for training with the columns (wav, wav_size, transcript). However all the files in the tedlium data set are in .sph rather than .wav and the transcripts are in .stm format. How can I generate a CSV file with that? Do I need to convert those into wav and actual text in the CSV?
Where should I be storing the training data itself. Should the “wav” information in the CSV be the path to the audio file?
There already is a TED-lium importer as it is already used in the release. Check what the importer does and you’ll know But DeepSpeech always needs 2-3 csv files for train/dev and maybe test at the end which point to the wavs (16 KHz, 16 Bit).
Best practice is to have one directory that has a csv file with pointers to all wavs in another dir in that folder called audio or wavs. Check audiomate on how to build a libraray of sets.
Thank you so much for the fast response. I was able to use that importer for the new Ted-lium-3 dataset after just changing up a couple of lines of code.
The issue now is that I am training this model in Google Colab Pro. After completing the training with Ted, I will need to download the check points and then continue training with another set of data on another day. How can I ensure that I can continue where I left off with the model checkpoints? Do I just upload the weights from the previous training session and reference it when I use the training script? (would it automatically use the most recent weights from the checkpoint folder?)
Save the checkpoints. If you start training with a checkpoint path that has a checkpoint, you continue training. Even though it shows Epoch 0 you are in fact training x + 0 epochs.