I have Indian accent English audio data and corresponding transcripts. I want to use that to finetune the pre-trained model but don’t know how to prepare it for deepspeech(.tsv files). Please help.
Please do some research before asking for help
My apologies for the silliness sir.
My configuration is as follows.
DeepSpeech version : 0.7.1
OS : Ubuntu 18.04
Python Version : Python 3.6.9
Tensorflow version : 1.14.0
Issue is I have collected a dataset containing the audio files of Indian accent and corresponding transcripts for the same. But as the finetuning requires specific files for that, I need to know how I can prepare them from the data I have.
Please use the search function and read the documentation. You need to prepare the data like all data for DeepSpeech.
This is also not aligned with the doc. If you are working with DeepSpeech 0.7, then you should have TensorFlow r1.15.
As @othiele said, please refer to the docs, there is no special file format for fine-tuning, you need to pass a set of CSVs as for any training. You can look at the (numerous) importers for examples.