I am newbie in deep learning and working around deepspeech . Would like to know the format for audio and transcription for preparing data
Split your audio files to sentence length (say 1 - 15 seconds). Then create three files, one for training (train.csv
), one for development testing (dev.csv
), one for evaluation testing (test.csv
). The file names are arbitrary. The first line of each must contain column declarations, and there must be at least these columns:
wav_filename,wav_filesize,transcript
There can be any number of other columns if you need them. The subsequent lines contain the data in the order defined by the header line.
-
wav_filename
corresponds to the path to the audio relative to the csv file. -
wav_filesize
is the number of bytes of the audio file. -
transcript
is the transcript limited to your alphabet.
A sample of a file could be:
wav_filename,wav_filesize,transcript
clips/sentence1.wav,16444,the cat sat on the mat
clips/sentence2.wav,21010,the bat shat on the cat
The size ratio of train:dev:test is usually 8:1:1, but that’s more of a convention than a generally optimal number.
The audio files must be standard WAV, 16kHz, mono.
HTH
Love the examples, there should be a generator for that
I have created my own data. Now :
Is there any flask(or any framework) app to edit my transcript while listening to relative wav file?
Or any transcription app
If there is not I Must create one
Please no hijacking