@dan0 the models for 0.4.0 has resolved most of the transcription errors the errors. Thanks for the response but i have a few issues with transfer learning.
I have 3 questions regarding a few problems i am facing :
- We tried to do transfer learning on the model with a few of our samples(4 large audio samples (technical talks) converted to 740 (around 5 sec chunks and 500 training samples and 100 for dev and 140 for test) that we created with around 5 sec audio with transcription in the csv file created from voice activity detection.
So some transcriptions got better some got worse.
So how many files are needed for a good transfer learning to happen?
- While transcribing with 04.0 i found that when the person speaks fast the transcription goes wrong either two words merge and form a wrong word or to seperate wrong words. So how do i improve this or this will also happen with transfer learning with people speaking fast and how many samples are ideal
- I tried transcribing with files with background music but got around 75% accuracy. I removed the noise with audacity:
procedure :
- remove voice from audio
- get noise profile
- and remove noise from original sample with noise profile
The accuracy was 85% after this.
But i tried to automate this with sox package for ubuntu
procedure:
- remove voice
sox audio.wav music.wav oops - create noise profile
sox music.wav -n noiseprof noise.prof - Remove noise from wav using profile
sox audio.wav output.wav noisered noise.prof 0.21
(i also tried with different levels of aggressivness like 0.3,0.05,0.1 etc but not much change in transcription)
The trancription became bad. I think it damaged the voice audio while noise reducing with sox.Do you know a better way for noise reduction and get better transscription.? And if i need to better transcribe a file which has background music is there any other way(like would training help and how many samples would i be needing)?
Thanks