Hi,
I am building STT for a company which is into education sector. We have around 700 hours of corrected training data from education sector which is very relevant to the type of conversation which will be happening.
To increase my vocabulary, can i put corrected set from youtube which can be generic and 2-3 from business verticals like finance and medical.
Total will be around 2000 hours. I intend to keep the LM from education vertical only.
Will this degrade inference performance of my model or enhance it.
Is there any other approach around it so that i can create a model and in future for different business verticals i keep on changing the LM.
Please advice or point me to any resource which will be helpful
Thanks in advance