i would like to train an ASR and a TTS model for italian based on Common Voice Dataset.
I have some questions.
- DeepSpeech can only train ASR is correct?
- The procedure to follow to train own’s model is the one pointed out in TUTORIAL : How I trained a specific french model to control my robot ? Have i to follow this one?
in positive case, i would like to apply my model to domotic, so is there any kind of pre-processing or sound properties or other stuff i need to know to properly train the model ? Can i find anything i need in the paper https://arxiv.org/abs/1412.5567 ? Or can you suggest me other references?
- Can someone give an advice on a good architecture to train TTS ?
thank you a lot!