I’m quite new to this all. I have to take on a project and am a bit out of my depth. Trying to figure out a viable tooling architecture from a high level first before getting my hands dirty.
The project I’m taking on is a full speech-to-speech translation for a body of work by a researcher. The source language is English. I need to land with s2s translations in the languages Spanish, French, German, Russian & Arabic.
Questions:
-
Are there pre-trained models for these languages available with Mozilla tts? If not, can anyone suggest an alternative tts that has all of these pre-trained?
-
How do I go about syncing the video to audio? This doesn’t have to be perfect. Dubovers are typically a bit off. The cheap way should be something like taking timestamps from the ASR then feeding this into TTS? I don’t know if there is some simple standard approach.