Synthesizing Multiple sentences/audiofiles

Hi
Can I pass a text file to the PWgan synthesize.py like i do with the training of the TTS?

I have noticed that if I run synthesize.py several times with different sentences, the tonation/expression of the voice sometimes change.

I am setting up variables in Colab to easily be able to pass it to the python script like so:

out_dir = “’/content/drive/My Drive/output’”
speech = “‘This is a test sentence’”

Could i do something like
speech = “’/content/drive/My Drive/sentences.txt’”

I’ve also noticed that i get problems when using
speech = “‘I’m a noob and I’m really bad at this shit’”
due to the single quotation marks. Is there any way around it? Obviously I want to be able to write “I’m” and not “Im”

You can iterate over a text file lines. Alternatively, you can locally do it using bash.

Embarrased to ask, but how would I do that using colab?

All i know is how this syntax works. Do i need to modify some of the python files you mean?

!python /content/TTS/TTS/bin/synthesize.py $speech $tts_cfg $tts_path $out_dir --vocoder_path $voc_path --vocoder_config_path $voc_cfg --use_cuda True

1 Like

No need to be embarrassed :slight_smile: I have never tried it, but I imagine something like

file = open(“file.txt”).readlines()

for line in file:
       speech = line

may work. Although the notebooks spit a sentence and then you have to run it again, so I don’t know. If you want to do it locally on your computer, you can try something like

cat lines.txt | while read LINE; do python3 synthesize.py $LINE config model out_dir; done

Thanks, but i kind of wanted to avoid running it several times due to the fact that i has caused the output to sound different every time. This is not the case if i run a single long sentence.

Well you have to choose. LSTMs are not great with long sequences, so it makes sense to synthesize shorter batches. With a dataset that is extensive, has no transcription errors and has adequate diphone/triphone distributions, you can synthesize text of up to 5000 characters. With LJSpeech, the longer I have achieved has been 1000 characters. This all in one take.

I havent got the analysis notebooks to work on colab. Always some error that prevents me from running them :frowning:

How are you dealing with non-words like “uhm” “eeh” “ahh” . Are you keeping them in and adding transcription for them?

My datasets usually do not have these. You can label them if you want to keep them. Taco2 can learn to model them if it is presented with a lot of examples.

Yes, i want to keep them to get a more natural flow :slight_smile:

these utterances should be just fine. If they are not, then you need to add them to the phoneme dictionary if you are using the phonemizer.