Synthesizing Multiple sentences/audiofiles

CrazyJoeDevola · September 9, 2020, 10:57am

Hi
Can I pass a text file to the PWgan synthesize.py like i do with the training of the TTS?

I have noticed that if I run synthesize.py several times with different sentences, the tonation/expression of the voice sometimes change.

I am setting up variables in Colab to easily be able to pass it to the python script like so:

out_dir = “’/content/drive/My Drive/output’”
speech = “‘This is a test sentence’”

Could i do something like
speech = “’/content/drive/My Drive/sentences.txt’”

I’ve also noticed that i get problems when using
speech = “‘I’m a noob and I’m really bad at this shit’”
due to the single quotation marks. Is there any way around it? Obviously I want to be able to write “I’m” and not “Im”

georroussos · September 9, 2020, 11:25am

You can iterate over a text file lines. Alternatively, you can locally do it using bash.

CrazyJoeDevola · September 9, 2020, 11:29am

Embarrased to ask, but how would I do that using colab?

All i know is how this syntax works. Do i need to modify some of the python files you mean?

!python /content/TTS/TTS/bin/synthesize.py $speech $tts_cfg $tts_path $out_dir --vocoder_path $voc_path --vocoder_config_path $voc_cfg --use_cuda True

georroussos · September 9, 2020, 11:36am

No need to be embarrassed I have never tried it, but I imagine something like

file = open(“file.txt”).readlines()

for line in file:
       speech = line

may work. Although the notebooks spit a sentence and then you have to run it again, so I don’t know. If you want to do it locally on your computer, you can try something like

cat lines.txt | while read LINE; do python3 synthesize.py $LINE config model out_dir; done

CrazyJoeDevola · September 9, 2020, 11:40am

Thanks, but i kind of wanted to avoid running it several times due to the fact that i has caused the output to sound different every time. This is not the case if i run a single long sentence.

georroussos · September 9, 2020, 1:21pm

Well you have to choose. LSTMs are not great with long sequences, so it makes sense to synthesize shorter batches. With a dataset that is extensive, has no transcription errors and has adequate diphone/triphone distributions, you can synthesize text of up to 5000 characters. With LJSpeech, the longer I have achieved has been 1000 characters. This all in one take.

CrazyJoeDevola · September 9, 2020, 3:14pm

I havent got the analysis notebooks to work on colab. Always some error that prevents me from running them

CrazyJoeDevola · September 9, 2020, 3:15pm

How are you dealing with non-words like “uhm” “eeh” “ahh” . Are you keeping them in and adding transcription for them?

georroussos · September 9, 2020, 3:53pm

My datasets usually do not have these. You can label them if you want to keep them. Taco2 can learn to model them if it is presented with a lot of examples.

CrazyJoeDevola · September 9, 2020, 4:08pm

Yes, i want to keep them to get a more natural flow

erogol · September 11, 2020, 10:20am

these utterances should be just fine. If they are not, then you need to add them to the phoneme dictionary if you are using the phonemizer.

Topic		Replies	Views
[TWEB dataset] TestSentence audio is progressing while synthesized audio is noisy TTS (Text-to-Speech)	1	307	July 28, 2020
A test Sentence for LJSpeech Tacotron1 model TTS (Text-to-Speech)	0	667	April 20, 2020
Cloning my own voice does not work at all TTS (Text-to-Speech)	9	3157	September 26, 2020
Sanity check TTS (Text-to-Speech)	3	426	August 15, 2019
Notebook for fasten than real-time speech synthesis on a CPU using Tensorflow 2 runtime TTS (Text-to-Speech)	0	487	July 9, 2020

Synthesizing Multiple sentences/audiofiles

Related topics