After chatting with @dkreutz he recommended to document the progress and lessons learned for the community. As i think this is a good idea i (and probable Dominik) will update this thread on a regular basis.
At the moment i listen to all my recordings and categorize all wavs in green (good), yellow (needs revision), red (removed from dataset) while Dominik starts optimizing the files.
After removing the red ones, and doing some optimization by Dominik on the yellow ones we hopefully have an acceptable dataset for german tts generation.
I uploaded some recorded samples on my github page (https://github.com/thorstenMueller/deep-learning-german-tts) just for interested people to get an impression of the sound of my voice.
Lessons learned so far:
- Mimic-Recording-Studio (by MyCroft) records with “RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz” which is higher sample rate than required (16000-22050Hz). Stereo should not be needed.
- Beware your recording room situation (reverb and random noise)
- Always keep some distance between mouth and mic
- Use a good mic and speaker for reviewing your audio (that’s mentioned in several places, so please take it serious)
Even if i use this thread for documenting the progress this should not be a soliloquy thread so feedback of every kind is welcome.