Inspired by @mrthorstenm I decided to create my own dataset as well. Starting from my own desk with a crappy headset microphone, I soon moved on to more professional methods.
In the end I hired two male voice talents, they will each provide me with 20-25 hours of Belgian Dutch voice data over the course of the coming two months. My aim is to create other voices from this data as well, hopefully with a minimum of data. I asked them to record in mono WAV format, 44.1 kHz and 16-bit audio.
Should I train two separate tacatron2 models, check which one is most suitable, and use transfer learning or is the state of the current multi-speaker training good enough and easier to work with to generate future voices?
Are there any other tips or suggestions which I should think about?
Any help or input is appreciated.