Do you think it would be possible to make use of some of the recordings from librevox? https://librivox.org/
Just an idea, I do not know how practical this would be.
Yes we will! Thank you for the suggestion
Another possibility is to take the samples from https://en.wiktionary.org/wiki/Wiktionary:Main_Page - they have recorded many many single words. It might be a good little addition to the data set.
This might also be a good source https://tatoeba.org/eng/sentences/show/2544351 apparently they have 6,128,636 sentences in 322 languages
Thanks @rain1! We are definitely working with Tatoeba. That is a great project!