I had some questions about the pre-trained model for 0.4.1.
How many hours of data in total were used to train the pre-trained model?
What are the proportions of each speech corpus used? i.e. is it mainly LibriSpeech, Common Voice or an even mix of all of them?
It says that the model is optimized for American English but that it uses the English Common Voice corpus, so presumably this isn’t filtered first and thus contains all English accents?
Yes it contains all accents, but so does American English. Particularly, it’s dominated by Fisher which is dominated by American English, see the links above.