Hi both, thanks for the discussion.
Indeed our long-term goal is 10K hours for each language. That said, we may be able to build usable speech technology with far less data (for instance, by using transfer learning with our existing english models).
But in addition to 10K hours, we also want to start creating “usable datasets” in many languages that don’t already have usable datasets. Because if you can build a V1 of a speech product in a language like Chuvash, it’s possible to use that product to start collecting more data. But without any Chuvash data, it’s impossible to start.
In addition to kickstarting languages, we are also trying to make our site more fun to use so that people donate more and come back more often. To look at some of this work, check out our upcoming wireframes:
http://bit.ly/cv-desktop-ux