Too many recordings?

Addendum about your calculations on “how much time or recording needed per day”.

For simplicity, suppose these:

  • Average recording duration in your corpus is 3.6 seconds. So 1 hour recording means 1000 recordings.
  • It is good to have multiple recordings per sentence, from diverse genders/ages/accents. Say, you aim for 3 recordings per sentence on the average.
  • Suppose your language and model requires 1000 hours to give good results for your application.
  • Suppose your community can produce 1 hour VALIDATED recordings per day (i.e. 1000 recordings) on the average.

As a results, to reach your 1000 hours goal:

  • You would need ~333k different sentences.
  • You need 1000 days (2.74 years) to reach your goal.

I’m assuming you use ALL validated recordings. If you limit that, it will take longer.

In my experience, 1 hour validated recordings per day can be reached only with a large contributor base (e.g. English), or with constant effort from community leads who can direct many campaigns - required in our case with ~1000 diverse voices.

3 Likes