I’m specially happy to see the efforts around collecting sentences. This is key for languages like Kabyle since, as far as I see right now, there are only 35,7K sentences in the system.
There are 258hrs recorded for Kabyle, and in order to avoid people repeating the same sentences over and over again you would need more than 232K sentences to have all these hours without repetitions (4s avg per sentence).
Remember that Deep Speech then only use one recording per sentence so we need to avoid having more hours than sentences we have