These are all good ideas. One that I would add to the list is error reduction - i.e. focusing on reducing the number of clips that end up failing validation.
I feel like this requires a combination of both educational and technical features. Users would benefit from increased guidance on how to record and validate. While there are community guidelines, there need to be official Mozilla-sanctioned guidelines that are easily locatable on the main Common Voice site.
On the technical side of things, a lot of recording problems would be solved if users played their own recordings back before submitting. Users apparently don’t know about core features like playing back recordings and skipping sentences, so there is clearly some kind of UX issue around feature discovery.
Additionally, more feedback in the UI about technical issues like microphone volume would help to combat the large number of low volume recordings.