It would be great to hear from @josh_meyer or others involved with that team especially on the issue around acronyms as well as around how the 14 word limit was chosen. Specifically on acronyms I worry whether systems trained on Common Voice will be able to handle the acronyms that occur all over everyday speech, unless we allow them in the dataset.
Other areas that still seems to need more clarity is the issues around non-A-Z characters should be allowed. I was going through and reviewing some more sentences that are on the site now, and was not sure whether âLucas played in the SĂŁo Paulo soccer teamâ should be rejected or accepted.