Thanks for sharing @Mte90
Do you have more details on the reaction from press and communities before and after you showcase the Italian model? I’m really interested in reading the feedback you got from them so far.
I see you mention a few existing problems around sentences. Currently we have a process to remove from the wikipedia import any number of problematic sentences at once, and the plan for 2020 is to evolve the tool to also extract sentences from other open sources (like the European Parliament dataset). I think this is the way to go in order to get a lot of quality sentences.
Considering we have already enough sentences for Italian for a long time, are there other big issues/blockers in the rest of the workflow? (voice recording, voice validation, dataset release, model training)