I haven’t seen any notes on how the dataset will be used. Will it only be non-public? Free to download? Link provided behind a paywall? Download after paying what the download costed (e.g. S3 like arXiv)?
Will a scientific publication be made with this? (Where the dataset will be described / how to use it will be described?)
We have a goal to publish an initial version of the dataset before the end of the year. As @omniscimus mentioned, it will be public domain, free for all.
For Mozilla, we have the DeepSpeech project that will use it. We are not sure how everyone will use it yet though, and that’s one of the things we are investigating.
I’m waiting for the support to others languages, I recording for the English model, but I’d to record to my native language, this is the most exciting project in the last years in my opinion.