I am working in a small non-formal team for speech recognition in Esperanto. In addition to the Esperanto Common Voice dataset we would like to use other datasets. We have some options for recordings, but they have all rights reserved. The owners are willing to provide the recordings for us to use for training a model, but do not necessarily want to free the dataset for everyone.

In the DeepSpeech release notes (like for version 0.9.3) is as one of the source training corpora mentioned “approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.”

Could you please provide some model license agreement that we could adapt and use with our potential partners? Or link me to some license agreement already prepared? Many thanks!

