Hi again, sorry for the delay, we had a team off-site last week and I wasn’t able to share this with you.
After checking the feedback in this topic, together with other channels, we drafted a MVP (minimum viable product) that we want to share with you for feedback.
Note:
- This MVP includes the things we considered more important for a first release.
- We will be gathering feedback in this topic until September 23rd.
- Based on feedback we will iterate the document and share with our User Experience experts for a final pass.
- Any visuals here are just quick mockups subject to change, they should not represent the final visual direction (no UX expert involved in them)
Common Voice Sentence Collection MVP
Project needs
- An input of sentences (categorized in language and source)
- A set of validation algorithms (ensuring length, license)
- An input of reviewed sentences.
- A way to transfer reviewed sentenced to the final database.
- General metrics (number of sentences, validated, reviewed)
1. An input of sentences (categorized in language and source)
A web form input for text should be available, this form should:
- Allow single and multiple sentences in a form.
- Allow upload txt files with multiple sentences per line.
- Ask for the source language (auto-detect browser language).
- Ask for the source of the sentence (your own, url, other)
2. A set of validation algorithms (ensuring length, license,)
Once you submit the form, a backend will process all individual sentences and apply different validation algorithms:
- Length: Sentences should be 14 words or less.
- License: Sentence are not recorded as copyrighted material not under public domain.
If issues are presented, the result of this validation will be presented to user, who can edit problematic sentences or just submit the validated ones.
Once submitted, user will be asked to keep helping and presented with the review sentences.
3. An input of reviewed sentences.
User will be presented with sentences from other users in their language (auto-detect from browser) to validate. People should be able to:
- Validate a sentence right away.
- Reject a sentence right away.
- Edit a sentence and submit it for validation
The way information is presented should be really similar to the review system for localization tools:
A way to submit more sentences should also be presented to the user in this screen.
Any user should be able to access the review screen anytime and select the language preference to review.
5. General metrics (number of sentences, validated, reviewed)
At any given moment, user should be able to see a quick reference of how many sentences were validated and reviewed for the current language.
Spanish: Validated (1300) Reviewed (567)
A page with all languages metrics should also exist.
User needs
- Guidance page: Where to find sentences? What is a good sentence?
- An input form to write or an upload mechanism (txt files)
- A way to see post-validation output.
- A system to review other people’s sentences.
- General metrics (number of sentences, validated, reviewed)
1. Guidance page: Where to find sentences? What is a good sentence?
A link to a page with documentation should be present in the tool at any given moment as well as very visible from the submission form.
This page should contain:
- Description of the the 3 current good strategies to gather sentences
- How do I get public license sentences from large sources? (examples)
- How do I get linguistics involved in the project? (examples)
- How do I submit original sentences myself?
- Description of what constitutes a good sentences
- Hard requirements: Lenght, license, grammar.
- Nice to have: Names, cities, diverse sounds…
For 2, 3, 4, 5 see explanation in the previous section.