[ACTION REQUIRED] New Sentence Collector Infrastructure and Improvements

sinumade · October 18, 2020, 9:33am

Yes, what I'm worried about is duplicate sentences and sources, so if the Collector tool automatically detects that, I might not need to be able to search for it.

The reason I'm so fixated on user searches, or who was involved in collecting the sentences, is because I want the user to feel responsible. It's good to see some interesting sentences coming out ...... but on the other hand, I'm worried about the use of language that could be hurtful to people. It's really nice to see more people participating, but in the aspect of self-governance, I have concerns.

Yeah, I'm waiting for the next update.

So containers and profiles are different functions. I didn't know that. Thanks for telling me about it.

@mkohler Did you put an announcements tag on it? Thank you.

I think this topic should be linked from the Collector tool. The Migrate Account page doesn't even have a closing date on it.

As I wrote in Announces "announcements", the announcements are inadequate. Volunteers and visitors are not being notified of any significant changes. The information is still stuck in Discourse.

Tell everyone what we do. I'm sure some people will be interested in the detailed changes.

Maybe, really, when a sentence is removed for any reason, we should announce it in the Collector tool. Volunteers will be in disbelief, especially when the numbers have moved significantly. Eventually, they will ask why in Discourse. And the users who collected the sentences may be displeased.

Think about it. Both the reporting of copyright issues and the decision to remove them are done "behind the scenes". Certainly the movement is public. But if there is no trigger, no one is going to look at it. It's not surprising that many users feel that they are not being taken seriously.

I have shown on my site how much Japanese sentences have been removed and why they were removed. I thought it would be sincere to do so. No matter how small the readership was (even if I was the only volunteer), reasons and changes should be shown.

I know it's hard to deal with 100+ languages. But if we're a team, it has to be shown.

Besides, we need to show that the Tanaka Corpus is a corpus that cannot be used in Common Voice, right?
The Collector tool should also have a complete list of such "disabled corpus".

Show the changes in the sentence collection and why.
A list of disabled corpus.
- Of course it would be nice if the Collector tool could filter it. But it's better to keep a list, so volunteers don't waste their time.

That's it for now.

Topic		Replies	Views
Sentence collection tool development topic Common Voice sentence-collection , announcements	30	4112	January 26, 2019
Sentence Collector Localization Update Common Voice sentence-collection	45	2040	January 16, 2022
The Sentence Collector is going to change! Common Voice	5	627	March 15, 2023
Sentence Collector Open Discussions - Input needed Common Voice sentence-collection	17	3710	October 2, 2020
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	34	8979	December 17, 2018

[ACTION REQUIRED] New Sentence Collector Infrastructure and Improvements

Related topics