Does the sentence collector test for duplicates?

I tested it with a scentence that is already part of the colection and I could review it and it looked like there would be no test for doublicates. Is this the case?

I guess a lot of people try to add scentences from similar sources that are public domain, so this should defenetely be tested.

Are they exact same sentences?

Yes, maybe I added a space at one end of the scentence.

EDIT: strange, I just retested it and it now says “The sentences you submitted already exist.” Does it also test for scentences from the early phase of the project? The scentence I used was defenetly added via github, not via the frontend. Now that it is part of the collection of the sentence collector everything works.

We only test against the Sentence Collector database. Once sentencee get exported, we also check against existing sentences in the voice-web GitHub repo.

1 Like

Once sentencee get exported, we also check against existing sentences in the voice-web GitHub repo.

Great, thanks. So I don’t have to fear to create duplicates when I propose a new sentence.