[Help Wanted] Write some nice, short sentences for people to read

Honestly, I find this a little troubling. A true language model would include all words, offensive or otherwise. As someone who is depending on voice recognition heavily to get work at my normal job done, and interact with friends and family, I find it very irksome that some words are not recognized as well. I feel censored. How would you feel if your keyboard refused to produce a word that you typed?

There certainly should be room for having offensive words in this language model set.

4 Likes

Wrote some sentences here and tried to include more slangs and female pronouns: https://pastebin.com/94aLhZxT

1 Like

Here are some short common sentences. They each reside in multiple books on Project Gutenberg from a random collection of works I downloaded from different authors. https://pastebin.com/XRpZgdbw

1 Like

Would also help for sentiment analysis.

Hi, @mhenretty

Because of @fred_trotter 's comment, I created some health-related sentences. https://pastebin.com/6Y797eNp. Let me know if you need more contributions.

2 Likes

I’ve created some sentences here - maybe not simple enough? Let me know, I’d be happy to supply more

https://pastebin.com/3wcQEnB8

Hopefully this helps.
https://pastebin.com/1VvGnUiV

I created an extra thread to discuss this topic:

Hi! Today I explored some interesting cases in terms of pronunciation. I believe these: https://pastebin.com/bTKWpHiU
can help cover a few potential bugs of machine understanding.

Hi, @bavencope! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added.

Hi, @pro.gadget! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added. Cheers!

Hi, @tlcoles! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added. Also, I think it would be awesome to have sentence that touch on women’s health, but that’s just my two cents. :slight_smile: Cheers!

Hi, @RLarissa! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added.

Hello, @Kieran_Drew! Thank you so much for your contribution. Just letting you know that your sentences have been added. :slight_smile:

https://github.com/mozilla/voice-web/commit/86f5769fad99d9bb844fb2eb61e4211413856ca9

@mlennox Thank you so much for your contribution! Your sentences have been added. :slight_smile:

Here are some sentences, not sure if this is the right place to send them:
https://pastebin.com/raw/RJp7bWpu

Here’s 228 sentences of CC0 licensed dialog:

https://pastebin.com/raw/Jb8grcpV

Hope that helps :slight_smile:

And here’s 200 sentences of CC0 conversational sentences:

https://pastebin.com/raw/EawMHjEx

I got 9.8 MB more of CC0 conversation sentences but can anyone help me filter them to remove the incorrect ones and verify them?

https://ufile.io/8ijp9

1 Like

@James_Fortune cool, that’s a substantial amount of utterances! Since most of them are very technical, I’ve filtered the worst offenders automatically:

http://speech.tools/cc0-75k-conversations.filtered

But that’s still 75k sentences, a lot more than the the 7k in the common voice v1 release. Not a bad idea to exclude some more utterances manually, I guess, but it’s a start.

1 Like