Here are some short common sentences. They each reside in multiple books on Project Gutenberg from a random collection of works I downloaded from different authors. https://pastebin.com/XRpZgdbw
Would also help for sentiment analysis.
Hi, @mhenretty
Because of @fred_trotter 's comment, I created some health-related sentences. https://pastebin.com/6Y797eNp. Let me know if you need more contributions.
I’ve created some sentences here - maybe not simple enough? Let me know, I’d be happy to supply more
I created an extra thread to discuss this topic:
Hi! Today I explored some interesting cases in terms of pronunciation. I believe these: https://pastebin.com/bTKWpHiU
can help cover a few potential bugs of machine understanding.
Hi, @bavencope! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added.
Hi, @pro.gadget! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added. Cheers!
Hi, @tlcoles! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added. Also, I think it would be awesome to have sentence that touch on women’s health, but that’s just my two cents. Cheers!
Hi, @RLarissa! My name is Janet, and I am a volunteer for Project Common Voice. Thank you so much for your contribution. I’d like to let you know that your sentences have been added.
Hello, @Kieran_Drew! Thank you so much for your contribution. Just letting you know that your sentences have been added.
https://github.com/mozilla/voice-web/commit/86f5769fad99d9bb844fb2eb61e4211413856ca9
Here are some sentences, not sure if this is the right place to send them:
https://pastebin.com/raw/RJp7bWpu
I got 9.8 MB more of CC0 conversation sentences but can anyone help me filter them to remove the incorrect ones and verify them?
@James_Fortune cool, that’s a substantial amount of utterances! Since most of them are very technical, I’ve filtered the worst offenders automatically:
http://speech.tools/cc0-75k-conversations.filtered
But that’s still 75k sentences, a lot more than the the 7k in the common voice v1 release. Not a bad idea to exclude some more utterances manually, I guess, but it’s a start.
@bmilde I archived your link with the wayback machine: https://web.archive.org/web/20180116003608/http://speech.tools/cc0-75k-conversations.filtered
I’ll see if I can get an other substantial amount of utterances for Common Voice.
Hello, @Elleo! Thank you so much for your contribution. Just letting you know that your sentences (both batches) have been added.