Sentence ending punctuation

Someone has recently uploaded a few (~560) czech sentences into the sentence collector tool, however all of those are lacking eny sentence-ending punctuation whatsoever. That left me wondering, if such is required and recommended, and whether sentences without the punctuation should be approved or rejected.

@mkohler I’m afraid those sentences should be considered for deletion due to their sourcing nevertheless. They were uploaded by user “filipjurcicek” with the source being claimed to be “Snetnces are from 100 years old book RUR by Karel Capek”, but they are almost definitely comming from the file https://github.com/UFAL-DSG/alex/blob/master/alex/applications/RepeatAfterMe/sentences_cs.txt, with the repo license being Apache, and that file itself is according to the readme sourced from two public domain works and one CC-BY-SA work. I can always scrape the two public domain works later, although it is not like there is a lack of sentences to review in czech.

Agreed, we need to be careful of the sources sentences come from. I’ve deleted these.

I’ll let someone else respond here. Take my answer with a grain of salt. As far as I remember DeepSpeech currently ignores the punctuation endings.

2 Likes