What level of background noise should fail validation?


On the validation tab it asks you only if the person said the sentence accurately so I have generally approved all situations where this was true, irrespective of other factors.

But what level of background noise should be considered a fail? I have approved clips where the background noise was very loud and the sentence was barely discernible (but still correct) and also situations where there were background voices or talking by other people on the recording. Should I be failing these?

And are there other situations that would require a failure even if the sentence is correct?


Since a STT engine using this dataset should be able to handle such situations, it should be OK when you can understand the sentence. But i would also like clear rules here.