Long silence in some recordings... Solution pre-trimming?

This mostly happens on longer sentences. Probably, people hit the record button and then read the sentence silently, before actually speaking it. This results a long silence (like 10 sec) in front of the clip.

People who are validating these clips would most possibly think that the recording is empty and hit NO button.

Is it possible pre-process the recordings to trim out the silent parts before pushing them to the listen queue? I saw a normalization request in github too…


This is currently not possible and would be difficult to engineer, even basic splitting based on noise would be fairly processor intensive and not necessarily reliable (e.g. it might have deleterious effects on clips that don’t have silence in). It’s not just a matter of fixing clips with silence, but also about potentially harming clips without silence. It would also be difficult to tune properly.

This is one issue that discusses it and this one too (for reference).

I think it’s worthwhile keeping the issue open though. In principle these clips could be recovered by dataset engineers after the fact by mining/processing the invalidated clips to find ones that are incorrectly marked as invalid.


Thank you for the info and pointers :slight_smile:

The blank clip detection in the links is also important, I can never be sure if it is not because of my current connection (low bandwidth mobile data).


Maybe this request “convince” the contributer to re-record after hearing his own clip(s)


1 Like