The noise reduction componet in the pre-transcription processing

Thanks for sharing your approaches. It did motivate me to run a benchmark for comparison:

In my experiments frequency filtering only had a very small impact. The noise reduction (with rnnoise) did help much, but also can lower the accuracy in more silent environments.

Note that the benchmark did not test transcription accuracy directly, because I’m doing an additional step afterwards (Speech → Text → Intent+Slots).

The benchmark code can be found here

Update: The new model version (0.9) has a much better accuracy in noisy environments due to the noise augmentations in training. Extra noise reduction now decreases the accuracy while frequency filtering does increase it a little in very noise environments.

6 Likes