I’ve noticed that DeepSpeech will sometimes interpret instrumental music, sound effects and animal noises as speech. How I can I tell it that something isn’t speech - would I just provide a blank transcript with the clip?
Or is it not necessary - will DeepSpeech get better at interpreting this as it gets more samples of what speech actually is?