I tested the Python one you mentioned ( GitHub - wiseman/py-webrtcvad: Python interface to the WebRTC Voice Activity Detector ) today, and had a small WAV of 55 words, 275 characters and length 19.968 seconds
It split the WAV into 5 WAV files
00:00:01.59 — 3 words
00:00:06.87 — 21 words
00:00:03.63 — 12 words
00:00:03.33 - 10 words
00:00:02.43 — 9 words
and playing all the files in the same sequence as the original WAV, there appears to be no word truncation. I don’t understand the aggressiveness setting, and simply ran it with a ‘0’
As I was matching up the (real) transcript with the audio from this python tool, also found that where sentences ended, the audio also ended. Possibly it is picking up the additional pause/wait at the end of a sentence ?
The total length of the 5 WAV audios is 17.85 seconds, where the original was 19.968 seconds. What was ‘dropped’ was obviousy noise and not speech.
hth