I have to ask that how to handle long sentences./
Sometimes, depending on the spoken, long sentences spoken by one person is aligned with a single duration length. like 10 sec 50 words or so (depends on the speaker spoken in one go), and all goes in one transcript. it would be kind if it can be broken and split into smaller possible fraction of sentences. If you can guide me how to resolve the issue. or any possibility to control no of words per transcript alignment duration?
/bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/audio.wav --script data/transcript.txt --aligned data/result.json --tlog data/result.log --output-pretty --stt-max-duration 2000
VAD splitting: 0it [00:00, ?it/s]INFO:root:Fragment 0: Audio too long for STT
INFO:root:Fragment 1: Audio too long for STT
and it is missing text above this limit. How to handle such issues and split fragments into smaller possible parts.