I have forked a Github repo and Customized it to use for my own use case.
maybe my use case is the same as yours which is annotating audios.
I assume you are using standard csv file format in Deep Speech project and all your audios are wav mono 16000.
(Or mp3 to use in windows. mp3 is better)
feel free to use it. But be careful this project is customized for RTL Languages. (you can change this one for your own language.)