How to use Deep Speech to force aliginment?

I trained a model using Deep Speech. Now, How to use Deep Speech to force alignment?

1 Like

Or asked in another way: is it somehow possible to extract the best of the “chosen paths” of the ctc decoder in a way that you get “per time step” information ? Or something like information about at what time in the audio the system recognizes the character ?

Timing metadata is now exposed in the API and in the command-line tool with the --json flag.