but considering the cost per minute, I want to use my own engine, I tested deepspeech and I think with learning, I will arrive at a good result, the only problem is that the text is in raw, and it is impossible for me to know when words was pronounced
any idea to reproduce speechmatics api result ?
thanx in advance, and sorry for my bad english
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Why don’t you use the library or its binding and build it yourself ? Besides, we have no way to produce a “time” that gets you when the word was spoken. There’s already github issue filed about that.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
I think you can achieve something similar with:
VAD
our streaming API
As you can see, libdeepspeech API will return you just a string, but then you can deal with that and produce JSON.