Hello, I wanted to test deepspeech (i am using the 0.9.7 version) and see how accurate was the timestap recognition. I did the following audio file: https://drive.google.com/file/d/16e_GzM-AgqYGs37y5fEwifX_p5AJ2moe/view?usp=sharing and I realize that the results are not as expected:
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio speech.wav --json
"words": [
{
"word": "as",
"start_time": 0.38,
"duration": 0.12
},
{
"word": "i",
"start_time": 0.58,
"duration": 0.1
},
{
"word": "concluded",
"start_time": 0.76,
"duration": 0.48
},
{
"word": "my",
"start_time": 1.32,
"duration": 0.16
},
{
"word": "term",
"start_time": 1.6,
"duration": 0.34
},
{
"word": "as",
"start_time": 2.06,
"duration": 0.2
},
{
"word": "the",
"start_time": 2.32,
"duration": 0.14
},
{
"word": "forty",
"start_time": 2.54,
"duration": 0.3
},
{
"word": "fifth",
"start_time": 2.92,
"duration": 0.38
},
{
"word": "president",
"start_time": 3.4,
"duration": 0.48
},
{
"word": "of",
"start_time": 3.96,
"duration": 0.06
},
{
"word": "the",
"start_time": 4.06,
"duration": 0.1
},
{
"word": "united",
"start_time": 4.22,
"duration": 0.38
},
{
"word": "states",
"start_time": 4.68,
"duration": 0.76
},
{
"word": "it",
"start_time": 5.58,
"duration": 0.1
}
when I try to cut the segment where it says it found a word, the segment doesn’t match with the real word. For example, if you cut in the 0.38 second with a duration of 0.1 you are going to find that the word “as” is not there and when you inspect it with audio editor programs (like audacity) you can see that “as” is before the named time.
Is it normal? can you improve this in some way ?
Thank you!