Hi all,
I am using “deepspeech-0.6.1-model(Pre-trained model)” to transcript wave files from mozilla common voice dataset but getting incorrect transcription.I want to ask if our model is trained on this dataset than why we are getting incorrect transcription??
Common Voice dataset is quite large both in number of releases and in languages, could you please be more precise on what you tested ?
Also, because a sample was in the training or validation dataset does not ensure it will be 100% correctly recognized.
Language:-English
here I am unable to upload audio here I have tested on sample-000000.wav,sample-000001.wav,sample-000002.wav.
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
2020-04-08 05:22:51.588842: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00798s.
Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie
Loaded language model in 0.000177s.
Running inference.
without the data that the article useless:-by deepspeech_model-V.0.6.1
Inference took 2.208s for 3.192s audio file.
for sample-000000.wav the correct transcription is “Without the dataset the article is useless”
You have not answered the question about which common voice release you are referring about. There have been multiples english dataset release.
I don’t know about that specific example, but I have already explained to you that this not unexpected.
I referred version “en_1488h_2019-12-10” of common voice dataset.
So you expect that a model release from before the common voice release would include the data? No. The 0.6 was trained on previous common voice release.
Mozilla Discourse a écrit :
@lissyx,Please let me know which version of common voice dataset to download for deepspeech 0.6 or please share the document where it is given, it would be really helpful.
Thanks in advance.
I don’t know what you want exactly … Unfortunately, the Common Voice team does not yet allows to access older version of the dataset through their website.
Could you please clearly state what you are trying to achieve ?
I am sorry for making you confused.
I want the samples in which deepspeech model gives correct predictions without any error in the transcription for a given wave. So,should I have to check it manually by running each sample and checking the transcription?
You are chasing ghosts here. First, we don’t have such a list. Second, as I already stated, Common Voice is one of the hardest dataset according to several benchmarks. So even on samples that have been on the training set, it would not be surprising there is small errors like the ones you reported.
The error could also be triggered by the language model.
As of now, yes, that looks like the only solution.
It would really be useful if you articulated clearly why you need that.
Thanks for the information and sorry for the inconvenience caused.
You really have a strange request, sounds like you are trying to cheat on a benchmark or what else do you need that for?
Usually people want to know how well DeepSpeech can transcribe language and not find perfect examples … Just overfit a small model yourself and it will detect your phrases perfectly
So, can we know why you want to do that ? It’s frustrating that you don’t want to share and thus that we cannot help you correctly.
No, it’s not like cheating or anything.I want to perform adversarial attack on this model so, that’s why I want to take 100% correctly recognized samples .
Ya,sure. I want to perform adversarial attack on this model so, that’s why I want to take 100% correctly recognized samples .
You could have exposed that from the begining, it would have saved everyone a lot of time.
I guess overfitting on a smaller subset of Common Voice would be the most efficient way to achieve this. You would be more in control of the model, and you could thus perform your attack in a reproductible manner.
Exactly, that way you could figure out what works best which is a lot harder to see in the larger model.
Yes I will definitely take care of this before raising an issue from next time .