"Transcription wrong on giving audio from Mozilla Common voice dataset

Saumya_Mishra · April 8, 2020, 6:17am

Hi all,
I am using “deepspeech-0.6.1-model(Pre-trained model)” to transcript wave files from mozilla common voice dataset but getting incorrect transcription.I want to ask if our model is trained on this dataset than why we are getting incorrect transcription??

lissyx · April 8, 2020, 8:41am

Common Voice dataset is quite large both in number of releases and in languages, could you please be more precise on what you tested ?

Also, because a sample was in the training or validation dataset does not ensure it will be 100% correctly recognized.

Saumya_Mishra · April 8, 2020, 10:15am

Language:-English
here I am unable to upload audio here I have tested on sample-000000.wav,sample-000001.wav,sample-000002.wav.
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
2020-04-08 05:22:51.588842: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00798s.
Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie
Loaded language model in 0.000177s.
Running inference.
without the data that the article useless:-by deepspeech_model-V.0.6.1
Inference took 2.208s for 3.192s audio file.
for sample-000000.wav the correct transcription is “Without the dataset the article is useless”

lissyx · April 8, 2020, 10:52am

You have not answered the question about which common voice release you are referring about. There have been multiples english dataset release.

I don’t know about that specific example, but I have already explained to you that this not unexpected.

Saumya_Mishra · April 9, 2020, 4:53am

I referred version “en_1488h_2019-12-10” of common voice dataset.

lissyx · April 9, 2020, 7:18am

So you expect that a model release from before the common voice release would include the data? No. The 0.6 was trained on previous common voice release.

Mozilla Discourse a écrit :

Saumya_Mishra · April 9, 2020, 8:14am

@lissyx,Please let me know which version of common voice dataset to download for deepspeech 0.6 or please share the document where it is given, it would be really helpful.

Thanks in advance.

lissyx · April 9, 2020, 8:19am

I don’t know what you want exactly … Unfortunately, the Common Voice team does not yet allows to access older version of the dataset through their website.

Could you please clearly state what you are trying to achieve ?

Saumya_Mishra · April 9, 2020, 8:28am

I am sorry for making you confused.

I want the samples in which deepspeech model gives correct predictions without any error in the transcription for a given wave. So,should I have to check it manually by running each sample and checking the transcription?

lissyx · April 9, 2020, 8:31am

You are chasing ghosts here. First, we don’t have such a list. Second, as I already stated, Common Voice is one of the hardest dataset according to several benchmarks. So even on samples that have been on the training set, it would not be surprising there is small errors like the ones you reported.

The error could also be triggered by the language model.

As of now, yes, that looks like the only solution.

It would really be useful if you articulated clearly why you need that.

Saumya_Mishra · April 9, 2020, 8:33am

Thanks for the information and sorry for the inconvenience caused.

othiele · April 9, 2020, 8:35am

You really have a strange request, sounds like you are trying to cheat on a benchmark or what else do you need that for?

Usually people want to know how well DeepSpeech can transcribe language and not find perfect examples … Just overfit a small model yourself and it will detect your phrases perfectly

lissyx · April 9, 2020, 8:35am

So, can we know why you want to do that ? It’s frustrating that you don’t want to share and thus that we cannot help you correctly.

Saumya_Mishra · April 9, 2020, 8:38am

No, it’s not like cheating or anything.I want to perform adversarial attack on this model so, that’s why I want to take 100% correctly recognized samples .

Saumya_Mishra · April 9, 2020, 8:40am

Ya,sure. I want to perform adversarial attack on this model so, that’s why I want to take 100% correctly recognized samples .

lissyx · April 9, 2020, 8:42am

You could have exposed that from the begining, it would have saved everyone a lot of time.

I guess overfitting on a smaller subset of Common Voice would be the most efficient way to achieve this. You would be more in control of the model, and you could thus perform your attack in a reproductible manner.

othiele · April 9, 2020, 8:44am

Exactly, that way you could figure out what works best which is a lot harder to see in the larger model.

Saumya_Mishra · April 9, 2020, 8:47am

Yes I will definitely take care of this before raising an issue from next time .

Topic		Replies	Views
Inaccurate results from 0.9.3 model Common Voice learning	1	364	April 16, 2024
Using common voice datasets? DeepSpeech	5	1072	November 17, 2020
Fine-tuning DeepSpeech Model (CommonVoice-DATA) DeepSpeech	60	6175	August 20, 2019
Model doesn't recognize record audios DeepSpeech	0	387	January 24, 2022
Using pre-trained model DeepSpeech	13	1501	May 11, 2020

"Transcription wrong on giving audio from Mozilla Common voice dataset

Related topics