No --validate_label_locale specified, your might end with inconsistent dataset

Hello

I am training my own model with portuguese dataset from Mozilla Comon Voice. I am using Colab for that.

Running this line:
!/content/DeepSpeech/bin/import_cv2.py --filter_alphabet /content/DeepSpeech/data/alphabet.txt ‘/content/drive/My Drive/pt-language’

I got this issue:
Loading TSV file: /content/drive/My Drive/pt-language/test.tsv
Importing mp3 files…
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.

Can someone help me, please?
Thanks.

Please search before you post, you are not the first to get that message …

@Alan_Belem As @othiele said, please make some efforts and search: https://deepspeech.readthedocs.io/en/v0.8.2/search.html?q=validate_label_locale&check_keywords=yes&area=default

We have docs, but if you dont read them or people don’t give feedback like “i searched the docs and have not found” , we can’t improve the project.

1 Like

You are right.

I read the docs, I found this " Some importers might require additional code to properly handled your locale-specific requirements. Such handling is dealt with --validate_label_locale flag that allows you to source out-of-tree Python script that defines a validate_label function. Please refer to util/importers.py for implementation example of that function. If you don’t provide this argument, the default validate_label function will be used. This one is only intended for English language, so you might have consistency issues in your data for other languages."

For sure it has to do with my issue, because I am not training for English language.
But I don’t understand what I have to do.
I saw the importers.py code, it really had mentions about validate_label, but I don’t know where I should change.

As documented --validate_label_locale allows you to pass a locale-specific python file containing a function like validate_label to perform cleanup locale-specific.

It worked. I looked for the validate function, find it, then wrote --validate_label_locale /validatefunction.py, as you said and as the docs said.
Now it is training.
Thank you very much!

Obs: For who have the same issue, “validationfunction.py” was just an eg. For me the name was “validate_locale_fra.py”

1 Like