I am training my own model with portuguese dataset from Mozilla Comon Voice. I am using Colab for that.
Running this line:
!/content/DeepSpeech/bin/import_cv2.py --filter_alphabet /content/DeepSpeech/data/alphabet.txt ‘/content/drive/My Drive/pt-language’
I got this issue:
Loading TSV file: /content/drive/My Drive/pt-language/test.tsv
Importing mp3 files…
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
I read the docs, I found this " Some importers might require additional code to properly handled your locale-specific requirements. Such handling is dealt with --validate_label_locale flag that allows you to source out-of-tree Python script that defines a validate_label function. Please refer to util/importers.py for implementation example of that function. If you don’t provide this argument, the default validate_label function will be used. This one is only intended for English language, so you might have consistency issues in your data for other languages."
For sure it has to do with my issue, because I am not training for English language.
But I don’t understand what I have to do.
I saw the importers.py code, it really had mentions about validate_label, but I don’t know where I should change.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
As documented --validate_label_locale allows you to pass a locale-specific python file containing a function like validate_label to perform cleanup locale-specific.
It worked. I looked for the validate function, find it, then wrote --validate_label_locale /validatefunction.py, as you said and as the docs said.
Now it is training.
Thank you very much!