I installed the requirements from setup.py.
What I am trying to do is to use Chinese pretrained model from Here, and evalute the pretrained model based on my own audios.
The test.csv file is like this,
Then I run the command:
CUDA_VISIBLE_DEVICES=7 python evaluate.py --scorer_path deepspeech-0.9.3-checkpoint/deepspeech-0.9.3-models-zh-CN.scorer --test_files data/test.csv --checkpoint_dir deepspeech-0.9.3-checkpoint/
Then, I got the Error,
ValueError: Alphabet cannot encode transcript “哈哈” while processing sample “data/audios/real.wav”, check that your alphabet contains all characters in the training corpus. Missing characters are: [‘哈’, ‘哈’].
I do not know the exact reason. The alphabet I use might be for English training and inference. Although the pretrained Chinese Model could be fetched, I have no idea where could I get the Chinese Alphabet.
Thank you.
Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case.