Error in testing model on test.csv when training zh-CN(chInese model)

877980636 · March 18, 2021, 2:07pm

Have I written custom code (as opposed to running examples on an unmodified clone of the repository) : No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04) : Linux Ubuntu 18.04.1 LTS
TensorFlow installed from (our builds, or upstream TensorFlow) : Using Docker
TensorFlow version (use command below) : Using Docker
Python version : Using Docker
Bazel version (if compiling from source) : Docker
GCC/Compiler version (if compiling from source) : Docker
CUDA/cuDNN version : Docker
GPU model and memory : GeForce GTX 1650/PCIe/SSE2
Exact command to reproduce : Provided below

This is my command

root@e658b51810f6:/DeepSpeech# python3 DeepSpeech.py
–train_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/train.csv
–dev_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/dev.csv
–test_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/test.csv
–checkpoint_dir deepspeech-data/checkpoints --export_dir deepspeech-data/exported-model --n_hidden 256 --reduce_lr_on_plateau true --plateau_epochs 8 --plateau_reduction 0.08 --early_stop true --es_epochs 10 --es_min_delta 0.06 --dropout_rate 0.4 --bytes_output_mode --automatic_mixed_precision --train_batch_size 128 --dev_batch_size 128 --test_batch_size 128 --lm_alpha 0.6940122363709647 --lm_beta 4.777924224113021 --epochs 1

the logs of the error recieved

Testing model on deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00 Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 300, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 251, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/training/deepspeech_training/train.py”, line 958, in main
test()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 682, in test
samples = evaluate(FLAGS.test_files.split(’,’), create_model)
File “/DeepSpeech/training/deepspeech_training/evaluate.py”, line 132, in evaluate
samples.extend(run_test(init_op, dataset=csv))
File “/DeepSpeech/training/deepspeech_training/evaluate.py”, line 114, in run_test
cutoff_prob=FLAGS.cutoff_prob, cutoff_top_n=FLAGS.cutoff_top_n)
File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 228, in ctc_beam_search_decoder_batch
for beam_results in batch_beam_results
File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 228, in
for beam_results in batch_beam_results
File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 227, in
[(res.confidence, alphabet.Decode(res.tokens)) for res in beam_results]
File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 138, in Decode
return res.decode(‘utf-8’)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe5 in position 0: invalid continuation byte

After I searched ,i found the solution from @ yang_jiao

change ds_ctcdecoder init .py decode function
def Decode(self, input):
‘’‘Decode a sequence of labels into a string.’’’
res = super(UTF8Alphabet, self).Decode(input)
return res.decode(‘utf-8’,‘ignore’)

But when I want to revise the file in a container of docker , i find that the file is empty

root@e658b51810f6:/DeepSpeech# cd /usr/local/lib/python3.6/dist-packages/ds_ctcdecoder
root@e658b51810f6:/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder# vim init.py

How can i solve the problem?

BTW, another solution is using sorcer
So i use the zh-CN sorcer

root@e658b51810f6:/DeepSpeech# python3 DeepSpeech.py --train_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/train.csv --dev_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/dev.csv --test_files deepspeech-data/cv-corpus-6.1-2020-12-11/zh-CN/clips/test.csv --checkpoint_dir deepspeech-data/checkpoints --export_dir deepspeech-data/exported-model --n_hidden 256 --reduce_lr_on_plateau true --plateau_epochs 8 --plateau_reduction 0.08 --early_stop true --es_epochs 10 --es_min_delta 0.06 --dropout_rate 0.4 --bytes_output_mode --automatic_mixed_precision --train_batch_size 128 --dev_batch_size 128 --test_batch_size 128 --lm_alpha 0.6940122363709647 --lm_beta 4.777924224113021 --epochs 1
–scorer_path deepspeech-data
–scorer deepspeech-0.9.3-models-zh-CN.scorer

The error i received

Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 300, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 251, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/training/deepspeech_training/train.py”, line 949, in main
early_training_checks()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 934, in early_training_checks
FLAGS.scorer_path, Config.alphabet)
File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 36, in init
raise ValueError(‘Scorer initialization failed with error code 0x{:X}’.format(err))
ValueError: Scorer initialization failed with error code 0x2005

lissyx · March 18, 2021, 2:50pm

Can you fix your console output and use proper formatting for easing reading?

877980636 · March 18, 2021, 2:52pm

fine…
Scorer problem happened because of the wrong socer path
The right path is

–scorer_path deepspeech-data/deepspeech-0.9.3-models-zh-CN.scorer

But I still don’t understand why the init file is empty

lissyx · March 18, 2021, 2:54pm

This is likely the wrong solution.

I don’t know

From https://github.com/mozilla/DeepSpeech/blob/7450e5763b2af8f6804205c60b8fd9a0b4cec7db/native_client/ctcdecode/init.py#L91-L138 you need to explain how you produced the scorer, because it seems it’s just wrong.

877980636 · March 18, 2021, 2:59pm

Thank you for your answer
I just succeed in using scorer , the problem of UnicodeDecodeError can be solved.

appreciate for your time

lissyx · March 18, 2021, 3:01pm

I’m not sure I get your point here: have you figured out the problem? Or are you still trying to fix?

If it’s still needing a fix, then you need to document how you built your scorer.

877980636 · March 19, 2021, 1:31pm

I am still trying to fix.
I tried using the deepspeech-0.9.3-models-zh-CN.scorer which is provided by the DeepSpeech github.
But sometimes it works, sometimes it will cause other errors.
For example:

File “/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/init.py”, line 138, in Decode
return res.decode(‘utf-8’)
UnicodeDecodeError: ‘utf-8’ codec can’t decode bytes in position 15-16: unexpected end of data

Above is the error i can’t fix

I think the error is about the scorer which i should build by myself.
So what dataset should i use to build my scorer?

lissyx · March 19, 2021, 4:14pm

As i said please share your build steps.

chenminghui0927 · April 23, 2021, 2:51am

It should help you.