DEEPSPEECH traning problem data feeding

myrainbowandsky · March 6, 2019, 8:52am

Traceback (most recent call last):
File ./DeepSpeech-0.4.1/util/text.py", line 37, in label_from_string
return self._str_to_label[string]
KeyError: ‘我’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “./DeepSpeech.py”, line 941, in
tf.app.run(main)
File “/home/xu/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “./DeepSpeech.py”, line 893, in main
train()
File “./DeepSpeech.py”, line 388, in train
hdf5_cache_path=FLAGS.train_cached_features_path)
File “/home/xu/DeepSpeech-0.4.1/util/preprocess.py”, line 69, in preprocess
out_data = pmap(step_fn, source_data.iterrows())
File “/home/xu/DeepSpeech-0.4.1/util/preprocess.py”, line 13, in pmap
results = pool.map(fun, iterable)
File “/usr/lib/python3.5/multiprocessing/pool.py”, line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File “/usr/lib/python3.5/multiprocessing/pool.py”, line 608, in get
raise self._value
File “/usr/lib/python3.5/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/usr/lib/python3.5/multiprocessing/pool.py”, line 44, in mapstar
return list(map(*args))
File “/home/xu/DeepSpeech-0.4.1/util/preprocess.py”, line 23, in process_single_file
transcript = text_to_char_array(file.transcript, alphabet)
File “/home/xu/DeepSpeech-0.4.1/util/text.py”, line 68, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/xu/DeepSpeech-0.4.1/util/text.py”, line 68, in
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/xu/DeepSpeech-0.4.1/util/text.py”, line 48, in label_from_string
).with_traceback(e.traceback)
File “/home/xu/DeepSpeech-0.4.1/util/text.py”, line 37, in label_from_string
return self._str_to_label[string]
KeyError: '\n ERROR: You have characters in your transcripts\n which do not occur in your data/alphabet.txt\n file. Please verify that your alphabet.txt\n contains all neccessary characters. Use\n util/check_characters.py to see what characters are in\n your train / dev / test transcripts.\n

The trained csv is in Chinese. I have double checked the words. They are all included. Both alphabet.txt and the csv are coded in utf-8. What happened?

I have checked KeyError in self._str_to_label[string] of DeepSpeech/util/text.py when training own model

but failed.

lissyx · March 6, 2019, 10:04am

Check better, all your CSV.

myrainbowandsky · March 6, 2019, 11:04am

Any other suggestions?

Lissyx via Mozilla Discourse discourse@mozilla-community.org于2019年3月6日周三18:07写道：

lissyx · March 6, 2019, 11:08am

Well, there’s no better suggestion, because I already explained that in the github issue …

myrainbowandsky · March 8, 2019, 8:39am

Yes, You are right. The alphabet.txt should be formatted in a specific method even if all the characters are in it

lissyx · March 8, 2019, 11:00am

I’m not sure which “specific method” you are referring to, can you share more details ? Also it would help others facing the same issue.