KeyError in self._str_to_label[string] of DeepSpeech/util/text.py when training own model


(Matti Meikäläinen) #1

Hi!

I am training my own model on Ubuntu as described in TUTORIAL : How I trained a specific french model to control my robot

However, I get an error:

WARNING: libdeepspeech failed to load, resorting to deprecated code
Refer to README.md for instructions on installing libdeepspeech

Exception in thread Thread-3:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’k’

Exception in thread Thread-4:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’l’

Exception in thread Thread-5:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’t’

Exception in thread Thread-6:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’o’
Exception in thread Thread-7:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’t’

Exception in thread Thread-8:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 801, in __bootstrap_inner
self.run()
File “/usr/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “/home/ubuntu/DeepSpeech/util/feeding.py”, line 148, in _populate_batch_queue
target = text_to_char_array(transcript, self._alphabet)
File “/home/ubuntu/DeepSpeech/util/text.py”, line 40, in text_to_char_array
return np.asarray([alphabet.label_from_string© for c in original])
File “/home/ubuntu/DeepSpeech/util/text.py”, line 30, in label_from_string
return self._str_to_label[string]
KeyError: u’t’

If I understood correctly, some characters are not present in the dirctionary. But I don’t understand why as ‘o’, ‘t’, ‘l’ and other it complains about are present in alphabet.txt.

Any ideas where it could go wrong?


(Lissyx) #2

I do remember hitting and debugging the same issue already in the past, and the trick was that the characters were present in alphabet.txt but under different UTF-8 codes than in the transcript. So, don’t trust your eyes.


(Matti Meikäläinen) #3

Yeah, you are right. It was the issue!