Inference Model trained with DeepSpeech.py

Hi @lissyx , I trained with the chinese language dataset from “openslr.org/18/”.
My inference model is not predicting the text. I received the following stack trace,

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ ls
alphabet.txt  en_models  output_graph.pb  trie  vocabulart.txt  zh_lm.binary
(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ deepspeech output_graph.pb /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15\:19\:00.wav alphabet.txt zh_lm.binary 
Loading model from file output_graph.pb
2018-05-17 15:04:45.930784: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.157s.
Running inference.
2018-05-17 15:04:46.172535: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-05-17 15:04:46.172591: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
	 [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
Error running session: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
	 [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
None
Inference took 0.090s for 7.970s audio file.

However, the inference model works perfectly fine with the downloaded pre trained model.

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel/en_models$ deepspeech output_graph.pb /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15\:16\:38.wav alphabet.txt lm.binary
Loading model from file output_graph.pb
2018-05-17 14:59:36.900942: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.241s.
Running inference.
for every men i killed field father from home
Inference took 6.228s for 3.560s audio file.

This is mismatching tensorflow versions. Since you trained yourself, you likely installed recent TensorFlow (after r1.5), and the v0.1.1 binaries are r1.4 based and those cannot handle extra info added in 1.5. How did you installed the inference client?

Thanks @lissyx,
yea, I trained with Tensorflow version 1.8.

I used pip install to install client. Do i have to build myself. I think, i should.

Okay, what you can do is pip install --upgrade deepspeech==0.2.0a5: we started publishing alpha releases of npm / python packages, to make it easier. https://pypi.org/project/deepspeech/0.2.0a5/#description

1 Like

Thanks @lissyx, it worked perfectly after I upgrade.

I had trained with pinyin characters, I got the pinyin output.

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ deepspeech output_graph.pb alphabet.txt zh_lm.binary /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15:19:00.wav
Loading model from file output_graph.pb
TensorFlow: v1.6.0-16-gc346f2c
DeepSpeech: v0.2.0-alpha.5-0-g7cc8382
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-05-17 17:33:54.194497: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.016s.
Running inference.
pnenshi4 si3 man2 zh1 er2 wei2 ci4ephei4 zheu1 un3 zin4 hluer4 si4 bi4 ne4 feihu3
Inference took 1.036s for 7.970s audio file.

Okay, and is that correct ? :slight_smile:

I trained with 100 epoch, the WER was 0.103503.
When i predict with dev and test dataset, it almost accurate i.e more than 90% accurate.

But when i predict with noisy data, the prediction is bit off.

Sorry for the ScreenShot.

With noisy data, it’s not surprising. If you can, try to augment your training dataset with noise and noisy data, it should help.

1 Like

Thanks @lissyx, I’ll try that too.

I’ll try to do distributed training first, and will get back to data augmentation. :slight_smile:

Hello, I see you are using pinyin for training, and what is your alphabet looks like, is it [a-z] or each of pinyin based format like [n i3 h ao3]

No, not alphabets of [a-z]. For Chinese ASR model, i tried with several alphabets, pinyin characters, all Chinese alphabets (from a dictionary resources).

Do you mean like [n i3 h ao3] ? I have tried with the alphatbets format like


from the THCHS-30 corpus, but always gives feedback fail like

That error is because your alphabet doesn’t have all the characters that are in the audio transcript or (CSV file) and also the corpus to train language model.

Right now, don’t remember if it was [n i3 h ao3] or actual pinyin, but it should be same characters like that in the transcript.

Yes, I know the failure is ‘i’ not in the alphabet. What I’m trying to say is, what the pinyin format should be in the alphabet. Like your corpus, I am thinking your transcript will be like
gen1 ju4 zhe4 xiang4 xie2 yi4 e2 luo2 si1 chu1 kou3 gu3 ba1 de5 shi2 you2 you2 wei3 nei4 rui4 la1 gong1 ying4 er2 wei3 chu1 kou3 de2 guo2 de5 shi2 you2 you2 e2 guo2 ti2 gong1
And if I just put consonants and vowels in the alphabet, even though the vowels in a single line with the format like ‘ii’ or ‘i1’, but the DeepSpeech just reads the single characters of them, this is the problem.

I’m using the vowels and consonants constructed sentences like
ii i1 x iang1 g ang3 d e5 k ang4 r iz4 j iu4 uu uang2 vv vn4 d ong4 ii i3 j iu4 uu uang2 uu un2 h ua4 vv vn4 d ong4 uu ui2 zh u3 ii iao4 x ing2 sh ix4

If you are using the pinyin of chinese character like ‘xiang1’ or ‘gang3’ in each line of alphabets, I think will face the same problem. You run it successfully, it must the different format. So I’m curious about what your alphabets look like?

It’s been sometime now, don’t exactly remember.
As far as I remember, I converted the THU30 transcript to actual pinyin using nodejs library.
eg: instead [ni3 za1i zuo1 she3 me], i had [Nǐ zài zuò shénme]
And, alphabets were,

Ā
ā
Ē
ē
Ě
ě
ī
1
2
3
4
5

À
B
C
D
F
G
H
È
É
J
K
L
Ō
ō
M
Ǎ
ǎ
N
ǐ
P
Q
ǒ
R
S
ǔ
T
W
X
ǘ
Y
Z
ǚ
ǜ
à
á
a
b
c
d
e
f
g
h
è
i
é
j
k
ū
ì
l
m
í
n
o
p
q
ò
r
s
ó
t
u
v
w
x
y
ù
ú

z
ü

Also, did training with the Chinese characters as well, and used chieses characters from HSK dictionary.

OK, I see. Thank you, it’s really helpful. And I noticed the above picture which you posted, the source sentence src:"gen1 ju4 zhe4 xiang4 xie2 yi4 e2 luo2 si1 chu1 kou3 gu3 ba1 de5 shi2 you2 you2 wei3 nei4 rui4 la1 gong1 ying4 er2 wei3 chu1 kou3 de2 guo2 de5 shi2 you2 you2 e2 guo2 ti2 gong1" is like the tones with number.
Is that means, using the tones with number can also work?

Hello, I have one more question about the language model. Do you have used a pinyin language model for pinyin training. If you have used, how do you train your pinyin language model ?

I used opensource tool called kenLM for training LanguageModel. I think its in the Mozilla Deepspeech’s documentation, that’s the most compatible one for inferencing from Mozilla’a Deepspeech model.

emmm…the transcript of the training data for KenLM is also used in Pinyin? I am a bit curious about the pinyin transcript, because the pinyin transcript is rare in internet.

Yea, the transcript and other data to generate are in pinyin. I used the nodejs tool to convert chinese characters to pinyin.