Inference Model trained with DeepSpeech.py

jageshmaharjan · May 17, 2018, 7:17am

Hi @lissyx , I trained with the chinese language dataset from “openslr.org/18/”.
My inference model is not predicting the text. I received the following stack trace,

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ ls
alphabet.txt  en_models  output_graph.pb  trie  vocabulart.txt  zh_lm.binary
(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ deepspeech output_graph.pb /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15\:19\:00.wav alphabet.txt zh_lm.binary 
Loading model from file output_graph.pb
2018-05-17 15:04:45.930784: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.157s.
Running inference.
2018-05-17 15:04:46.172535: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-05-17 15:04:46.172591: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
	 [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
Error running session: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
	 [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
None
Inference took 0.090s for 7.970s audio file.

However, the inference model works perfectly fine with the downloaded pre trained model.

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel/en_models$ deepspeech output_graph.pb /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15\:16\:38.wav alphabet.txt lm.binary
Loading model from file output_graph.pb
2018-05-17 14:59:36.900942: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.241s.
Running inference.
for every men i killed field father from home
Inference took 6.228s for 3.560s audio file.

lissyx · May 17, 2018, 7:21am

This is mismatching tensorflow versions. Since you trained yourself, you likely installed recent TensorFlow (after r1.5), and the v0.1.1 binaries are r1.4 based and those cannot handle extra info added in 1.5. How did you installed the inference client?

jageshmaharjan · May 17, 2018, 9:05am

Thanks @lissyx,
yea, I trained with Tensorflow version 1.8.

I used pip install to install client. Do i have to build myself. I think, i should.

lissyx · May 17, 2018, 9:10am

Okay, what you can do is pip install --upgrade deepspeech==0.2.0a5: we started publishing alpha releases of npm / python packages, to make it easier. https://pypi.org/project/deepspeech/0.2.0a5/#description

jageshmaharjan · May 17, 2018, 9:36am

Thanks @lissyx, it worked perfectly after I upgrade.

I had trained with pinyin characters, I got the pinyin output.

(asr) shenzhen@shenzhen:~/Desktop/zh_servermodel$ deepspeech output_graph.pb alphabet.txt zh_lm.binary /home/shenzhen/Desktop/Take2_data/Jugs/2017-11-02-15:19:00.wav
Loading model from file output_graph.pb
TensorFlow: v1.6.0-16-gc346f2c
DeepSpeech: v0.2.0-alpha.5-0-g7cc8382
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-05-17 17:33:54.194497: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.016s.
Running inference.
pnenshi4 si3 man2 zh1 er2 wei2 ci4ephei4 zheu1 un3 zin4 hluer4 si4 bi4 ne4 feihu3
Inference took 1.036s for 7.970s audio file.

lissyx · May 17, 2018, 9:46am

Okay, and is that correct ?

jageshmaharjan · May 17, 2018, 10:56am

I trained with 100 epoch, the WER was 0.103503.
When i predict with dev and test dataset, it almost accurate i.e more than 90% accurate.

But when i predict with noisy data, the prediction is bit off.

Sorry for the ScreenShot.

lissyx · May 17, 2018, 11:00am

With noisy data, it’s not surprising. If you can, try to augment your training dataset with noise and noisy data, it should help.

jageshmaharjan · May 17, 2018, 11:03am

Thanks @lissyx, I’ll try that too.

I’ll try to do distributed training first, and will get back to data augmentation.

cekhwang · September 21, 2019, 9:25am

Hello, I see you are using pinyin for training, and what is your alphabet looks like, is it [a-z] or each of pinyin based format like [n i3 h ao3]

jageshmaharjan · September 21, 2019, 9:49am

No, not alphabets of [a-z]. For Chinese ASR model, i tried with several alphabets, pinyin characters, all Chinese alphabets (from a dictionary resources).

cekhwang · September 21, 2019, 10:08am

Do you mean like [n i3 h ao3] ? I have tried with the alphatbets format like

from the THCHS-30 corpus, but always gives feedback fail like

jageshmaharjan · September 21, 2019, 10:17am

That error is because your alphabet doesn’t have all the characters that are in the audio transcript or (CSV file) and also the corpus to train language model.

Right now, don’t remember if it was [n i3 h ao3] or actual pinyin, but it should be same characters like that in the transcript.

cekhwang · September 21, 2019, 10:41am

Yes, I know the failure is ‘i’ not in the alphabet. What I’m trying to say is, what the pinyin format should be in the alphabet. Like your corpus, I am thinking your transcript will be like
gen1 ju4 zhe4 xiang4 xie2 yi4 e2 luo2 si1 chu1 kou3 gu3 ba1 de5 shi2 you2 you2 wei3 nei4 rui4 la1 gong1 ying4 er2 wei3 chu1 kou3 de2 guo2 de5 shi2 you2 you2 e2 guo2 ti2 gong1
And if I just put consonants and vowels in the alphabet, even though the vowels in a single line with the format like ‘ii’ or ‘i1’, but the DeepSpeech just reads the single characters of them, this is the problem.

I’m using the vowels and consonants constructed sentences like
ii i1 x iang1 g ang3 d e5 k ang4 r iz4 j iu4 uu uang2 vv vn4 d ong4 ii i3 j iu4 uu uang2 uu un2 h ua4 vv vn4 d ong4 uu ui2 zh u3 ii iao4 x ing2 sh ix4

If you are using the pinyin of chinese character like ‘xiang1’ or ‘gang3’ in each line of alphabets, I think will face the same problem. You run it successfully, it must the different format. So I’m curious about what your alphabets look like?

jageshmaharjan · September 21, 2019, 11:11am

It’s been sometime now, don’t exactly remember.
As far as I remember, I converted the THU30 transcript to actual pinyin using nodejs library.
eg: instead [ni3 za1i zuo1 she3 me], i had [Nǐ zài zuò shénme]
And, alphabets were,

Ā
ā
Ē
ē
Ě
ě
ī
1
2
3
4
5

À
B
C
D
F
G
H
È
É
J
K
L
Ō
ō
M
Ǎ
ǎ
N
ǐ
P
Q
ǒ
R
S
ǔ
T
W
X
ǘ
Y
Z
ǚ
ǜ
à
á
a
b
c
d
e
f
g
h
è
i
é
j
k
ū
ì
l
m
í
n
o
p
q
ò
r
s
ó
t
u
v
w
x
y
ù
ú

z
ü
’

Also, did training with the Chinese characters as well, and used chieses characters from HSK dictionary.

cekhwang · September 21, 2019, 11:25am

OK, I see. Thank you, it’s really helpful. And I noticed the above picture which you posted, the source sentence src:"gen1 ju4 zhe4 xiang4 xie2 yi4 e2 luo2 si1 chu1 kou3 gu3 ba1 de5 shi2 you2 you2 wei3 nei4 rui4 la1 gong1 ying4 er2 wei3 chu1 kou3 de2 guo2 de5 shi2 you2 you2 e2 guo2 ti2 gong1" is like the tones with number.
Is that means, using the tones with number can also work?

cekhwang · October 16, 2019, 8:35am

Hello, I have one more question about the language model. Do you have used a pinyin language model for pinyin training. If you have used, how do you train your pinyin language model ?

jageshmaharjan · October 16, 2019, 8:39am

I used opensource tool called kenLM for training LanguageModel. I think its in the Mozilla Deepspeech’s documentation, that’s the most compatible one for inferencing from Mozilla’a Deepspeech model.

cekhwang · October 16, 2019, 8:59am

emmm…the transcript of the training data for KenLM is also used in Pinyin? I am a bit curious about the pinyin transcript, because the pinyin transcript is rare in internet.

jageshmaharjan · October 16, 2019, 9:02am

Yea, the transcript and other data to generate are in pinyin. I used the nodejs tool to convert chinese characters to pinyin.

Topic		Replies	Views
Pretrained Chinese Model Invalid Inference Output DeepSpeech	5	572	March 24, 2021
Inference prediction with own trained model DeepSpeech	9	1416	September 19, 2018
Cannot find deepspeech binaries compatible with tensorflow 1.6 to run inference on a trained model DeepSpeech	2	983	April 20, 2018
Deep Speech model prediction doesn't appear any text result DeepSpeech learning	2	612	July 22, 2019
Train your own Chinese model, no content output for inference DeepSpeech learning	6	595	April 29, 2021

Inference Model trained with DeepSpeech.py

Ā ā Ē ē Ě ě ī 1 2 3 4 5

Related topics

Ā
ā
Ē
ē
Ě
ě
ī
1
2
3
4
5