Blank inference for transfer learning 2

Boda_Wen · January 14, 2020, 11:17am

Hello everyone,
I am trying to use the branch transfer learning 2 to train a German model with pre-trained English checkpoints of deepspeech release 0.5.1. I followed readme in native_client to build generate_trie and Install Python bindings . The training and testing worded very well, cmd output as following:

Initializing model from /home/ucvis/Dev/DS_transfer_learning/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading global_step
Loading beta1_power
Loading beta2_power
Loading layer_1/bias/Adam
Loading layer_1/bias/Adam_1
Loading layer_1/weights/Adam
Loading layer_1/weights/Adam_1
Loading layer_2/bias/Adam
Loading layer_2/bias/Adam_1
Loading layer_2/weights/Adam
Loading layer_2/weights/Adam_1
Loading layer_3/bias/Adam
Loading layer_3/bias/Adam_1
Loading layer_3/weights/Adam
Loading layer_3/weights/Adam_1
Loading lstm_fused_cell/kernel/Adam
Loading lstm_fused_cell/kernel/Adam_1
Loading lstm_fused_cell/bias/Adam
Loading lstm_fused_cell/bias/Adam_1
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:01:36 | Steps: 201 | Loss: 79.444831                            
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /home/ucvis/SpeechDaEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 27.477463 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 2 | Loss: 28.548362 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 3 | Loss: 31.165366 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 4 | Loss: 32.863958 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 5 | Loss: 33.924326 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 6 | Loss: 34.204820 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 7 | Loss: 34.227403 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 8 | Loss: 34.816249 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 9 | Loss: 36.092936 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 10 | Loss: 35.770137 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 11 | Loss: 36.595213 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 12 | Loss: 37.262180 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 13 | Loss: 37.792134 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 14 | Loss: 38.489527 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 15 | Loss: 39.597140 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 16 | Loss: 41.075090 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 17 | Loss: 42.392093 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 18 | Loss: 44.038131 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 19 | Loss: 45.961456 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 20 | Loss: 48.108334 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 21 | Loss: 51.888667 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 21 | Loss: 51.888667 | Dataset: /home/ucvis/SpeechData/German/csv/Dev/zamia.csv
I Saved new best validating model with loss 51.888667 to: /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/checkouts/2_layer/best_dev-467557
Epoch 1 |   Training | Elapsed Time: 0:00:15 | Steps: 48 | Loss: 35.411974                             ^CI FINISHED optimization in 0:01:57.407523
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from best validation checkpoint at /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/checkouts/2_layer/best_dev-467557, step 467557
Testing model on /home/ucvis/SpeechData/German/csv/test/zamia.csv
Computing acoustic model predictions | Steps: 43 | Elapsed Time: 0:00:07                               
Decoding predictions | 100% (43 of 43) |#########################| Elapsed Time: 0:01:19 Time:  0:01:19
Test on /home/ucvis/SpeechData/German/csv/test/zamia.csv - WER: 0.413294, CER: 0.218274, loss: 53.346436
--------------------------------------------------------------------------------
WER: 2.166667, CER: 51.000000, loss: 228.118179
 - src: "als sie waren das sah ich"
 - res: "werden sie mit orchester vor und schön und milano aus da er an den"
--------------------------------------------------------------------------------
WER: 1.800000, CER: 35.000000, loss: 162.401108
 - src: "ich hoffe dass es ihr"
 - res: "und jetzt noch in zum ersten male wieder sah"
--------------------------------------------------------------------------------
WER: 1.750000, CER: 26.000000, loss: 123.622856
 - src: "ich aber es war"
 - res: "das hat nicht waren sie eine minute"
--------------------------------------------------------------------------------
WER: 1.714286, CER: 55.000000, loss: 291.408875
 - src: "ungewöhnlich selbstbewusst in der tat sie begann"
 - res: "sie sie sagte sie und begann eine lebhafte begleitung auf dem erwiesen"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 16.000000, loss: 52.503277
 - src: "ich merkte sofort"
 - res: "ich machte er vor dass sie"
--------------------------------------------------------------------------------
WER: 1.571429, CER: 42.000000, loss: 147.998016
 - src: "tiefblau wie zur dämmerzeit in den himmel"
 - res: "sie war wie zu der martin den himmel in ein attest usl eine frau"
--------------------------------------------------------------------------------
WER: 1.545455, CER: 83.000000, loss: 324.947968
 - src: "landedelmann steht vor ihrem sofa mit der tasse in der hand"
 - res: "er hat den ich lebe bei beschreiben vergessen habe eine große und bündnerland der mann teveren sofa mit der e in der hand"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 12.000000, loss: 26.731617
 - src: "bewachen mich armes waisenkind"
 - res: "der sache ich es weisen in"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 10.000000, loss: 39.092022
 - src: "warum unausgelastet"
 - res: "nun auf geleaste"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 29.000000, loss: 69.979065
 - src: "lackierte schnupftabaksdosen und brillenfutterale"
 - res: "lage des notars und n furter alle"
--------------------------------------------------------------------------------
I Exporting the model...
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/models/2_layer

After that, I executed inference with a German audio which is 16 bit, 1 channel, 16000 rate and contains 571 words. But I got blank inference:

Loading model from file /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/models/2_layer/output_graph.pb
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.0-0-g3db7a99
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-01-14 11:43:12.994098: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-14 11:43:13.108409: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2020-01-14 11:43:13.108447: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2020-01-14 11:43:13.108453: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2020-01-14 11:43:13.108532: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Loaded model in 0.116s.
Loading language model from files /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/lm_German/lm_training_and_text_5gram/lm.binary /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/lm_German/lm_training_and_text_5gram/trie_new
Loaded language model in 0.648s.
Running inference.

Inference took 141.306s for 373.301s audio file.

I considered some reasons of the issue.

Poor training data. But as test output above shows that the model is sufficient to make inference.
The language model and deepspeech version is inconsistent. I checked that the transfer learning 2 is used branch deepspeech 0.5.0a6, so I downloaded native_client files from link: https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.0-alpha.6.
And used generate_trie to generate a new tire and test again, but I got blank inference yet.
Can someone help me how can I deal with the problem?

lissyx · January 14, 2020, 11:16am

Why don’t you rely on published wheels and binaries ?

This is barely readable. Please use proper code formatting.

I don’t see a command line for running inference here. Version is 0.5.0 instead of 0.5.1, that’s inconsistent.

transfer-learning2 branch is not yet merged, so really only @josh_meyer can shed light here.

lissyx · January 14, 2020, 11:18am

Where is the data coming from ? Can you make sure it’s a non-broken file ?
Can you make sure the volume is enough ?

Boda_Wen · January 14, 2020, 12:01pm

I am not sure which version should I rely on, I tried published wheels and binaries from version 0.5.0a6 but it did not work.

Sorry about that. But I do not know how to use or which button is for code format

Surprisingly, the path to alphabet.txt is incorrect, that why I had blank inference. After correction, I got result. But I did not get any warnings or error messages about the wrong path, if there are this information, that will be helpful.

But thank you lot for reply

lissyx · January 14, 2020, 12:15pm

Honestly, you’re running an old, alpha build, I’m not sure there’s anything we can do here. If you can reproduce that on latest 0.6.1, it’d be interesting.

It would also be easier to help if you shared your command line for running inference.

Boda_Wen · January 14, 2020, 12:46pm

The command line for inference
python native_client/python/dist/deepspeech/client.py --model /Path_to_model /output_graph.pb --lm /Path_to_lm.binary/lm.binary --trie /Path_to_trie/trie --audio /Path_to_audio/Vorwort_16_16.wav --alphabet Path_to_alphabet/alphabet.txt

As I said, I got inference after correction of Path of alphabet.txt

Loaded language model in 0.741s.
Running inference.
vorles der sich aus regte im materien kulturellen erbes
Inference took 4.702s for 6.408s audio file.

lissyx · January 14, 2020, 12:49pm

That’s … weird. You should not use it like that, rather deepspeech directly.

That’s why we changed the API to fold the alphabet as a metadata of the model itself.

Boda_Wen · January 14, 2020, 1:12pm

I also tried following command line to run inference:
python native_client/python/client.py succeed
python native_client/python/dist/deepspeech/client.py succeed
deepspeech succeed
I installed Python bindings, so that I can use native_client/python/dist/deepspeech/client.py

lissyx · January 14, 2020, 1:27pm

Nothing documents to run those, and I strongly advise against.

We only support this code when ran as deepspeech CLI, not directly like you did: there could be side-effects from environment.