Hello everyone,
I am trying to use the branch transfer learning 2 to train a German model with pre-trained English checkpoints of deepspeech release 0.5.1. I followed readme in native_client to build generate_trie and Install Python bindings . The training and testing worded very well, cmd output as following:
Initializing model from /home/ucvis/Dev/DS_transfer_learning/deepspeech-0.5.1-checkpoint
Loading layer_1/bias
Loading layer_1/weights
Loading layer_2/bias
Loading layer_2/weights
Loading layer_3/bias
Loading layer_3/weights
Loading lstm_fused_cell/kernel
Loading lstm_fused_cell/bias
Loading global_step
Loading beta1_power
Loading beta2_power
Loading layer_1/bias/Adam
Loading layer_1/bias/Adam_1
Loading layer_1/weights/Adam
Loading layer_1/weights/Adam_1
Loading layer_2/bias/Adam
Loading layer_2/bias/Adam_1
Loading layer_2/weights/Adam
Loading layer_2/weights/Adam_1
Loading layer_3/bias/Adam
Loading layer_3/bias/Adam_1
Loading layer_3/weights/Adam
Loading layer_3/weights/Adam_1
Loading lstm_fused_cell/kernel/Adam
Loading lstm_fused_cell/kernel/Adam_1
Loading lstm_fused_cell/bias/Adam
Loading lstm_fused_cell/bias/Adam_1
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:01:36 | Steps: 201 | Loss: 79.444831
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /home/ucvis/SpeechDaEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 27.477463 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 2 | Loss: 28.548362 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 3 | Loss: 31.165366 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 4 | Loss: 32.863958 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 5 | Loss: 33.924326 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 6 | Loss: 34.204820 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 7 | Loss: 34.227403 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 8 | Loss: 34.816249 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 9 | Loss: 36.092936 | Dataset: /home/ucvis/SpeechDEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 10 | Loss: 35.770137 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 11 | Loss: 36.595213 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 12 | Loss: 37.262180 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 13 | Loss: 37.792134 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 14 | Loss: 38.489527 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 15 | Loss: 39.597140 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 16 | Loss: 41.075090 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 17 | Loss: 42.392093 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 18 | Loss: 44.038131 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 19 | Loss: 45.961456 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 20 | Loss: 48.108334 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 21 | Loss: 51.888667 | Dataset: /home/ucvis/SpeechEpoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 21 | Loss: 51.888667 | Dataset: /home/ucvis/SpeechData/German/csv/Dev/zamia.csv
I Saved new best validating model with loss 51.888667 to: /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/checkouts/2_layer/best_dev-467557
Epoch 1 | Training | Elapsed Time: 0:00:15 | Steps: 48 | Loss: 35.411974 ^CI FINISHED optimization in 0:01:57.407523
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from best validation checkpoint at /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/checkouts/2_layer/best_dev-467557, step 467557
Testing model on /home/ucvis/SpeechData/German/csv/test/zamia.csv
Computing acoustic model predictions | Steps: 43 | Elapsed Time: 0:00:07
Decoding predictions | 100% (43 of 43) |#########################| Elapsed Time: 0:01:19 Time: 0:01:19
Test on /home/ucvis/SpeechData/German/csv/test/zamia.csv - WER: 0.413294, CER: 0.218274, loss: 53.346436
--------------------------------------------------------------------------------
WER: 2.166667, CER: 51.000000, loss: 228.118179
- src: "als sie waren das sah ich"
- res: "werden sie mit orchester vor und schön und milano aus da er an den"
--------------------------------------------------------------------------------
WER: 1.800000, CER: 35.000000, loss: 162.401108
- src: "ich hoffe dass es ihr"
- res: "und jetzt noch in zum ersten male wieder sah"
--------------------------------------------------------------------------------
WER: 1.750000, CER: 26.000000, loss: 123.622856
- src: "ich aber es war"
- res: "das hat nicht waren sie eine minute"
--------------------------------------------------------------------------------
WER: 1.714286, CER: 55.000000, loss: 291.408875
- src: "ungewöhnlich selbstbewusst in der tat sie begann"
- res: "sie sie sagte sie und begann eine lebhafte begleitung auf dem erwiesen"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 16.000000, loss: 52.503277
- src: "ich merkte sofort"
- res: "ich machte er vor dass sie"
--------------------------------------------------------------------------------
WER: 1.571429, CER: 42.000000, loss: 147.998016
- src: "tiefblau wie zur dämmerzeit in den himmel"
- res: "sie war wie zu der martin den himmel in ein attest usl eine frau"
--------------------------------------------------------------------------------
WER: 1.545455, CER: 83.000000, loss: 324.947968
- src: "landedelmann steht vor ihrem sofa mit der tasse in der hand"
- res: "er hat den ich lebe bei beschreiben vergessen habe eine große und bündnerland der mann teveren sofa mit der e in der hand"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 12.000000, loss: 26.731617
- src: "bewachen mich armes waisenkind"
- res: "der sache ich es weisen in"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 10.000000, loss: 39.092022
- src: "warum unausgelastet"
- res: "nun auf geleaste"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 29.000000, loss: 69.979065
- src: "lackierte schnupftabaksdosen und brillenfutterale"
- res: "lage des notars und n furter alle"
--------------------------------------------------------------------------------
I Exporting the model...
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /home/ucvis/miniconda3/envs/transfer_learning/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/models/2_layer
After that, I executed inference with a German audio which is 16 bit, 1 channel, 16000 rate and contains 571 words. But I got blank inference:
Loading model from file /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/results/models/2_layer/output_graph.pb
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.0-0-g3db7a99
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-01-14 11:43:12.994098: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-14 11:43:13.108409: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2020-01-14 11:43:13.108447: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2020-01-14 11:43:13.108453: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2020-01-14 11:43:13.108532: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Loaded model in 0.116s.
Loading language model from files /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/lm_German/lm_training_and_text_5gram/lm.binary /home/ucvis/Dev/DS_transfer_learning/DeepSpeech/data/lm_German/lm_training_and_text_5gram/trie_new
Loaded language model in 0.648s.
Running inference.
Inference took 141.306s for 373.301s audio file.
I considered some reasons of the issue.
- Poor training data. But as test output above shows that the model is sufficient to make inference.
- The language model and deepspeech version is inconsistent. I checked that the transfer learning 2 is used branch deepspeech 0.5.0a6, so I downloaded native_client files from link: https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.0-alpha.6.
And used generate_trie to generate a new tire and test again, but I got blank inference yet.
Can someone help me how can I deal with the problem?