Hi @dara1400 - sorry, didn’t manage to get onto this yesterday evening, but I’ve managed to get it working and attach the LM (zipped up) plus did a quick video to demo it working.
And the good news is that it’s pretty effective with your list of words/phrases.
I’ll list out some details below (you may well know some of this from your experiments, but hopefully it may help others also trying to do this)
I hope this helps - if you still have issues, post the errors you see (in detail ideally), at which point they occur etc etc and we can try to figure it out from there
Key background:
The aim is to take your input (ie the list of “one”, “two” etc) and produce the two output files lm.binary and trie
input file:
vocabulary.txt – this is the file of phrases that you want your LM to process
output files:
-
words.arpa – used to produce the other outputs, not used directly by DeepSpeech
- lm.binary
- trie
Steps
Install and build KenML
See details here or try a pre-build binary for your distro if one exists
Create a working directory
mkdir working
mkdir working/training_material
mkdir working/language_models
Create the file for your phrases
vocabulary.txt - store it in training_material, with each sentence on its own line
From base of KenML folder
build/bin/lmplz --text working/training_material/vocabulary.txt --arpa working/language_models/words.arpa --order 5 --discount_fallback --temp_prefix /tmp/
note: I had previously also played around using –order of 3, 4 and 5 along with –prune 0 0 0 1 (for order 5)
I don’t recall exactly why I’d used prune, but didn’t seem needed here. However like your earlier attempts I did need –discount_fallback (seemingly as the list of phrases is small)
build/bin/build_binary -T -s trie working/language_models/words.arpa working/language_models/lm.binary
Using generate_trie from native_client
See above point about native_client
/path_to_native_client/generate_trie /path_to_deepspeech/DeepSpeech/data/alphabet.txt working/language_models/lm.binary working/language_models/trie
Testing it out
You’ll need to have installed deepspeech for this part onwards
deepspeech --model /path_to_models/deepspeech-0.5.0-models/output_graph.pbmm --alphabet /path_to_models/deepspeech-0.5.0-models/alphabet.txt --lm working/language_models/lm.binary --trie working/language_models/trie --audio /path_to_test_wavs/p225_27280.wav
Gives output like this (note, my test wav file didn’t have many words from the custom LM, but this shows how it clearly is using the LM):
Loading model from file deepspeech-0.5.0-models/output_graph.pbmm
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.0-alpha.11-0-g1201739
2019-06-15 16:28:50.969519: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-15 16:28:51.046477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-15 16:28:51.046944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.42GiB
2019-06-15 16:28:51.046955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-15 16:28:51.457048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-15 16:28:51.457068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-15 16:28:51.457072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-15 16:28:51.457141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10088 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-06-15 16:28:51.460453: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-06-15 16:28:51.460467: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-06-15 16:28:51.460473: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-06-15 16:28:51.460598: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Loaded model in 0.493s.
Loading language model from files /home/neil/main/Projects/kenlm/working/language_models/lm.binary /home/neil/main/Projects/kenlm/working/language_models/trie
Loaded language model in 0.00601s.
Warning: original sample rate (22050) is different than 16kHz. Resampling might produce erratic speech recognition.
Running inference.
2019-06-15 16:28:51.655464: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
the would the form one how one one the one one
Inference took 2.372s for 6.615s audio file.
Trying it out with the Mic VAD Streaming example
python mic_vad_streaming.py -d 7 -m ../../models/deepspeech-0.5.0-models/ -l /path_to_working/working/language_models/lm.binary -t /path_to_working/working/language_models/trie -w wavs/
(this is using the script here)
custom_lm.zip (2.6 KB)