Hi @dara1400 - sorry, didn’t manage to get onto this yesterday evening, but I’ve managed to get it working and attach the LM (zipped up) plus did a quick video to demo it working.
And the good news is that it’s pretty effective with your list of words/phrases.
I’ll list out some details below (you may well know some of this from your experiments, but hopefully it may help others also trying to do this)
I hope this helps - if you still have issues, post the errors you see (in detail ideally), at which point they occur etc etc and we can try to figure it out from there
Key background:
- is worth refering to here for some basic detail on the LM: DeepSpeech/data/lm at master · mozilla/DeepSpeech · GitHub
- do not make mistake of following the KenLM BUILDING instructions in DeepSpeech/native_client/kenlm at master · mozilla/DeepSpeech · GitHub (not required)
- you can (+ should) use the official KenLM repo (this doesn’t seem to be covered in requirements, but would be a useful PR)
- these steps are v. similar to what’s shown in here, it’s simply that I’ve tested them with the 0.5.0 model: TUTORIAL : How I trained a specific french model to control my robot
- you’ll need to have downloaded the relevant native client tar file for your environment (for me that was native_client.amd64.cuda.linux.tar.xz) and use generate_trie from there OR build it (this will be more complex and I didn’t go this route for speed)
The aim is to take your input (ie the list of “one”, “two” etc) and produce the two output files lm.binary and trie
input file:
vocabulary.txt – this is the file of phrases that you want your LM to process
output files:
- words.arpa – used to produce the other outputs, not used directly by DeepSpeech
- lm.binary
- trie
Steps
Install and build KenML
See details here or try a pre-build binary for your distro if one exists
Create a working directory
mkdir working
mkdir working/training_material
mkdir working/language_models
Create the file for your phrases
vocabulary.txt - store it in training_material, with each sentence on its own line
From base of KenML folder
build/bin/lmplz --text working/training_material/vocabulary.txt --arpa working/language_models/words.arpa --order 5 --discount_fallback --temp_prefix /tmp/
note: I had previously also played around using –order of 3, 4 and 5 along with –prune 0 0 0 1 (for order 5)
I don’t recall exactly why I’d used prune, but didn’t seem needed here. However like your earlier attempts I did need –discount_fallback (seemingly as the list of phrases is small)
build/bin/build_binary -T -s trie working/language_models/words.arpa working/language_models/lm.binary
Using generate_trie from native_client
See above point about native_client
/path_to_native_client/generate_trie /path_to_deepspeech/DeepSpeech/data/alphabet.txt working/language_models/lm.binary working/language_models/trie
Testing it out
You’ll need to have installed deepspeech for this part onwards
deepspeech --model /path_to_models/deepspeech-0.5.0-models/output_graph.pbmm --alphabet /path_to_models/deepspeech-0.5.0-models/alphabet.txt --lm working/language_models/lm.binary --trie working/language_models/trie --audio /path_to_test_wavs/p225_27280.wav
Gives output like this (note, my test wav file didn’t have many words from the custom LM, but this shows how it clearly is using the LM):
Loading model from file deepspeech-0.5.0-models/output_graph.pbmm TensorFlow: v1.13.1-10-g3e0cc53 DeepSpeech: v0.5.0-alpha.11-0-g1201739 2019-06-15 16:28:50.969519: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-15 16:28:51.046477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-15 16:28:51.046944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:01:00.0 totalMemory: 10.91GiB freeMemory: 10.42GiB 2019-06-15 16:28:51.046955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-15 16:28:51.457048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-15 16:28:51.457068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-06-15 16:28:51.457072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-06-15 16:28:51.457141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10088 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-06-15 16:28:51.460453: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant 2019-06-15 16:28:51.460467: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant 2019-06-15 16:28:51.460473: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant 2019-06-15 16:28:51.460598: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant Loaded model in 0.493s. Loading language model from files /home/neil/main/Projects/kenlm/working/language_models/lm.binary /home/neil/main/Projects/kenlm/working/language_models/trie Loaded language model in 0.00601s. Warning: original sample rate (22050) is different than 16kHz. Resampling might produce erratic speech recognition. Running inference. 2019-06-15 16:28:51.655464: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally the would the form one how one one the one one Inference took 2.372s for 6.615s audio file.
Trying it out with the Mic VAD Streaming example
python mic_vad_streaming.py -d 7 -m ../../models/deepspeech-0.5.0-models/ -l /path_to_working/working/language_models/lm.binary -t /path_to_working/working/language_models/trie -w wavs/
(this is using the script here)
custom_lm.zip (2.6 KB)