Hi,
I have been following DeepSpeech documentation in order to build my own scorer. After implementing this blocks of code
cd data/lm
python3 generate_lm.py --input_txt vocabulary.txt --output_dir .
–top_k 1500 --kenlm_bins path/to/kenlm/build/bin/
–arpa_order 3 --max_arpa_memory “50%” --arpa_prune “0|0|1”
–binary_a_bits 255 --binary_q_bits 8 --binary_type trie
cd data/lm
Download and extract appropriate native_client package:
curl -LO http://github.com/mozilla/DeepSpeech/releases/…
tar xvf native_client.*.tar.xz
./generate_scorer_package --alphabet …/alphabet.txt --lm lm.binary --vocab vocab-1500.txt
–package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
I get the following errors:
Doesn’t look like a character based (Bytes Are All You Need) model.
–force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
Error loading language model file: Invalid magic in trie header.
I want to mention that I use a different alphabet that also contains other characters besides english characters.