lm::FormatLoadException Error on SageMaker with Persistent Environment

Hey all,

I have been using my own trained model for inference on google colab for a while now. However, I now need to move everything into a sagemaker notebook. I created a fresh persistent conda environment (miniconda env) with python 3.7.

Here are the steps I took:

  1. installed tensorflow 2.3.0 into the conda env
  2. installed deepspeech into the conda env with %pip install deepspeech
  3. when I ran the script (vad_transcriber) it would say no module called “deepspeech”
  4. So I installed deepspeech outside of the env with !pip install deepspeech
  5. Now when I run it, I see this

Pasting the error here also:

DEBUG:root:Transcribing audio file @ first_6.wav
DEBUG:root:Found Model: speech_model/tedlium_checkpoint.pbmm
DEBUG:root:Found scorer: speech_model/tedlium_model.scorer
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2020-12-16 23:18:57.797780: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
DEBUG:root:Loaded model in 0.015s.
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  native_client/kenlm/lm/binary_format.cc:160 in void* lm::ngram::BinaryFormat::LoadBinary(std::size_t) threw FormatLoadException because `file_size != util::kBadSize && file_size < total_map'.

Binary file has size 14680064 but the headers say it should be at least 941209108

I am quite new to using deepspeech so if anyone has any insight that would be amazing.

Please no images, stick to the guidelines.

  1. Don’t use conda, but a virtual environment like colab.

  2. The error msg indicates that you didn’t download the whole scorer. You have only parts of it.