Libctc_decoder_with_kenlm need with version 0.4.1-0

pete · May 17, 2019, 2:27pm

Hello,

My versions are:

TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6

I have trained my own model, but getting confusing results:
Evaluating on one test file (when last epoch is finished) I get decent results, or good enough anyway, but when I do same inference using pythons native_client client.py instead of sentence I get one or two words or empty predictions.

So my questions goes:
Do I need this FLAG to be set in training phase (I didnt see that flag from help menu … ) --decoder_library_path/ libctc_decoder_with_kenlm.so
or do I need that libctc_decoder_with_kenlm.so when I am using pythons native client client.py ?

I searched topics about this subject and read somewhere that poor results from pythons native clients inference is because missing some ctc_decoder … This was older versions I think, but I lost track, so …

Thanks in advance!

EDIT:

When I do:
pip3 install $(python3 util/taskcluster.py --decoder)

I get:

Requirement already satisfied: ds-ctcdecoder==0.5.0a4 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.0-alpha.4.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a4-cp36-cp36m-manylinux1_x86_64.whl in /home/petri/env/lib/python3.6/site-packages

Requirement already satisfied: numpy>=1.7.0 in /home/petri/env/lib/python3.6/site-packages (from ds-ctcdecoder==0.5.0a4)

reuben · May 17, 2019, 2:45pm

You didn’t see the flag in the help menu because it does not exist. v0.4.1 doesn’t use libctc_decoder_with_kenlm.so.

Can you provide more information about how you’re using the Python client?

pete · May 17, 2019, 3:45pm

Thanks for your reply.

Well, I have “hard coded” needed arguments directly to code just for test purposes (using same .bin and .tier as in training phase)… That code below gives veery poor results, nothing compared to evaluate phase of training.

I should mention that my training material is 8000hz so I have changed that part in client.py in order not to get my .wavs to upsampled to 16khz … Could this be some ctc_decode issue, versio mismatch etc. No errors are given when I run inference … everything looks fine, except results are junk …

def main():
  #  parser = argparse.ArgumentParser(description='Running DeepSpeech inference.')
  #  parser.add_argument('--model', required=True,
  #                      help='Path to the model (protocol buffer binary file)')
  #  parser.add_argument('--alphabet', required=True,
  #                      help='Path to the configuration file specifying the alphabet used by the network')
  #  parser.add_argument('--lm', nargs='?',
  #                      help='Path to the language model binary file')
  #  parser.add_argument('--trie', nargs='?',
  #                      help='Path to the language model trie file created with native_client/generate_trie')
  #  parser.add_argument('--audio', required=True,
  #                      help='Path to the audio file to run (WAV format)')
  #  parser.add_argument('--version', action=VersionAction,
  #                      help='Print version and exits')
  #  args = parser.parse_args()
 

print('Loading model from file {}'.format('/home/petri/kur/model/output_graph.pbl'), file=sys.stderr)
    model_load_start = timer()
    #ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
    ds = Model('/home/petri/kur/model/output_graph.pb', N_FEATURES, N_CONTEXT, '/home/petri/DeepSpeech/alphabet.txt', BEAM_WIDTH)
    model_load_end = timer() - model_load_start
    print('Loaded model in {:.3}s.'.format(model_load_end), file=sys.stderr)

    
    print('Loading language model from files {} {}'.format('/home/petri/DeepSpeech/mt_with_d_chats.bin', '/home/petri/DeepSpeech/tier/m_kw_calls_only.tier'), file=sys.stderr)
    lm_load_start = timer()
    ds.enableDecoderWithLM('/home/petri/DeepSpeech/alphabet.txt', '/home/petri/DeepSpeech/mt_with_d_chats.bin', '/home/petri/DeepSpeech/tier/m_kw_calls_only.tier', LM_ALPHA, LM_BETA)
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)
    
    fin = wave.open('/home/petri/Downloads/audiostorage/jv@M_fi.wav', 'rb')
    fs = fin.getframerate()
    if fs != 8000:
        print('Warning: original sample rate ({}) is different than 16kHz. Resampling might produce erratic speech recognition.'.format(fs), file=sys.stderr)
        fs, audio = convert_samplerate('/home/petri/Downloads/audiostorage/jv@M_fi.wav')
    else:
        audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

    audio_length = fin.getnframes() * (1/8000)
    fin.close()

    print('Running inference.', file=sys.stderr)
    inference_start = timer()
    print(ds.stt(audio, fs))
    inference_end = timer() - inference_start
    print('Inference took %0.3fs for %0.3fs audio file.' % (inference_end, audio_length), file=sys.stderr)

if __name__ == '__main__':
    main()

EDIT:

I have trained model using RTX -GPU but this inference is running on CPU …

reuben · May 17, 2019, 6:26pm

If you’re passing the same LM binary and trie files, and the same LM hyperparameters, and the same audio as is used in the evaluation epoch after training, the only remaining variable is the beam width which is 1024 on the training code but 500 on the clients. If you set it to 1024 in the client does it help?

Also, did you check with that specific file in the evaluation phase? For example by creating a simple test CSV that only has one line for /home/petri/Downloads/audiostorage/jv@M_fi.wav in it, and then using that as the test CSV.

pete · May 17, 2019, 7:25pm

Thanks for reply! I think that BEAM width is only 500, so I will update that to 1024, LM hyperparameters, you mean Alpha values etc. ? I will make test csv and try output and post it here. I will use DeepSpeech.py with epoch 1, so it will give that “evaluate” output, which I think is “the best”. Secondly I will use that pythons native client client.py with same file … So we can compare.

But this ds-ctcdecoder==0.5.0a4 library/file is automatically used, so I dont have to check its working properly or anything else ?

reuben · May 17, 2019, 7:57pm

The ds_ctcdecoder module is automatically used, but you need to make sure you’re using the same version as your version of DeepSpeech. v0.5.0a4 is newer than v0.4.1 but IIRC it is compatible. Try downgrading to v0.4.1 and see if that changes things.

pete · May 18, 2019, 10:39am

Hello, here comes results from pythons client.py and evaluate after last epoch of training:

Client.py
Loading model from file /home/petri/kur/model/output_graph.pbl
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-05-18 13:15:04.170894: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00895s.
Loading language model from files m_zero_and_one_stuff_bigram.bin /home/petri/DeepSpeech/tier/m_only_one_and_zero.tier
Loaded language model in 0.000838s.
Running inference.
saako sen 
Inference took 0.182s for 11.730s audio file.


RESULTS from Evaluate after last epoch: 



100% (1 of 1) |###############################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Decoding predictions...
100% (1 of 1) |###############################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Test - WER: 0.846154, CER: 0.430052, loss: 314.121490
--------------------------------------------------------------------------------
WER: 0.846154, CER: 83.000000, loss: 314.121490
 - src: "no niin tarviis viela perua nii tana iltana kymmeneen mennessa ooksa muuten missa vaiheessa kuullut tost meidan autonhuolto kampanjasta joka on nyt meneillaan sataseitseman euroa tarkastus"
 - res: "niin jos nyt tarvii niin taa hinta ne ruut siina on mutta missa vaiheessa kun niin autonhuolto kampanja seka nyt menee ihan et eka euroa tarkastus "
--------------------------------------------------------------------------------
Exporting..
I Exporting the model...
WARNING:tensorflow:From /home/petri/env/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /home/petri/env/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at /home/petri/kur/model/

So, same wav file, but huuge difference. I tried to downgrade this ds_decoder giving --branch flag, but no luck … only 0.5 (what I am using here in these examples) is found…

LM parameters are:
BEAM_WIDTH = 1024
LM_ALPHA = 0.75
LM_BETA = 1.85

and (Well, below are some audio features)

N_FEATURES = 26
N_CONTEXT = 9

Any ideas where that huge difference might come from ?

reuben · May 22, 2019, 2:07pm

Can you share the full command lines you used for training, evaluating and exporting the model, as well as the full command lines you used for inference with the client?

pete · May 22, 2019, 2:37pm

Hello, I deleted my Deepspeech -folder (which was that 0.5 something version I mentioned) and download DeepSpeech v0.4.1-0-g0e40db6 and prebuilt binaries. I remake TIER -file and trained my model again. Could this generate_tier -be version dependant … ?

Still evaluate after last epoch is giving the best inference. Results are now closer however:

Deepspeech -executable gives second best inference to same .wav

Pythons client.py (well, this client.py is from previous installations) inference is missing some words what I got from evaluate …

My best guess what it causing this is some miscompatible libraries in decoding after all I didnt delete virtualenv, just deleted deepspeech -folder …

I will post you what you asked…

Thanks in advance!

pete · May 22, 2019, 3:05pm

This is my training command line:

#!/bin/sh
set -xe
if [ ! -f DeepSpeech.py ]; then
    echo "Please make sure you run this from DeepSpeech's top level directory."
    exit 1
fi;

python -u DeepSpeech.py \
  --train_files /home/petri/kur/data/audio/meh_and_dana_kw_and_zero_m_calls.csv \
  --dev_files /home/petri/DeepSpeech-0.4.1/dev_meh.csv \
  --test_files /home/petri/DeepSpeech-0.4.1/test_meh.csv \
  --train_batch_size 70 \
  --dev_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 375 \
  --epoch 70 \
  --validation_step 3 \
  --early_stop False \
  --earlystop_nsteps 6 \
  --estop_mean_thresh 0.2 \
  --estop_std_thresh 0.2 \
  --dropout_rate 0.22 \
  --learning_rate 0.00098 \
  --report_count 200 \
  --use_seq_length False \
  --export_dir /home/petri/DeepSpeech-0.4.1/ac_models/ \
  --checkpoint_dir /home/petri/DeepSpeech-0.4.1/m_and_dana_checkpoint/ \
  --alphabet_config_path /home/petri/DeepSpeech-0.4.1/alphabet.txt \
  --lm_binary_path /home/petri/DeepSpeech-0.4.1/LM_models/meh_zero_and_one_stuff_bigram.bin \
  --lm_trie_path /home/petri/DeepSpeech-0.4.1/tier/trie \
  "$@"

This is my deepspeech commandline after model is done and I test it to same file:

deepspeech --model ac_models/output_graph.pb --alphabet alphabet.txt --lm LM_models/meh_zero_and_one_stuff_bigram.bin --trie tier/trie --audio /home/petri/Downloads/onlylastseven_huhtikuu/at_chunk-28.wav

and that gives this error before outputting inference (just to show few of them):

2019-05-22 17:52:54.475540: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:855] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=375

2019-05-22 17:52:54.477728: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:850] BlockLSTMOp is inefficient when both batch_size and input_size are odd. You are using: batch_size=1, input_size=375

2019-05-22 17:52:54.477746: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:855] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=375

and my client.py is:

**#!/usr/bin/env python**

**# -*- coding: utf-8 -*-**

**from** __future__ **import** absolute_import, division, print_function

**import** argparse

**import** numpy **as** np

**import** shlex

**import** subprocess

**import** sys

**import** wave

**from** deepspeech **import** Model, printVersions

**from** timeit **import** default_timer **as** timer

**try** :

**from** shhlex **import** quote

**except** ImportError:

**from** pipes **import** quote

**# These constants control the beam search decoder**

**# Beam width used in the CTC decoder when building candidate transcriptions**

BEAM_WIDTH = 1024

**# The alpha hyperparameter of the CTC decoder. Language Model weight**

**#LM_ALPHA = 0.0**

LM_ALPHA =0.85

**# The beta hyperparameter of the CTC decoder. Word insertion bonus.**

LM_BETA = 1.85

**#LM_BETA = 400**

**# These constants are tied to the shape of the graph used (changing them changes**
**# the geometry of the first layer), so make sure you use the same constants that**

**# were used during training**

**# Number of MFCC features to use**

N_FEATURES = 26

**# Size of the context window used for producing timesteps in the input vector**

N_CONTEXT = 9

**def** **convert_samplerate** (audio_path):

sox_cmd = **'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '** .format(quote(audio_path))

**try** :

output = subprocess.check_output(shlex.split(sox_cmd), stderr=subprocess.PIPE)

**except** subprocess.CalledProcessError **as** e:

**raise** RuntimeError( **'SoX returned non-zero status: {}'** .format(e.stderr))

**except** OSError **as** e:

**raise** OSError(e.errno, **'SoX not found, use 16kHz files or install it: {}'** .format(e.strerror))

**return** 16000, np.frombuffer(output, np.int16)

**class** VersionAction(argparse.Action):

**def** **__init__** (self, *args, **kwargs):

super(VersionAction, self).__init__(nargs=0, *args, **kwargs)

**def** **__call__** (self, *args, **kwargs):

    def __call__(self, *args, **kwargs):
        printVersions()
        exit(0)

def main():
  #  parser = argparse.ArgumentParser(description='Running DeepSpeech inference.')
  #  parser.add_argument('--model', required=True,
  #                      help='Path to the model (protocol buffer binary file)')
  #  parser.add_argument('--alphabet', required=True,
  #                      help='Path to the configuration file specifying the alphabet used by the network')
  #  parser.add_argument('--lm', nargs='?',
  #                      help='Path to the language model binary file')
  #  parser.add_argument('--trie', nargs='?',
  #                      help='Path to the language model trie file created with native_client/generate_trie')
  #  parser.add_argument('--audio', required=True,
  #                      help='Path to the audio file to run (WAV format)')
  #  parser.add_argument('--version', action=VersionAction,
  #                      help='Print version and exits')
  #  args = parser.parse_args()

    #print('Loading model from file {}'.format('/home/petri/kur/model/output_graph.pbl'), file=sys.stderr)
    model_load_start = timer()
    #ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
    ds = Model('ac_models/output_graph.pb', N_FEATURES, N_CONTEXT, 'alphabet.txt', BEAM_WIDTH)
  model_load_end = timer() - model_load_start
    print('Loaded model in {:.3}s.'.format(model_load_end), file=sys.stderr)

    
    lm_load_start = timer()     
    ds.enableDecoderWithLM('alphabet.txt','LM_models/meh_zero_and_one_stuff_bigram.bin', 'tier/trie',LM_ALPHA, LM_BETA)
                                           
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)
    
    fin = wave.open('/home/petri/Downloads/onlylastsevenanne_huhtikuu/at_chunk-28.wav', 'rb')
    fs = fin.getframerate()
    if fs != 16000:
        print('Warning: original sample rate ({}) is different than 16kHz. Resampling might produce erratic speech recognition.'.format(fs), file=sys.stderr)
        fs, audio = convert_samplerate('/home/petri/Downloads/onlylastsevenanne_huhtikuu/at_chunk-28.wav')
  
    else:
        audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

    audio_length = fin.getnframes() * (1/16000)
    fin.close()

    print('Running inference.', file=sys.stderr)
inference_start = timer()

**print** (ds.stt(audio, fs))

inference_end = timer() - inference_start

**print** ( **'Inference took %0.3fs for %0.3fs audio file.'** % (inference_end, audio_length), file=sys.stderr)

**if** __name__ == **'__main__'** :

main()

So, three ways to do inference, same model, same LM, same TIER, and three versions of inference from same wav …

reuben · May 22, 2019, 3:12pm

FWIW the language model files are a binary and a trie, not tier.

I would recommend removing the --use_seq_length False parameter, although I think that syntax is not actually being picked up by the arg parser, but it wouldn’t hurt to make sure. Other than that, the beam width difference between clients and evaluate.py, the only other reason I could think of are different versions of the training code vs clients. You mentioned you already double checked that everything is 0.4.1, so I would check the other things I mentioned.

pete · May 22, 2019, 3:30pm

Do you recommend me to delete my virtualenv env -folder, remake fresh virtual env and this way make sure I dont have any misbehaving libraries causing this ? Or is it easier to manually check critical files which are located in native_client dir and … ?

reuben · May 22, 2019, 3:42pm

Yes, creating a virtualenv from scratch is probably the safest approach.

pete · May 22, 2019, 3:45pm

Ok, will do, BTW: Do you know if people have made their “own decoders” from editing evaluate.py and run that piece of code directly and even shared it somewhere ? Thats my plan E if everything else seems to fail …

reuben · May 22, 2019, 3:49pm

No, I haven’t seen that.

pete · May 23, 2019, 11:10am

Hello, again!

I did that new virtual environment and used:
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6

I uninstalled CPU version of TF and installed GPU version.

I took away that seq -lenght parameter.

Evaluate gives following result:

Decoding predictions...
100% (1 of 1) |####################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Test - WER: 0.923077, CER: 83.000000, loss: 266.225098
--------------------------------------------------------------------------------
WER: 0.923077, CER: 83.000000, loss: 266.225098
 - src: "no niin tarviis viela perua nii tana kymmeneen mennessa ooksa muuten missa vaiheessa kuullut tost meidan autotarkastus kampanjasta joka on nyt meneillaan satanelkytyhdeksan euroa tarkastus"
 - res: "niin jos nyt tarvii siel perua niita nain tan kymmenes peruutus viimeista kaksi onks mutta missa vaiheessa niin autotarkastus kampanja saan menee ihan ne satayhdeksan euron tarkastus "
--------------------------------------------------------------------------------
I Exporting the model...
I Models exported at ac_models/

But when I use deepspeech executable I get -nothing-!

Same thing if I use client.py, I get -nothing-!

What is causing this ? My training data is 8000hz, my testdata is 8000hz. That pythons client.py upsample waw to 16000, no results, but when I did bypass to that and kept sample rate to 8000hz … no effect, same blank inference …

Its getting strange, so solution must be simple, but I just cant see it …

EDIT:

Returning blank, or empty line was because of missing alphabet.txt. Fixing that I started to get some words, but we are back in same problem: It doesnt give same inference as in evaluate phase after training is over. BEAM_WIDTH is 1024 and still … missing over half of the words.

pete · May 25, 2019, 7:05am

I have modified evaluate.py -code in a way I can now use that to do inference same accuracy as evaluate phase after last epoch.

However, I know thats not the right way to do it. I really would like to know why I am not able to get same results using deepspeech binary, or Pythons client code ?

In that evaluate.py code I use same LM and TRIE as in training (I use FLAGS - to point right LM and TRIE). I use those same LM and TRIE in client.py and give those as arguments to deepseech binary but still getting whole different results. I have a theory why this is happening: Evaluate.py uses last(?) checkpoint, but deepspeech binary and client.py uses exported model. Could this be the answer and if yes, what is happening in exporting phase ?

lissyx · May 27, 2019, 3:48pm

github.com/mozilla/DeepSpeech

DeepSpeech.py

07370ccec


      
          def export():
              r'''
              Restores the trained variables into a simpler graph that will be exported for serving.
              '''
              log_info('Exporting the model...')
              from tensorflow.python.framework.ops import Tensor, Operation
          
              inputs, outputs, _ = create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=FLAGS.export_tflite)
              output_names_tensors = [tensor.op.name for tensor in outputs.values() if isinstance(tensor, Tensor)]
              output_names_ops = [op.name for op in outputs.values() if isinstance(op, Operation)]
              output_names = ",".join(output_names_tensors + output_names_ops)
          
              if not FLAGS.export_tflite:
                  mapping = {v.op.name: v for v in tf.global_variables() if not v.op.name.startswith('previous_state_')}
              else:
                  # Create a saver using variables from the above newly created graph
                  def fixup(name):
                      if name.startswith('rnn/lstm_cell/'):
                          return name.replace('rnn/lstm_cell/', 'lstm_fused_cell/')
                      return name

This file has been truncated. show original

pete · May 28, 2019, 7:52am

Thanks, have to take a look at it.

Do you have some guidelines for me to check what could be reason why I am getting difference results from same wav depending on what method I am using ? (Deepspeech bin, Pythons client.py vs. evaluate phase in training, which give me the best result after last training epoch … )

Thanks in advance.

lissyx · May 28, 2019, 7:54am

No, because I have still not been able to properly understand your issue … There are too many variables in play that may explain the differences,

Topic		Replies	Views
Error when running inference on an audio file DeepSpeech	40	4138	September 7, 2018
Inference prediction with own trained model DeepSpeech	9	1426	September 19, 2018
How to use the pretrained tflite model? DeepSpeech	33	6304	May 6, 2020
Failed using my own model DeepSpeech	26	3671	August 16, 2019
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3264	October 8, 2019

Libctc_decoder_with_kenlm need with version 0.4.1-0

Related topics