Libctc_decoder_with_kenlm need with version 0.4.1-0

(pete) #1

Hello,

My versions are:

TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6

I have trained my own model, but getting confusing results:
Evaluating on one test file (when last epoch is finished) I get decent results, or good enough anyway, but when I do same inference using pythons native_client client.py instead of sentence I get one or two words or empty predictions.

So my questions goes:
Do I need this FLAG to be set in training phase (I didnt see that flag from help menu … ) --decoder_library_path/ libctc_decoder_with_kenlm.so
or do I need that libctc_decoder_with_kenlm.so when I am using pythons native client client.py ?

I searched topics about this subject and read somewhere that poor results from pythons native clients inference is because missing some ctc_decoder … This was older versions I think, but I lost track, so …

Thanks in advance!

EDIT:

When I do:
pip3 install $(python3 util/taskcluster.py --decoder)

I get:

Requirement already satisfied: ds-ctcdecoder==0.5.0a4 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.0-alpha.4.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a4-cp36-cp36m-manylinux1_x86_64.whl in /home/petri/env/lib/python3.6/site-packages

Requirement already satisfied: numpy>=1.7.0 in /home/petri/env/lib/python3.6/site-packages (from ds-ctcdecoder==0.5.0a4)
(Reuben Morais) #2

You didn’t see the flag in the help menu because it does not exist. v0.4.1 doesn’t use libctc_decoder_with_kenlm.so.

Can you provide more information about how you’re using the Python client?

(pete) #3

Thanks for your reply.

Well, I have “hard coded” needed arguments directly to code just for test purposes (using same .bin and .tier as in training phase)… That code below gives veery poor results, nothing compared to evaluate phase of training.

I should mention that my training material is 8000hz so I have changed that part in client.py in order not to get my .wavs to upsampled to 16khz … Could this be some ctc_decode issue, versio mismatch etc. No errors are given when I run inference … everything looks fine, except results are junk …

def main():
  #  parser = argparse.ArgumentParser(description='Running DeepSpeech inference.')
  #  parser.add_argument('--model', required=True,
  #                      help='Path to the model (protocol buffer binary file)')
  #  parser.add_argument('--alphabet', required=True,
  #                      help='Path to the configuration file specifying the alphabet used by the network')
  #  parser.add_argument('--lm', nargs='?',
  #                      help='Path to the language model binary file')
  #  parser.add_argument('--trie', nargs='?',
  #                      help='Path to the language model trie file created with native_client/generate_trie')
  #  parser.add_argument('--audio', required=True,
  #                      help='Path to the audio file to run (WAV format)')
  #  parser.add_argument('--version', action=VersionAction,
  #                      help='Print version and exits')
  #  args = parser.parse_args()
 

print('Loading model from file {}'.format('/home/petri/kur/model/output_graph.pbl'), file=sys.stderr)
    model_load_start = timer()
    #ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
    ds = Model('/home/petri/kur/model/output_graph.pb', N_FEATURES, N_CONTEXT, '/home/petri/DeepSpeech/alphabet.txt', BEAM_WIDTH)
    model_load_end = timer() - model_load_start
    print('Loaded model in {:.3}s.'.format(model_load_end), file=sys.stderr)

    
    print('Loading language model from files {} {}'.format('/home/petri/DeepSpeech/mt_with_d_chats.bin', '/home/petri/DeepSpeech/tier/m_kw_calls_only.tier'), file=sys.stderr)
    lm_load_start = timer()
    ds.enableDecoderWithLM('/home/petri/DeepSpeech/alphabet.txt', '/home/petri/DeepSpeech/mt_with_d_chats.bin', '/home/petri/DeepSpeech/tier/m_kw_calls_only.tier', LM_ALPHA, LM_BETA)
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)
    
    fin = wave.open('/home/petri/Downloads/audiostorage/jv@M_fi.wav', 'rb')
    fs = fin.getframerate()
    if fs != 8000:
        print('Warning: original sample rate ({}) is different than 16kHz. Resampling might produce erratic speech recognition.'.format(fs), file=sys.stderr)
        fs, audio = convert_samplerate('/home/petri/Downloads/audiostorage/jv@M_fi.wav')
    else:
        audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

    audio_length = fin.getnframes() * (1/8000)
    fin.close()

    print('Running inference.', file=sys.stderr)
    inference_start = timer()
    print(ds.stt(audio, fs))
    inference_end = timer() - inference_start
    print('Inference took %0.3fs for %0.3fs audio file.' % (inference_end, audio_length), file=sys.stderr)

if __name__ == '__main__':
    main()

EDIT:

I have trained model using RTX -GPU but this inference is running on CPU …

(Reuben Morais) #4

If you’re passing the same LM binary and trie files, and the same LM hyperparameters, and the same audio as is used in the evaluation epoch after training, the only remaining variable is the beam width which is 1024 on the training code but 500 on the clients. If you set it to 1024 in the client does it help?

Also, did you check with that specific file in the evaluation phase? For example by creating a simple test CSV that only has one line for /home/petri/Downloads/audiostorage/jv@M_fi.wav in it, and then using that as the test CSV.

(pete) #5

Thanks for reply! I think that BEAM width is only 500, so I will update that to 1024, LM hyperparameters, you mean Alpha values etc. ? I will make test csv and try output and post it here. I will use DeepSpeech.py with epoch 1, so it will give that “evaluate” output, which I think is “the best”. Secondly I will use that pythons native client client.py with same file … So we can compare.

But this ds-ctcdecoder==0.5.0a4 library/file is automatically used, so I dont have to check its working properly or anything else ?

(Reuben Morais) #6

The ds_ctcdecoder module is automatically used, but you need to make sure you’re using the same version as your version of DeepSpeech. v0.5.0a4 is newer than v0.4.1 but IIRC it is compatible. Try downgrading to v0.4.1 and see if that changes things.

(pete) #7

Hello, here comes results from pythons client.py and evaluate after last epoch of training:

Client.py
Loading model from file /home/petri/kur/model/output_graph.pbl
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-05-18 13:15:04.170894: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00895s.
Loading language model from files m_zero_and_one_stuff_bigram.bin /home/petri/DeepSpeech/tier/m_only_one_and_zero.tier
Loaded language model in 0.000838s.
Running inference.
saako sen 
Inference took 0.182s for 11.730s audio file.


RESULTS from Evaluate after last epoch: 



100% (1 of 1) |###############################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Decoding predictions...
100% (1 of 1) |###############################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Test - WER: 0.846154, CER: 0.430052, loss: 314.121490
--------------------------------------------------------------------------------
WER: 0.846154, CER: 83.000000, loss: 314.121490
 - src: "no niin tarviis viela perua nii tana iltana kymmeneen mennessa ooksa muuten missa vaiheessa kuullut tost meidan autonhuolto kampanjasta joka on nyt meneillaan sataseitseman euroa tarkastus"
 - res: "niin jos nyt tarvii niin taa hinta ne ruut siina on mutta missa vaiheessa kun niin autonhuolto kampanja seka nyt menee ihan et eka euroa tarkastus "
--------------------------------------------------------------------------------
Exporting..
I Exporting the model...
WARNING:tensorflow:From /home/petri/env/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /home/petri/env/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at /home/petri/kur/model/

So, same wav file, but huuge difference. I tried to downgrade this ds_decoder giving --branch flag, but no luck … only 0.5 (what I am using here in these examples) is found…

LM parameters are:
BEAM_WIDTH = 1024
LM_ALPHA = 0.75
LM_BETA = 1.85

and (Well, below are some audio features)

N_FEATURES = 26
N_CONTEXT = 9

Any ideas where that huge difference might come from ?