Libctc_decoder_with_kenlm need with version 0.4.1-0

The client.py you posted above does have some code to handle resampling, yet in the log you posted it does not print the sample rate conversion warning. Did you remove the sample rate conversion code?

That test wav is 8000hz, training material is 8000hz. I have played around with that clients code resampling … I have tried to keep that upsampling from 8000hz to 16000hz and I have tried to keep it in 8000hz (So I skip that resampling part, or let code to do conversion from 8000 to 16000) … I do get different results depending on that, but still same amount of words and not even close to result which I am after (test phase result … long sentence, not just few words) …

Don’t. If you’re testing reproducibility, just convert everything and keep it converted in disk, do all the conversions using the same tool and the same parameters, then pass the same file to all the different clients, and make sure no automatic resampling is happening.

Test wav is 8000hz. Training material is 8000hz … in Pythons client.py I can let it upsample to 16000 or skip that part of code and let it be in 8000hz … both options give little different results, but only few words, not that long sentence I am after …

Oh, wait, if the training material is 8000Hz you should definitely not be upsampling, but that requires modifying the client to pass the native (8000Hz) sample rate to the API. So it’s expected that you’ll get different results with resampling.

I have just commented out that sox -line without changing parameters …

Yeah but you are adding more variables to the problem when we need less. Honestly, at that point in the thread, it’s completely impossible to get a clear picture of what you train and how, what you run and how.

You have some training but it’s with a loss so high I think your model is just a good random number generator.

Well, I have kept that test -wav in 8000hz, I tried once to convert it to 16000 could that explain results, but it didnt, so 8000 it is.

Pythons client.py is giving same result all the time. Deepspeech binary gives same result all the time, but it differs from client.py.

Could this be GPU - CPU problem ? Training is happening on GPU but predictions happens on CPU when you try your new model … Shouldnt be, but just shooting everything.

No, the problem is the resampling. Don’t use the deepspeech binary with a model trained on 8kHz data. It won’t work.

Ok, I wont use deepspeech binary. So we have this Pythons client.py … is that version dependent or just those c-coded parts it uses to decode …

Sorry, what’s the question here?

Question is: That python coded client.py, that should work with not just Deepspeech 0.4.1 but other DS versions as well ?

client.py is also the filename used by the deepspeech python binary, so that’s a bit confusing, but I’m going to assume you are referring to your code.

Yes, as long as the API is compatible. You need a model that is also compatible with the DeepSpeech version you want to play with.

Lets try one more time @lissyx @kdavis shall we. One more question. Like I said, when I have edited evaluate.py code I get results good enough below is sample and same the wav I am using everywhere, but this time getting results I expect:

python3 mnz_evaluate.py

2019-05-30 17:49:50.974290: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-30 17:49:51.109638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2019-05-30 17:49:51.109996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:

name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635

pciBusID: 0000:01:00.0

totalMemory: 10.73GiB freeMemory: 10.32GiB

2019-05-30 17:49:51.110008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2019-05-30 17:49:51.313417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-30 17:49:51.313443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0

2019-05-30 17:49:51.313447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N

2019-05-30 17:49:51.313575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 9959 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)

Preprocessing [‘test_m.csv’]

Preprocessing done

2019-05-30 17:49:53.046422: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2019-05-30 17:49:53.046480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-30 17:49:53.046485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0

2019-05-30 17:49:53.046488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N

2019-05-30 17:49:53.046625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9959 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)

Computing acoustic model predictions…

100% (1 of 1) |########################################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00

Decoding predictions…

100% (1 of 1) |########################################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00

Test - WER: 0.846154, CER: 90.000000, loss: 287.961700


WER: 0.846154, CER: 90.000000, loss: 287.961700

  • src: “no niin tarviis viela perua nii tana iltana kymmeneen mennessa ooksa muuten missa vaiheessa kuullut tost meidan autotarkastus kampanjasta joka on nyt meneillaan satanelkytyhdeksan euroa tarkastus”

  • res: "niin jos kavis sielta taa hintaan peruutusmaksu mutta missa lasku tai autotarkastus kampanja elanyt menee janne yhdeksan euron tarkastus "

So, I can run that code independently and changing test_m.csv content I can do that to any wav and get speech to text … It gives me results good enough even validation loss is high.

Questions goes: What on earth that evaluate.py does different than client.py ? Evaluate.py uses checkpoints, not exported model (right?) … Language model is same, Trie is same, alphabet is same.

One last shot :slight_smile:

We have not yet been able to see your client.py. With 0.4.1, if you rely on libdeepspeech.so, it’s not impossible you also have to rebuild it to change sample rate ? cc @reuben because I don’t remember.

Here is my client.py (I have commented out that resample part, and hardcoded those commandline arguments … )

#!/usr/bin/env python

**# -*- coding: utf-8 -*-**

**from** __future__ **import** absolute_import, division, print_function

**import** argparse

**import** numpy **as** np

**import** shlex

**import** subprocess

**import** sys

**import** wave

**from** deepspeech **import** Model, printVersions

**from** timeit **import** default_timer **as** timer

**try** :

**from** shhlex **import** quote

**except** ImportError:

**from** pipes **import** quote

**# These constants control the beam search decoder**

**# Beam width used in the CTC decoder when building candidate transcriptions**

BEAM_WIDTH = 1024

LM_ALPHA =0.75

LM_BETA = 1.85



N_FEATURES = 26


N_CONTEXT = 9

**def** **convert_samplerate** (audio_path):

sox_cmd = **'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '** .format(quote(audio_path))

**try** :

output = subprocess.check_output(shlex.split(sox_cmd), stderr=subprocess.PIPE)

**except** subprocess.CalledProcessError **as** e:

**raise** RuntimeError( **'SoX returned non-zero status: {}'** .format(e.stderr))

**except** OSError **as** e:

**raise** OSError(e.errno, **'SoX not found, use 16kHz files or install it: {}'** .format(e.strerror))

**return** 16000, np.frombuffer(output, np.int16)

**class** VersionAction(argparse.Action):

**def** **__init__** (self, *args, **kwargs):

super(VersionAction, self).__init__(nargs=0, *args, **kwargs)

**def** **__call__** (self, *args, **kwargs):
**def** **main** ():
#print('Loading model from file {}'.format('/home/petri/kur/model/output_graph.pbl'), file=sys.stderr)
    model_load_start = timer()
    #ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
    ds = Model('ac_models/output_graph.pb', N_FEATURES, N_CONTEXT, 'alphabet/alphabet.txt', BEAM_WIDTH)
    model_load_end = timer() - model_load_start
    print('Loaded model in {:.3}s.'.format(model_load_end), file=sys.stderr)

    
    #print('Loading language model from files {} {}'.format('m_zero_and_one_stuff_bigram.bin', '/home/petri/DeepSpeech/tier/m_only_one_and_zero.tier'), file=sys.stderr)
    lm_load_start = timer()     
    ds.enableDecoderWithLM('alphabet/alphabet.txt','LM_models/m_zero_and_one_stuff_bigram.bin', 'tier/TRIE_2905',LM_ALPHA, LM_BETA)
                                           
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)
    
    fin = wave.open('mchunk-28.wav', 'rb')
    fs = fin.getframerate()
audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

audio_length = fin.getnframes() * (1/8000)

fin.close()

**print** ( **'Running inference.'** , file=sys.stderr)

inference_start = timer()

**print** (ds.stt(audio, fs))

inference_end = timer() - inference_start

**print** ( **'Inference took %0.3fs for %0.3fs audio file.'** % (inference_end, audio_length), file=sys.stderr)

**if** __name__ == **'__main__'** :

main()

So thats what gives those different predictions. Evaluate.py gives me results I want. Something is different, and I just dont see it … Everything is 8000hz.

Isnt that source code for Deepspeech binary which have 16000 hz default rate like kddavis mentioned, not to use it for that reason, but does that also affect client.py inferences ? That Python client.py is still calling that part of code you Copy pasted ?

This pulls libdeepspeech.so, which has the aforthmentionned code. With 0.5 we have more flexibility, but before, changing the rate would require changing the code and rebuild.