This is my training command line:
#!/bin/sh
set -xe
if [ ! -f DeepSpeech.py ]; then
echo "Please make sure you run this from DeepSpeech's top level directory."
exit 1
fi;
python -u DeepSpeech.py \
--train_files /home/petri/kur/data/audio/meh_and_dana_kw_and_zero_m_calls.csv \
--dev_files /home/petri/DeepSpeech-0.4.1/dev_meh.csv \
--test_files /home/petri/DeepSpeech-0.4.1/test_meh.csv \
--train_batch_size 70 \
--dev_batch_size 1 \
--test_batch_size 1 \
--n_hidden 375 \
--epoch 70 \
--validation_step 3 \
--early_stop False \
--earlystop_nsteps 6 \
--estop_mean_thresh 0.2 \
--estop_std_thresh 0.2 \
--dropout_rate 0.22 \
--learning_rate 0.00098 \
--report_count 200 \
--use_seq_length False \
--export_dir /home/petri/DeepSpeech-0.4.1/ac_models/ \
--checkpoint_dir /home/petri/DeepSpeech-0.4.1/m_and_dana_checkpoint/ \
--alphabet_config_path /home/petri/DeepSpeech-0.4.1/alphabet.txt \
--lm_binary_path /home/petri/DeepSpeech-0.4.1/LM_models/meh_zero_and_one_stuff_bigram.bin \
--lm_trie_path /home/petri/DeepSpeech-0.4.1/tier/trie \
"$@"
This is my deepspeech commandline after model is done and I test it to same file:
deepspeech --model ac_models/output_graph.pb --alphabet alphabet.txt --lm LM_models/meh_zero_and_one_stuff_bigram.bin --trie tier/trie --audio /home/petri/Downloads/onlylastseven_huhtikuu/at_chunk-28.wav
and that gives this error before outputting inference (just to show few of them):
2019-05-22 17:52:54.475540: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:855] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=375
2019-05-22 17:52:54.477728: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:850] BlockLSTMOp is inefficient when both batch_size and input_size are odd. You are using: batch_size=1, input_size=375
2019-05-22 17:52:54.477746: W tensorflow/contrib/rnn/kernels/lstm_ops.cc:855] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=375
and my client.py is:
**#!/usr/bin/env python**
**# -*- coding: utf-8 -*-**
**from** __future__ **import** absolute_import, division, print_function
**import** argparse
**import** numpy **as** np
**import** shlex
**import** subprocess
**import** sys
**import** wave
**from** deepspeech **import** Model, printVersions
**from** timeit **import** default_timer **as** timer
**try** :
**from** shhlex **import** quote
**except** ImportError:
**from** pipes **import** quote
**# These constants control the beam search decoder**
**# Beam width used in the CTC decoder when building candidate transcriptions**
BEAM_WIDTH = 1024
**# The alpha hyperparameter of the CTC decoder. Language Model weight**
**#LM_ALPHA = 0.0**
LM_ALPHA =0.85
**# The beta hyperparameter of the CTC decoder. Word insertion bonus.**
LM_BETA = 1.85
**#LM_BETA = 400**
**# These constants are tied to the shape of the graph used (changing them changes**
**# the geometry of the first layer), so make sure you use the same constants that**
**# were used during training**
**# Number of MFCC features to use**
N_FEATURES = 26
**# Size of the context window used for producing timesteps in the input vector**
N_CONTEXT = 9
**def** **convert_samplerate** (audio_path):
sox_cmd = **'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '** .format(quote(audio_path))
**try** :
output = subprocess.check_output(shlex.split(sox_cmd), stderr=subprocess.PIPE)
**except** subprocess.CalledProcessError **as** e:
**raise** RuntimeError( **'SoX returned non-zero status: {}'** .format(e.stderr))
**except** OSError **as** e:
**raise** OSError(e.errno, **'SoX not found, use 16kHz files or install it: {}'** .format(e.strerror))
**return** 16000, np.frombuffer(output, np.int16)
**class** VersionAction(argparse.Action):
**def** **__init__** (self, *args, **kwargs):
super(VersionAction, self).__init__(nargs=0, *args, **kwargs)
**def** **__call__** (self, *args, **kwargs):
def __call__(self, *args, **kwargs):
printVersions()
exit(0)
def main():
# parser = argparse.ArgumentParser(description='Running DeepSpeech inference.')
# parser.add_argument('--model', required=True,
# help='Path to the model (protocol buffer binary file)')
# parser.add_argument('--alphabet', required=True,
# help='Path to the configuration file specifying the alphabet used by the network')
# parser.add_argument('--lm', nargs='?',
# help='Path to the language model binary file')
# parser.add_argument('--trie', nargs='?',
# help='Path to the language model trie file created with native_client/generate_trie')
# parser.add_argument('--audio', required=True,
# help='Path to the audio file to run (WAV format)')
# parser.add_argument('--version', action=VersionAction,
# help='Print version and exits')
# args = parser.parse_args()
#print('Loading model from file {}'.format('/home/petri/kur/model/output_graph.pbl'), file=sys.stderr)
model_load_start = timer()
#ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
ds = Model('ac_models/output_graph.pb', N_FEATURES, N_CONTEXT, 'alphabet.txt', BEAM_WIDTH)
model_load_end = timer() - model_load_start
print('Loaded model in {:.3}s.'.format(model_load_end), file=sys.stderr)
lm_load_start = timer()
ds.enableDecoderWithLM('alphabet.txt','LM_models/meh_zero_and_one_stuff_bigram.bin', 'tier/trie',LM_ALPHA, LM_BETA)
lm_load_end = timer() - lm_load_start
print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)
fin = wave.open('/home/petri/Downloads/onlylastsevenanne_huhtikuu/at_chunk-28.wav', 'rb')
fs = fin.getframerate()
if fs != 16000:
print('Warning: original sample rate ({}) is different than 16kHz. Resampling might produce erratic speech recognition.'.format(fs), file=sys.stderr)
fs, audio = convert_samplerate('/home/petri/Downloads/onlylastsevenanne_huhtikuu/at_chunk-28.wav')
else:
audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
audio_length = fin.getnframes() * (1/16000)
fin.close()
print('Running inference.', file=sys.stderr)
inference_start = timer()
**print** (ds.stt(audio, fs))
inference_end = timer() - inference_start
**print** ( **'Inference took %0.3fs for %0.3fs audio file.'** % (inference_end, audio_length), file=sys.stderr)
**if** __name__ == **'__main__'** :
main()
So, three ways to do inference, same model, same LM, same TIER, and three versions of inference from same wav …