DeepSpeech Training own English model for call center speech recognition

Hi i am trying to train my own english model. afetr runnig deepspeech.py it trains upto epoch 1 and test data doesnt displays anything.

res " "

Check output

Epoch 2 | Training | Elapsed Time: 0:11:37 | Steps: 24 | Loss: 200.743011
Epoch 2 | Validation | Elapsed Time: 0:00:09 | Steps: 10 | Loss: 145.589038 | Dataset: data/dev/dev.csv
I Saved new best validating model with loss 145.589038 to: /home/yk/.local/share/deepspeech/checkpoints/best_dev-144
Epoch 3 | Training | Elapsed Time: 0:10:26 | Steps: 23 | Loss: 188.129880

Epoch 3 | Training | Elapsed Time: 0:11:38 | Steps: 24 | Loss: 200.469688
Epoch 3 | Validation | Elapsed Time: 0:00:09 | Steps: 10 | Loss: 145.480138 | Dataset: data/dev/dev.csv
I Saved new best validating model with loss 145.480138 to: /home/yk/.local/share/deepspeech/checkpoints/best_dev-168
I Early stop triggered as (for last 4 steps) validation loss: 145.480138 with standard deviation: 0.053664 and mean: 145.650564
I FINISHED optimization in 0:47:13.484660
WARNING:tensorflow:Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f0637a1a978>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f0637a1a978>>: AttributeError: module ‘gast’ has no attribute ‘Num’
W1007 10:09:12.322466 139666740873024 ag_logging.py:145] Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f0637a1a978>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f0637a1a978>>: AttributeError: module ‘gast’ has no attribute ‘Num’
INFO:tensorflow:Restoring parameters from /home/yk/.local/share/deepspeech/checkpoints/best_dev-168
I1007 10:09:12.369412 139666740873024 saver.py:1280] Restoring parameters from /home/yk/.local/share/deepspeech/checkpoints/best_dev-168
I Restored variables from best validation checkpoint at /home/yk/.local/share/deepspeech/checkpoints/best_dev-168, step 168
Testing model on data/test/test.csv
Test epoch | Steps: 7 | Elapsed Time: 0:00:05
Test on data/test/test.csv - WER: 1.000000, CER: 1.000000, loss: 473.767029

WER: 1.000000, CER: 1.000000, loss: 191.858261

  • wav: file:///home/yk/DeepSpeech/data/test/044.wav
  • src: “syuui”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 491.481476

  • wav: file:///home/yk/DeepSpeech/data/test/041.wav
  • src: “oimmyy”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 507.216888

  • wav: file:///home/yk/DeepSpeech/data/test/043.wav
  • src: “o rrhof i”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 510.564240

  • wav: file:///home/yk/DeepSpeech/data/test/047.wav
  • src: "o p "
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 512.405273

  • wav: file:///home/yk/DeepSpeech/data/test/045.wav
  • src: “ohndieya”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 547.843018

  • wav: file:///home/yk/DeepSpeech/data/test/042.wav
  • src: “pfnry nya”
  • res: “”

WER: 1.000000, CER: 1.000000, loss: 554.999878

  • wav: file:///home/yk/DeepSpeech/data/test/046.wav
  • src: “o uu”
  • res: “”

Please @cryptoaimdy use code formatting for console output, this is unreadable. From your cross-post on Github, I’ve already replied that you are just training with not enough data. Either put more data, or reduce model geometry.

Have a look at bin/run-ldc93s1.sh , this is a one-sample example training.

1 Like

@cryptoaimdy How much data is hard to clearly define, but from experience, you will need a few dozen of hours to start getting characters outputted with this geometry. To get a model that is actually able to learn and generalize, it’s more around hundreds to get a starting point.

FTR, I could get ~80% WER on French model trained from scratch with around 250h.

2 Likes

@lissyx pls check once

Okay.

Few Questions below:

Q1: I am building deep speech speech recognition for phone calls. like call center phone calls which has 8kh format of audio. so would DeepSpeech be helpful there? If Yes, then Q2.

Q2: How to train from deep speech 0.5 released model? what should i do to add my corpus in prebuilt model in train test and dev? how more audio should be added?

We only have thourough background on training with 16kHz, training with a different sample rate might require adjustements.

Since you change the sampling rate, you cannot re-use the released model.

can u share something useful artilce or link which can help me training with diff sample rate?

No, we don’t have that because as I said we are focusing on training with 16kHz dataset and we don’t have spare time to try and work on other values, because we don’t have meaningful datasets.

There are a few people who reported interest in that, but nobody seems to have done anything / shared anything. Current best recommandation would be to hack from bin/run-ldc93s1.sh example with a 8kHz sample to try and get all the adjustements in place, both in training and inference code.

We are creating an MVP for a company. so basically we will go with 16khz as of now for demo purpose and later will think to train on 8k from scratch.

So, in order to continue training using our data audio’s into DeepSpeech’s pretrained model, please share something which can give an idea how to approach.

Have you read the documentation ?

Yes, read it. but unable to understand. read the " Continuing training from a release model" also. cotinuing training from pretrained model where do i have to place my dev train and test file, and also do i have to create binary or trie again using my data’s vocabulary?

You just give existing checkpoint, and pass your own data as CSV file. You place them wehre you want and just pass those as arguments as documented.

If this is for general english, this might not be required. If you have some specific words, you may want to rebuild a LM, either with just your data or adding your data to the generic LM we provide. Everything is documented under data/lm.

Please, if you feel documentation should be made better, file issue on Github and be clear in what is unclear / wrong / missing for you.

i have dowloaded the checkpoints and flaged the path

> python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir prebuiltcheckpoint --epochs 3 --train_files data/train/my-train.csv --dev_files data/dev/my-dev.csv --test_files data/test my_dev.csv --learning_rate 0.0001

and the training started
> Use tf.where in 2.0, which has the same broadcast rule as np.where

I Initializing variables…
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000

Epoch 0 | Training | Elapsed Time: 0:00:20 | Steps: 1 | Loss: 504.214539

Epoch 0 | Training | Elapsed Time: 0:00:41 | Steps: 2 | Loss: 327.344215

is it okay? why loss is high?

You have just started training, don’t expect it to have low loss on the beginning. That’s completely normal.

Thank you team for your kind help.

Hi,
i still getting res “” blank after continuing from pretrained model.

yk@andromeda:~/DeepSpeech$ python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ./checkpoint_old --epochs 3 --train_files data/train/train.csv --dev_files data/dev/dev.csv --test_files data/test/test.csv --learning_rate 0.0001

WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.

W1007 19:19:11.504933 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W1007 19:19:11.572091 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W1007 19:19:11.572299 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W1007 19:19:11.572416 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1007 19:19:12.182626 140293868754752 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W1007 19:19:12.184339 140293868754752 deprecation.py:506] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f986841fcc0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f986841fcc0>>: AttributeError: module 'gast' has no attribute 'Num'
W1007 19:19:12.208325 140293868754752 ag_logging.py:145] Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f986841fcc0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f986841fcc0>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:From DeepSpeech.py:232: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1007 19:19:12.278438 140293868754752 deprecation.py:323] From DeepSpeech.py:232: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
I Initializing variables...
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:42 | Steps: 2 | Loss: 327.344215

Epoch 0 |   Training | Elapsed Time: 0:01:27 | Steps: 4 | Loss: 279.011799


Epoch 0 |   Training | Elapsed Time: 0:02:18 | Steps: 6 | Loss: 384.225965

Epoch 0 |   Training | Elapsed Time: 0:06:20 | Steps: 14 | Loss: 273.964598

Epoch 0 |   Training | Elapsed Time: 0:06:58 | Steps: 15 | Loss: 263.730356

Epoch 0 |   Training | Elapsed Time: 0:11:29 | Steps: 21 | Loss: 239.360187

Epoch 0 |   Training | Elapsed Time: 0:12:25 | Steps: 22 | Loss: 238.142157



Epoch 0 |   Training | Elapsed Time: 0:14:55 | Steps: 24 | Loss: 251.470535
Epoch 0 | Validation | Elapsed Time: 0:00:14 | Steps: 10 | Loss: 105.280766 | Dataset: data/dev/dev.csv
I Saved new best validating model with loss 105.280766 to: ./checkpoint_old/best_dev-24
Epoch 1 |   Training | Elapsed Time: 0:00:19 | Steps: 1 | Loss: 72.282936

Epoch 1 |   Training | Elapsed Time: 0:01:28 | Steps: 4 | Loss: 88.395905

Epoch 1 |   Training | Elapsed Time: 0:01:51 | Steps: 5 | Loss: 86.035843

Epoch 1 |   Training | Elapsed Time: 0:05:41 | Steps: 13 | Loss: 105.126752


Epoch 1 |   Training | Elapsed Time: 0:08:16 | Steps: 17 | Loss: 114.653208

Epoch 1 |   Training | Elapsed Time: 0:14:52 | Steps: 24 | Loss: 147.739715
Epoch 1 | Validation | Elapsed Time: 0:00:14 | Steps: 10 | Loss: 100.421697 | Dataset: data/dev/dev.csv
WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W1007 19:49:33.111889 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I Saved new best validating model with loss 100.421697 to: ./checkpoint_old/best_dev-48
Epoch 2 |   Training | Elapsed Time: 0:05:04 | Steps: 12 | Loss: 98.081915

Epoch 2 |   Training | Elapsed Time: 0:05:41 | Steps: 13 | Loss: 98.861042

Epoch 2 |   Training | Elapsed Time: 0:06:58 | Steps: 15 | Loss: 104.322818

Epoch 2 |   Training | Elapsed Time: 0:07:37 | Steps: 16 | Loss: 106.783278


Epoch 2 |   Training | Elapsed Time: 0:14:55 | Steps: 24 | Loss: 143.179845
Epoch 2 | Validation | Elapsed Time: 0:00:14 | Steps: 10 | Loss: 99.873046 | Dataset: data/dev/dev.csv
I Saved new best validating model with loss 99.873046 to: ./checkpoint_old/best_dev-72
I FINISHED optimization in 0:45:29.704290
WARNING:tensorflow:Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f97d2f48a58>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f97d2f48a58>>: AttributeError: module 'gast' has no attribute 'Num'
W1007 20:04:43.839351 140293868754752 ag_logging.py:145] Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f97d2f48a58>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f97d2f48a58>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1007 20:04:43.909140 140293868754752 deprecation.py:323] From /home/yk/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from ./checkpoint_old/best_dev-72
I1007 20:04:43.909901 140293868754752 saver.py:1280] Restoring parameters from ./checkpoint_old/best_dev-72
I Restored variables from best validation checkpoint at ./checkpoint_old/best_dev-72, step 72
Testing model on data/test/test.csv
Test epoch | Steps: 7 | Elapsed Time: 0:00:08
Test on data/test/test.csv - WER: 1.000000, CER: 1.000000, loss: 34.051395
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 25.776178
 - wav: file:///home/yk/DeepSpeech/data/test/046.wav
 - src: "o uu"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 29.066339
 - wav: file:///home/yk/DeepSpeech/data/test/047.wav
 - src: "o   p "
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 31.200336
 - wav: file:///home/yk/DeepSpeech/data/test/041.wav
 - src: "oimmyy"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 35.002659
 - wav: file:///home/yk/DeepSpeech/data/test/044.wav
 - src: "syuui"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 36.742115
 - wav: file:///home/yk/DeepSpeech/data/test/045.wav
 - src: "ohndieya"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 37.746620
 - wav: file:///home/yk/DeepSpeech/data/test/043.wav
 - src: "o rrhof i"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 42.825520
 - wav: file:///home/yk/DeepSpeech/data/test/042.wav
 - src: "pfnry nya"
 - res: ""
--------------------------------------------------------------------------------

Three epochs ? How much data ? What tag / branch are you using ?

Following this link.

Continuing from pretrained model using check point.

train has 30 audios

dev has 10

test has 7

using this one