Training and Testing Accuracy vs Inference Accuracy

hmen97 · April 18, 2020, 8:03am

Hi,
I have been trying to train DeepSpeech (0.7.0-alpha.2) on a custom dataset of 69,000+ audio files. Setup the kenlm scorer after creating a custom lm.binary using the following commands:

./bin/lmplz --text …/…/tts/vocabulary.txt --arpa words.arpa --o 5 --discount_fallback

./bin/build_binary -T -s -v words.arpa lm.binary

cd to DeepSpeech/data/lm, ran the below command:

python3 generate_package.py --alphabet ~/April14/alphabet.txt --lm ~/April14/lm.binary --vocab ~/April14/vocabulary.txt --default_alpha 0.75 --default_beta 1.85 --package ~/April14/kenlm.scorer

I then ran the training with the following configuration flags:

python3 DeepSpeech.py
–train_files /home/ubuntu/April14/train/train.csv
–dev_files /home/ubuntu/April14/dev/dev.csv
–test_files /home/ubuntu/April14/test/test.csv
–test_batch_size 5
–dev_batch_size 5
–train_batch_size 32
–learning_rate 0.000055
–epochs 200
–early_stop True
–augmentation_freq_and_time_masking True
–augmentation_pitch_and_tempo_scaling True
–augmentation_spec_dropout_keeprate 0.8
–automatic_mixed_precision True
–train_cudnn True
–alphabet_config_path /home/ubuntu/April14/viet_alpha.txt
–export_dir /home/ubuntu/April14/results/model_export/
–checkpoint_dir /home/ubuntu/April14/checkout/
–summary_dir /home/ubuntu/April14/summary/
–scorer_path /home/ubuntu/April14/kenlm.scorer
–export_language Vietnamese
–es_epochs 30 \

The result of the training was:

Epoch 79 | Training | Elapsed Time: 0:28:28 | Steps: 2099 | Loss: 0.034167
Epoch 79 | Validation | Elapsed Time: 0:00:15 | Steps: 137 | Loss: 0.300532 | Dataset: /home/ubuntu/April14/dev/dev.csv
I Early stop triggered as the loss did not improve the last 30 epochs
I FINISHED optimization in 1 day, 14:28:35.242939

The result of the testing was:

Test on /home/ubuntu/April14/test/test.csv - WER: 0.001845, CER: 0.003121, loss: 0.295476

So in conclusion it did really well, however the results weren’t so good while creating inferences.
For example:
src: tôi có phải đóng tiền cho nhân viên khi đăng ký vay mua xe không
res: hàng bô đo tquái gàn đaơng vgh loa lưng lôi th lản h vay lua gha la a đ

How can I use the below parameters to get better augmentations for the data?

Standard deviation for Gaussian additive noise: data_aug_features_additive
Standard deviation for Normal distribution around 1 for multiplicative noise: data_aug_features_multiplicative
Standard deviation for speeding-up tempo. If Standard deviation is 0, this augmentation is not performed: augmentation_speed_up_std

PS: if you have any other suggestions please let me know.

lissyx · April 18, 2020, 8:31am

Please analyse training through the training loss and validation loss. Likely you are jsut super overfitting.

hmen97 · April 18, 2020, 8:45am

But the Validation loss seems to have followed the training loss.

If this is overfitting then what is the permissible difference in loss between training and validation?

PS: thank you for the quick response.

reuben · April 18, 2020, 9:27am

10 times lower training loss than validation loss is definitely suspicious. I don’t think we can give you a hard difference value because this will change from dataset to dataset, as well as depend on other hyperparameters. You’ll have to investigate yourself.

Make sure your validation and test sets are well designed, watch for overlap of speakers, repeated transcripts, maybe even repeated audios? Make sure they both accurately match the types of audio you want to do inference on, otherwise the WER/CER numbers you’ll see will mean nothing in regards to the real model performance.

hmen97 · April 18, 2020, 10:10am

I had a doubt that it was doing “too” well. But the test set accuracy and tensorboard visulaization was misleading.
I’ll revisit and try to avoid overfitting. But I still want to know how much does augmentation affect the results. If I can use these augmentation parameters correctly wouldn’t it significantly reduce the chance of overfitting?

reuben · April 18, 2020, 11:36am

The first step is making sure your validation and test sets actually match the audio you want to do inference on. The fact that you got such a low loss and WER on the test set but then when trying other audios the results weren’t good enough suggests that the validation and test sets aren’t actually representative of your intended use case. Without a good validation set you’ll be trying hyperparameters blindly (including augmentation ones), which is just a waste of time.

dabinat · April 18, 2020, 7:20pm

69k files is around 100 hours, depending on audio length. In order to get decent results when training from scratch, your dataset should be in the thousands of hours.

One way around this is to use transfer learning from the English pre-trained checkpoints. This should be easier with 0.7 once the 0.7 checkpoints are available.

hmen97 · April 23, 2020, 3:30pm

I tried to do transfer learning like @dabinat suggested. I read in one of the discourses that we can’t use transfer learning from the master as the transfer_learning branch isn’t merged yet, so it doesn’t allow you to finetune the dropped layers from the pre-trained model. So I cloned the transfer-learning2 branch as it it was something @lissyx had suggested, again in one of the discourse. But the ds_ctcdecoder is missing, I ran the below command:

pip3 install $(python3 util/taskcluster.py --branch v0.5.1 --decoder)

got the below response:

Collecting ds-ctcdecoder==0.5.0a6
ERROR: HTTP error 404 while getting https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.1.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl
ERROR: Could not install requirement ds-ctcdecoder==0.5.0a6 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.1.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl because of error 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/bftpNURqQkixvDjOFLSlCQ/artifacts/public%2Fds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl
ERROR: Could not install requirement ds-ctcdecoder==0.5.0a6 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.1.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl because of HTTP error 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/bftpNURqQkixvDjOFLSlCQ/artifacts/public%2Fds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl for URL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.5.1.cpu-ctc/artifacts/public/ds_ctcdecoder-0.5.0a6-cp35-cp35m-manylinux1_x86_64.whl

Specifications:

Python 3.5.2
Ubuntu 16.04 xenial x86_64
DeepSpeech Version 0.5.0-alpha.6

PS: added llvmlite==0.31.0 in the requirements.txt.

Was this the wrong branch or did I do something wrong?

othiele · April 23, 2020, 5:39pm

The current master for the 0.7 branch supports transfer learning and makes it easy to use. I would suggest you clone the current master and check the flags.py file for parameters. There was some mishap with older decoder files which are not longer available, sorry.

lissyx · April 23, 2020, 7:37pm

Unfortunately, those artifacts expired and we can’t republish them. Also, those URLs are for the old TaskCluster infra. You should be able to rebuild native_client/ctcdecoder to produce it.

hmen97 · April 24, 2020, 9:18am

After inputs from @othiele and @lissyx, I went back to the 0.7.0-alpha.2 version of DeepSpeech. To achieve transfer learning, used the following flags:

python3 DeepSpeech.py
–train_files /home/ubuntu/April19/train/train.csv
–dev_files /home/ubuntu/April19/dev/dev.csv
–test_files /home/ubuntu/April19/test/test.csv
–test_batch_size 5
–dev_batch_size 5
–train_batch_size 32
–learning_rate 0.000001
–epochs 200
–early_stop True
–es_epochs 20
–es_min_delta 0.05
–report_count 160
–data_aug_features_additive 1
–augmentation_speed_up_std 0.1
–augmentation_freq_and_time_masking True
–augmentation_pitch_and_tempo_scaling True
–augmentation_spec_dropout_keeprate 0.8
–automatic_mixed_precision True
–drop_source_layers 1
–alphabet_config_path /home/ubuntu/April19/viet_alpha.txt
–export_dir /home/ubuntu/Transfer_learning/April23/results/model_export
–save_checkpoint_dir /home/ubuntu/Transfer_learning/April23/checkout
–load_checkpoint_dir /home/ubuntu/DeepSpeech/deepspeech-0.6.1-checkpoint
–summary_dir /home/ubuntu/Transfer_learning/April23/summary/
–scorer_path /home/ubuntu/April19/kenlm.scorer
–export_language Vietnamese
–export_name april23
–test_output_file /home/ubuntu/Transfer_learning/April23/testing.logs \

I have used the checkpoints from the 0.6.1 checkpoints release. First I tried it on the 0.6.1 release but it has no support for Transfer Learning. So then tried it with the 0.7.0-alpha.2 version.
I got the following error:

/home/ubuntu/DeepSpeech/util/feeding.py:146: UserWarning: Seed 4568 from outer graph might be getting used by function Dataset_map_<class ‘functools.partial’>, if the random op has not been provided any seed. Explicitly set the seed in the function if this is not the intended behavior.
.map(process_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE))
I Enabling automatic mixed precision training.
I Loading best validating checkpoint from /home/ubuntu/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-233784
I Loading variable from checkpoint: cond_1/beta1_power
Traceback (most recent call last):
File “DeepSpeech.py”, line 912, in
absl.app.run(main)
File “/home/ubuntu/.virtualenvs/deepspeech/lib/python3.5/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/ubuntu/.virtualenvs/deepspeech/lib/python3.5/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 885, in main
train()
File “DeepSpeech.py”, line 515, in train
load_or_init_graph(session, method_order)
File “/home/ubuntu/DeepSpeech/util/checkpoints.py”, line 103, in load_or_init_graph
return _load_checkpoint(session, ckpt_path)
File “/home/ubuntu/DeepSpeech/util/checkpoints.py”, line 70, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File “/home/ubuntu/.virtualenvs/deepspeech/lib/python3.5/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py”, line 915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cond_1/beta1_power not found in checkpoint

Is 0.7.x not compatible with 0.6.1 checkpoints? If not, then are there any 0.7.x checkpoints that I could use?

othiele · April 24, 2020, 9:27am

Sorry, I thought you had your own material and trained a model on it and now you wanted to do transfer. There is no 0.7 model yet. You can try the old transfer branch with a lot of work or wait a little til the 0.7 model.

othiele · April 24, 2020, 9:27am

And 0.6 models don’t work for the 0.7 branch.

hmen97 · April 24, 2020, 9:38am

Oh, can you tell me approximately when it would be available? In a month or longer than that?

othiele · April 24, 2020, 9:41am

It’s like the Sars 2 crisis, it takes time but nobody knows how long Sorry, I am not on the team and last I read was it will be released soonish … If you train yourself you know that it’s hard to predict whether the next model will run through smoothly.

hmen97 · April 24, 2020, 9:45am

Haha, guess I’m going to have to get into the 0.6.1 code and figure out ways to get more data. Thanks for the help. @othiele @lissyx @dabinat @reuben

lissyx · April 24, 2020, 10:46am

Hopefully, much much less. We don’t have a date yet, but we’re very close.

hmen97 · April 24, 2020, 11:13am

That’s great. Looking forward to it.

othiele · April 25, 2020, 8:53am

Time moves fast, 0.7 model is out

hmen97 · April 25, 2020, 9:01am

That’s amazing, DeepSpeech team for the win.

Topic		Replies	Views
Noise injection training experiment DeepSpeech learning , feedback , dataset	33	1984	September 16, 2020
Train model but actual prediction is too poor DeepSpeech	53	1687	May 5, 2020
Question with DeepSpeech Transfer Learning DeepSpeech	40	2885	March 28, 2020
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3262	October 8, 2019
Is DeepSpeech not meant for one word audio files? DeepSpeech	27	1492	July 30, 2020

Training and Testing Accuracy vs Inference Accuracy

Related topics