Lately I’ve been trying to fine-tune the latest release model to consistently recognize certain medical terms such as “eczema.”
Computing Environment:
- DeepSpeech v0.7.4
- Ubuntu 18.04 LTS
- Python 3.6.9
- Tensorflow 1.15.2
I have not been able to find many other resources on this forum or through Google. I understand that in this release, the deepspeech-0.7.4-models.scorer replaced the language model and trie so my first inclination is that I need to update that as well to fit the newly fine-tuned model. But I’m not sure if that’s the root of the problem.
Here’s when I run the original pre-trained model:
Given Audio:
$ deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav Loading model from file deepspeech-0.7.4-models.pbmm TensorFlow: v1.15.0-24-gceb46aa DeepSpeech: v0.7.4-0-gfcd9563 2020-07-29 11:46:06.984198: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 0.187s. Loading scorer from files deepspeech-0.7.4-models.scorer Loaded scorer in 0.136s. Running inference. experience proves this Inference took 12.036s for 1.975s audio file.
One of my test files: (transcription: “tell me about your eczema”)
$ deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio data/bm_test/Eggs.wav Loading model from file deepspeech-0.7.4-models.pbmm TensorFlow: v1.15.0-24-gceb46aa DeepSpeech: v0.7.4-0-gfcd9563 2020-07-29 12:03:59.036974: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 0.018s. Loading scorer from files deepspeech-0.7.4-models.scorer Loaded scorer in 0.00878s. Running inference. tell me about your eggs a Inference took 10.176s for 2.805s audio file.
Here’s when I run the fine-tuned model:
Given Audio:
$ deepspeech --model exports/output_graph.pb --scorer deepspeech -0.7.4-models.scorer --audio audio/2830-3980-0043.wav Loading model from file exports/output_graph.pb TensorFlow: v1.15.0-24-gceb46aa DeepSpeech: v0.7.4-0-gfcd9563 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2020-07-29 11:46:44.780034: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 1.03s. Loading scorer from files deepspeech-0.7.4-models.scorer Loaded scorer in 0.00873s. Running inference. iii Inference took 7.992s for 1.975s audio file.
One of my test files: (transcription: “tell me about your eczema”)
$ deepspeech --model exports/output_graph.pb --scorer deepspeech -0.7.4-models.scorer --audio data/bm_test/Eggs.wav Loading model from file exports/output_graph.pb TensorFlow: v1.15.0-24-gceb46aa DeepSpeech: v0.7.4-0-gfcd9563 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2020-07-29 12:09:32.695840: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 0.227s. Loading scorer from files deepspeech-0.7.4-models.scorer Loaded scorer in 0.00892s. Running inference. tell me about your eczema Inference took 9.169s for 2.805s audio file.
Here’s My .sh file to run the fine-tuning session:
#!/bin/sh set -xe bm_dir="./data/bm_test" bm_csv="${bm_dir}/bm.csv" if [ ! -f ]; then echo "Please make sure you run this from DeepSpeech's top level directory." exit 1 fi; # Force only one visible device because we have a single-sample dataset # and when trying to run on multiple devices (like GPUs), this will break export CUDA_VISIBLE_DEVICES=0 python -u --noearly_stop \ --train_files ${bm_csv} --train_batch_size 1 \ --dev_files ${bm_csv} --dev_batch_size 1 \ --test_files ${bm_csv} --test_batch_size 1 \ --n_hidden 2048 --epochs 10 \ --max_to_keep 1 --save_checkpoint_dir './fine_tuning_checkpoints' \ --load_checkpoint_dir './load_checkpoints' \ --load_cudnn \ --export_dir './exports' \ --learning_rate 0.0005 --dropout_rate 0.05 \ --scorer_path 'deepspeech-0.7.4-models.scorer' | tee /tmp/resume.log if ! grep "Loading best validating checkpoint from" /tmp/resume.log; then echo "Did not resume training from checkpoint" exit 1 else exit 0 fi
In the ‘load_checkpoints’ directory I have the checkpoint data from the v0.7.4 release:
$ ls load_checkpoints/ alphabet.txt best_dev-732522.index best_dev_checkpoint best_dev-732522.meta checkpoint
This is my data/bm_test/ directory I’m using for the training data:
ls data/bm_test/ Common.wav EczemaFlareUps.wav IBSEffect.wav ManageIBS.wav SetIBS.wav WhatManage.wav DietGuide.wav Eggs.wav IBSPrescription.wav Metals.wav ShelterFood.wav WorkEffect.wav Difficult.wav Episode.wav IBSSymptoms.wav ObtainPermission.wav TalkAboutIBS.wav bm.csv DiscussIBS.wav Foods.wav Instigate.wav Prescription.wav Topical.wav DiscussMetals.wav HiAlan.wav Life.wav ProbsWithIBS.wav TreatIBS.wav Eczema.wav Hygiene.wav ManageEczema.wav Religious.wav WhatHelps.wav
This is my bm.csv:
wav_filename,wav_filesize,transcript /home/wes/DeepSpeech/data/bm_test/Eggs.wav,89804,tell me about your eczema /home/wes/DeepSpeech/data/bm_test/HiAlan.wav,101278,hi alan my name is wesley /home/wes/DeepSpeech/data/bm_test/ObtainPermission.wav,173930,is it alright if we talk in this non clinical setting /home/wes/DeepSpeech/data/bm_test/Religious.wav,135198,do you have any religious or spiritual affiliations /home/wes/DeepSpeech/data/bm_test/TalkAboutIBS.wav,88844,tell me about your ibs /home/wes/DeepSpeech/data/bm_test/ProbsWithIBS.wav,80664,what problems have you been having with your ibs /home/wes/DeepSpeech/data/bm_test/EczemaFlareUps.wav,73976,please describe you eczema flare ups /home/wes/DeepSpeech/data/bm_test/Life.wav,138718,how has eczema affected your life /home/wes/DeepSpeech/data/bm_test/Eczema.wav,66764,eczema /home/wes/DeepSpeech/data/bm_test/Metals.wav,121438,certain metals can cause eczema to develop /home/wes/DeepSpeech/data/bm_test/DiscussMetals.wav,114718,discuss metals as an eczema trigger /home/wes/DeepSpeech/data/bm_test/Topical.wav,117324,have you considered using topical creams for eczema /home/wes/DeepSpeech/data/bm_test/Prescription.wav,130398,lets talk about possibly using prescriptions for your eczema /home/wes/DeepSpeech/data/bm_test/ManageEczema.wav,86284,how do you manage your eczema /home/wes/DeepSpeech/data/bm_test/WhatManage.wav,86878,what do you do to manage your eczema /home/wes/DeepSpeech/data/bm_test/WorkEffect.wav,134558,how has work afftected your eczema and ibs /home/wes/DeepSpeech/data/bm_test/SetIBS.wav,87884,what sets off your ibs /home/wes/DeepSpeech/data/bm_test/Instigate.wav,104524,what instigates your ibs /home/wes/DeepSpeech/data/bm_test/Foods.wav,109004,do you know of any food that triggers ibs /home/wes/DeepSpeech/data/bm_test/Common.wav,100684,some common foods can trigger ibs /home/wes/DeepSpeech/data/bm_test/IBSSymptoms.wav,111518,please describe some of your ibs symptoms /home/wes/DeepSpeech/data/bm_test/Episode.wav,131690,can you tell me about an ibs episode you have had /home/wes/DeepSpeech/data/bm_test/IBSEffect.wav,90764,how has ibs affected your life /home/wes/DeepSpeech/data/bm_test/DiscussIBS.wav,122078,lets talk about how ibs has affected your quality of life /home/wes/DeepSpeech/data/bm_test/ShelterFood.wav,110878,how does the shelter food affect your ibs /home/wes/DeepSpeech/data/bm_test/TreatIBS.wav,102238,lets look at some treatment options for ibs /home/wes/DeepSpeech/data/bm_test/IBSPrescription.wav,89758,have your considered prescriptions for ibs /home/wes/DeepSpeech/data/bm_test/DietGuide.wav,112158,discuss dietary guidelines for ibs /home/wes/DeepSpeech/data/bm_test/WhatHelps.wav,83038,what will help your ibs
I understand that this is not a large amount of training data. I’m really just trying to get the basics of fine-tuning down first.
I’m not entirely sure If it is the way that I have trained the model or if its the way that I’m running it for inference. When I searched the internet for how to run the newly trained model it was referring to older models instead of the v0.7.4 where the scorer has replaced the lm and trie. Please point me in the right direction