ZeroDivisionError: float division by zero

Hi,

Environment:
Ububtu VM : 16.04 On Azure
Cuda: 10.0
Cudnn: 7.6.5
Tensorflow-gpu: 1.15.4
Deepspeech : 0.9.3

We are trying to sample train a small dataset from checkpoints before moving to train to our complete dataset. We were able to setup and train successfully in our local GPU setup, and have moved to Azure’s Linux VM to take advantage of better GPUs inorder to speed up our training for larger dataset. The train, dev and test csv contents of the sample training dataset are as given below:

----train.csv----

,wav_filename,wav_filesize,transcript
0,harqen_samples/Aaron_Findling/0/Ad710aa24c8b840b094d_16K.wav,351084,this is a common theme where i work in the icu and health care and general unfortunately it’s not uncommon to walk on the floor and be short staffed whether it’s anciliary staff for actual registered nurses so when it comes to this we you know work together as a team at saint luke where i work we typically try to assess who has the higher loads and availability for time and a kind of chip in and work as a team generally speaking we manage our situations very well it’s amazingly done as well as we have since especially since the outbreak of the corona virus in march
1,harqen_samples/Aaron_Findling/1/A30b4d8c50f314f0e854_16K.wav,397269,a good example of one i provided excellent customer service most recently i can think is we had a patient who was essentially homeless he was living in a hotel cross town with his girl friend he had and spent several days on our unit was well enough to be discharged but couldn’t obtain a ride and i end up volunteering to take him home to short ride home as he was only about half a mile from my house i got him off at the hotel and his girlfriend had actually drawn me a picture of a turkey as it was actually a couple of his last thanksgiving two thanksgivings ago that’s a good example i think of time i provided excellent care
2,harqen_samples/Aaron_Findling/2/Adf28199153094df2870_16K.wav,430079,i recently took care of a patient who’s daughter thought she was not but the patient’s daughter felt the patient was not getting the care she felt that she should be getting this kind of a build up over several days and actually she was so angry that effective communication really wasn’t an option so at that point i realized i just want to listen her that let her then to me when she had finally calm down i was so sure asked her what issues specifically that i had been i could fix and address and after about a fifteen twenty minute conversation she was much happier and you understand we understand the daughters concerns as well as the concerns of the daughter felt that the patient needed address and actually looked up well
3,harqen_samples/Abigail_Wojciechowski/0/A6fe7d406bfb24488a2e_16K.wav,513670,one example of having like a heavy assignment with the my first travel assignment um it was our first week starting out it was my first travel assignment and we had six patients each and it was a little stressful because trying to figure out like the new hospital new charting system the pretty much i just expressed i need a little bit assistance from the charge nurse and she was able to help me so we got through the day and m and then there’s up being more stressful just in the moment but once we got through it it was it was a lot better so i think just taking a deep breath and relaxing and thinking you can do this and you could pretty much get through anything
4,harqen_samples/Abigail_Wojciechowski/1/A75e22c37d0084ac19f3_16K.wav,533941,well i try to go above and beyond the patients because i always try to think of what if it’s me or somebody in my family in their shoes and and how it i want that nurse to treat my family member em one and since i can kind of think of is at the hospital i used to work out i had only spanish speaking patient who was pretty sick and he had a a a child with him that was probably in elementary school and they didn’t know english and there was nobody that could come and pick up the child and the kid hadnt eaten and it was about lunch so i ran up to the cafeteria and got in some lucky charms and milk and some other snacks in and brought it back down to him so he could eat and it just made my day seeing how happy was to eat and watch cartoons and we could take care of him and that made the patient feel more comfortable as well so i’d say that that was my experience with that
5,harqen_samples/Abigail_Wojciechowski/2/A8e5e1a540c0242608e0_16K.wav,646790,i feel like working our level on trauma center getting to like trauma patients frequently and especially during covid um we come across a lot of angry and upset patients and family members um i can’t particularly think of a specific time but am just overall mostly what the patients in the family members are just confused um just leveling with them and try to think like what if i was in there shoes and i would probably be angry as well such as kind of thinking how can i help you kind of be on their level and explain things better um don’t minimize their feelings just recognize that they’re upset and tell me about that something what i can do to make your experience better um and i feel like that typically the best route to go and when somebody’s angry usually they just need information or just want they’re just confused and want things explained to them better and and once you get that out to them they they usually tone down and apologize and say i’m sorry that i was acting that way so and yeah that that happens a lot in the er and i feel like i know how to deal with that pretty well

----test.csv----

,wav_filename,wav_filesize,transcript
0,harqen_samples/Alfredo_Bantug/0/Af0f836c690bc4873bb0_16K.wav,755878,through my respitory care experience as far as it’s concerned having a patient with a a a usual heavy ward is pretty much common premise a a in order to to do the right thing for the patient therapy is for the ball time management and you should know how to how to prioritize things first for in order when you do have a good outcome and a most important thing is quality of care to it doesnt matter how it doesn’t matter how hard the work or the work work load as long you have a good time management and you do know how to prioritize things i’m pretty much sure you could give a good quality of care to the patient and oh i’m pretty much sure the patient will be satisfied with your service to them
1,harqen_samples/Alicia_Norton/0/Ae17d4a13af3947a0987_16K.wav,293615,at my current place of work it’s not unusual for us to be out of ratio as well as having no support such as cna’s or charge nurses or research nurses during those times i find it best to take a breath to prioritize my care and my patience into prioritize my intervention and what needs to be done first and what is most critical during those times i also find it helpful to work with my other fellow nurses being able to work as a team and a team environment and staying positive about our situation has helped make those situations much better and tolerable
2,harqen_samples/Arlena_Quiring/0/A8cbdc4968a754048bf3_16K.wav,101354,sixty patients at night um three deaths and just had two three hours more important and made it through the night

----dev.csv—

,wav_filename,wav_filesize,transcript
0,harqen_samples/Alexan_Tran/2/Ad8ab52eb56b44be78ed_16K.wav,939780,i remember this patient in particular because there is story and how they became sick was very unfortunate they were in a construction accident which left them a quadriplegic they could just barely move their heads and they could still talk but they no longer had news of their extremities and they were very young and because of what happened to them they were very angry and they lashed out at medical professionals when we were just trying to help because they were just frustrated in situation in times like that i just remember grace in that you move with kindness and intention and you recognize that sometimes the anger or frustration behind a patient or family members words isn’t always directed at you per se but rather than the situation and if it is i you then you take it into account and you listen to them and really try to solve the problem if not then try to escalate to person that might be able to solve them and sometimes if something can’t be solved and explain it so that they understand that to the protocols and place and unfortunately their issue cannot be solved that patient in particular was angry that they weren’t able to eat because they were they had to be unpure which means nothing per moutn for the majority of the night so that they could have procedure in the morning the family was also very upset
1,harqen_samples/Allison_White/2/Ae70ace6e7a9048c6bc0_16K.wav,405628,i had encounter with the angry family member before just simple as something not being done by cna or you know something like that so they wasn’t pleased with it so i something and it was resolved easily just by listening i listen to the family member concerned with that was dissatisfied with an i help them to let them know that ok i will personally ensure that its done even if i have to do it myself listening actually work just listening to today concerns let them then on what hasn’t been done what needs to be done and after you listen and let them know that you will personally take care of it and you take care of it that easily resolved the issue
2,harqen_samples/Brooks_Rock/0/A40398ee2d0e64219a35_16K.wav,765282,ah yes the overly intense assignment question so during this pandemic i think a lot of us nurses have dealt with i know quite intense assignment i was recently working on a covid unit and most of my patients were all total care patience and you know gowning up for each of the patients alone takes a good amount of time let alone taking care of four or five total care patients so that day in particular we didn’t have very good support staff with us to help us out and help you know care for these patients feed them et cetera so as a very heavy assignment and i it was undoable for one person really so what i ended up doing was having a conversation with the charge nurse who was working that day stating that you know my concern about the assignment particular and how i feel that the could could have been changed a little bit maybe for the next shift coming on and she then voiced her opinion back stating that would be great you know i always changed the assignment up and i will let you know some pca staff that the patients in particular do need more care a little bit more help you know around the unit that that was really how i went about battering the situation

After moving the sample training to Azure VM, we are getting the following error:

(deepspeech_gpu2) azureuser@revathy-gpu:~/cloudfiles/code/Users/Revathy.G/deepspeech_training/DeepSpeech$ python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /home/azureuser/cloudfiles/code/Users/Revathy.G/deepspeech_training/fine_tuning_checkpoints/ --epochs 1 --train_files ../training_csvs_small/train.csv --dev_files ../training_csvs_small/dev.csv --test_files ../training_csvs_small/test.csv --learning_rate 0.0001 --export_dir ../output_models/ --use_allow_growth true --train_cudnn true --train_batch_size 1I Loading best validating checkpoint from /home/azureuser/cloudfiles/code/Users/Revathy.G/deepspeech_training/fine_tuning_checkpoints/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:08 | Steps: 1 | Loss: 298.209839                                                              
Epoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 0 | Loss: 0.000000 | Dataset: ../training_csvs_small/dev.csv                      
Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/revathy-gpu/code/Users/Revathy.G/deepspeech_training/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
    absl.app.run(main)
  File "/anaconda/envs/deepspeech_gpu2/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/anaconda/envs/deepspeech_gpu2/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/revathy-gpu/code/Users/Revathy.G/deepspeech_training/DeepSpeech/training/deepspeech_training/train.py", line 954, in main
    train()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/revathy-gpu/code/Users/Revathy.G/deepspeech_training/DeepSpeech/training/deepspeech_training/train.py", line 622, in train
    dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero

We went through other community posts on the similar error, and found this error could be caused due to issues with dataset. However, we are able to train with the exact same files on the local setup and are overcoming this error after we migrated to Azure. Can you please help us resolve and provide us with possible direction on this?

Thanks in advance!

The fact that you confirm it works locally means there’s mostly nothing we can do to help there.

You need to triple check your setup on azure. And please follow the documented guidelines for reaching support, you don’t even share actionable details on your setup (GPU used etc.)

Hi,

Thanks for your reply. Can you please link the documented guidelines for support to post more valuable information to help resolve this, I tried searching, but of no avail.

On Azure, we are trying to train on GPU - 4 x NVIDIA Tesla V100.

It’s a pinned topic … [READ FIRST] What and how to report if you need support

Hi @lissyx,

Training or just running inference - Training from checkpoint
Mozilla STT branch/version - 0.9.3 (as already mentioned)
OS Platform and Distribution (e.g. Linux Ubuntu 18.04) - Ubuntu 16.04 on Azure VM (as already mentioned)
Python version - 3.7
TensorFlow version - 1.15.4 (tensorflow-gpu as already mentioned )
Cuda: 10.0
Cudnn: 7.6.5
GPU : GPU - 4 x NVIDIA Tesla V100 .

This issue is resolved, after we increased the size of the train, test and dev datasets.

1 Like