How do I understand approximate time for the training of dataset with 3 GB size?

Denyz_Pylypenko · February 2, 2021, 10:20am

I downloaded Russian dataset of size 3 GB and I am running training on 2-core machine. My recent checkpoint just passed point of 150k and it has been running for a last few months.

Can I somehow estimate the amount of time left until it is done?

othiele · February 2, 2021, 10:24am

If you are running on CPU only, this can take a while.

Take the nr of chunks and divide by the batch size. This will give you the steps for each epoch.

Hint: Use a GPU machine on Google Colab. You’ll have a model after a couple of hours. But it is not permament So download your results.

Denyz_Pylypenko · February 2, 2021, 10:43am

Sounds amazing. Will definitely try Google Colab.

Denyz_Pylypenko · February 23, 2021, 7:42pm

hey, Olaf
can you please tell me where I can find a full guide on how to train Mozilla Voice datasets on Google Colab

p.s. I am JavaScript engineer so I have some difficulties understanding Python

othiele · February 23, 2021, 8:02pm

I guess you have 2 choices as I don’t have a running Colab currently:

Find a Linux server you can use and go by the playbook linked in the guidelines.
Search here for a Colab that works. I remember one just recently for inferencing, but might have been training too. Or simply do one step after the other as in the readme. It works. Post everything in a new thread and I am happy to help.