How do I understand approximate time for the training of dataset with 3 GB size?

I downloaded Russian dataset of size 3 GB and I am running training on 2-core machine. My recent checkpoint just passed point of 150k and it has been running for a last few months.

Can I somehow estimate the amount of time left until it is done?

If you are running on CPU only, this can take a while.

Take the nr of chunks and divide by the batch size. This will give you the steps for each epoch.

Hint: Use a GPU machine on Google Colab. You’ll have a model after a couple of hours. But it is not permament :frowning: So download your results.

1 Like

Sounds amazing. Will definitely try Google Colab.

hey, Olaf
can you please tell me where I can find a full guide on how to train Mozilla Voice datasets on Google Colab

p.s. I am JavaScript engineer so I have some difficulties understanding Python

I guess you have 2 choices as I don’t have a running Colab currently:

  1. Find a Linux server you can use and go by the playbook linked in the guidelines.

  2. Search here for a Colab that works. I remember one just recently for inferencing, but might have been training too. Or simply do one step after the other as in the readme. It works. Post everything in a new thread and I am happy to help.