How long does it take to train the model with Common voice training set?

(jackhuang) #1

I use one GPU (Tesla, 12G) to train the model and I have trained the model with default parameters for 11 days, but the training progress(75 epoch) have not finished.
I have used “ctrl + c” to interrupt the training process for 2 times and continued the training process with the same checkpoint_dir. I wander whether my configuration or my operation is wrong, so the program will take much time training.

(Lissyx) #2

Training on our cluster (16 GPUs TITAN X, 12G) takes severals hours for each epoch, so full Common Voice with just one GPU is not unlikely to take a long time. How much epochs have you completed over 11 days ? Maybe not even one ? Might not be that surprising.

(jackhuang) #3

I find that some of the data show in the “train-other.csv” file are defective. For example, the transcript of the data is blank.

(jackhuang) #4

Besides, when I add “–display_step 1” to my command, the training process will be very slow. I am very curious about this.

(Lissyx) #5

There’s nothing to be curious about, display_step computes WER, which is very intensive to compute.

(jackhuang) #6

Is there some ways that make the program not compute the WER report after finishing the training process?