What are the options for someone without a proper GPU? Cloud services, VMs or external GPUs?

Contributers don’t really have an advantage, very small, but sounds good for marketing :slight_smile:

Why don’t you get everything up and running on a Google Colab GPU/TPU before starting your own server. Haven’t tried many hours, but judging from the forum some people do all their training there and it is great for testing.

1 Like

I don’t know, does Google Colab GPU/TPU have any advantage over any other server? I only pay by the minute at my service and I don’t have a credit card (and I don’t want to get one just for this). This is why I could not use Amazons AWS. But I haven’t looked into Google Colab yet.

Colab is free with a Google account, but you only have 2 hours or sth like that :slight_smile:

As setting everything up and getting the data to start training will take you about 10 hours, this is great if you are on a budget. Once you know everything it’ll take only 15 minutes on the server.

Yes I am planing to create a little script for the initial setup, there are a few mistakes I do again and again (like forgetting to install sox libsox-fmt-mp3).

I paid for 24 hours at LeaderGPU, so I can collect some data now. If it turns out that one epoch is quick then I can use also free services. But in general I hope I can slowly learn more about deepspeech as the dataset grows. Having a small dataset to start with looks like an advantage to me because my main goal is learning and creating a little nerdy project on the way.

I am willing to invest 30-50 € a month, so I will have a day or two of computation time per month. This should be enough to experiment with a small model.

Mycroft supports deepspeech for a while now.

1 Like

And just on a side note, thanks to @agarwalaashish20 we have a German tflite model now that can be used for that :slight_smile:

1 Like

A little update: I’ve chosen https://www.exoscale.com , a swiss provider where you can get all sorts of GPUs including V100s (after contacting the support if you don’t want to buy a complete month).

I did a first experiment with the old dataset a while ago, one epoch took 3 hours with a 1080 Ti (32Gb). I believe this is a little long, since the train.tsv is a lot smaller than the 35h of the dataset and the new release of Common Voice more than doubled the available data to 83 hours. Would this be quicker on a V100 or P100?

My parameters so far are:

python3 DeepSpeech.py --train_files …/eo/clips/train.csv --dev_files …/eo/clips/dev.csv --test_files …/eo/clips/test.csv --automatic_mixed_precision --train_batch_size 16 --epochs 7

What can I do to optimize this? How useful is —use_cudnn_rnn ?

1 Like

Doesn’t sound too bad, but I can’t see how many hours you are using, resulting in how many chunks and maybe some output of the logs?

Hello @othiele, if I want to train 1000h dataset in a server/workstation with GPU, What hardware specification should I use would be more efficient? RAM capacity/cores of CPU still need to powerful?

Try a V100 with 10-12 CPUs/RAM, that should work fine

I see. Thank you for your reply. I have one more question.

Assume I train my own model in Docker with the usage of cuDNN, is it still need a little bit more RAM for the Docker to process the training? For example, the server has 128GB RAM, Should I assign more than half RAM to the Docker for running smoothly? Or just consider the RAM size of the GPU?

Good point, haven’t worked much with Docker yet, but @lissyx, @utunga, @dan.bmh: Do you have some idea how much RAM to use for a Docker instance?

Are we talking about RAM or VRAM ?

Training my small German which has about 32h usually takes 2-3h on my computer with RTX2070 + 16GB ram. The bigger 1000h model did take 8d6h on 2x1080Ti + 64GB.

You can find the docker setup I’m using here:

You also should be able to use and extend the project to Esperanto without much effort if you like. For better results I’d also recommend to use transfer-learning with the English checkpoint.


According to this article you can train 12h for free on Colab and 9h on Kaggle, which should be enough for your dataset:

3 Likes

I’m no Docker expert, I did not knew you had to “dedicate” RAM for that, and I certainly don’t do that when training using https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

I also don’t set it, but it may be required for the cloud VM. When training with slurm + singularity I have such an option and normally just set RAM to 2xVRAM size.

1 Like

Was this 32 h model useful in any way? What could you do with it?

Nothing :smile: I only use it to run different experiments because training is quite fast …
You can find them under “Voxforge” in my readme at the results chapter.

But with transfer-learning and noise augmentation I got WER down to 0.206 (it’s somewhat easier than common-voice), so it may be usable, especially if you can use a small domain specific language model.

1 Like

@stergro You can find some additional data here (~2-3h I’d estimate):
https://tatoeba.org/epo/sentences/search?query=&from=epo&to=und&user=&orphans=no&unapproved=no&has_audio=yes&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort_reverse=&sort=relevance