What are the options for someone without a proper GPU? Cloud services, VMs or external GPUs?

And just on a side note, thanks to @agarwalaashish20 we have a German tflite model now that can be used for that :slight_smile:

1 Like

A little update: I’ve chosen https://www.exoscale.com , a swiss provider where you can get all sorts of GPUs including V100s (after contacting the support if you don’t want to buy a complete month).

I did a first experiment with the old dataset a while ago, one epoch took 3 hours with a 1080 Ti (32Gb). I believe this is a little long, since the train.tsv is a lot smaller than the 35h of the dataset and the new release of Common Voice more than doubled the available data to 83 hours. Would this be quicker on a V100 or P100?

My parameters so far are:

python3 DeepSpeech.py --train_files …/eo/clips/train.csv --dev_files …/eo/clips/dev.csv --test_files …/eo/clips/test.csv --automatic_mixed_precision --train_batch_size 16 --epochs 7

What can I do to optimize this? How useful is —use_cudnn_rnn ?

1 Like

Doesn’t sound too bad, but I can’t see how many hours you are using, resulting in how many chunks and maybe some output of the logs?

Hello @othiele, if I want to train 1000h dataset in a server/workstation with GPU, What hardware specification should I use would be more efficient? RAM capacity/cores of CPU still need to powerful?

Try a V100 with 10-12 CPUs/RAM, that should work fine

I see. Thank you for your reply. I have one more question.

Assume I train my own model in Docker with the usage of cuDNN, is it still need a little bit more RAM for the Docker to process the training? For example, the server has 128GB RAM, Should I assign more than half RAM to the Docker for running smoothly? Or just consider the RAM size of the GPU?

Good point, haven’t worked much with Docker yet, but @lissyx, @utunga, @dan.bmh: Do you have some idea how much RAM to use for a Docker instance?

Are we talking about RAM or VRAM ?

Training my small German which has about 32h usually takes 2-3h on my computer with RTX2070 + 16GB ram. The bigger 1000h model did take 8d6h on 2x1080Ti + 64GB.

You can find the docker setup I’m using here:

You also should be able to use and extend the project to Esperanto without much effort if you like. For better results I’d also recommend to use transfer-learning with the English checkpoint.


According to this article you can train 12h for free on Colab and 9h on Kaggle, which should be enough for your dataset:

3 Likes

I’m no Docker expert, I did not knew you had to “dedicate” RAM for that, and I certainly don’t do that when training using https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

I also don’t set it, but it may be required for the cloud VM. When training with slurm + singularity I have such an option and normally just set RAM to 2xVRAM size.

1 Like

Was this 32 h model useful in any way? What could you do with it?

Nothing :smile: I only use it to run different experiments because training is quite fast …
You can find them under “Voxforge” in my readme at the results chapter.

But with transfer-learning and noise augmentation I got WER down to 0.206 (it’s somewhat easier than common-voice), so it may be usable, especially if you can use a small domain specific language model.

1 Like

@stergro You can find some additional data here (~2-3h I’d estimate):
https://tatoeba.org/epo/sentences/search?query=&from=epo&to=und&user=&orphans=no&unapproved=no&has_audio=yes&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort_reverse=&sort=relevance

Thanks, I know about this. There are also more than 400 mb of additional data on lingualibre, but I am not sure how to import it. There are a few scripts for that, but I haven’t looked into it in detail yet. Is it easy to import from tatoeba? Will the import scripts simply add the files to the train.tsv or will I have to do manual work?

NVIDIA’s new A100 has the ability for cloud providers to dedicate parts of it to multiple VMs, so they no longer have to over-provision to make sure everyone gets decent speeds. This should reduce prices, in addition to the faster processing speed reducing training times.

I’m waiting for cloud providers to launch their A100 instances (probably in the next 2-3 months) before doing training again. It may require DeepSpeech to support CUDA 11 and cuDNN 8 to get the full benefit of Ampere though.

If using my above linked project, you just have to add a new entry with the language code in this file:

Then you can convert, clean (see: readme - adding a new language) and combine them with the train.csv from CV.

1 Like

@dan.bmh Thanks for your reference. It let me rekindle my passion for work. :laughing:

And I would integrate you all recommendation on the hardware specification to train the Cantonese dataset, hope I can share the result with you all after completed. Thank you.

Thanks, looks good. There is also a lingualibre importer from the French team in the official repo, but I haven’t tested it yet:

EDIT: I created a merge request on github for the eo language codes for tatoeba.

1 Like

it’s working well with french, italian and some other locale since we got patches from other contributor on that. There can always be issues of course, but it should be pretty reliable.

If you have issues with Lingua Libre, I’m also in touch with the developper so we can forward / put people in touch.

1 Like