What are the options for someone without a proper GPU? Cloud services, VMs or external GPUs?

othiele · May 29, 2020, 3:32pm

And just on a side note, thanks to @agarwalaashish20 we have a German tflite model now that can be used for that

stergro · July 5, 2020, 9:01pm

A little update: I’ve chosen https://www.exoscale.com , a swiss provider where you can get all sorts of GPUs including V100s (after contacting the support if you don’t want to buy a complete month).

I did a first experiment with the old dataset a while ago, one epoch took 3 hours with a 1080 Ti (32Gb). I believe this is a little long, since the train.tsv is a lot smaller than the 35h of the dataset and the new release of Common Voice more than doubled the available data to 83 hours. Would this be quicker on a V100 or P100?

My parameters so far are:

python3 DeepSpeech.py --train_files …/eo/clips/train.csv --dev_files …/eo/clips/dev.csv --test_files …/eo/clips/test.csv --automatic_mixed_precision --train_batch_size 16 --epochs 7

What can I do to optimize this? How useful is —use_cudnn_rnn ?

othiele · July 6, 2020, 7:43am

Doesn’t sound too bad, but I can’t see how many hours you are using, resulting in how many chunks and maybe some output of the logs?

axcn · July 6, 2020, 8:01am

Hello @othiele, if I want to train 1000h dataset in a server/workstation with GPU, What hardware specification should I use would be more efficient? RAM capacity/cores of CPU still need to powerful?

othiele · July 6, 2020, 8:10am

Try a V100 with 10-12 CPUs/RAM, that should work fine

axcn · July 6, 2020, 8:34am

I see. Thank you for your reply. I have one more question.

Assume I train my own model in Docker with the usage of cuDNN, is it still need a little bit more RAM for the Docker to process the training? For example, the server has 128GB RAM, Should I assign more than half RAM to the Docker for running smoothly? Or just consider the RAM size of the GPU?

othiele · July 6, 2020, 9:10am

Good point, haven’t worked much with Docker yet, but @lissyx, @utunga, @dan.bmh: Do you have some idea how much RAM to use for a Docker instance?

lissyx · July 6, 2020, 9:07am

Are we talking about RAM or VRAM ?

dan.bmh · July 6, 2020, 9:59am

Training my small German which has about 32h usually takes 2-3h on my computer with RTX2070 + 16GB ram. The bigger 1000h model did take 8d6h on 2x1080Ti + 64GB.

You can find the docker setup I’m using here:

You also should be able to use and extend the project to Esperanto without much effort if you like. For better results I’d also recommend to use transfer-learning with the English checkpoint.

According to this article you can train 12h for free on Colab and 9h on Kaggle, which should be enough for your dataset:

lissyx · July 6, 2020, 10:01am

I’m no Docker expert, I did not knew you had to “dedicate” RAM for that, and I certainly don’t do that when training using https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

dan.bmh · July 6, 2020, 10:59am

I also don’t set it, but it may be required for the cloud VM. When training with slurm + singularity I have such an option and normally just set RAM to 2xVRAM size.

stergro · July 6, 2020, 3:24pm

Was this 32 h model useful in any way? What could you do with it?

dan.bmh · July 6, 2020, 3:35pm

Nothing I only use it to run different experiments because training is quite fast …
You can find them under “Voxforge” in my readme at the results chapter.

But with transfer-learning and noise augmentation I got WER down to 0.206 (it’s somewhat easier than common-voice), so it may be usable, especially if you can use a small domain specific language model.

dan.bmh · July 6, 2020, 5:29pm

@stergro You can find some additional data here (~2-3h I’d estimate):
https://tatoeba.org/epo/sentences/search?query=&from=epo&to=und&user=&orphans=no&unapproved=no&has_audio=yes&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort_reverse=&sort=relevance

stergro · July 6, 2020, 6:24pm

Thanks, I know about this. There are also more than 400 mb of additional data on lingualibre, but I am not sure how to import it. There are a few scripts for that, but I haven’t looked into it in detail yet. Is it easy to import from tatoeba? Will the import scripts simply add the files to the train.tsv or will I have to do manual work?

dabinat · July 6, 2020, 6:54pm

NVIDIA’s new A100 has the ability for cloud providers to dedicate parts of it to multiple VMs, so they no longer have to over-provision to make sure everyone gets decent speeds. This should reduce prices, in addition to the faster processing speed reducing training times.

I’m waiting for cloud providers to launch their A100 instances (probably in the next 2-3 months) before doing training again. It may require DeepSpeech to support CUDA 11 and cuDNN 8 to get the full benefit of Ampere though.

dan.bmh · July 6, 2020, 7:04pm

If using my above linked project, you just have to add a new entry with the language code in this file:

Then you can convert, clean (see: readme - adding a new language) and combine them with the train.csv from CV.

axcn · July 7, 2020, 1:49am

@dan.bmh Thanks for your reference. It let me rekindle my passion for work.

And I would integrate you all recommendation on the hardware specification to train the Cantonese dataset, hope I can share the result with you all after completed. Thank you.

stergro · July 7, 2020, 9:21am

Thanks, looks good. There is also a lingualibre importer from the French team in the official repo, but I haven’t tested it yet:

github.com

mozilla/DeepSpeech/blob/master/bin/import_lingua_libre.py

#!/usr/bin/env python3
import argparse
import csv
import os
import re
import subprocess
import unicodedata
import zipfile
from glob import glob
from multiprocessing import Pool

import progressbar
import sox

from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
    get_counter,
    get_imported_samples,
    get_importers_parser,
    get_validate_label,

This file has been truncated. show original

EDIT: I created a merge request on github for the eo language codes for tatoeba.

lissyx · July 7, 2020, 2:39pm

it’s working well with french, italian and some other locale since we got patches from other contributor on that. There can always be issues of course, but it should be pretty reliable.

If you have issues with Lingua Libre, I’m also in touch with the developper so we can forward / put people in touch.