I’m trying to train a custom model using current master 0.5.0alpha7 (tensorflow 1.13.1) with my own mixture of librispeech, commonvoice and some private data. My training seems to proceed normally, however the memory usage keeps creeping up till entire 32G memory is exhausted and I have to manually cancel the training (it takes around 2 hours to use up all available memory and I don’t have GPU OOM issue). I am wondering whether this is the correct behaviour. It looks similar to memory leak issue. Has anyone experienced similar issue. Another thing I noticed is that the the rate of step updates also goes down as training progresses.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Well, we don’t have issue but I personnally have more memory. How much RAM and swap do you have, how much is available? There could be memory leaks in our code or in dependencies, as well.
I have 32G RAM and 2G swap (default ubuntu installation). The memory usage starts from around 4.5G (no other process running apart from system monitor and a terminal running deepspeech) and gradually fill up the RAM then onto swap in ~2hours. I can increase the swap size but this just avoids the issue if there is indeed an issue.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
You say you have Common Voice and others, it might just be you need more memory ?
my dataset mix is dominated by librispeech, private data is less than two hours (I have trouble using 0.4.1 checkpoint for finetuning, loss goes to nan as soon as training starts). The thing that I don’t understand (without knowing how deepspeech training works) is what is holding up all the memory and not releasing it? mfcc? accumulated logits?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
It might not be that, just requires more memory than you have. Can you elaborate on the size of LibriSpeech and Common Voice you are using ?
I have 328642 entries in my training.csv. assuming average 10 second per audio clip, that’s roughly 912 hours of audio. At this moment, I only have a small dev computer that allows maximum 32GB RAM.
Update: I’ve increased the swapfile size to 32G and set feature_cache flag. I think now all preprocessed features are saved to the disk. The memory size no longer increases.
Also most of time system freezes when training is started.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
You should try some transfer learning, with that amount of data. Also, training on such few data, it’s not really surprising loss is behaving like that.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
12
@lissyx can i do transfer learning for Urdu language, from a model trained in another language? if so, kindly refer to me the documentation for transfer learning in deepspeech
There isn’t any asr model available for urdu language, hence i am making this effort, also, i understand the data is minimum, i will try to augment it.
What is the minimum hours of data is needed for Deepspeech?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
To get something viable, > 1k hours
Well, I’m not sure if there’s any way to transliterate urdu to english-compatible alphabet. If so, you can re-use checkpoints from english as well as @josh_meyer’s transfer-learning branch.
Transliteration is possible, but English alphabets aren’t enough, as total alphabets and also phonemes in Urdu is higher than English. Usually case-sensitive and dual alphabets are used to completely transliterate Urdu in English-compatible one
your suggestions now?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
16
Have you tried being less aggressive on learning rate ? Also, I think trying to source new dataset and / or promote Common Voice contribution on Urdu (include new sentences to read, get more people to record) would be a better use of your time, at that point.