Transfer-learning1/2 branch, generate trie

Mike1 · August 24, 2020, 6:14am

I know it’s a topic widely discussed but as of now on this release ( DeepSpeech 0.8.2) i can’t manage to generate the trie file for transfer learning.

Is the transfer-learning branch still the recommended way? (in the original docs there is no mention that we should clone that branch and not the master branch for transfer-learning)
2.Which is the last version that contains that utility in order to be able to build the trie?
I understood that trie is no longer needed when training from scratch but after transfer learning i’m getting this error :
I STARTING Optimization I FINISHED optimization in 0:00:00.000007 terminate called after throwing an instance of 'lm::FormatLoadException' what(): ../kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::DontBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'. The decoder requested all the vocabulary strings, but this binary file does not have them. You may need to rebuild the binary file with an updated version of build_binary.

othiele · August 24, 2020, 7:46am

Pleases search and read before you post, tl branch is deprecated, read here

https://deepspeech.readthedocs.io/en/latest/TRAINING.html#fine-tuning-same-alphabet

and search for custom language model and scorer for question 2

Mike1 · August 24, 2020, 8:22am

Well that s the trouble, I ve read the entire documentation and also most of the threads here. The thing is that everywhere on this forum when speaking of transfer learning this implies using the branch (and at the time the branch wasn’t deprecated, the documentation didn’t mention switching to it nor does it mention now that s deprecated). That s the reason for which i found it confusing and others might too.
Thanks for the response and the awesome work!

reuben · August 24, 2020, 3:11pm

It looks like you’re trying to use a newer version of the scorer with an older version of the code. Compatible versions of the decoder code should not try to enumerate the vocabulary when loading the scorer package, so this error wouldn’t happen. Make sure you’re not mixing versions of training code and model files.

lissyx · August 24, 2020, 3:14pm

No, please refer to the docs.

What utility are you talking about?

lissyx · August 24, 2020, 3:16pm

https://mozilla-voice-stt.readthedocs.io/en/latest/TRAINING.html#transfer-learning-new-alphabet
https://deepspeech.readthedocs.io/en/latest/TRAINING.html#transfer-learning-new-alphabet

Right under “training”, so please tell us what would make that easier to find, because it seems already quite easy …