How to make the testing process more quickly?

jackhuang · February 2, 2018, 2:47am

It takes a long time to test one .wav file(not using language is about 20s, using the language model is about 240s), I wander how to make this process more quickly?

lissyx · February 2, 2018, 7:35am

Even 20 secs for a 10 secs audio file is suspect. Can you document your system and how you install everything ?

jackhuang · February 2, 2018, 7:51am

When I train the English model, the test process is much quickly. Now, I am using the Deepspeech to train a Chinese model. I didn’t change any code, and only using the Chinese training data and language model(alphabet size is about 6000, using 10600000 Chinese sentences(700M) to train a 4-gram model on Chinese character level) to train the model. The testing process is much slower than before.

lissyx · February 2, 2018, 7:55am

Thanks. This is something we have not yet worked on, it could just be a fallout from the increased complexity because you have much more characters. Maybe you should try to play with some of the language model’s parameters, like beam width?

Also, trying different n-gram ?

jackhuang · February 2, 2018, 9:09am

Does it mean that the larger the beam width is, the more will the model generate the candidate transcriptions and this relationship is linear?

lissyx · February 2, 2018, 10:05am

It means you are exploring things we have not, and one of the parameter I see that might influence speed, besides the size of your alphabet (for which you cannot do anything, of course) is the beam width. But you might also have to change the n-gram used (I cannot remember what we are using right now). I tend to remember much faster than 5x when using beam width of 100 (but that comes with a cost in word error rate as well).

There’s also something spurious: 20 secs for 10 secs of audio on GTX1080, when we measured ~2x realtime for GTX1070 on English, I’m wondering if it’s really the best that one can get for Chinese or if you have room for improvement as well.

reuben · February 2, 2018, 10:13am

You could try using Baidu’s Warp-CTC. It’s specifically meant to handle large alphabet sizes well.

lissyx · February 2, 2018, 10:17am

To complement this @jackhuang we have WarpCTC in https://github.com/mozilla/tensorflow master branch (it’s TensorFlow r1.4), though we stopped building Python packages. Also, build instructions have been updated to use --config=monolithic and --copt=-fvisibility=hidden, if you build the Python package you might not want that.

Using WarpCTC from libdeepspeech should not be too hard though, likely just have to use the proper headers and switch the CTC codepath to use it.

jackhuang · February 2, 2018, 12:51pm

Did you mean the warp_ctc function in tensorflow (https://github.com/mozilla/tensorflow)
and Baidu’s Warp-CTC has the same effect？

lissyx · February 2, 2018, 12:56pm

This is the code from Baidu, actually. There was an old TensorFlow fork with up, we’ve taken that and kept it on more uptodate. It may work, but there may be issues

jackhuang · February 3, 2018, 6:49am

And can the native client of the deepspeech which is installed by pip also use the WarpCTC?

lissyx · February 3, 2018, 10:16am

Not like that, you’d have to make the same kind of changes. You need to do the changes in libdeepspeech.so (deepspeech.cc) and then rebuild the python / node / C++ packages, this will be picked up.

jackhuang · February 20, 2018, 11:14am

Would you please give me some advice on processing the training data when I trained the language model with the Chinese corpus like “今天早上 …”, of which the sentence is divided into character level by （so the n-gram model is based on Chinese character）. Should the training data be like “明天中午”(not divided by blank) or “明天中午”(divided by blank)?

lissyx · February 21, 2018, 10:04am

I’m sorry, but what is the difference in vietnamese between the two alternatives? Basically, you should train with what should be the output as you use daily.

jackhuang · February 21, 2018, 12:24pm

Well, I am training the Chinese model. In Chinese, the word will not be separated by the blank. For example, “今天上午吃早餐” is a normal Chinese sentence. But in English, the word will be separated by the blank. So, I wander whether I need to separate the Chinese sentence when using the deepspeech to train the Chinese model.
By the way, I try to know the process of decoding with kenlm, but I only find a “libctc_decoder_with_kenlm.so” file. Could you please show me the original code of that file so I can get to know how that code works.

lissyx · February 21, 2018, 12:58pm

For the spacing, I don’t think it should be a problem. The decoding code is TensorFlow’s CTC, libctc_decoder_with_kenlm is a custom op to add KenLM scoring to the TensorFlow CTC Beam Decoder.

lissyx · February 21, 2018, 1:12pm

Here in TensorFlow source:

tensorflow/core/util/ctc/ctc_beam_search.h
tensorflow/core/kernels/ctc_decoder_ops.cc

reuben · February 21, 2018, 1:25pm

And a description of the algorithm itself is here: ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf

reuben · February 21, 2018, 1:27pm

And an explanation of beam search decoding: https://medium.com/corti-ai/ctc-networks-and-language-models-prefix-beam-search-explained-c11d1ee23306

azureapertus01 · October 17, 2018, 12:13pm

Hi @lissyx @jackhuang I think I’ve found the reason why it take long time in testing process .
In DeepSpeech2 paper had mentioned that beam search had been further prune for Mandarin.
Here’s the capture of the paper: (In 7.3)
“Rather than considering all characters as viable additions to the beam, we only consider the fewest number of characters whose cumulative probability is at least p.”
@lissyx will you mind to give me some advice on which class I need to modify? I’ve take a quick look on deepspeech.cc and I’m not sure that KenLMBeamScorer is the actual class I should modify or not.

Thanks! And sorry for my poor English.

Topic		Replies	Views
The problem of training the Chinese Model DeepSpeech	10	1905	February 26, 2018
DeepSpeech benchmarking / Shorten inference time DeepSpeech	16	5716	February 14, 2018
How to speed up recognize speed? DeepSpeech	4	835	March 12, 2020
Noticing Slow testing steps in --utf8 mode DeepSpeech participation , learning , issue , dataset	6	1034	May 23, 2020
Training Chinese model DeepSpeech	22	9073	April 22, 2021

How to make the testing process more quickly?

Related topics