If we want to build our own binaries then it’s mentioned in this README that one of the prerequisites is that we have to install the Mozilla’s TensorFlow r1.15 branch since it fixes some common problems.
Can you please specify in general what were the problems encountered i.e. what was the need for using the fork of TensorFlow?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
First, we need the fork for some CI integration. We also have a few extra fixes that we apply until upstream picks them, which can take time. We also have more specific improvements, namely for cross-compilation.
Upstreaming them is always good, but not always working …
No, you just need our code to build, but training model does not require our fork, you (and we do) can just use upstream …
Was one of the reasons for using a custom TensorFlow is to use KenLM with CTC beam search decoder? Or is it possible to use KenLM with CTC decoder using native tensorflow?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
I don’t think we had to do funny things because of CTC decoder. Would you please elaborate why you are asking this ?
I have modified deepspeech architecture a little bit and I am trying to re-compile the binaries according to the modified deepspeech in order to perform inference on the model. There isn’t any documentation for the ds_ctcdecoder package available so it is somewhat difficult to understand the process. This is why I was asking.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
8
What kind of changes have you made that requires you to rebuild everything ?
Please be more specific, there is documentation on how to rebuild that, so I don’t understand your statement.
I added 2 convolutional layers at the beginning of the model, now since the model architecture has changed, will the same binaries work?
By documentation I meant how is the ctc decoder working behind the scenes.
And if I am building a completely different speech to text model, these binaries won’t work right because they are built for deepspeech? In that case I guess, I would have to modify those binaries. Please correct me if I am wrong.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
That depends on how you performed those changes. Did you modify the user-facing input ? Could you share a diff ?
We don’t change how the CTC decoder changes, we just provide scoring.
I can’t do divination, so please show me the code.
No, the user-facing input is still the MFCC vectors. After the input, I apply convolution layers on it. The output is the character probabilities for each time-step (same as deepspeech).
I’m sorry, I didn’t keep the code on git yet.
I was referring to the documentation of this package.
Input (audio_input): 26 size MFCC vector i.e. (batch_size, time_steps, 26) Output (prediction): character probabilities for each time-step i.e. (batch_size, time_steps, 29).
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
12
If you did no change to input or output, then you should be able to use the binaries we provide.
You could share that here. Hard to ensure compatibility this way.
I still don’t understand your question.
I’m unsure we have the TensorFlow ops in the binaries for the convolutions, though. Maybe TFLite runtime would work out of the box. You should just try both at first, before trying to rebuild …
Alright, understood. I just wanted to use kenlm with the beam search decoder in my new model and wanted to use the generate_trie file with the new model.
I guess it could work if the model input and output is same. Thanks a lot, it was a huge help.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
Without looking at the code I can’t be definitive, but it seems to be the case and the only limitation, as I said, might be the TensorFlow runtime, but since we now have TFLite available everywhere you can try easily. Seek for 0.7.0-alpha.0, this is going to be compatible with 0.6.1 for the moment, and provide tflite everywhere.