Understand native client code

Hi, I am trying to understand the inference code. [Master branch]

In the train.py file a create_inference_graph function is defined

As per the comments

This shape is read by the native_client in DS_CreateModel to know the
# value of n_steps, n_context and n_input. Make sure you update the code
# there if this shape is changed.

input_tensor = tfv1.placeholder(tf.float32, [batch_size, n_steps if n_steps > 0 else None, 2 * Config.n_context + 1, Config.n_input], name='input_node')

DS_CreateMethod is called, which is defined in deepspeech.cc

I want to understand the sequence of the code blocks that are called during the inference time.

First, create_inference_graph function in train.py sets the input_node, previous_state_c, previous_state_h etc. When is the DS_CreateMethod in deepspeech.cc and session_>run in tfmodelstate.cc called?

Any starting point to understand the flow would be helpful.

I want to write the native client inference code in tf.js; so beginning to understand how the native client code works. Thanks

When API user calls it?

When DS_SpeechToText is called, it triggers infer in the end ?

Thanks for the response.

client.cc calls the DS_CreateModel. When/how is client.cc called, could you please point me to the code ?

Do you know about git grep ? Reading client.cc and you would see it’s the main C++ deepspeech binary, which is one example of callers. Have a look at the bindings as well, those are others callers.

Basically, you need to re-implement what is exposed from deepspeech.h and implemented in deepspeech.cc. Details may vary depending on how tf.js works, and maybe you would rather look at tflitemodelstate.cc.

It also depends on exactly what do you want to do: just run the model, or provide the same API?

I want my model to be able to train in-browser. And then I should be able to use the model I get.

Yes, looking at that file.

I am not clear on different folders in native client -
java/python/dotnet The folders are the different backends that deepspeech can be run on?

versus,
all the files in native client folder tfmodelstate modelstate deepspeech.cc etc. - These files are common for all binaries jave/dotnet/python backend?

If you can point me to a resource/file etc explaining the design of native client I should be good with that as well.

Sorry for these basic questions, I am not very comfortable with C++.

I posted a similar question on github issues as well.This is my current understanding. Would be helpful if you could point flaws

ok client.cc calls DS_createModel , that initialises a model with TFModelState input. So, while running the model in tfjs , the input (say, X ) to model.predict should be of the form defined in TFModelState . Is that correct?

If yes, I have anther question:
I was passing an audio file through mode.predict which obviously is not of the form X . Why did it mean that I need to convert the TFModelState to tfjs (as you stated earlier in the thread)? Shouldn’t it just mean that the audio file that I am passing should be converted to the form X

The train.py (create_inference_graph) has already defined what the X format mean. Why was my input not directly converted into that format? The create_inference_graph is not converted to tfjs through tfjs_coverter , is it? So basically i need to write code in tfjs to convert this input to form X ?

And how would i convey in the tfjs code that model.predict would mean Session::Run() equivalent-of-tfjs.
PS: I can take this to the discourse forum, if this is not the right place to discuss this here.

yes

for building libdeepspeech.so whichi is then used by otehrs

we don’t have that, you have to dig into the code