Export TFLite model after additional training

ctymoszek · August 26, 2019, 8:59pm

Hi, I am developing an Android app which does some on-device speech recognition. I got this working with the pretrained model (0.5.1), but the results weren’t great.

I wanted to do some further training for my use case. I read this thread, so I checked out DeepSpeech 0.5.1 and installed Tensorflow 1.13.1 via pip. The additional training completed successfully and the model seems to work quite well when tested via the command line.

Now, I am unable to export my checkpoint to TFLite. I get the error “Exception: TensorFlow Lite currently doesn’t support control flow ops: Merge, Switch.”

I found some threads stating that this has been fixed in DeepSpeech 0.6, but I also understand from the above linked thread that my 0.5.1 model is not compatible with DeepSpeech 0.6.

I wondered if using Mozilla’s TensorFlow fork would help, so I tried to build that but then had issues with building it, so I wanted to stop and ask before I struggled with that further.

So, is there a way for me to continue training a pretrained model, and then export it as a TFLite model?
Will Mozilla’s fork of TensorFlow help? Should I continue trying to build it?

Thanks in advance.

kdavis · August 27, 2019, 12:59pm

I’d guess the best course of action would be to use our alpha-version 0.6.0-alpha.4 model, then use the 0.6.0-alpha.4 instructions on how to export to TFLite.

However, you’d I guess also have to fine tune using the 0.6.0-alpha.4 code base for your use case.

ctymoszek · August 28, 2019, 7:05pm

Hi, thanks for your reply.

I think this sounds like a good plan, I will try to continue training the 0.6a model.

One question - when I originally trained from the 0.5.1 model, I downloaded the checkpoint and used that in addition to the released model files. Is the checkpoint available for 0.6a4?

lissyx · August 29, 2019, 7:28am

No, it will be available when we release 0.6

ctymoszek · August 29, 2019, 6:05pm

Sorry, I think I must be misunderstanding something.

How is it possible for me to follow the suggestion from @kdavis without the checkpoint? Is it possible for me to use the 0.6a4 output_graph.pb and 0.6a4 codebase, but train from the 0.5.1 checkpoint? I assumed this would be incompatible.

reuben · August 29, 2019, 6:57pm

To continue training you’ll need to patch the tf.train.Saver to be able to match the 0.5.1 variable names with the names on master. I don’t know how it’ll interact when saving the new checkpoints though, maybe you’ll want to have two separate savers, one for loading the 0.5.1 checkpoint initially and then one for saving the checkpoints from your fine tuning run. In any case, it’ll require some light modification of the code. I’m attaching the patch with the logic applied to the export function, which is how we got the 0.6a4 exported models for testing, but for your use case you’ll want to apply the same logic to the Saver used during training.

Patch: https://gist.github.com/reuben/b68b9085f7b293580f8431156a33daa9

ctymoszek · August 29, 2019, 8:41pm

Awesome, thank you so much! I’ll give it a shot.

eggonlea · September 3, 2019, 5:15pm

0.5.1 branch should work as is, which has already included that fixup code.

ctymoszek · September 3, 2019, 7:15pm

I gave this a try and was able to successfully export the tflite model (meaning that there were no command line errors during the export process).

However, the tflite model isn’t working on the Android device (I’m using the Android mozillaspeechlibrary). It looks like the constructor in DeepSpeechModel.java is unsuccessful. Specifically, at the end of the constructor, this._msp is still null, which causes a null pointer exception later.
I simply replaced the old .tflite file with the new one in the device storage - perhaps I need to do something else to make the new model work?

eggonlea · September 3, 2019, 7:52pm

Replacing the tflite model file should be all you need - I can confirm this with 0.5.1. There could be something else wrong with your Android code.

BTW, I’d suggest you test your tflite model on Ubuntu PC first to ensure the exported file works fine. Take a look at ./evaluate_tflite.py

ctymoszek · September 3, 2019, 8:54pm

I just tested the .tflite file (on Mac, not Ubuntu, but I’m assuming it doesn’t matter).

output_graph.tflite results:
WER: 0.019960, CER: 0.008225, loss: 0.000000

output_graph.pb results, for comparison:
WER: 0.019960, CER: 0.008225, loss: 1.638512

Seems like the .tflite model is working great, so I guess it’s something in Android. I haven’t actually changed the Android code at all from the demo, so I’m not sure what the issue would be there. It seems like the call to impl.CreateModel() in the DeepSpeechModel constructor is running into a problem, which causes _msp to be null. I’m honestly not familiar with using a JNI - I traced that CreateModel() function back as far as I could but it looks like it’s a native function declared in libdeepspeech.jar. Can I somehow view the native CreateModel function so I can try to debug it?