Building the Native Client & Tensorflow from scratch

(mathematiguy) #1

In our use-case, we believe we can benefit from modifying the native_client source code to include orthographical rules about how the alphabet is to be used (e.g. our phonemes always comes in consonant-vowel pairs).

We would like to enforce these rules by modifying the C++ code underlying the native client, but in order to test and deploy these changes we first have to be able to build the native client from scratch.

My understanding is that the only way to do this is to compile tensorflow from scratch locally, and then have it build the native client from the source code. So far, building tensorflow has been a mission, even on a fairly regular GPU instance on AWS. In our team, we use Docker as standard, so we are planning to containerise the tensorflow build as well.

Up until now, we’ve been grabbing tensorflow from taskcluster, but if we need to build from scratch this no longer works.

My questions are:

  • Has anyone else tried to build the native client from scratch?
  • Does this sound like the correct approach for what we want to achieve?
  • Any tips about how to do this well?
  • How should we think about following changes to the mozilla fork of tensorflow? Is it fairly stable/safe to update from, or should we expect occasional major changes resulting in a lot of work on our end to re-sync things if we want to stay up to date (e.g. with respect to Paddle Paddle CTC updates etc)

(Lissyx) #2

no, just follow the instructions in native_client/, you have to setup as you would for a full blown tensorflow build following upstream instructions, but you don’t need to build more than native client code.

(Lissyx) #3

Some contributor already did that, providing a CUDA-oriented Dockerfile

(Lissyx) #4

often, but we try to limit the changes with upstream to the minimum, so it should be no different than following tensorflow upstream

(Sekarpdkt) #5

I was able to build tensorflow in my amd64 arch board using

if you get into any issue, check this closed issue

(Lissyx) #6

Why not using the Dockerfile in our repo ?

(Sekarpdkt) #7

I am just looking into it now. As my CPU does not support AVX etc and does not have GPU, just going to remove those build options and try. If it works, then will ping you in other thread :slight_smile: It looks bit simpler

(Lissyx) #8

No AVX and no GPU, you might get poor execution times :confused:

(mathematiguy) #9

Thanks @lissyx, your advice has been really useful. I’ve made a fair bit of progress as a result (actually I think I’m almost there) but right now I have the following error:

ERROR: /work/tensorflow/native_client/BUILD:6:1: Executing genrule //native_client:ds_git_version failed (Exit 1)
realpath: /work/train/DeepSpeech/native_client/../.git/: Not a directory

It’s saying that /work/train/DeepSpeech/.git isn’t a directory, and that’s correct because DeepSpeech is installed as a git submodule. I’m fond of keeping DeepSpeech installed as a submodule so we can track versions easily, but if I have to clone it into the directory then I can do that.

Basically, if there’s a way to run bazel build with some flags that’ll get it to respect the git submodule structure that would be great. Otherwise, most likely tomorrow I’ll just clone the repo locally for the sake of making further progress.

(Lissyx) #10

ask bazel devs maybe?

(Sekarpdkt) #11

I was able to build it for my

$ sudo lshw | grep -i cpu
description: CPU
product: Intel® Pentium® CPU N3710 @ 1.60GHz
bus info: cpu@0
version: Intel® Pentium® CPU N3710 @ 1.60GHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat cpufreq

I had some issues, like unable to build using tensorflow repository (or may be I can not build both tensorflow and deepspeech together). So, first build tensorflow using tensorflow repository. Then cleaned it and rebuilt deepspeech using Mozila tensorflow repository.
Also need to modify both Makefiles to use python3 instead of python.
I will try to softlink python to python3 inside the docker and try after sometime.

Next step is to try to use deepspeech in my UDOO. Then try to cross compile for armv8a with Python36

(Lissyx) #12

This does sound absolutely not right. We document exactly that you should use our fork to perform the build. Why do you depend on python when you just want to build native_client ?

Also, which Makefile are you referring to ? There is no need to hack anything.

Why don’t you just follow the documentation ? It’s already explaining everything.

I don’t know what you are doing, but that really does not looks like you are following the documentation we wrote.

(Lissyx) #13

Again, it’s not complicated, follow the docs, setup tensorflow build using our fork as we document, then bazel build --config=rpi3-armv8 as I said earlier …

(Sekarpdkt) #14

I was referring to these two Makefile steps used in your docker file

WORKDIR /DeepSpeech/native_client/python
RUN make bindings
RUN pip install dist/deepspeech*
WORKDIR /DeepSpeech/native_client/ctcdecode
RUN make

Basically the make files in these directory uses python ./ As in docker, python points to python2.7 and both make failed. I need to change these two makefiles to python3 and ran make bindings and make… It completed successfully

(Lissyx) #15

There’s no reason it would fail with Python 2.7, we have builds for that running, so if it’s failing then something else is bad.

(Lissyx) #16

But seriously @sekarpdkt can you avoid hijacking other’s thread ? This is getting very confusing for everyone to follow.

(mathematiguy) #17

Hey- just following up. I succeeded at running:

 bazel build \
	--config=monolithic \
	-c opt \
	--copt=-O3 \
	--copt="-D_GLIBCXX_USE_CXX11_ABI=0" \
	--copt=-fvisibility=hidden \
	// \

But I’m not sure if it worked 100% because I’m not sure what it was supposed to achieve. Was it supposed to create and generate_trie? If so, does it put them in the native_client folder? I checked and didn’t find them anywhere. Or does it achieve something else, and I have to do a later step to get those two files.

In case it didn’t actually work, here were the last few lines from the stdout:

[1,047 / 1,053] Compiling tensorflow/core/kernels/; 14s local ... (5 actions running)
INFO: From Compiling external/snappy/
cc1plus: warning: command line option '-Wno-implicit-function-declaration' is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option '-Wno-shift-negative-value'
INFO: Elapsed time: 789.811s, Critical Path: 71.31s
INFO: Build completed successfully, 1131 total actions

Once I’m finally done, I’ll put a post here to explain what I did in more detail for posterity.


(Lissyx) #18

It should create them, but under bazel-bin/native_client/ directory in the tensorflow source tree. That’s how bazel works, not our choice.

(mathematiguy) #19

Hey- thanks for your help, that was the last bit I needed to know.

The files weren’t visible because I was building the native_client inside of docker, and bazel-bin is a symlink whose source evaporates when the docker exits, leaving a broken link.

That explained why I couldn’t find the files after the build succeeded, and the solution is obvious: grab the files I want out before the docker exits.