[IMPORTANT] Keeping the build green


(Lissyx) #1

Hello everyone,

For a while, device builds have been broken badly on both pine and m-c. Pine builds were fixed a few days ago, but m-c is still broken as of now. This means anyone looking at the treeherder status cannot check if B2G is broken or not.

This is about to change, but we will need help from anyone interested in B2G to keep it green.

You can track the current status on pine tree from the URL https://treeherder.mozilla.org/#/jobs?repo=pine&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=device when “B” is green it means it’s okay. When it’s orange, it’s broken and we need to fix it ASAP.

Now, on mozilla-central, this is going to be another story. There are multiple issues: build configuration have not been updated for a while and it still references a lot of broken stuff, including: use of stlport, missing build flags, old repos mirrors. The second main issue is that as of now, nobody can push to try with device builds.

What does it means? People can test patches using the TryServer as documented on https://wiki.mozilla.org/Build:TryServer, but the way Docker images for B2G device builds are currently done, we cannot make this possible. It makes merging tedious because one has to build locally to make sure device build works. In https://bugzilla.mozilla.org/show_bug.cgi?id=1282226 which is about to land, this is going to be fixed and this try syntax will work: “-p aries-eng,nexus-5l-eng”. This will issue a device build for Z3c eng (Kitkat base) and Nexus 5 eng (Lollipop base).

So, now, what can you do to help keeping the builds green? Anybody can do this:

Anything that does not contains a green “B” means there is a build failure. Clicking on the “B” symbol will give you more details.

It is very important to catch build failure as soon as possible, hence, monitoring the integration branch is the most efficient way: we can issue a fix (often it’s very simple, missing includes or similar stuff) in time to land in mozilla-central.

What to do once you have spotted a build failure?

  • File a bug that blocks https://bugzilla.mozilla.org/show_bug.cgi?id=1245091 so that anyone can be aware there is an issue
  • Copy/paste the first line of the error in the bug title
  • If possible, try to find (quickly, don’t waste too much time) which change is probably related to this. You can do this by looking at the history of the files impacted in the error and see what changed recently. If you can find a suspect, collect the bug number and also add this in the “blocks” field of the bug you filed
  • If you have time/knowledge, you can try to make a patch !

It is very important that more people gets involved in this process:

  1. It’s critical that we keep builds green to be able to continue to hack
  2. Those changes are often simple breakages
  3. Because of (2), it makes those patches often simple to do and thus good first bug
  4. It’s a simple way to contribute efficiently to the codebase and get to learn it

Thanks for anyone’s hep!


#2

Hi !

I’ve just read your post and I have few times, so I try what you said. It’s quite clear but I just want to be sure to understand well (and it will help other people to try to follow the building process).

1- I’ve open the device builds on integration branch (https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=b2g%20device)
2- I’ve click on a “B” of the “Fri Jul 1, 11:19:10” B2G Device Image debug “Aries[tier 3]”.
This is what I have:

[taskcluster:error] Failure to properly start execution environment.
· 1240477 > [taskcluster:error] Failure to properly start execution environment. HTTP code is 500 which indicates error: server error - Cannot start container : [8] System error: read parent: connection reset by peer
[taskcluster:error] HTTP code is 500 which indicates error: server error - Container command not found or does not exist.

What is the first line of the error? “1240477…” or “[taskcluster:error] Failure to properly start execution environment.”. In the second case, there are two errors

Thanks in advance for your reply.


(Lissyx) #3

Ok, so as I said, patches were on their way to land. One of them was incomplete after I fixed some suggestion from reviewer, hence why. When https://bugzilla.mozilla.org/show_bug.cgi?id=1283452 reaches central/inbound it should be all good.

Also, you should click on the log link to analyze the real issue: treeherder might be unable to catch the proper lines of log. When you click on “B”, in the panel, you have two icons on the top of that panel, one with “LOG”. You need to click the one just next to the “LOG” icon.


#4

If I understand well, before reporting the problems, we need to wait that the 1283452 bug is land?

Here is an image to illustrate the icons you mention:

Here is the result:

[taskcluster 2016-07-01 09:31:50.929Z] Task ID: ZcF8yU_IRAmAXxhCImil1A
[taskcluster 2016-07-01 09:31:50.929Z] Worker ID: i-01202c729677dde2c
[taskcluster 2016-07-01 09:31:50.929Z] Worker Group: us-east-1c
[taskcluster 2016-07-01 09:31:50.929Z] Worker Type: flame-kk
[taskcluster 2016-07-01 09:31:50.929Z] Public IP: 54.173.240.64
[taskcluster 2016-07-01 09:31:50.929Z] using cache “level-3-mozilla-inbound-tc-vcs” -> /home/worker/.tc-vcs
[taskcluster 2016-07-01 09:31:50.929Z] using cache “level-3-mozilla-inbound-build-aries-debug” -> /home/worker/workspace
[taskcluster 2016-07-01 09:31:50.929Z] using cache “level-3-mozilla-inbound-build-aries-debug-objdir-gecko” -> /home/worker/objdir-gecko

[taskcluster 2016-07-01 09:31:51.602Z] Image ‘public/image.tar’ from task ‘SBInZyb0QK6Cmb0PEkIm1A’ loaded. Using image ID sha256:d6cd1f74d72c5f3fe7f57ea57aa30a17047f4efadcccec1eefb543db958d6dce.
[taskcluster 2016-07-01 09:31:51.616Z] === Task Starting ===
exec: “checkout-gecko workspace && cd ./workspace/gecko/taskcluster/scripts/phone-builder && buildbot_step ‘Build’ ./build-phone.sh $HOME/workspace\n”: stat checkout-gecko workspace && cd ./workspace/gecko/taskcluster/scripts/phone-builder && buildbot_step ‘Build’ ./build-phone.sh $HOME/workspace
: no such file or directory

[taskcluster:error] Failure to properly start execution environment.

[taskcluster:error] HTTP code is 500 which indicates error: server error - Container command not found or does not exist.

[taskcluster 2016-07-01 09:31:52.588Z] === Task Finished ===
[taskcluster 2016-07-01 09:31:52.687Z] Artifact “private/build” not found at “/home/worker/artifacts/”
[taskcluster 2016-07-01 09:31:52.758Z] Artifact “public/build” not found at “/home/worker/artifacts-public/”
[taskcluster 2016-07-01 09:31:53.237Z] Unsuccessful task run with exit code: -1 completed in 2.308 seconds

In the bug report (I will wait the 1283452 before reporting anything), I should copy/paste:

[taskcluster 2016-07-01 09:31:50.929Z] Task ID: ZcF8yU_IRAmAXxhCImil1A

Is that right?

[EDIT] Bold for the part to copy.


(Lissyx) #5

Yeah, this is exactly the bug you need to wait. But what you should copy/paste for reporting this one would be that:


(Lissyx) #6

For instance, this would be a good example: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c4020fc72a5&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=23218092

Except I don’t know why it fails this way :slight_smile:


(Lissyx) #7

And today this is paying, we have B2G Device Aries build green on m-c: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=4222978

Fixes are on their way and we should also have Nexus 5 L eng soon. Then i’m cleaning up a little bit the debug variants.


(Ben Francis) #8

\o/


(Lissyx) #9

And this reached inbound as well: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=b2g

Now it’s really time that all people who are interested to contribute start doing it actively: monitoring and fixing build issues will be problem #1.


#11

It’s a shame, the build is already broken… :cry:

@lissyx, in your first post you mention 3 links:
[1] Mulet builds on integration branch (mozilla-inbound)
Mulet builds on integration branch (mozilla-inbound)
[2] Device builds on integration branch
Device builds on integration branch
[3] Device builds on mozilla-central
Device builds on mozilla-central

For the [2] and [3], there is no “B” (nor green nor orange). Is that normal?

The [1] has a orange B, so the built is broken… “Build ./build-mulet-linux.sh /home/worker/workspace : busted.”

I’ve open a bug dans link it to the bug1245091:
Bug 1285210 - Mulet builds on integration branch (mozilla-inbound)
Please, if it’s not what you are waiting, sait it and the next time I will try to do the correct thing :wink:


(Lissyx) #12

That was a good catch, but your logs were not including the real issue, and no link to a failure job, so it’s a bit harder to find from your bug :). Sadly, I already filed the bug 1285157 :slight_smile:

But that’s a good catch anyway, this is what is needed !


#13

Hi!
I know you are working on this bug to make build working.
Just to be sure, we need to “wait” the bug solved before checking regularly that the green is not broke?

What does the differents color means?

  • green: OK
  • orange: broke
  • brown busted (what is it???)

When I look to the the bug 1285157, I really don’t understand what you are doing, so I really don’t know if you are next to solve the bug or not. Where can I found informations to help me understand how to solve the bug?

https://bugzilla.mozilla.org/show_bug.cgi?id=1285157


(Lissyx) #14

I have a patch on that night, it builds and boots. All we need is some
review and then yes we can hope the build is green.

So waiting :-/

<
https://ci3.googleusercontent.com/proxy/07Ul_ILysIPO2Wf2zMgTu2ok0YPwQHyAwtfb-97YDV4wyoMrs8ul0caSkP8YMDetGITsOnsjZXJv1NSkNvyJKsrGZ9t5ONB-K3ldG-DWXEiprh1QHkD4FdgdAQPZp3N_2gLHN3KRJwVUosg=s0-d-e1-ft#https://discourse.mozilla-community.org/letter_avatar_proxy/v2/letter/p/a5b964/45.png>
pl6025
July 8

Hi!
I know you are working on this bug to make build working.
Just to be sure, we need to “wait” the bug solved before checking
regularly that the green is not broke?

What does the differents color means?

  • green: OK
  • orange: broke
  • brown busted (what is it???)

When I look to the the bug 1285157, I really don’t understand what you
are doing, so I really don’t know if you are next to solve the bug or not.
Where can I found informations to help me understand how to solve the bug?

bugzilla.mozilla.org
<
https://ci3.googleusercontent.com/proxy/WUN2wVWFfmFfOmC-lezieXXHxIRo-04LOC03zj0KhaSnblQcn5s-XdD3JiKCVIJQMGDGqu3Fzy7vDQ44xr6CLU7QTArfQQMsFHPQpLXLRA10pON4SOVtgg=s0-d-e1-ft#https://bugzilla.mozilla.org/extensions/OpenGraph/web/bugzilla.png

1285157 – dom/ipc/ContentChild.cpp:631:24: error: no matching function
for call to ‘mozilla::dom::ContentChild::SetTransport(IPC::Channel*&)’


Visit Topic or reply to this email to respond.


In Reply To

<
https://ci5.googleusercontent.com/proxy/awzTLeZSQ4NoIfBlGZA3lo26kZQVVC_BjmANCNdQ4J3ntLbC3OZJIeAQFM53I2NXMRdNwx5dc_wrzQauIPOWKNBxCWTk2WOuGwsP5-7iO_OBAvvokOOWFV5k6nWiczuxCcFLpgrVlO3SS2U=s0-d-e1-ft#https://discourse.mozilla-community.org/letter_avatar_proxy/v2/letter/l/ecb155/45.png>
lissyx
July 7
That was a good catch, but your logs were not including the real issue,
and no link to a failure job, so it’s a bit harder to find from your bug
:). Sadly, I already filed the bug 1285157 :slight_smile: But that’s a good
catch anyway, this is what is needed !


(Lissyx) #15

Well it looks like something else landed yesterday evening and broke the
build even worse with something breaking client.mk :frowning:

I have a patch on that night, it builds and boots. All we need is some
review and then yes we can hope the build is green.

So waiting :-/

<
https://ci3.googleusercontent.com/proxy/07Ul_ILysIPO2Wf2zMgTu2ok0YPwQHyAwtfb-97YDV4wyoMrs8ul0caSkP8YMDetGITsOnsjZXJv1NSkNvyJKsrGZ9t5ONB-K3ldG-DWXEiprh1QHkD4FdgdAQPZp3N_2gLHN3KRJwVUosg=s0-d-e1-ft#https://discourse.mozilla-community.org/letter_avatar_proxy/v2/letter/p/a5b964/45.png>
pl6025

July 8

Hi!
I know you are working on this bug to make build working.
Just to be sure, we need to “wait” the bug solved before checking
regularly that the green is not broke?

What does the differents color means?

  • green: OK
  • orange: broke
  • brown busted (what is it???)

When I look to the the bug 1285157, I really don’t understand what you
are doing, so I really don’t know if you are next to solve the bug or not.
Where can I found informations to help me understand how to solve the bug?

bugzilla.mozilla.org
<
https://ci3.googleusercontent.com/proxy/WUN2wVWFfmFfOmC-lezieXXHxIRo-04LOC03zj0KhaSnblQcn5s-XdD3JiKCVIJQMGDGqu3Fzy7vDQ44xr6CLU7QTArfQQMsFHPQpLXLRA10pON4SOVtgg=s0-d-e1-ft#https://bugzilla.mozilla.org/extensions/OpenGraph/web/bugzilla.png

1285157 – dom/ipc/ContentChild.cpp:631:24: error: no matching function
for call to ‘mozilla::dom::ContentChild::SetTransport(IPC::Channel*&)’


Visit Topic or reply to this email to respond.


In Reply To

<
https://ci5.googleusercontent.com/proxy/awzTLeZSQ4NoIfBlGZA3lo26kZQVVC_BjmANCNdQ4J3ntLbC3OZJIeAQFM53I2NXMRdNwx5dc_wrzQauIPOWKNBxCWTk2WOuGwsP5-7iO_OBAvvokOOWFV5k6nWiczuxCcFLpgrVlO3SS2U=s0-d-e1-ft#https://discourse.mozilla-community.org/letter_avatar_proxy/v2/letter/l/ecb155/45.png>
lissyx

July 7
That was a good catch, but your logs were not including the real issue,
and no link to a failure job, so it’s a bit harder to find from your bug
:). Sadly, I already filed the bug 1285157 :slight_smile: But that’s a good
catch anyway, this is what is needed !


(Lissyx) #16

For the past days, device builds have been broken because of TaskCluster cache level issues, and I cannot do anything about that: https://bugzilla.mozilla.org/show_bug.cgi?id=1285732


#17

The build are green for the link [1] (see post 11) \o/
But for the two others, the is no “B”. Is that normal? What do we follow to know if building are running normaly?

Thanks for your hard work on build process.


(Lissyx) #18

“B” symbol might be missing when: you have filtering NOT including tier 3 ; or the device build have not yet been triggered and/or treeherder is still loading data.


#19

So, if I understand well, for the moment, for the links [2] and [3] (see post 10) the device build have not been triggered. Because, the Tier 3 is include and Threeherder seems to have load all its data.

When I click on the link [2] this is what I have:


Has you can see, Tier 3 is activate.

If this link is not good to follow, can you give a good one? Thanks.


(Lissyx) #20

Please give links, it’s hard to follow.


(Lissyx) #21

I don’t see anywhere where it is missing any build.