xref: /llvm-project/llvm/docs/HowToAddABuilder.rst (revision ee4fb3a8761b0abe231a8fdc127cd668cd9478f7)
1===================================================================
2How To Add Your Build Configuration To LLVM Buildbot Infrastructure
3===================================================================
4
5Introduction
6============
7
8This document contains information about adding a build configuration and
9buildbot worker to the LLVM Buildbot Infrastructure.
10
11.. note:: The term "buildmaster" is used in this document to refer to the
12  server that manages which builds are run and where. Though we would not
13  normally choose to use "master" terminology, it is used in this document
14  because it is the term that the Buildbot package currently
15  `uses <https://github.com/buildbot/buildbot/issues/5382>`_.
16
17Buildmasters
18============
19
20There are two buildmasters running.
21
22* The main buildmaster at `<https://lab.llvm.org/buildbot>`_. All builders
23  attached to this machine will notify commit authors every time they break
24  the build.
25* The staging buildmaster at `<https://lab.llvm.org/staging>`_. All builders
26  attached to this machine will be completely silent by default when the build
27  is broken. This buildmaster is reconfigured every two hours with any new
28  commits from the llvm-zorg repository.
29
30In order to remain connected to the main buildmaster (and thus notify
31developers of failures), a builbot must:
32
33* Be building a supported configuration.  Builders for experimental backends
34  should generally be attached to staging buildmaster.
35* Be able to keep up with new commits to the main branch, or at a minimum
36  recover to tip of tree within a couple of days of falling behind.
37
38Additionally, we encourage all bot owners to point their bots towards the
39staging master during maintenance windows, instability troubleshooting, and
40such.
41
42Roles & Expectations
43====================
44
45Each buildbot has an owner who is the responsible party for addressing problems
46which arise with said buildbot.  We generally expect the bot owner to be
47reasonably responsive.
48
49For some bots, the ownership responsibility is split between a "resource owner"
50who provides the underlying machine resource, and a "configuration owner" who
51maintains the build configuration.  Generally, operational responsibility lies
52with the "config owner".  We do expect "resource owners" - who are generally
53the contact listed in a workers attributes - to proxy requests to the relevant
54"config owner" in a timely manner.
55
56Most issues with a buildbot should be addressed directly with a bot owner
57via email.  Please CC `Galina Kistanova <mailto:gkistanova@gmail.com>`_.
58
59Steps To Add Builder To LLVM Buildbot
60=====================================
61Volunteers can provide their build machines to work as build workers to
62public LLVM Buildbot.
63
64Here are the steps you can follow to do so:
65
66#. Check the existing build configurations to make sure the one you are
67   interested in is not covered yet or gets built on your computer much
68   faster than on the existing one. We prefer faster builds so developers
69   will get feedback sooner after changes get committed.
70
71#. The computer you will be registering with the LLVM buildbot
72   infrastructure should have all dependencies installed and be able to
73   build your configuration successfully. Please check what degree
74   of parallelism (-j param) would give the fastest build.  You can build
75   multiple configurations on one computer.
76
77#. Install buildbot-worker (currently we are using buildbot version 2.8.4).
78   This specific version can be installed using ``pip``, with a command such
79   as ``pip3 install buildbot-worker==2.8.4``.
80
81#. Create a designated user account, your buildbot-worker will be running under,
82   and set appropriate permissions.
83
84#. Choose the buildbot-worker root directory (all builds will be placed under
85   it), buildbot-worker access name and password the build master will be using
86   to authenticate your buildbot-worker.
87
88#. Create a buildbot-worker in context of that buildbot-worker account. Point it
89   to the **lab.llvm.org** port **9994** (see `Buildbot documentation,
90   Creating a worker
91   <http://docs.buildbot.net/current/tutorial/firstrun.html#creating-a-worker>`_
92   for more details) by running the following command:
93
94    .. code-block:: bash
95
96       $ buildbot-worker create-worker <buildbot-worker-root-directory> \
97                    lab.llvm.org:9994 \
98                    <buildbot-worker-access-name> \
99                    <buildbot-worker-access-password>
100
101   Only once a new worker is stable, and
102   approval from Galina has been received (see last step) should it
103   be pointed at the main buildmaster.
104
105   Now start the worker:
106
107    .. code-block:: bash
108
109       $ buildbot-worker start <buildbot-worker-root-directory>
110
111   This will cause your new worker to connect to the staging buildmaster
112   which is silent by default.
113
114   Try this once then check the log file
115   ``<buildbot-worker-root-directory>/worker/twistd.log``. If your settings
116   are correct you will see a refused connection. This is good and expected,
117   as the credentials have not been established on both ends. Now stop the
118   worker and proceed to the next steps.
119
120#. Fill the buildbot-worker description and admin name/e-mail.  Here is an
121   example of the buildbot-worker description::
122
123       Windows 7 x64
124       Core i7 (2.66GHz), 16GB of RAM
125
126       g++.exe (TDM-1 mingw32) 4.4.0
127       GNU Binutils 2.19.1
128       cmake version 2.8.4
129       Microsoft(R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
130
131   See `here <http://docs.buildbot.net/current/manual/installation/worker.html>`_
132   for which files to edit.
133
134#. Send a patch which adds your build worker and your builder to
135   `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM
136   `workflow <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
137
138   * workers are added to ``buildbot/osuosl/master/config/workers.py``
139   * builders are added to ``buildbot/osuosl/master/config/builders.py``
140
141   Please make sure your builder name and its builddir are unique through the
142   file.
143
144   All new builders should default to using the "'collapseRequests': False"
145   configuration.  This causes the builder to build each commit individually
146   and not merge build requests.  To maximize quality of feedback to developers,
147   we *strongly prefer* builders to be configured not to collapse requests.
148   This flag should be removed only after all reasonable efforts have been
149   exhausted to improve build times such that the builder can keep up with
150   commit flow.
151
152   It is possible to allow email addresses to unconditionally receive
153   notifications on build failure; for this you'll need to add an
154   ``InformativeMailNotifier`` to ``buildbot/osuosl/master/config/status.py``.
155   This is particularly useful for the staging buildmaster which is silent
156   otherwise.
157
158#. Send the buildbot-worker access name and the access password directly to
159   `Galina Kistanova <mailto:gkistanova@gmail.com>`_, and wait until she
160   lets you know that your changes are applied and buildmaster is
161   reconfigured.
162
163#. Make sure you can start the buildbot-worker and successfully connect
164   to the silent buildmaster. Then set up your buildbot-worker to start
165   automatically at the start up time.  See the buildbot documentation
166   for help.  You may want to restart your computer to see if it works.
167
168#. Check the status of your buildbot-worker on the `Waterfall Display (Staging)
169   <http://lab.llvm.org/staging/#/waterfall>`_ to make sure it is
170   connected, and the `Workers Display (Staging)
171   <http://lab.llvm.org/staging/#/workers>`_ to see if administrator
172   contact and worker information are correct.
173
174#. At this point, you have a working builder connected to the staging
175   buildmaster.  You can now make sure it is reliably green and keeps
176   up with the build queue.  No notifications will be sent, so you can
177   keep an unstable builder connected to staging indefinitely.
178
179#. (Optional) Once the builder is stable on the staging buildmaster with
180   several days of green history, you can choose to move it to the production
181   buildmaster to enable developer notifications.  Please email `Galina
182   Kistanova <mailto:gkistanova@gmail.com>`_ for review and approval.
183
184   To move a worker to production (once approved), stop your worker, edit the
185   buildbot.tac file to change the port number from 9994 to 9990 and start it
186   again.
187
188Testing a Builder Config Locally
189================================
190
191It is possible to test a builder running against a local version of LLVM's
192buildmaster setup. This allows you to test changes to builder, worker, and
193buildmaster configuration. A buildmaster launched in this "local testing" mode
194will:
195
196* Bind only to local interfaces.
197* Use SQLite as the database.
198* Use a single fixed password for workers.
199* Disable extras like GitHub authentication.
200
201In order to use this "local testing" mode:
202
203* Within a checkout of `llvm-zorg <https://github.com/llvm/llvm-zorg>`_,
204  create and activate a Python `venv
205  <https://docs.python.org/3/library/venv.html>`_ and install the necessary
206  dependencies.
207
208    .. code-block:: bash
209
210       python -m venv bbenv
211       source bbenv/bin/activate
212       pip install buildbot{,-console-view,-grid-view,-waterfall-view,-worker,-www}==3.11.7 urllib3
213
214* Initialise the necessary buildmaster files, link to the configuration in
215  ``llvm-zorg`` and ask ``buildbot`` to check the configuration. This step can
216  be run from any directory.
217
218    .. code-block:: bash
219
220       buildbot create-master llvm-testbbmaster
221       cd llvm-testbbmaster
222       ln -s /path/to/checkout/of/llvm-zorg/buildbot/osuosl/master/master.cfg .
223       ln -s /path/to/checkout/of/llvm-zorg/buildbot/osuosl/master/config/ .
224       ln -s /path/to/checkout/of/llvm-zorg/zorg/ .
225       BUILDBOT_TEST=1 buildbot checkconfig
226
227* Start the buildmaster.
228
229    .. code-block:: bash
230
231       BUILDBOT_TEST=1 buildbot start --nodaemon .
232
233* After waiting a few seconds for startup to complete, you should be able to
234  open the web UI at ``http://localhost:8011``.  If there are any errors or
235  this isn't working, check ``twistd.log`` (within the current directory) for
236  more information.
237
238* You can now create and start a buildbot worker. Ensure you pick the correct
239  name for the worker associated with the build configuration you want to test
240  in ``buildbot/osuosl/master/config/builders.py``.
241
242    .. code-block:: bash
243
244       buildbot-worker create-worker <buildbot-worker-root-directory> \
245                       localhost:9990 \
246                       <buildbot-worker-name> \
247                       test
248       buildbot-worker start --nodaemon <buildbot-worker-root-directory>
249
250* Either wait until the poller sets off a build, or alternatively force a
251  build to start in the web UI.
252
253* Review the progress and results of the build in the web UI.
254
255This local testing configuration defaults to binding only to the loopback
256interface for security reasons.
257
258If you want to run the test worker on a different machine, or to run the
259buildmaster on a remote server, ssh port forwarding can be used to make
260connection possible. For instance, if running the buildmaster on a remote
261server the following command will suffice to make the web UI accessible via
262``http://localhost:8011`` and make it possible for a local worker to connect
263to the remote buildmaster by connecting to ``localhost:9900``:
264
265    .. code-block:: bash
266
267       ssh -N -L 8011:localhost:8011 -L 9990:localhost:9990 username@buildmaster_server_address
268
269
270Best Practices for Configuring a Fast Builder
271=============================================
272
273As mentioned above, we generally have a strong preference for
274builders which can build every commit as they come in.  This section
275includes best practices and some recommendations as to how to achieve
276that end.
277
278The goal
279  In 2020, the monorepo had just under 35 thousand commits.  This works
280  out to an average of 4 commits per hour.  Already, we can see that a
281  builder must cycle in less than 15 minutes to have a hope of being
282  useful.  However, those commits are not uniformly distributed.  They
283  tend to cluster strongly during US working hours.  Looking at a couple
284  of recent (Nov 2021) working days, we routinely see ~10 commits per
285  hour during peek times, with occasional spikes as high as ~15 commits
286  per hour.  Thus, as a rule of thumb, we should plan for our builder to
287  complete ~10-15 builds an hour.
288
289Resource Appropriately
290  At 10-15 builds per hour, we need to complete a new build on average every
291  4 to 6 minutes.  For anything except the fastest of hardware/build configs,
292  this is going to be well beyond the ability of a single machine.  In buildbot
293  terms, we likely going to need multiple workers to build requests in parallel
294  under a single builder configuration.  For some rough back of the envelope
295  numbers, if your build config takes e.g. 30 minutes, you will need something
296  on the order of 5-8 workers.  If your build config takes ~2 hours, you'll
297  need something on the order of 20-30 workers.  The rest of this section
298  focuses on how to reduce cycle times.
299
300Restrict what you build and test
301  Think hard about why you're setting up a bot, and restrict your build
302  configuration as much as you can.  Basic functionality is probably
303  already covered by other bots, and you don't need to duplicate that
304  testing.  You only need to be building and testing the *unique* parts
305  of the configuration.  (e.g. For a multi-stage clang builder, you probably
306  don't need to be enabling every target or building all the various utilities.)
307
308  It can sometimes be worthwhile splitting a single builder into two or more,
309  if you have multiple distinct purposes for the same builder.  As an example,
310  if you want to both a) confirm that all of LLVM builds with your host
311  compiler, and b) want to do a multi-stage clang build on your target, you
312  may be better off with two separate bots.  Splitting increases resource
313  consumption, but makes it easy for each bot to keep up with commit flow.
314  Additionally, splitting bots may assist in triage by narrowing attention to
315  relevant parts of the failing configuration.
316
317  In general, we recommend Release build types with Assertions enabled.  This
318  generally provides a good balance between build times and bug detection for
319  most buildbots.  There may be room for including some debug info (e.g. with
320  `-gmlt`), but in general the balance between debug info quality and build
321  times is a delicate one.
322
323Use Ninja & LLD
324  Ninja really does help build times over Make, particularly for highly
325  parallel builds.  LLD helps to reduce both link times and memory usage
326  during linking significantly.  With a build machine with sufficient
327  parallelism, link times tend to dominate critical path of the build, and are
328  thus worth optimizing.
329
330Use CCache and NOT incremental builds
331  Using ccache materially improves average build times.  Incremental builds
332  can be slightly faster, but introduce the risk of build corruption due to
333  e.g. state changes, etc...  At this point, the recommendation is not to
334  use incremental builds and instead use ccache as the latter captures the
335  majority of the benefit with less risk of false positives.
336
337  One of the non-obvious benefits of using ccache is that it makes the
338  builder less sensitive to which projects are being monitored vs built.
339  If a change triggers a build request, but doesn't change the build output
340  (e.g. doc changes, python utility changes, etc..), the build will entirely
341  hit in cache and the build request will complete in just the testing time.
342
343  With multiple workers, it is tempting to try to configure a shared cache
344  between the workers.  Experience to date indicates this is difficult to
345  well, and that having local per-worker caches gets most of the benefit
346  anyways.  We don't currently recommend shared caches.
347
348  CCache does depend on the builder hardware having sufficient IO to access
349  the cache with reasonable access times - i.e. a fast disk, or enough memory
350  for a RAM cache, etc..  For builders without, incremental may be your best
351  option, but is likely to require higher ongoing involvement from the
352  sponsor.
353
354Enable batch builds
355  As a last resort, you can configure your builder to batch build requests.
356  This makes the build failure notifications markedly less actionable, and
357  should only be done once all other reasonable measures have been taken.
358
359Leave it on the staging buildmaster
360  While most of this section has been biased towards builders intended for
361  the main buildmaster, it is worth highlighting that builders can run
362  indefinitely on the staging buildmaster.  Such a builder may still be
363  useful for the sponsoring organization, without concern of negatively
364  impacting the broader community.  The sponsoring organization simply
365  has to take on the responsibility of all bisection and triage.
366
367Managing a Worker From The Web Interface
368========================================
369
370Tasks such as clearing pending building requests can be done using
371the Buildbot web interface. To do this you must be recognised as an admin
372of the worker:
373
374* Set your public GitHub profile email to one that was included in the
375  ``admin`` information you set up on the worker. It does not matter if this
376  is your primary account email or a "verified email". To confirm this has been
377  done correctly, go to ``github.com/<your GitHub username>`` and you should
378  see the email address listed there.
379
380  A worker can have many admins, if they are listed in the form
381  ``First Last <first.last@example.com>, First2 Last2 <first2.last2@example.com>``.
382  You only need to have one of those addresses in your profile to be recognised
383  as an admin.
384
385  If you need to add an email address, you can edit the ``admin`` file and
386  restart the worker. You should see the new admin details in the web interface
387  shortly afterwards.
388
389* Connect GitHub to Buildbot by clicking on the "Anonymous" button on the
390  top right of the page, then "Login with GitHub" and authorise the app.
391
392Some tasks don't give immediate feedback, so if nothing happens within a short
393time, try again with the browser's web console open. Sometimes you will see
394403 errors and other messages that might indicate you don't have the correct
395details set up.
396
397