xref: /llvm-project/llvm/docs/FAQ.rst (revision 829ea59ddaf0ddfa1d9316a9260bd3ba17562ffe)
1================================
2Frequently Asked Questions (FAQ)
3================================
4
5.. contents::
6   :local:
7
8
9License
10=======
11
12Can I modify LLVM source code and redistribute the modified source?
13-------------------------------------------------------------------
14Yes.  The modified source distribution must retain the copyright notice and
15follow the conditions listed in the `Apache License v2.0 with LLVM Exceptions
16<https://github.com/llvm/llvm-project/blob/main/llvm/LICENSE.TXT>`_.
17
18
19Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
20--------------------------------------------------------------------------------------------------------------------------
21Yes. This is why we distribute LLVM under a less restrictive license than GPL,
22as explained in the first question above.
23
24
25Can I use AI coding tools, such as GitHub co-pilot, to write LLVM patches?
26--------------------------------------------------------------------------
27Yes, as long as the resulting work can be licensed under the project license, as
28covered in the :doc:`DeveloperPolicy`. Using an AI tool to reproduce copyrighted
29work does not rinse it of copyright and grant you the right to relicense it.
30
31
32Source Code
33===========
34
35In what language is LLVM written?
36---------------------------------
37All of the LLVM tools and libraries are written in C++ with extensive use of
38the STL.
39
40
41How portable is the LLVM source code?
42-------------------------------------
43The LLVM source code should be portable to most modern Unix-like operating
44systems. LLVM also has excellent support on Windows systems.
45Most of the code is written in standard C++ with operating system
46services abstracted to a support library.  The tools required to build and
47test LLVM have been ported to a plethora of platforms.
48
49
50What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation?
51---------------------------------------------------------------------------------------------------
52
53In short: you can't. It's actually kind of a silly question once you grok
54what's going on. Basically, in code like:
55
56.. code-block:: llvm
57
58    %result = add i32 %foo, %bar
59
60, ``%result`` is just a name given to the ``Value`` of the ``add``
61instruction. In other words, ``%result`` *is* the add instruction. The
62"assignment" doesn't explicitly "store" anything to any "virtual register";
63the "``=``" is more like the mathematical sense of equality.
64
65Longer explanation: In order to generate a textual representation of the
66IR, some kind of name has to be given to each instruction so that other
67instructions can textually reference it. However, the isomorphic in-memory
68representation that you manipulate from C++ has no such restriction since
69instructions can simply keep pointers to any other ``Value``'s that they
70reference. In fact, the names of dummy numbered temporaries like ``%1`` are
71not explicitly represented in the in-memory representation at all (see
72``Value::getName()``).
73
74
75Source Languages
76================
77
78What source languages are supported?
79------------------------------------
80
81LLVM currently has full support for C and C++ source languages through
82`Clang <https://clang.llvm.org/>`_. Many other language frontends have
83been written using LLVM, and an incomplete list is available at
84`projects with LLVM <https://llvm.org/ProjectsWithLLVM/>`_.
85
86
87I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
88----------------------------------------------------------------------------------------------------------------------------------------
89Your compiler front-end will communicate with LLVM by creating a module in the
90LLVM intermediate representation (IR) format. Assuming you want to write your
91language's compiler in the language itself (rather than C++), there are 3
92major ways to tackle generating LLVM IR from a front-end:
93
941. **Call into the LLVM libraries code using your language's FFI (foreign
95   function interface).**
96
97  * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format
98
99  * *for:* enables running LLVM optimization passes without a emit/parse
100    overhead
101
102  * *for:* adapts well to a JIT context
103
104  * *against:* lots of ugly glue code to write
105
1062. **Emit LLVM assembly from your compiler's native language.**
107
108  * *for:* very straightforward to get started
109
110  * *against:* the .ll parser is slower than the bitcode reader when
111    interfacing to the middle end
112
113  * *against:* it may be harder to track changes to the IR
114
1153. **Emit LLVM bitcode from your compiler's native language.**
116
117  * *for:* can use the more-efficient bitcode reader when interfacing to the
118    middle end
119
120  * *against:* you'll have to re-engineer the LLVM IR object model and bitcode
121    writer in your language
122
123  * *against:* it may be harder to track changes to the IR
124
125If you go with the first option, the C bindings in include/llvm-c should help
126a lot, since most languages have strong support for interfacing with C. The
127most common hurdle with calling C from managed code is interfacing with the
128garbage collector. The C interface was designed to require very little memory
129management, and so is straightforward in this regard.
130
131What support is there for a higher level source language constructs for building a compiler?
132--------------------------------------------------------------------------------------------
133Currently, there isn't much. LLVM supports an intermediate representation
134which is useful for code representation but will not support the high level
135(abstract syntax tree) representation needed by most compilers. There are no
136facilities for lexical nor semantic analysis.
137
138
139I don't understand the ``GetElementPtr`` instruction. Help!
140-----------------------------------------------------------
141See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_.
142
143
144Using the C and C++ Front Ends
145==============================
146
147Can I compile C or C++ code to platform-independent LLVM bitcode?
148-----------------------------------------------------------------
149No. C and C++ are inherently platform-dependent languages. The most obvious
150example of this is the preprocessor. A very common way that C code is made
151portable is by using the preprocessor to include platform-specific code. In
152practice, information about other platforms is lost after preprocessing, so
153the result is inherently dependent on the platform that the preprocessing was
154targeting.
155
156Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary
157between platforms. In most C front-ends, ``sizeof`` is expanded to a
158constant immediately, thus hard-wiring a platform-specific detail.
159
160Also, since many platforms define their ABIs in terms of C, and since LLVM is
161lower-level than C, front-ends currently must emit platform-specific IR in
162order to have the result conform to the platform ABI.
163
164
165Questions about code generated by the demo page
166===============================================
167
168What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``?
169-------------------------------------------------------------------------------------------------------------
170If you ``#include`` the ``<iostream>`` header into a C++ translation unit,
171the file will probably use the ``std::cin``/``std::cout``/... global objects.
172However, C++ does not guarantee an order of initialization between static
173objects in different translation units, so if a static ctor/dtor in your .cpp
174file used ``std::cout``, for example, the object would not necessarily be
175automatically initialized before your use.
176
177To make ``std::cout`` and friends work correctly in these scenarios, the STL
178that we use declares a static object that gets created in every translation
179unit that includes ``<iostream>``.  This object has a static constructor
180and destructor that initializes and destroys the global iostream objects
181before they could possibly be used in the file.  The code that you see in the
182``.ll`` file corresponds to the constructor and destructor registration code.
183
184If you would like to make it easier to *understand* the LLVM code generated
185by the compiler in the demo page, consider using ``printf()`` instead of
186``iostream``\s to print values.
187
188
189Where did all of my code go??
190-----------------------------
191If you are using the LLVM demo page, you may often wonder what happened to
192all of the code that you typed in.  Remember that the demo script is running
193the code through the LLVM optimizers, so if your code doesn't actually do
194anything useful, it might all be deleted.
195
196To prevent this, make sure that the code is actually needed.  For example, if
197you are computing some expression, return the value from the function instead
198of leaving it in a local variable.  If you really want to constrain the
199optimizer, you can read from and assign to ``volatile`` global variables.
200
201
202What is this "``undef``" thing that shows up in my code?
203--------------------------------------------------------
204``undef`` is the LLVM way of representing a value that is not defined.  You
205can get these if you do not initialize a variable before you use it.  For
206example, the C function:
207
208.. code-block:: c
209
210   int X() { int i; return i; }
211
212Is compiled to "``ret i32 undef``" because "``i``" never has a value specified
213for it.
214
215
216Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?
217----------------------------------------------------------------------------------------------------------------------------------------------------------
218This is a common problem run into by authors of front-ends that are using
219custom calling conventions: you need to make sure to set the right calling
220convention on both the function and on each call to the function.  For
221example, this code:
222
223.. code-block:: llvm
224
225   define fastcc void @foo() {
226       ret void
227   }
228   define void @bar() {
229       call void @foo()
230       ret void
231   }
232
233Is optimized to:
234
235.. code-block:: llvm
236
237   define fastcc void @foo() {
238       ret void
239   }
240   define void @bar() {
241       unreachable
242   }
243
244... with "``opt -instcombine -simplifycfg``".  This often bites people because
245"all their code disappears".  Setting the calling convention on the caller and
246callee is required for indirect calls to work, so people often ask why not
247make the verifier reject this sort of thing.
248
249The answer is that this code has undefined behavior, but it is not illegal.
250If we made it illegal, then every transformation that could potentially create
251this would have to ensure that it doesn't, and there is valid code that can
252create this sort of construct (in dead code).  The sorts of things that can
253cause this to happen are fairly contrived, but we still need to accept them.
254Here's an example:
255
256.. code-block:: llvm
257
258   define fastcc void @foo() {
259       ret void
260   }
261   define internal void @bar(void()* %FP, i1 %cond) {
262       br i1 %cond, label %T, label %F
263   T:
264       call void %FP()
265       ret void
266   F:
267       call fastcc void %FP()
268       ret void
269   }
270   define void @test() {
271       %X = or i1 false, false
272       call void @bar(void()* @foo, i1 %X)
273       ret void
274   }
275
276In this example, "test" always passes ``@foo``/``false`` into ``bar``, which
277ensures that it is dynamically called with the right calling conv (thus, the
278code is perfectly well defined).  If you run this through the inliner, you
279get this (the explicit "or" is there so that the inliner doesn't dead code
280eliminate a bunch of stuff):
281
282.. code-block:: llvm
283
284   define fastcc void @foo() {
285       ret void
286   }
287   define void @test() {
288       %X = or i1 false, false
289       br i1 %X, label %T.i, label %F.i
290   T.i:
291       call void @foo()
292       br label %bar.exit
293   F.i:
294       call fastcc void @foo()
295       br label %bar.exit
296   bar.exit:
297       ret void
298   }
299
300Here you can see that the inlining pass made an undefined call to ``@foo``
301with the wrong calling convention.  We really don't want to make the inliner
302have to know about this sort of thing, so it needs to be valid code.  In this
303case, dead code elimination can trivially remove the undefined code.  However,
304if ``%X`` was an input argument to ``@test``, the inliner would produce this:
305
306.. code-block:: llvm
307
308   define fastcc void @foo() {
309       ret void
310   }
311
312   define void @test(i1 %X) {
313       br i1 %X, label %T.i, label %F.i
314   T.i:
315       call void @foo()
316       br label %bar.exit
317   F.i:
318       call fastcc void @foo()
319       br label %bar.exit
320   bar.exit:
321       ret void
322   }
323
324The interesting thing about this is that ``%X`` *must* be false for the
325code to be well-defined, but no amount of dead code elimination will be able
326to delete the broken call as unreachable.  However, since
327``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we
328end up with a branch on a condition that goes to unreachable: a branch to
329unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is
330able to produce:
331
332.. code-block:: llvm
333
334   define fastcc void @foo() {
335      ret void
336   }
337   define void @test(i1 %X) {
338   F.i:
339      call fastcc void @foo()
340      ret void
341   }
342