xref: /netbsd-src/external/apache2/llvm/dist/llvm/docs/FAQ.rst (revision 82d56013d7b633d116a93943de88e08335357a7c)
1================================
2Frequently Asked Questions (FAQ)
3================================
4
5.. contents::
6   :local:
7
8
9License
10=======
11
12Can I modify LLVM source code and redistribute the modified source?
13-------------------------------------------------------------------
14Yes.  The modified source distribution must retain the copyright notice and
15follow the conditions listed in the `Apache License v2.0 with LLVM Exceptions
16<https://github.com/llvm/llvm-project/blob/main/llvm/LICENSE.TXT>`_.
17
18
19Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
20--------------------------------------------------------------------------------------------------------------------------
21Yes. This is why we distribute LLVM under a less restrictive license than GPL,
22as explained in the first question above.
23
24
25Source Code
26===========
27
28In what language is LLVM written?
29---------------------------------
30All of the LLVM tools and libraries are written in C++ with extensive use of
31the STL.
32
33
34How portable is the LLVM source code?
35-------------------------------------
36The LLVM source code should be portable to most modern Unix-like operating
37systems. LLVM also has excellent support on Windows systems.
38Most of the code is written in standard C++ with operating system
39services abstracted to a support library.  The tools required to build and
40test LLVM have been ported to a plethora of platforms.
41
42
43What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation?
44---------------------------------------------------------------------------------------------------
45
46In short: you can't. It's actually kind of a silly question once you grok
47what's going on. Basically, in code like:
48
49.. code-block:: llvm
50
51    %result = add i32 %foo, %bar
52
53, ``%result`` is just a name given to the ``Value`` of the ``add``
54instruction. In other words, ``%result`` *is* the add instruction. The
55"assignment" doesn't explicitly "store" anything to any "virtual register";
56the "``=``" is more like the mathematical sense of equality.
57
58Longer explanation: In order to generate a textual representation of the
59IR, some kind of name has to be given to each instruction so that other
60instructions can textually reference it. However, the isomorphic in-memory
61representation that you manipulate from C++ has no such restriction since
62instructions can simply keep pointers to any other ``Value``'s that they
63reference. In fact, the names of dummy numbered temporaries like ``%1`` are
64not explicitly represented in the in-memory representation at all (see
65``Value::getName()``).
66
67
68Source Languages
69================
70
71What source languages are supported?
72------------------------------------
73
74LLVM currently has full support for C and C++ source languages through
75`Clang <https://clang.llvm.org/>`_. Many other language frontends have
76been written using LLVM, and an incomplete list is available at
77`projects with LLVM <https://llvm.org/ProjectsWithLLVM/>`_.
78
79
80I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
81----------------------------------------------------------------------------------------------------------------------------------------
82Your compiler front-end will communicate with LLVM by creating a module in the
83LLVM intermediate representation (IR) format. Assuming you want to write your
84language's compiler in the language itself (rather than C++), there are 3
85major ways to tackle generating LLVM IR from a front-end:
86
871. **Call into the LLVM libraries code using your language's FFI (foreign
88   function interface).**
89
90  * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format
91
92  * *for:* enables running LLVM optimization passes without a emit/parse
93    overhead
94
95  * *for:* adapts well to a JIT context
96
97  * *against:* lots of ugly glue code to write
98
992. **Emit LLVM assembly from your compiler's native language.**
100
101  * *for:* very straightforward to get started
102
103  * *against:* the .ll parser is slower than the bitcode reader when
104    interfacing to the middle end
105
106  * *against:* it may be harder to track changes to the IR
107
1083. **Emit LLVM bitcode from your compiler's native language.**
109
110  * *for:* can use the more-efficient bitcode reader when interfacing to the
111    middle end
112
113  * *against:* you'll have to re-engineer the LLVM IR object model and bitcode
114    writer in your language
115
116  * *against:* it may be harder to track changes to the IR
117
118If you go with the first option, the C bindings in include/llvm-c should help
119a lot, since most languages have strong support for interfacing with C. The
120most common hurdle with calling C from managed code is interfacing with the
121garbage collector. The C interface was designed to require very little memory
122management, and so is straightforward in this regard.
123
124What support is there for a higher level source language constructs for building a compiler?
125--------------------------------------------------------------------------------------------
126Currently, there isn't much. LLVM supports an intermediate representation
127which is useful for code representation but will not support the high level
128(abstract syntax tree) representation needed by most compilers. There are no
129facilities for lexical nor semantic analysis.
130
131
132I don't understand the ``GetElementPtr`` instruction. Help!
133-----------------------------------------------------------
134See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_.
135
136
137Using the C and C++ Front Ends
138==============================
139
140Can I compile C or C++ code to platform-independent LLVM bitcode?
141-----------------------------------------------------------------
142No. C and C++ are inherently platform-dependent languages. The most obvious
143example of this is the preprocessor. A very common way that C code is made
144portable is by using the preprocessor to include platform-specific code. In
145practice, information about other platforms is lost after preprocessing, so
146the result is inherently dependent on the platform that the preprocessing was
147targeting.
148
149Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary
150between platforms. In most C front-ends, ``sizeof`` is expanded to a
151constant immediately, thus hard-wiring a platform-specific detail.
152
153Also, since many platforms define their ABIs in terms of C, and since LLVM is
154lower-level than C, front-ends currently must emit platform-specific IR in
155order to have the result conform to the platform ABI.
156
157
158Questions about code generated by the demo page
159===============================================
160
161What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``?
162-------------------------------------------------------------------------------------------------------------
163If you ``#include`` the ``<iostream>`` header into a C++ translation unit,
164the file will probably use the ``std::cin``/``std::cout``/... global objects.
165However, C++ does not guarantee an order of initialization between static
166objects in different translation units, so if a static ctor/dtor in your .cpp
167file used ``std::cout``, for example, the object would not necessarily be
168automatically initialized before your use.
169
170To make ``std::cout`` and friends work correctly in these scenarios, the STL
171that we use declares a static object that gets created in every translation
172unit that includes ``<iostream>``.  This object has a static constructor
173and destructor that initializes and destroys the global iostream objects
174before they could possibly be used in the file.  The code that you see in the
175``.ll`` file corresponds to the constructor and destructor registration code.
176
177If you would like to make it easier to *understand* the LLVM code generated
178by the compiler in the demo page, consider using ``printf()`` instead of
179``iostream``\s to print values.
180
181
182Where did all of my code go??
183-----------------------------
184If you are using the LLVM demo page, you may often wonder what happened to
185all of the code that you typed in.  Remember that the demo script is running
186the code through the LLVM optimizers, so if your code doesn't actually do
187anything useful, it might all be deleted.
188
189To prevent this, make sure that the code is actually needed.  For example, if
190you are computing some expression, return the value from the function instead
191of leaving it in a local variable.  If you really want to constrain the
192optimizer, you can read from and assign to ``volatile`` global variables.
193
194
195What is this "``undef``" thing that shows up in my code?
196--------------------------------------------------------
197``undef`` is the LLVM way of representing a value that is not defined.  You
198can get these if you do not initialize a variable before you use it.  For
199example, the C function:
200
201.. code-block:: c
202
203   int X() { int i; return i; }
204
205Is compiled to "``ret i32 undef``" because "``i``" never has a value specified
206for it.
207
208
209Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?
210----------------------------------------------------------------------------------------------------------------------------------------------------------
211This is a common problem run into by authors of front-ends that are using
212custom calling conventions: you need to make sure to set the right calling
213convention on both the function and on each call to the function.  For
214example, this code:
215
216.. code-block:: llvm
217
218   define fastcc void @foo() {
219       ret void
220   }
221   define void @bar() {
222       call void @foo()
223       ret void
224   }
225
226Is optimized to:
227
228.. code-block:: llvm
229
230   define fastcc void @foo() {
231       ret void
232   }
233   define void @bar() {
234       unreachable
235   }
236
237... with "``opt -instcombine -simplifycfg``".  This often bites people because
238"all their code disappears".  Setting the calling convention on the caller and
239callee is required for indirect calls to work, so people often ask why not
240make the verifier reject this sort of thing.
241
242The answer is that this code has undefined behavior, but it is not illegal.
243If we made it illegal, then every transformation that could potentially create
244this would have to ensure that it doesn't, and there is valid code that can
245create this sort of construct (in dead code).  The sorts of things that can
246cause this to happen are fairly contrived, but we still need to accept them.
247Here's an example:
248
249.. code-block:: llvm
250
251   define fastcc void @foo() {
252       ret void
253   }
254   define internal void @bar(void()* %FP, i1 %cond) {
255       br i1 %cond, label %T, label %F
256   T:
257       call void %FP()
258       ret void
259   F:
260       call fastcc void %FP()
261       ret void
262   }
263   define void @test() {
264       %X = or i1 false, false
265       call void @bar(void()* @foo, i1 %X)
266       ret void
267   }
268
269In this example, "test" always passes ``@foo``/``false`` into ``bar``, which
270ensures that it is dynamically called with the right calling conv (thus, the
271code is perfectly well defined).  If you run this through the inliner, you
272get this (the explicit "or" is there so that the inliner doesn't dead code
273eliminate a bunch of stuff):
274
275.. code-block:: llvm
276
277   define fastcc void @foo() {
278       ret void
279   }
280   define void @test() {
281       %X = or i1 false, false
282       br i1 %X, label %T.i, label %F.i
283   T.i:
284       call void @foo()
285       br label %bar.exit
286   F.i:
287       call fastcc void @foo()
288       br label %bar.exit
289   bar.exit:
290       ret void
291   }
292
293Here you can see that the inlining pass made an undefined call to ``@foo``
294with the wrong calling convention.  We really don't want to make the inliner
295have to know about this sort of thing, so it needs to be valid code.  In this
296case, dead code elimination can trivially remove the undefined code.  However,
297if ``%X`` was an input argument to ``@test``, the inliner would produce this:
298
299.. code-block:: llvm
300
301   define fastcc void @foo() {
302       ret void
303   }
304
305   define void @test(i1 %X) {
306       br i1 %X, label %T.i, label %F.i
307   T.i:
308       call void @foo()
309       br label %bar.exit
310   F.i:
311       call fastcc void @foo()
312       br label %bar.exit
313   bar.exit:
314       ret void
315   }
316
317The interesting thing about this is that ``%X`` *must* be false for the
318code to be well-defined, but no amount of dead code elimination will be able
319to delete the broken call as unreachable.  However, since
320``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we
321end up with a branch on a condition that goes to unreachable: a branch to
322unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is
323able to produce:
324
325.. code-block:: llvm
326
327   define fastcc void @foo() {
328      ret void
329   }
330   define void @test(i1 %X) {
331   F.i:
332      call fastcc void @foo()
333      ret void
334   }
335