xref: /llvm-project/llvm/docs/KernelInfo.rst (revision 18f8106f310ee702046a11f360af47947c030d2e)
1*18f8106fSJoel E. Denny==========
2*18f8106fSJoel E. DennyKernelInfo
3*18f8106fSJoel E. Denny==========
4*18f8106fSJoel E. Denny
5*18f8106fSJoel E. Denny.. contents::
6*18f8106fSJoel E. Denny   :local:
7*18f8106fSJoel E. Denny
8*18f8106fSJoel E. DennyIntroduction
9*18f8106fSJoel E. Denny============
10*18f8106fSJoel E. Denny
11*18f8106fSJoel E. DennyThis LLVM IR pass reports various statistics for codes compiled for GPUs.  The
12*18f8106fSJoel E. Dennygoal of these statistics is to help identify bad code patterns and ways to
13*18f8106fSJoel E. Dennymitigate them.  The pass operates at the LLVM IR level so that it can, in
14*18f8106fSJoel E. Dennytheory, support any LLVM-based compiler for programming languages supporting
15*18f8106fSJoel E. DennyGPUs.
16*18f8106fSJoel E. Denny
17*18f8106fSJoel E. DennyBy default, the pass runs at the end of LTO, and options like
18*18f8106fSJoel E. Denny``-Rpass=kernel-info`` enable its remarks.  Example ``opt`` and ``clang``
19*18f8106fSJoel E. Dennycommand lines appear in the next section.
20*18f8106fSJoel E. Denny
21*18f8106fSJoel E. DennyRemarks include summary statistics (e.g., total size of static allocas) and
22*18f8106fSJoel E. Dennyindividual occurrences (e.g., source location of each alloca).  Examples of the
23*18f8106fSJoel E. Dennyoutput appear in tests in `llvm/test/Analysis/KernelInfo`.
24*18f8106fSJoel E. Denny
25*18f8106fSJoel E. DennyExample Command Lines
26*18f8106fSJoel E. Denny=====================
27*18f8106fSJoel E. Denny
28*18f8106fSJoel E. DennyTo analyze a C program as it appears to an LLVM GPU backend at the end of LTO:
29*18f8106fSJoel E. Denny
30*18f8106fSJoel E. Denny.. code-block:: shell
31*18f8106fSJoel E. Denny
32*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
33*18f8106fSJoel E. Denny      -Rpass=kernel-info
34*18f8106fSJoel E. Denny
35*18f8106fSJoel E. DennyTo analyze specified LLVM IR, perhaps previously generated by something like
36*18f8106fSJoel E. Denny``clang -save-temps -g -fopenmp --offload-arch=native test.c``:
37*18f8106fSJoel E. Denny
38*18f8106fSJoel E. Denny.. code-block:: shell
39*18f8106fSJoel E. Denny
40*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
41*18f8106fSJoel E. Denny      -pass-remarks=kernel-info -passes=kernel-info
42*18f8106fSJoel E. Denny
43*18f8106fSJoel E. DennyWhen specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
44*18f8106fSJoel E. Dennyruns at the end of LTO by default.  ``-no-kernel-info-end-lto`` disables that
45*18f8106fSJoel E. Dennybehavior so you can position ``kernel-info`` explicitly:
46*18f8106fSJoel E. Denny
47*18f8106fSJoel E. Denny.. code-block:: shell
48*18f8106fSJoel E. Denny
49*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
50*18f8106fSJoel E. Denny      -Rpass=kernel-info \
51*18f8106fSJoel E. Denny      -Xoffload-linker --lto-newpm-passes='lto<O2>'
52*18f8106fSJoel E. Denny
53*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
54*18f8106fSJoel E. Denny      -Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
55*18f8106fSJoel E. Denny      -Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'
56*18f8106fSJoel E. Denny
57*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
58*18f8106fSJoel E. Denny      -pass-remarks=kernel-info \
59*18f8106fSJoel E. Denny      -passes='lto<O2>'
60*18f8106fSJoel E. Denny
61*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
62*18f8106fSJoel E. Denny      -pass-remarks=kernel-info -no-kernel-info-end-lto \
63*18f8106fSJoel E. Denny      -passes='module(kernel-info),lto<O2>'
64