1*18f8106fSJoel E. Denny========== 2*18f8106fSJoel E. DennyKernelInfo 3*18f8106fSJoel E. Denny========== 4*18f8106fSJoel E. Denny 5*18f8106fSJoel E. Denny.. contents:: 6*18f8106fSJoel E. Denny :local: 7*18f8106fSJoel E. Denny 8*18f8106fSJoel E. DennyIntroduction 9*18f8106fSJoel E. Denny============ 10*18f8106fSJoel E. Denny 11*18f8106fSJoel E. DennyThis LLVM IR pass reports various statistics for codes compiled for GPUs. The 12*18f8106fSJoel E. Dennygoal of these statistics is to help identify bad code patterns and ways to 13*18f8106fSJoel E. Dennymitigate them. The pass operates at the LLVM IR level so that it can, in 14*18f8106fSJoel E. Dennytheory, support any LLVM-based compiler for programming languages supporting 15*18f8106fSJoel E. DennyGPUs. 16*18f8106fSJoel E. Denny 17*18f8106fSJoel E. DennyBy default, the pass runs at the end of LTO, and options like 18*18f8106fSJoel E. Denny``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang`` 19*18f8106fSJoel E. Dennycommand lines appear in the next section. 20*18f8106fSJoel E. Denny 21*18f8106fSJoel E. DennyRemarks include summary statistics (e.g., total size of static allocas) and 22*18f8106fSJoel E. Dennyindividual occurrences (e.g., source location of each alloca). Examples of the 23*18f8106fSJoel E. Dennyoutput appear in tests in `llvm/test/Analysis/KernelInfo`. 24*18f8106fSJoel E. Denny 25*18f8106fSJoel E. DennyExample Command Lines 26*18f8106fSJoel E. Denny===================== 27*18f8106fSJoel E. Denny 28*18f8106fSJoel E. DennyTo analyze a C program as it appears to an LLVM GPU backend at the end of LTO: 29*18f8106fSJoel E. Denny 30*18f8106fSJoel E. Denny.. code-block:: shell 31*18f8106fSJoel E. Denny 32*18f8106fSJoel E. Denny $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ 33*18f8106fSJoel E. Denny -Rpass=kernel-info 34*18f8106fSJoel E. Denny 35*18f8106fSJoel E. DennyTo analyze specified LLVM IR, perhaps previously generated by something like 36*18f8106fSJoel E. Denny``clang -save-temps -g -fopenmp --offload-arch=native test.c``: 37*18f8106fSJoel E. Denny 38*18f8106fSJoel E. Denny.. code-block:: shell 39*18f8106fSJoel E. Denny 40*18f8106fSJoel E. Denny $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ 41*18f8106fSJoel E. Denny -pass-remarks=kernel-info -passes=kernel-info 42*18f8106fSJoel E. Denny 43*18f8106fSJoel E. DennyWhen specifying an LLVM pass pipeline on the command line, ``kernel-info`` still 44*18f8106fSJoel E. Dennyruns at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that 45*18f8106fSJoel E. Dennybehavior so you can position ``kernel-info`` explicitly: 46*18f8106fSJoel E. Denny 47*18f8106fSJoel E. Denny.. code-block:: shell 48*18f8106fSJoel E. Denny 49*18f8106fSJoel E. Denny $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ 50*18f8106fSJoel E. Denny -Rpass=kernel-info \ 51*18f8106fSJoel E. Denny -Xoffload-linker --lto-newpm-passes='lto<O2>' 52*18f8106fSJoel E. Denny 53*18f8106fSJoel E. Denny $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ 54*18f8106fSJoel E. Denny -Rpass=kernel-info -mllvm -no-kernel-info-end-lto \ 55*18f8106fSJoel E. Denny -Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>' 56*18f8106fSJoel E. Denny 57*18f8106fSJoel E. Denny $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ 58*18f8106fSJoel E. Denny -pass-remarks=kernel-info \ 59*18f8106fSJoel E. Denny -passes='lto<O2>' 60*18f8106fSJoel E. Denny 61*18f8106fSJoel E. Denny $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ 62*18f8106fSJoel E. Denny -pass-remarks=kernel-info -no-kernel-info-end-lto \ 63*18f8106fSJoel E. Denny -passes='module(kernel-info),lto<O2>' 64