llvm/docs/KernelInfo.rst

*18f8106fSJoel E. Denny==========
*18f8106fSJoel E. DennyKernelInfo
*18f8106fSJoel E. Denny==========
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny.. contents::
*18f8106fSJoel E. Denny   :local:
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyIntroduction
*18f8106fSJoel E. Denny============
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyThis LLVM IR pass reports various statistics for codes compiled for GPUs.  The
*18f8106fSJoel E. Dennygoal of these statistics is to help identify bad code patterns and ways to
*18f8106fSJoel E. Dennymitigate them.  The pass operates at the LLVM IR level so that it can, in
*18f8106fSJoel E. Dennytheory, support any LLVM-based compiler for programming languages supporting
*18f8106fSJoel E. DennyGPUs.
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyBy default, the pass runs at the end of LTO, and options like
*18f8106fSJoel E. Denny``-Rpass=kernel-info`` enable its remarks.  Example ``opt`` and ``clang``
*18f8106fSJoel E. Dennycommand lines appear in the next section.
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyRemarks include summary statistics (e.g., total size of static allocas) and
*18f8106fSJoel E. Dennyindividual occurrences (e.g., source location of each alloca).  Examples of the
*18f8106fSJoel E. Dennyoutput appear in tests in `llvm/test/Analysis/KernelInfo`.
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyExample Command Lines
*18f8106fSJoel E. Denny=====================
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyTo analyze a C program as it appears to an LLVM GPU backend at the end of LTO:
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny.. code-block:: shell
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
*18f8106fSJoel E. Denny      -Rpass=kernel-info
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyTo analyze specified LLVM IR, perhaps previously generated by something like
*18f8106fSJoel E. Denny``clang -save-temps -g -fopenmp --offload-arch=native test.c``:
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny.. code-block:: shell
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
*18f8106fSJoel E. Denny      -pass-remarks=kernel-info -passes=kernel-info
*18f8106fSJoel E. Denny
*18f8106fSJoel E. DennyWhen specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
*18f8106fSJoel E. Dennyruns at the end of LTO by default.  ``-no-kernel-info-end-lto`` disables that
*18f8106fSJoel E. Dennybehavior so you can position ``kernel-info`` explicitly:
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny.. code-block:: shell
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
*18f8106fSJoel E. Denny      -Rpass=kernel-info \
*18f8106fSJoel E. Denny      -Xoffload-linker --lto-newpm-passes='lto<O2>'
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
*18f8106fSJoel E. Denny      -Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
*18f8106fSJoel E. Denny      -Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
*18f8106fSJoel E. Denny      -pass-remarks=kernel-info \
*18f8106fSJoel E. Denny      -passes='lto<O2>'
*18f8106fSJoel E. Denny
*18f8106fSJoel E. Denny  $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
*18f8106fSJoel E. Denny      -pass-remarks=kernel-info -no-kernel-info-end-lto \
*18f8106fSJoel E. Denny      -passes='module(kernel-info),lto<O2>'