llvm/docs/InstCombineContributorGuide.md

6898147bSNikita Popov# InstCombine contributor guide
6898147bSNikita Popov
6898147bSNikita PopovThis guide lays out a series of rules that contributions to InstCombine should
6898147bSNikita Popovfollow. **Following these rules will results in much faster PR approvals.**
6898147bSNikita Popov
6898147bSNikita Popov## Tests
6898147bSNikita Popov
6898147bSNikita Popov### Precommit tests
6898147bSNikita Popov
6898147bSNikita PopovTests for new optimizations or miscompilation fixes should be pre-committed.
6898147bSNikita PopovThis means that you first commit the test with CHECK lines showing the behavior
6898147bSNikita Popov*without* your change. Your actual change will then only contain CHECK line
6898147bSNikita Popovdiffs relative to that baseline.
6898147bSNikita Popov
6898147bSNikita PopovThis means that pull requests should generally contain two commits: First,
6898147bSNikita Popovone commit adding new tests with baseline check lines. Second, a commit with
6898147bSNikita Popovfunctional changes and test diffs.
6898147bSNikita Popov
6898147bSNikita PopovIf the second commit in your PR does not contain test diffs, you did something
6898147bSNikita Popovwrong. Either you made a mistake when generating CHECK lines, or your tests are
6898147bSNikita Popovnot actually affected by your patch.
6898147bSNikita Popov
6898147bSNikita PopovExceptions: When fixing assertion failures or infinite loops, do not pre-commit
6898147bSNikita Popovtests.
6898147bSNikita Popov
6898147bSNikita Popov### Use `update_test_checks.py`
6898147bSNikita Popov
6898147bSNikita PopovCHECK lines should be generated using the `update_test_checks.py` script. Do
6898147bSNikita Popov**not** manually edit check lines after using it.
6898147bSNikita Popov
6898147bSNikita PopovBe sure to use the correct opt binary when using the script. For example, if
6898147bSNikita Popovyour build directory is `build`, then you'll want to run:
6898147bSNikita Popov
6898147bSNikita Popov```sh
6898147bSNikita Popovllvm/utils/update_test_checks.py --opt-binary build/bin/opt \
6898147bSNikita Popov    llvm/test/Transforms/InstCombine/the_test.ll
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovExceptions: Hand-written CHECK lines are allowed for debuginfo tests.
6898147bSNikita Popov
6898147bSNikita Popov### General testing considerations
6898147bSNikita Popov
6898147bSNikita PopovPlace all tests relating to a transform into a single file. If you are adding
6898147bSNikita Popova regression test for a crash/miscompile in an existing transform, find the
6898147bSNikita Popovfile where the existing tests are located. A good way to do that is to comment
6898147bSNikita Popovout the transform and see which tests fail.
6898147bSNikita Popov
6898147bSNikita PopovMake tests minimal. Only test exactly the pattern being transformed. If your
6898147bSNikita Popovoriginal motivating case is a larger pattern that your fold enables to
6898147bSNikita Popovoptimize in some non-trivial way, you may add it as well -- however, the bulk
6898147bSNikita Popovof the test coverage should be minimal.
6898147bSNikita Popov
6898147bSNikita PopovGive tests short, but meaningful names. Don't call them `@test1`, `@test2` etc.
6898147bSNikita PopovFor example, a test checking multi-use behavior of a fold involving the
6898147bSNikita Popovaddition of two selects might be called `@add_of_selects_multi_use`.
6898147bSNikita Popov
6898147bSNikita PopovAdd representative tests for each test category (discussed below), but don't
6898147bSNikita Popovtest all combinations of everything. If you have multi-use tests, and you have
6898147bSNikita Popovcommuted tests, you shouldn't also add commuted multi-use tests.
6898147bSNikita Popov
6898147bSNikita PopovPrefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible.
6898147bSNikita Popov
6898147bSNikita Popov### Add negative tests
6898147bSNikita Popov
6898147bSNikita PopovMake sure to add tests for which your transform does **not** apply. Start with
6898147bSNikita Popovone of the test cases that succeeds and then create a sequence of negative
6898147bSNikita Popovtests, such that **exactly one** different pre-condition of your transform is
6898147bSNikita Popovnot satisfied in each test.
6898147bSNikita Popov
6898147bSNikita Popov### Add multi-use tests
6898147bSNikita Popov
6898147bSNikita PopovAdd multi-use tests that ensures your transform does not increase instruction
6898147bSNikita Popovcount if some instructions have additional uses. The standard pattern is to
6898147bSNikita Popovintroduce extra uses with function calls:
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdeclare void @use(i8)
6898147bSNikita Popov
6898147bSNikita Popovdefine i8 @add_mul_const_multi_use(i8 %x) {
6898147bSNikita Popov  %add = add i8 %x, 1
6898147bSNikita Popov  call void @use(i8 %add)
6898147bSNikita Popov  %mul = mul i8 %add, 3
6898147bSNikita Popov  ret i8 %mul
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovExceptions: For transform that only produce one instruction, multi-use tests
6898147bSNikita Popovmay be omitted.
6898147bSNikita Popov
6898147bSNikita Popov### Add commuted tests
6898147bSNikita Popov
6898147bSNikita PopovIf the transform involves commutative operations, add tests with commuted
6898147bSNikita Popov(swapped) operands.
6898147bSNikita Popov
6898147bSNikita PopovMake sure that the operand order stays intact in the CHECK lines of your
6898147bSNikita Popovpre-commited tests. You should not see something like this:
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popov; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]]
6898147bSNikita Popov; ...
6898147bSNikita Popov%or = or i8 %y, %x
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovIf this happens, you may need to change one of the operands to have higher
6898147bSNikita Popovcomplexity (include the "thwart" comment in that case):
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popov%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization
6898147bSNikita Popov%or = or i8 %y, %x
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita Popov### Add vector tests
6898147bSNikita Popov
6898147bSNikita PopovWhen possible, it is recommended to add at least one test that uses vectors
6898147bSNikita Popovinstead of scalars.
6898147bSNikita Popov
6898147bSNikita PopovFor patterns that include constants, we distinguish three kinds of tests.
6898147bSNikita PopovThe first are "splat" vectors, where all the vector elements are the same.
6898147bSNikita PopovThese tests *should* usually fold without additional effort.
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) {
6898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 1>
6898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 3>
6898147bSNikita Popov  ret <2 x i8> %mul
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovA minor variant is to replace some of the splat elements with poison. These
6898147bSNikita Popovwill often also fold without additional effort.
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) {
6898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 poison>
6898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 poison>
6898147bSNikita Popov  ret <2 x i8> %mul
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovFinally, you can have non-splat vectors, where the vector elements are not
6898147bSNikita Popovthe same:
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) {
6898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 5>
6898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 6>
6898147bSNikita Popov  ret <2 x i8> %mul
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovNon-splat vectors will often not fold by default. You should **not** try to
6898147bSNikita Popovmake them fold, unless doing so does not add **any** additional complexity.
6898147bSNikita PopovYou should still add the test though, even if it does not fold.
6898147bSNikita Popov
6898147bSNikita Popov### Flag tests
6898147bSNikita Popov
6898147bSNikita PopovIf your transform involves instructions that can have poison-generating flags,
6898147bSNikita Popovsuch as `nuw` and `nsw` on `add`, you should test how these interact with the
6898147bSNikita Popovtransform.
6898147bSNikita Popov
6898147bSNikita PopovIf your transform *requires* a certain flag for correctness, make sure to add
6898147bSNikita Popovnegative tests missing the required flag.
6898147bSNikita Popov
6898147bSNikita PopovIf your transform doesn't require flags for correctness, you should have tests
6898147bSNikita Popovfor preservation behavior. If the input instructions have certain flags, are
6898147bSNikita Popovthey preserved in the output instructions, if it is valid to preserve them?
6898147bSNikita Popov(This depends on the transform. Check with alive2.)
6898147bSNikita Popov
6898147bSNikita PopovThe same also applies to fast-math-flags (FMF). In that case, please always
6898147bSNikita Popovtest specific flags like `nnan`, `nsz` or `reassoc`, rather than the umbrella
6898147bSNikita Popov`fast` flag.
6898147bSNikita Popov
6898147bSNikita Popov### Other tests
6898147bSNikita Popov
6898147bSNikita PopovThe test categories mentioned above are non-exhaustive. There may be more tests
6898147bSNikita Popovto be added, depending on the instructions involved in the transform. Some
6898147bSNikita Popovexamples:
6898147bSNikita Popov
6898147bSNikita Popov * For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled.
6898147bSNikita Popov * For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1.
6898147bSNikita Popov * For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly.
6898147bSNikita Popov
6898147bSNikita Popov## Proofs
6898147bSNikita Popov
6898147bSNikita PopovYour pull request description should contain one or more
6898147bSNikita Popov[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the
6898147bSNikita Popovproposed transform.
6898147bSNikita Popov
6898147bSNikita Popov### Basics
6898147bSNikita Popov
6898147bSNikita PopovProofs are written using LLVM IR, by specifying a `@src` and `@tgt` function.
6898147bSNikita PopovIt is possible to include multiple proofs in a single file by giving the src
6898147bSNikita Popovand tgt functions matching suffixes.
6898147bSNikita Popov
6898147bSNikita PopovFor example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can
6898147bSNikita Popovbe simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)):
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdefine i8 @src_add_sub(i8 %x, i8 %y) {
6898147bSNikita Popov  %add = add i8 %x, %y
6898147bSNikita Popov  %sub = sub i8 %add, %y
6898147bSNikita Popov  ret i8 %sub
6898147bSNikita Popov}
6898147bSNikita Popov
6898147bSNikita Popovdefine i8 @tgt_add_sub(i8 %x, i8 %y) {
6898147bSNikita Popov  ret i8 %x
6898147bSNikita Popov}
6898147bSNikita Popov
6898147bSNikita Popov
6898147bSNikita Popovdefine i8 @src_sub_add(i8 %x, i8 %y) {
6898147bSNikita Popov  %sub = sub i8 %x, %y
6898147bSNikita Popov  %add = add i8 %sub, %y
6898147bSNikita Popov  ret i8 %add
6898147bSNikita Popov}
6898147bSNikita Popov
6898147bSNikita Popovdefine i8 @tgt_sub_add(i8 %x, i8 %y) {
6898147bSNikita Popov  ret i8 %x
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita Popov### Use generic values in proofs
6898147bSNikita Popov
6898147bSNikita PopovProofs should operate on generic values, rather than specific constants, to the degree that this is possible.
6898147bSNikita Popov
6898147bSNikita PopovFor example, if we want to fold `X s/ C s< X` to `X s> 0`, the following would
6898147bSNikita Popovbe a *bad* proof:
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popov; Don't do this!
6898147bSNikita Popovdefine i1 @src(i8 %x) {
6898147bSNikita Popov  %div = sdiv i8 %x, 123
6898147bSNikita Popov  %cmp = icmp slt i8 %div, %x
6898147bSNikita Popov  ret i1 %cmp
6898147bSNikita Popov}
6898147bSNikita Popov
6898147bSNikita Popovdefine i1 @tgt(i8 %x) {
6898147bSNikita Popov  %cmp = icmp sgt i8 %x, 0
6898147bSNikita Popov  ret i1 %cmp
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovThis is because it only proves that the transform is correct for the specific
6898147bSNikita Popovconstant 123. Maybe there are some constants for which the transform is
6898147bSNikita Popovincorrect?
6898147bSNikita Popov
6898147bSNikita PopovThe correct way to write this proof is as follows
6898147bSNikita Popov([online](https://alive2.llvm.org/ce/z/acjwb6)):
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popovdefine i1 @src(i8 %x, i8 %C) {
6898147bSNikita Popov  %precond = icmp ne i8 %C, 1
6898147bSNikita Popov  call void @llvm.assume(i1 %precond)
6898147bSNikita Popov  %div = sdiv i8 %x, %C
6898147bSNikita Popov  %cmp = icmp slt i8 %div, %x
6898147bSNikita Popov  ret i1 %cmp
6898147bSNikita Popov}
6898147bSNikita Popov
6898147bSNikita Popovdefine i1 @tgt(i8 %x, i8 %C) {
6898147bSNikita Popov  %cmp = icmp sgt i8 %x, 0
6898147bSNikita Popov  ret i1 %cmp
6898147bSNikita Popov}
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovNote that the `@llvm.assume` intrinsic is used to specify pre-conditions for
6898147bSNikita Popovthe transform. In this case, the proof will fail unless we specify `C != 1` as
6898147bSNikita Popova pre-condition.
6898147bSNikita Popov
6898147bSNikita PopovIt should be emphasized that there is, in general, no expectation that the
6898147bSNikita PopovIR in the proofs will be transformed by the implemented fold. In the above
6898147bSNikita Popovexample, the transform would only apply if `%C` is actually a constant, but we
6898147bSNikita Popovneed to use non-constants in the proof.
6898147bSNikita Popov
6898147bSNikita Popov### Common pre-conditions
6898147bSNikita Popov
6898147bSNikita PopovHere are some examples of common preconditions.
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popov; %x is non-negative:
6898147bSNikita Popov%nonneg = icmp sgt i8 %x, -1
6898147bSNikita Popovcall void @llvm.assume(i1 %nonneg)
6898147bSNikita Popov
6898147bSNikita Popov; %x is a power of two:
6898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
6898147bSNikita Popov%pow2 = icmp eq i8 %x, 1
6898147bSNikita Popovcall void @llvm.assume(i1 %pow2)
6898147bSNikita Popov
6898147bSNikita Popov; %x is a power of two or zero:
6898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
6898147bSNikita Popov%pow2orzero = icmp ult i8 %x, 2
6898147bSNikita Popovcall void @llvm.assume(i1 %pow2orzero)
6898147bSNikita Popov
6898147bSNikita Popov; Adding %x and %y does not overflow in a signed sense:
6898147bSNikita Popov%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y)
6898147bSNikita Popov%ov = extractvalue { i8, i1 } %wo, 1
6898147bSNikita Popov%ov.not = xor i1 %ov, true
6898147bSNikita Popovcall void @llvm.assume(i1 %ov.not)
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita Popov### Timeouts
6898147bSNikita Popov
6898147bSNikita PopovAlive2 proofs will sometimes produce a timeout with the following message:
6898147bSNikita Popov
6898147bSNikita Popov```
6898147bSNikita PopovAlive2 timed out while processing your query.
6898147bSNikita PopovThere are a few things you can try:
6898147bSNikita Popov
6898147bSNikita Popov- remove extraneous instructions, if any
6898147bSNikita Popov
6898147bSNikita Popov- reduce variable widths, for example to i16, i8, or i4
6898147bSNikita Popov
6898147bSNikita Popov- add the --disable-undef-input command line flag, which
6898147bSNikita Popov  allows Alive2 to assume that arguments to your IR are not
6898147bSNikita Popov  undef. This is, in general, unsound: it can cause Alive2
6898147bSNikita Popov  to miss bugs.
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovThis is good advice, follow it!
6898147bSNikita Popov
6898147bSNikita PopovReducing the bitwidth usually helps. For floating point numbers, you can use
6898147bSNikita Popovthe `half` type for bitwidth reduction purposes. For pointers, you can reduce
6898147bSNikita Popovthe bitwidth by specifying a custom data layout:
6898147bSNikita Popov
6898147bSNikita Popov```llvm
6898147bSNikita Popov; For 16-bit pointers
6898147bSNikita Popovtarget datalayout = "p:16:16"
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovIf reducing the bitwidth does not help, try `-disable-undef-input`. This will
6898147bSNikita Popovoften significantly improve performance, but also implies that the correctness
6898147bSNikita Popovof the transform with `undef` values is no longer verified. This is usually
6898147bSNikita Popovfine if the transform does not increase the number of uses of any value.
6898147bSNikita Popov
6898147bSNikita PopovFinally, it's possible to build alive2 locally and use `-smt-to=<m>` to verify
6898147bSNikita Popovthe proof with a larger timeout. If you don't want to do this (or it still
6898147bSNikita Popovdoes not work), please submit the proof you have despite the timeout.
6898147bSNikita Popov
6898147bSNikita Popov## Implementation
6898147bSNikita Popov
6898147bSNikita Popov### Real-world usefulness
6898147bSNikita Popov
6898147bSNikita PopovThere is a very large number of transforms that *could* be implemented, but
6898147bSNikita Popovonly a tiny fraction of them are useful for real-world code.
6898147bSNikita Popov
6898147bSNikita PopovTransforms that do not have real-world usefulness provide *negative* value to
6898147bSNikita Popovthe LLVM project, by taking up valuable reviewer time, increasing code
6898147bSNikita Popovcomplexity and increasing compile-time overhead.
6898147bSNikita Popov
6898147bSNikita PopovWe do not require explicit proof of real-world usefulness for every transform
6898147bSNikita Popov-- in most cases the usefulness is fairly "obvious". However, the question may
6898147bSNikita Popovcome up for complex or unusual folds. Keep this in mind when chosing what you
6898147bSNikita Popovwork on.
6898147bSNikita Popov
6898147bSNikita PopovIn particular, fixes for fuzzer-generated missed optimization reports will
6898147bSNikita Popovlikely be rejected if there is no evidence of real-world usefulness.
6898147bSNikita Popov
6898147bSNikita Popov### Pick the correct optimization pass
6898147bSNikita Popov
6898147bSNikita PopovThere are a number of passes and utilities in the InstCombine family, and it
6898147bSNikita Popovis important to pick the right place when implementing a fold.
6898147bSNikita Popov
6898147bSNikita Popov * `ConstantFolding`: For folding instructions with constant arguments to a constant. (Mainly relevant for intrinsics.)
6898147bSNikita Popov * `ValueTracking`: For analyzing instructions, e.g. for known bits, non-zero, etc. Tests should usually use `-passes=instsimplify`.
6898147bSNikita Popov * `InstructionSimplify`: For folds that do not create new instructions (either fold to existing value or constant).
6898147bSNikita Popov * `InstCombine`: For folds that create or modify instructions.
6898147bSNikita Popov * `AggressiveInstCombine`: For folds that are expensive, or violate InstCombine requirements.
6898147bSNikita Popov * `VectorCombine`: For folds of vector operations that require target-dependent cost-modelling.
6898147bSNikita Popov
6898147bSNikita PopovSometimes, folds that logically belong in InstSimplify are placed in InstCombine instead, for example because they are too expensive, or because they are structurally simpler to implement in InstCombine.
6898147bSNikita Popov
6898147bSNikita PopovFor example, if a fold produces new instructions in some cases but returns an existing value in others, it may be preferable to keep all cases in InstCombine, rather than trying to split them among InstCombine and InstSimplify.
6898147bSNikita Popov
6898147bSNikita Popov### Canonicalization and target-independence
6898147bSNikita Popov
6898147bSNikita PopovInstCombine is a target-independent canonicalization pass. This means that it
6898147bSNikita Popovtries to bring IR into a "canonical form" that other optimizations (both inside
6898147bSNikita Popovand outside of InstCombine) can rely on. For this reason, the chosen canonical
6898147bSNikita Popovform needs to be the same for all targets, and not depend on target-specific
6898147bSNikita Popovcost modelling.
6898147bSNikita Popov
6898147bSNikita PopovIn many cases, "canonicalization" and "optimization" coincide. For example, if
6898147bSNikita Popovwe convert `x * 2` into `x << 1`, this both makes the IR more canonical
6898147bSNikita Popov(because there is now only one way to express the same operation, rather than
6898147bSNikita Popovtwo) and faster (because shifts will usually have lower latency than
6898147bSNikita Popovmultiplies).
6898147bSNikita Popov
6898147bSNikita PopovHowever, there are also canonicalizations that don't serve any direct
6898147bSNikita Popovoptimization purpose. For example, InstCombine will canonicalize non-strict
6898147bSNikita Popovpredicates like `ule` to strict predicates like `ult`. `icmp ule i8 %x, 7`
6898147bSNikita Popovbecomes `icmp ult i8 %x, 8`. This is not an optimization in any meaningful
6898147bSNikita Popovsense, but it does reduce the number of cases that other transforms need to
6898147bSNikita Popovhandle.
6898147bSNikita Popov
6898147bSNikita PopovIf some canonicalization is not profitable for a specific target, then a reverse
6898147bSNikita Popovtransform needs to be added in the backend. Patches to disable specific
6898147bSNikita PopovInstCombine transforms on certain targets, or to drive them using
6898147bSNikita Popovtarget-specific cost-modelling, **will not be accepted**. The only permitted
6898147bSNikita Popovtarget-dependence is on DataLayout and TargetLibraryInfo.
6898147bSNikita Popov
6898147bSNikita PopovThe use of TargetTransformInfo is only allowed for hooks for target-specific
6898147bSNikita Popovintrinsics, such as `TargetTransformInfo::instCombineIntrinsic()`. These are
6898147bSNikita Popovalready inherently target-dependent anyway.
6898147bSNikita Popov
6898147bSNikita PopovFor vector-specific transforms that require cost-modelling, the VectorCombine
6898147bSNikita Popovpass can be used instead. In very rare circumstances, if there are no other
6898147bSNikita Popovalternatives, target-dependent transforms may be accepted into
6898147bSNikita PopovAggressiveInstCombine.
6898147bSNikita Popov
6898147bSNikita Popov### PatternMatch
6898147bSNikita Popov
6898147bSNikita PopovMany transforms make use of the matching infrastructure defined in
6898147bSNikita Popov[PatternMatch.h](https://github.com/llvm/llvm-project/blame/main/llvm/include/llvm/IR/PatternMatch.h).
6898147bSNikita Popov
6898147bSNikita PopovHere is a typical usage example:
6898147bSNikita Popov
6898147bSNikita Popov```
6898147bSNikita Popov// Fold (A - B) + B and B + (A - B) to A.
6898147bSNikita PopovValue *A, *B;
6898147bSNikita Popovif (match(V, m_c_Add(m_Sub(m_Value(A), m_Value(B)), m_Deferred(B))))
6898147bSNikita Popov  return A;
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovAnd another:
6898147bSNikita Popov
6898147bSNikita Popov```
6898147bSNikita Popov// Fold A + C1 == C2 to A == C1+C2
6898147bSNikita PopovValue *A;
6898147bSNikita Popovif (match(V, m_ICmp(Pred, m_Add(m_Value(A), m_APInt(C1)), m_APInt(C2))) &&
6898147bSNikita Popov    ICmpInst::isEquality(Pred))
6898147bSNikita Popov  return Builder.CreateICmp(Pred, A,
6898147bSNikita Popov                            ConstantInt::get(A->getType(), *C1 + *C2));
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovSome common matchers are:
6898147bSNikita Popov
6898147bSNikita Popov * `m_Value(A)`: Match any value and write it into `Value *A`.
6898147bSNikita Popov * `m_Specific(A)`: Check that the operand equals A. Use this if A is
6898147bSNikita Popov   assigned **outside** the pattern.
6898147bSNikita Popov * `m_Deferred(A)`: Check that the operand equals A. Use this if A is
6898147bSNikita Popov   assigned **inside** the pattern, for example via `m_Value(A)`.
6898147bSNikita Popov * `m_APInt(C)`: Match a scalar integer constant or splat vector constant into
6898147bSNikita Popov   `const APInt *C`. Does not permit undef/poison values.
6898147bSNikita Popov * `m_ImmConstant(C)`: Match any non-constant-expression constant into
6898147bSNikita Popov   `Constant *C`.
6898147bSNikita Popov * `m_Constant(C)`: Match any constant into `Constant *C`. Don't use this unless
6898147bSNikita Popov   you know what you're doing.
6898147bSNikita Popov * `m_Add(M1, M2)`, `m_Sub(M1, M2)`, etc: Match an add/sub/etc where the first
6898147bSNikita Popov   operand matches M1 and the second M2.
6898147bSNikita Popov * `m_c_Add(M1, M2)`, etc: Match an add commutatively. The operands must match
6898147bSNikita Popov   either M1 and M2 or M2 and M1. Most instruction matchers have a commutative
6898147bSNikita Popov   variant.
6898147bSNikita Popov * `m_ICmp(Pred, M1, M2)` and `m_c_ICmp(Pred, M1, M2)`: Match an icmp, writing
6898147bSNikita Popov   the predicate into `IcmpInst::Predicate Pred`. If the commutative version
6898147bSNikita Popov   is used, and the operands match in order M2, M1, then `Pred` will be the
6898147bSNikita Popov   swapped predicate.
6898147bSNikita Popov * `m_OneUse(M)`: Check that the value only has one use, and also matches M.
6898147bSNikita Popov   For example `m_OneUse(m_Add(...))`. See the next section for more
6898147bSNikita Popov   information.
6898147bSNikita Popov
6898147bSNikita PopovSee the header for the full list of available matchers.
6898147bSNikita Popov
6898147bSNikita Popov### InstCombine APIs
6898147bSNikita Popov
6898147bSNikita PopovInstCombine transforms are handled by `visitXYZ()` methods, where XYZ
6898147bSNikita Popovcorresponds to the root instruction of your transform. If the outermost
6898147bSNikita Popovinstruction of the pattern you are matching is an icmp, the fold will be
6898147bSNikita Popovlocated somewhere inside `visitICmpInst()`.
6898147bSNikita Popov
6898147bSNikita PopovThe return value of the visit method is an instruction. You can either return
6898147bSNikita Popova new instruction, in which case it will be inserted before the old one, and
6898147bSNikita Popovuses of the old one will be replaced by it. Or you can return the original
6898147bSNikita Popovinstruction to indicate that *some* kind of change has been made. Finally, a
6898147bSNikita Popovnullptr return value indicates that no change occurred.
6898147bSNikita Popov
6898147bSNikita PopovFor example, if your transform produces a single new icmp instruction, you could
6898147bSNikita Popovwrite the following:
6898147bSNikita Popov
6898147bSNikita Popov```
6898147bSNikita Popovif (...)
6898147bSNikita Popov  return new ICmpInst(Pred, X, Y);
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovIn this case the main InstCombine loop takes care of inserting the instruction
6898147bSNikita Popovand replacing uses of the old instruction.
6898147bSNikita Popov
6898147bSNikita PopovAlternatively, you can also write it like this:
6898147bSNikita Popov
6898147bSNikita Popov```
6898147bSNikita Popovif (...)
6898147bSNikita Popov  return replaceInstUsesWith(OrigI, Builder.CreateICmp(Pred, X, Y));
6898147bSNikita Popov```
6898147bSNikita Popov
6898147bSNikita PopovIn this case `IRBuilder` will insert the instruction and `replaceInstUsesWith()`
6898147bSNikita Popovwill replace the uses of the old instruction, and return it to indicate that
6898147bSNikita Popova change occurred.
6898147bSNikita Popov
6898147bSNikita PopovBoth forms are equivalent, and you can use whichever is more convenient in
6898147bSNikita Popovcontext. For example, it's common that folds are inside helper functions that
6898147bSNikita Popovreturn `Value *` and then `replaceInstUsesWith()` is invoked on the result of
6898147bSNikita Popovthat helper.
6898147bSNikita Popov
6898147bSNikita PopovInstCombine makes use of a worklist, which needs to be correctly updated during
6898147bSNikita Popovtransforms. This usually happens automatically, but there are some things to
6898147bSNikita Popovkeep in mind:
6898147bSNikita Popov
6898147bSNikita Popov  * Don't use the `Value::replaceAllUsesWith()` API. Use InstCombine's
6898147bSNikita Popov    `replaceInstUsesWith()` helper instead.
6898147bSNikita Popov  * Don't use the `Instruction::eraseFromParent()` API. Use InstCombine's
6898147bSNikita Popov    `eraseInstFromFunction()` helper instead. (Explicitly erasing instruction
6898147bSNikita Popov    is usually not necessary, as side-effect free instructions without users
6898147bSNikita Popov    are automatically removed.)
6898147bSNikita Popov  * Apart from the "directly return an instruction" pattern above, use IRBUilder
6898147bSNikita Popov    to create all instruction. Do not manually create and insert them.
6898147bSNikita Popov  * When replacing operands or uses of instructions, use `replaceOperand()`
6898147bSNikita Popov    and `replaceUse()` instead of `setOperand()`.
6898147bSNikita Popov
6898147bSNikita Popov### Multi-use handling
6898147bSNikita Popov
6898147bSNikita PopovTransforms should usually not increase the total number of instructions. This
6898147bSNikita Popovis not a hard requirement: For example, it is usually worthwhile to replace a
6898147bSNikita Popovsingle division instruction with multiple other instructions.
6898147bSNikita Popov
6898147bSNikita PopovFor example, if you have a transform that replaces two instructions, with two
6898147bSNikita Popovother instructions, this is (usually) only profitable if *both* the original
6898147bSNikita Popovinstructions can be removed. To ensure that both instructions are removed, you
6898147bSNikita Popovneed to add a one-use check for the inner instruction.
6898147bSNikita Popov
6898147bSNikita PopovOne-use checks can be performed using the `m_OneUse()` matcher, or the
6898147bSNikita Popov`V->hasOneUse()` method.
6898147bSNikita Popov
6898147bSNikita Popov### Generalization
6898147bSNikita Popov
6898147bSNikita PopovTransforms can both be too specific (only handling some odd subset of patterns,
6898147bSNikita Popovleading to unexpected optimization cliffs) and too general (introducing
6898147bSNikita Popovcomplexity to handle cases with no real-world relevance). The right level of
6898147bSNikita Popovgenerality is quite subjective, so this section only provides some broad
6898147bSNikita Popovguidelines.
6898147bSNikita Popov
6898147bSNikita Popov * Avoid transforms that are hardcoded to specific constants. Try to figure
6898147bSNikita Popov   out what the general rule for arbitrary constants is.
6898147bSNikita Popov * Add handling for conjugate patterns. For example, if you implement a fold
6898147bSNikita Popov   for `icmp eq`, you almost certainly also want to support `icmp ne`, with the
6898147bSNikita Popov   inverse result. Similarly, if you implement a pattern for `and` of `icmp`s,
6898147bSNikita Popov   you should also handle the de-Morgan conjugate using `or`.
6898147bSNikita Popov * Handle non-splat vector constants if doing so is free, but do not add
6898147bSNikita Popov   handling for them if it adds any additional complexity to the code.
6898147bSNikita Popov * Do not handle non-canonical patterns, unless there is a specific motivation
6898147bSNikita Popov   to do so. For example, it may sometimes be worthwhile to handle a pattern
6898147bSNikita Popov   that would normally be converted into a different canonical form, but can
6898147bSNikita Popov   still occur in multi-use scenarios. This is fine to do if there is specific
6898147bSNikita Popov   real-world motivation, but you should not go out of your way to do this
6898147bSNikita Popov   otherwise.
6898147bSNikita Popov * Sometimes the motivating pattern uses a constant value with certain
6898147bSNikita Popov   properties, but the fold can be generalized to non-constant values by making
6898147bSNikita Popov   use of ValueTracking queries. Whether this makes sense depends on the case,
6898147bSNikita Popov   but it's usually a good idea to only handle the constant pattern first, and
6898147bSNikita Popov   then generalize later if it seems useful.
*f16e234fSNikita Popov
*f16e234fSNikita Popov## Guidelines for reviewers
*f16e234fSNikita Popov
*f16e234fSNikita Popov * Do not ask new contributors to implement non-splat vector support in code
*f16e234fSNikita Popov   reviews. If you think non-splat vector support for a fold is both
*f16e234fSNikita Popov   worthwhile and policy-compliant (that is, the handling would not result in
*f16e234fSNikita Popov   any appreciable increase in code complexity), implement it yourself in a
*f16e234fSNikita Popov   follow-up patch.