xref: /llvm-project/llvm/docs/InstCombineContributorGuide.md (revision f16e234f1126f6646609b6918a37c8eb32c8fb3e)
16898147bSNikita Popov# InstCombine contributor guide
26898147bSNikita Popov
36898147bSNikita PopovThis guide lays out a series of rules that contributions to InstCombine should
46898147bSNikita Popovfollow. **Following these rules will results in much faster PR approvals.**
56898147bSNikita Popov
66898147bSNikita Popov## Tests
76898147bSNikita Popov
86898147bSNikita Popov### Precommit tests
96898147bSNikita Popov
106898147bSNikita PopovTests for new optimizations or miscompilation fixes should be pre-committed.
116898147bSNikita PopovThis means that you first commit the test with CHECK lines showing the behavior
126898147bSNikita Popov*without* your change. Your actual change will then only contain CHECK line
136898147bSNikita Popovdiffs relative to that baseline.
146898147bSNikita Popov
156898147bSNikita PopovThis means that pull requests should generally contain two commits: First,
166898147bSNikita Popovone commit adding new tests with baseline check lines. Second, a commit with
176898147bSNikita Popovfunctional changes and test diffs.
186898147bSNikita Popov
196898147bSNikita PopovIf the second commit in your PR does not contain test diffs, you did something
206898147bSNikita Popovwrong. Either you made a mistake when generating CHECK lines, or your tests are
216898147bSNikita Popovnot actually affected by your patch.
226898147bSNikita Popov
236898147bSNikita PopovExceptions: When fixing assertion failures or infinite loops, do not pre-commit
246898147bSNikita Popovtests.
256898147bSNikita Popov
266898147bSNikita Popov### Use `update_test_checks.py`
276898147bSNikita Popov
286898147bSNikita PopovCHECK lines should be generated using the `update_test_checks.py` script. Do
296898147bSNikita Popov**not** manually edit check lines after using it.
306898147bSNikita Popov
316898147bSNikita PopovBe sure to use the correct opt binary when using the script. For example, if
326898147bSNikita Popovyour build directory is `build`, then you'll want to run:
336898147bSNikita Popov
346898147bSNikita Popov```sh
356898147bSNikita Popovllvm/utils/update_test_checks.py --opt-binary build/bin/opt \
366898147bSNikita Popov    llvm/test/Transforms/InstCombine/the_test.ll
376898147bSNikita Popov```
386898147bSNikita Popov
396898147bSNikita PopovExceptions: Hand-written CHECK lines are allowed for debuginfo tests.
406898147bSNikita Popov
416898147bSNikita Popov### General testing considerations
426898147bSNikita Popov
436898147bSNikita PopovPlace all tests relating to a transform into a single file. If you are adding
446898147bSNikita Popova regression test for a crash/miscompile in an existing transform, find the
456898147bSNikita Popovfile where the existing tests are located. A good way to do that is to comment
466898147bSNikita Popovout the transform and see which tests fail.
476898147bSNikita Popov
486898147bSNikita PopovMake tests minimal. Only test exactly the pattern being transformed. If your
496898147bSNikita Popovoriginal motivating case is a larger pattern that your fold enables to
506898147bSNikita Popovoptimize in some non-trivial way, you may add it as well -- however, the bulk
516898147bSNikita Popovof the test coverage should be minimal.
526898147bSNikita Popov
536898147bSNikita PopovGive tests short, but meaningful names. Don't call them `@test1`, `@test2` etc.
546898147bSNikita PopovFor example, a test checking multi-use behavior of a fold involving the
556898147bSNikita Popovaddition of two selects might be called `@add_of_selects_multi_use`.
566898147bSNikita Popov
576898147bSNikita PopovAdd representative tests for each test category (discussed below), but don't
586898147bSNikita Popovtest all combinations of everything. If you have multi-use tests, and you have
596898147bSNikita Popovcommuted tests, you shouldn't also add commuted multi-use tests.
606898147bSNikita Popov
616898147bSNikita PopovPrefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible.
626898147bSNikita Popov
636898147bSNikita Popov### Add negative tests
646898147bSNikita Popov
656898147bSNikita PopovMake sure to add tests for which your transform does **not** apply. Start with
666898147bSNikita Popovone of the test cases that succeeds and then create a sequence of negative
676898147bSNikita Popovtests, such that **exactly one** different pre-condition of your transform is
686898147bSNikita Popovnot satisfied in each test.
696898147bSNikita Popov
706898147bSNikita Popov### Add multi-use tests
716898147bSNikita Popov
726898147bSNikita PopovAdd multi-use tests that ensures your transform does not increase instruction
736898147bSNikita Popovcount if some instructions have additional uses. The standard pattern is to
746898147bSNikita Popovintroduce extra uses with function calls:
756898147bSNikita Popov
766898147bSNikita Popov```llvm
776898147bSNikita Popovdeclare void @use(i8)
786898147bSNikita Popov
796898147bSNikita Popovdefine i8 @add_mul_const_multi_use(i8 %x) {
806898147bSNikita Popov  %add = add i8 %x, 1
816898147bSNikita Popov  call void @use(i8 %add)
826898147bSNikita Popov  %mul = mul i8 %add, 3
836898147bSNikita Popov  ret i8 %mul
846898147bSNikita Popov}
856898147bSNikita Popov```
866898147bSNikita Popov
876898147bSNikita PopovExceptions: For transform that only produce one instruction, multi-use tests
886898147bSNikita Popovmay be omitted.
896898147bSNikita Popov
906898147bSNikita Popov### Add commuted tests
916898147bSNikita Popov
926898147bSNikita PopovIf the transform involves commutative operations, add tests with commuted
936898147bSNikita Popov(swapped) operands.
946898147bSNikita Popov
956898147bSNikita PopovMake sure that the operand order stays intact in the CHECK lines of your
966898147bSNikita Popovpre-commited tests. You should not see something like this:
976898147bSNikita Popov
986898147bSNikita Popov```llvm
996898147bSNikita Popov; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]]
1006898147bSNikita Popov; ...
1016898147bSNikita Popov%or = or i8 %y, %x
1026898147bSNikita Popov```
1036898147bSNikita Popov
1046898147bSNikita PopovIf this happens, you may need to change one of the operands to have higher
1056898147bSNikita Popovcomplexity (include the "thwart" comment in that case):
1066898147bSNikita Popov
1076898147bSNikita Popov```llvm
1086898147bSNikita Popov%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization
1096898147bSNikita Popov%or = or i8 %y, %x
1106898147bSNikita Popov```
1116898147bSNikita Popov
1126898147bSNikita Popov### Add vector tests
1136898147bSNikita Popov
1146898147bSNikita PopovWhen possible, it is recommended to add at least one test that uses vectors
1156898147bSNikita Popovinstead of scalars.
1166898147bSNikita Popov
1176898147bSNikita PopovFor patterns that include constants, we distinguish three kinds of tests.
1186898147bSNikita PopovThe first are "splat" vectors, where all the vector elements are the same.
1196898147bSNikita PopovThese tests *should* usually fold without additional effort.
1206898147bSNikita Popov
1216898147bSNikita Popov```llvm
1226898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) {
1236898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 1>
1246898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 3>
1256898147bSNikita Popov  ret <2 x i8> %mul
1266898147bSNikita Popov}
1276898147bSNikita Popov```
1286898147bSNikita Popov
1296898147bSNikita PopovA minor variant is to replace some of the splat elements with poison. These
1306898147bSNikita Popovwill often also fold without additional effort.
1316898147bSNikita Popov
1326898147bSNikita Popov```llvm
1336898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) {
1346898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 poison>
1356898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 poison>
1366898147bSNikita Popov  ret <2 x i8> %mul
1376898147bSNikita Popov}
1386898147bSNikita Popov```
1396898147bSNikita Popov
1406898147bSNikita PopovFinally, you can have non-splat vectors, where the vector elements are not
1416898147bSNikita Popovthe same:
1426898147bSNikita Popov
1436898147bSNikita Popov```llvm
1446898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) {
1456898147bSNikita Popov  %add = add <2 x i8> %x, <i8 1, i8 5>
1466898147bSNikita Popov  %mul = mul <2 x i8> %add, <i8 3, i8 6>
1476898147bSNikita Popov  ret <2 x i8> %mul
1486898147bSNikita Popov}
1496898147bSNikita Popov```
1506898147bSNikita Popov
1516898147bSNikita PopovNon-splat vectors will often not fold by default. You should **not** try to
1526898147bSNikita Popovmake them fold, unless doing so does not add **any** additional complexity.
1536898147bSNikita PopovYou should still add the test though, even if it does not fold.
1546898147bSNikita Popov
1556898147bSNikita Popov### Flag tests
1566898147bSNikita Popov
1576898147bSNikita PopovIf your transform involves instructions that can have poison-generating flags,
1586898147bSNikita Popovsuch as `nuw` and `nsw` on `add`, you should test how these interact with the
1596898147bSNikita Popovtransform.
1606898147bSNikita Popov
1616898147bSNikita PopovIf your transform *requires* a certain flag for correctness, make sure to add
1626898147bSNikita Popovnegative tests missing the required flag.
1636898147bSNikita Popov
1646898147bSNikita PopovIf your transform doesn't require flags for correctness, you should have tests
1656898147bSNikita Popovfor preservation behavior. If the input instructions have certain flags, are
1666898147bSNikita Popovthey preserved in the output instructions, if it is valid to preserve them?
1676898147bSNikita Popov(This depends on the transform. Check with alive2.)
1686898147bSNikita Popov
1696898147bSNikita PopovThe same also applies to fast-math-flags (FMF). In that case, please always
1706898147bSNikita Popovtest specific flags like `nnan`, `nsz` or `reassoc`, rather than the umbrella
1716898147bSNikita Popov`fast` flag.
1726898147bSNikita Popov
1736898147bSNikita Popov### Other tests
1746898147bSNikita Popov
1756898147bSNikita PopovThe test categories mentioned above are non-exhaustive. There may be more tests
1766898147bSNikita Popovto be added, depending on the instructions involved in the transform. Some
1776898147bSNikita Popovexamples:
1786898147bSNikita Popov
1796898147bSNikita Popov * For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled.
1806898147bSNikita Popov * For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1.
1816898147bSNikita Popov * For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly.
1826898147bSNikita Popov
1836898147bSNikita Popov## Proofs
1846898147bSNikita Popov
1856898147bSNikita PopovYour pull request description should contain one or more
1866898147bSNikita Popov[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the
1876898147bSNikita Popovproposed transform.
1886898147bSNikita Popov
1896898147bSNikita Popov### Basics
1906898147bSNikita Popov
1916898147bSNikita PopovProofs are written using LLVM IR, by specifying a `@src` and `@tgt` function.
1926898147bSNikita PopovIt is possible to include multiple proofs in a single file by giving the src
1936898147bSNikita Popovand tgt functions matching suffixes.
1946898147bSNikita Popov
1956898147bSNikita PopovFor example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can
1966898147bSNikita Popovbe simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)):
1976898147bSNikita Popov
1986898147bSNikita Popov```llvm
1996898147bSNikita Popovdefine i8 @src_add_sub(i8 %x, i8 %y) {
2006898147bSNikita Popov  %add = add i8 %x, %y
2016898147bSNikita Popov  %sub = sub i8 %add, %y
2026898147bSNikita Popov  ret i8 %sub
2036898147bSNikita Popov}
2046898147bSNikita Popov
2056898147bSNikita Popovdefine i8 @tgt_add_sub(i8 %x, i8 %y) {
2066898147bSNikita Popov  ret i8 %x
2076898147bSNikita Popov}
2086898147bSNikita Popov
2096898147bSNikita Popov
2106898147bSNikita Popovdefine i8 @src_sub_add(i8 %x, i8 %y) {
2116898147bSNikita Popov  %sub = sub i8 %x, %y
2126898147bSNikita Popov  %add = add i8 %sub, %y
2136898147bSNikita Popov  ret i8 %add
2146898147bSNikita Popov}
2156898147bSNikita Popov
2166898147bSNikita Popovdefine i8 @tgt_sub_add(i8 %x, i8 %y) {
2176898147bSNikita Popov  ret i8 %x
2186898147bSNikita Popov}
2196898147bSNikita Popov```
2206898147bSNikita Popov
2216898147bSNikita Popov### Use generic values in proofs
2226898147bSNikita Popov
2236898147bSNikita PopovProofs should operate on generic values, rather than specific constants, to the degree that this is possible.
2246898147bSNikita Popov
2256898147bSNikita PopovFor example, if we want to fold `X s/ C s< X` to `X s> 0`, the following would
2266898147bSNikita Popovbe a *bad* proof:
2276898147bSNikita Popov
2286898147bSNikita Popov```llvm
2296898147bSNikita Popov; Don't do this!
2306898147bSNikita Popovdefine i1 @src(i8 %x) {
2316898147bSNikita Popov  %div = sdiv i8 %x, 123
2326898147bSNikita Popov  %cmp = icmp slt i8 %div, %x
2336898147bSNikita Popov  ret i1 %cmp
2346898147bSNikita Popov}
2356898147bSNikita Popov
2366898147bSNikita Popovdefine i1 @tgt(i8 %x) {
2376898147bSNikita Popov  %cmp = icmp sgt i8 %x, 0
2386898147bSNikita Popov  ret i1 %cmp
2396898147bSNikita Popov}
2406898147bSNikita Popov```
2416898147bSNikita Popov
2426898147bSNikita PopovThis is because it only proves that the transform is correct for the specific
2436898147bSNikita Popovconstant 123. Maybe there are some constants for which the transform is
2446898147bSNikita Popovincorrect?
2456898147bSNikita Popov
2466898147bSNikita PopovThe correct way to write this proof is as follows
2476898147bSNikita Popov([online](https://alive2.llvm.org/ce/z/acjwb6)):
2486898147bSNikita Popov
2496898147bSNikita Popov```llvm
2506898147bSNikita Popovdefine i1 @src(i8 %x, i8 %C) {
2516898147bSNikita Popov  %precond = icmp ne i8 %C, 1
2526898147bSNikita Popov  call void @llvm.assume(i1 %precond)
2536898147bSNikita Popov  %div = sdiv i8 %x, %C
2546898147bSNikita Popov  %cmp = icmp slt i8 %div, %x
2556898147bSNikita Popov  ret i1 %cmp
2566898147bSNikita Popov}
2576898147bSNikita Popov
2586898147bSNikita Popovdefine i1 @tgt(i8 %x, i8 %C) {
2596898147bSNikita Popov  %cmp = icmp sgt i8 %x, 0
2606898147bSNikita Popov  ret i1 %cmp
2616898147bSNikita Popov}
2626898147bSNikita Popov```
2636898147bSNikita Popov
2646898147bSNikita PopovNote that the `@llvm.assume` intrinsic is used to specify pre-conditions for
2656898147bSNikita Popovthe transform. In this case, the proof will fail unless we specify `C != 1` as
2666898147bSNikita Popova pre-condition.
2676898147bSNikita Popov
2686898147bSNikita PopovIt should be emphasized that there is, in general, no expectation that the
2696898147bSNikita PopovIR in the proofs will be transformed by the implemented fold. In the above
2706898147bSNikita Popovexample, the transform would only apply if `%C` is actually a constant, but we
2716898147bSNikita Popovneed to use non-constants in the proof.
2726898147bSNikita Popov
2736898147bSNikita Popov### Common pre-conditions
2746898147bSNikita Popov
2756898147bSNikita PopovHere are some examples of common preconditions.
2766898147bSNikita Popov
2776898147bSNikita Popov```llvm
2786898147bSNikita Popov; %x is non-negative:
2796898147bSNikita Popov%nonneg = icmp sgt i8 %x, -1
2806898147bSNikita Popovcall void @llvm.assume(i1 %nonneg)
2816898147bSNikita Popov
2826898147bSNikita Popov; %x is a power of two:
2836898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
2846898147bSNikita Popov%pow2 = icmp eq i8 %x, 1
2856898147bSNikita Popovcall void @llvm.assume(i1 %pow2)
2866898147bSNikita Popov
2876898147bSNikita Popov; %x is a power of two or zero:
2886898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
2896898147bSNikita Popov%pow2orzero = icmp ult i8 %x, 2
2906898147bSNikita Popovcall void @llvm.assume(i1 %pow2orzero)
2916898147bSNikita Popov
2926898147bSNikita Popov; Adding %x and %y does not overflow in a signed sense:
2936898147bSNikita Popov%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y)
2946898147bSNikita Popov%ov = extractvalue { i8, i1 } %wo, 1
2956898147bSNikita Popov%ov.not = xor i1 %ov, true
2966898147bSNikita Popovcall void @llvm.assume(i1 %ov.not)
2976898147bSNikita Popov```
2986898147bSNikita Popov
2996898147bSNikita Popov### Timeouts
3006898147bSNikita Popov
3016898147bSNikita PopovAlive2 proofs will sometimes produce a timeout with the following message:
3026898147bSNikita Popov
3036898147bSNikita Popov```
3046898147bSNikita PopovAlive2 timed out while processing your query.
3056898147bSNikita PopovThere are a few things you can try:
3066898147bSNikita Popov
3076898147bSNikita Popov- remove extraneous instructions, if any
3086898147bSNikita Popov
3096898147bSNikita Popov- reduce variable widths, for example to i16, i8, or i4
3106898147bSNikita Popov
3116898147bSNikita Popov- add the --disable-undef-input command line flag, which
3126898147bSNikita Popov  allows Alive2 to assume that arguments to your IR are not
3136898147bSNikita Popov  undef. This is, in general, unsound: it can cause Alive2
3146898147bSNikita Popov  to miss bugs.
3156898147bSNikita Popov```
3166898147bSNikita Popov
3176898147bSNikita PopovThis is good advice, follow it!
3186898147bSNikita Popov
3196898147bSNikita PopovReducing the bitwidth usually helps. For floating point numbers, you can use
3206898147bSNikita Popovthe `half` type for bitwidth reduction purposes. For pointers, you can reduce
3216898147bSNikita Popovthe bitwidth by specifying a custom data layout:
3226898147bSNikita Popov
3236898147bSNikita Popov```llvm
3246898147bSNikita Popov; For 16-bit pointers
3256898147bSNikita Popovtarget datalayout = "p:16:16"
3266898147bSNikita Popov```
3276898147bSNikita Popov
3286898147bSNikita PopovIf reducing the bitwidth does not help, try `-disable-undef-input`. This will
3296898147bSNikita Popovoften significantly improve performance, but also implies that the correctness
3306898147bSNikita Popovof the transform with `undef` values is no longer verified. This is usually
3316898147bSNikita Popovfine if the transform does not increase the number of uses of any value.
3326898147bSNikita Popov
3336898147bSNikita PopovFinally, it's possible to build alive2 locally and use `-smt-to=<m>` to verify
3346898147bSNikita Popovthe proof with a larger timeout. If you don't want to do this (or it still
3356898147bSNikita Popovdoes not work), please submit the proof you have despite the timeout.
3366898147bSNikita Popov
3376898147bSNikita Popov## Implementation
3386898147bSNikita Popov
3396898147bSNikita Popov### Real-world usefulness
3406898147bSNikita Popov
3416898147bSNikita PopovThere is a very large number of transforms that *could* be implemented, but
3426898147bSNikita Popovonly a tiny fraction of them are useful for real-world code.
3436898147bSNikita Popov
3446898147bSNikita PopovTransforms that do not have real-world usefulness provide *negative* value to
3456898147bSNikita Popovthe LLVM project, by taking up valuable reviewer time, increasing code
3466898147bSNikita Popovcomplexity and increasing compile-time overhead.
3476898147bSNikita Popov
3486898147bSNikita PopovWe do not require explicit proof of real-world usefulness for every transform
3496898147bSNikita Popov-- in most cases the usefulness is fairly "obvious". However, the question may
3506898147bSNikita Popovcome up for complex or unusual folds. Keep this in mind when chosing what you
3516898147bSNikita Popovwork on.
3526898147bSNikita Popov
3536898147bSNikita PopovIn particular, fixes for fuzzer-generated missed optimization reports will
3546898147bSNikita Popovlikely be rejected if there is no evidence of real-world usefulness.
3556898147bSNikita Popov
3566898147bSNikita Popov### Pick the correct optimization pass
3576898147bSNikita Popov
3586898147bSNikita PopovThere are a number of passes and utilities in the InstCombine family, and it
3596898147bSNikita Popovis important to pick the right place when implementing a fold.
3606898147bSNikita Popov
3616898147bSNikita Popov * `ConstantFolding`: For folding instructions with constant arguments to a constant. (Mainly relevant for intrinsics.)
3626898147bSNikita Popov * `ValueTracking`: For analyzing instructions, e.g. for known bits, non-zero, etc. Tests should usually use `-passes=instsimplify`.
3636898147bSNikita Popov * `InstructionSimplify`: For folds that do not create new instructions (either fold to existing value or constant).
3646898147bSNikita Popov * `InstCombine`: For folds that create or modify instructions.
3656898147bSNikita Popov * `AggressiveInstCombine`: For folds that are expensive, or violate InstCombine requirements.
3666898147bSNikita Popov * `VectorCombine`: For folds of vector operations that require target-dependent cost-modelling.
3676898147bSNikita Popov
3686898147bSNikita PopovSometimes, folds that logically belong in InstSimplify are placed in InstCombine instead, for example because they are too expensive, or because they are structurally simpler to implement in InstCombine.
3696898147bSNikita Popov
3706898147bSNikita PopovFor example, if a fold produces new instructions in some cases but returns an existing value in others, it may be preferable to keep all cases in InstCombine, rather than trying to split them among InstCombine and InstSimplify.
3716898147bSNikita Popov
3726898147bSNikita Popov### Canonicalization and target-independence
3736898147bSNikita Popov
3746898147bSNikita PopovInstCombine is a target-independent canonicalization pass. This means that it
3756898147bSNikita Popovtries to bring IR into a "canonical form" that other optimizations (both inside
3766898147bSNikita Popovand outside of InstCombine) can rely on. For this reason, the chosen canonical
3776898147bSNikita Popovform needs to be the same for all targets, and not depend on target-specific
3786898147bSNikita Popovcost modelling.
3796898147bSNikita Popov
3806898147bSNikita PopovIn many cases, "canonicalization" and "optimization" coincide. For example, if
3816898147bSNikita Popovwe convert `x * 2` into `x << 1`, this both makes the IR more canonical
3826898147bSNikita Popov(because there is now only one way to express the same operation, rather than
3836898147bSNikita Popovtwo) and faster (because shifts will usually have lower latency than
3846898147bSNikita Popovmultiplies).
3856898147bSNikita Popov
3866898147bSNikita PopovHowever, there are also canonicalizations that don't serve any direct
3876898147bSNikita Popovoptimization purpose. For example, InstCombine will canonicalize non-strict
3886898147bSNikita Popovpredicates like `ule` to strict predicates like `ult`. `icmp ule i8 %x, 7`
3896898147bSNikita Popovbecomes `icmp ult i8 %x, 8`. This is not an optimization in any meaningful
3906898147bSNikita Popovsense, but it does reduce the number of cases that other transforms need to
3916898147bSNikita Popovhandle.
3926898147bSNikita Popov
3936898147bSNikita PopovIf some canonicalization is not profitable for a specific target, then a reverse
3946898147bSNikita Popovtransform needs to be added in the backend. Patches to disable specific
3956898147bSNikita PopovInstCombine transforms on certain targets, or to drive them using
3966898147bSNikita Popovtarget-specific cost-modelling, **will not be accepted**. The only permitted
3976898147bSNikita Popovtarget-dependence is on DataLayout and TargetLibraryInfo.
3986898147bSNikita Popov
3996898147bSNikita PopovThe use of TargetTransformInfo is only allowed for hooks for target-specific
4006898147bSNikita Popovintrinsics, such as `TargetTransformInfo::instCombineIntrinsic()`. These are
4016898147bSNikita Popovalready inherently target-dependent anyway.
4026898147bSNikita Popov
4036898147bSNikita PopovFor vector-specific transforms that require cost-modelling, the VectorCombine
4046898147bSNikita Popovpass can be used instead. In very rare circumstances, if there are no other
4056898147bSNikita Popovalternatives, target-dependent transforms may be accepted into
4066898147bSNikita PopovAggressiveInstCombine.
4076898147bSNikita Popov
4086898147bSNikita Popov### PatternMatch
4096898147bSNikita Popov
4106898147bSNikita PopovMany transforms make use of the matching infrastructure defined in
4116898147bSNikita Popov[PatternMatch.h](https://github.com/llvm/llvm-project/blame/main/llvm/include/llvm/IR/PatternMatch.h).
4126898147bSNikita Popov
4136898147bSNikita PopovHere is a typical usage example:
4146898147bSNikita Popov
4156898147bSNikita Popov```
4166898147bSNikita Popov// Fold (A - B) + B and B + (A - B) to A.
4176898147bSNikita PopovValue *A, *B;
4186898147bSNikita Popovif (match(V, m_c_Add(m_Sub(m_Value(A), m_Value(B)), m_Deferred(B))))
4196898147bSNikita Popov  return A;
4206898147bSNikita Popov```
4216898147bSNikita Popov
4226898147bSNikita PopovAnd another:
4236898147bSNikita Popov
4246898147bSNikita Popov```
4256898147bSNikita Popov// Fold A + C1 == C2 to A == C1+C2
4266898147bSNikita PopovValue *A;
4276898147bSNikita Popovif (match(V, m_ICmp(Pred, m_Add(m_Value(A), m_APInt(C1)), m_APInt(C2))) &&
4286898147bSNikita Popov    ICmpInst::isEquality(Pred))
4296898147bSNikita Popov  return Builder.CreateICmp(Pred, A,
4306898147bSNikita Popov                            ConstantInt::get(A->getType(), *C1 + *C2));
4316898147bSNikita Popov```
4326898147bSNikita Popov
4336898147bSNikita PopovSome common matchers are:
4346898147bSNikita Popov
4356898147bSNikita Popov * `m_Value(A)`: Match any value and write it into `Value *A`.
4366898147bSNikita Popov * `m_Specific(A)`: Check that the operand equals A. Use this if A is
4376898147bSNikita Popov   assigned **outside** the pattern.
4386898147bSNikita Popov * `m_Deferred(A)`: Check that the operand equals A. Use this if A is
4396898147bSNikita Popov   assigned **inside** the pattern, for example via `m_Value(A)`.
4406898147bSNikita Popov * `m_APInt(C)`: Match a scalar integer constant or splat vector constant into
4416898147bSNikita Popov   `const APInt *C`. Does not permit undef/poison values.
4426898147bSNikita Popov * `m_ImmConstant(C)`: Match any non-constant-expression constant into
4436898147bSNikita Popov   `Constant *C`.
4446898147bSNikita Popov * `m_Constant(C)`: Match any constant into `Constant *C`. Don't use this unless
4456898147bSNikita Popov   you know what you're doing.
4466898147bSNikita Popov * `m_Add(M1, M2)`, `m_Sub(M1, M2)`, etc: Match an add/sub/etc where the first
4476898147bSNikita Popov   operand matches M1 and the second M2.
4486898147bSNikita Popov * `m_c_Add(M1, M2)`, etc: Match an add commutatively. The operands must match
4496898147bSNikita Popov   either M1 and M2 or M2 and M1. Most instruction matchers have a commutative
4506898147bSNikita Popov   variant.
4516898147bSNikita Popov * `m_ICmp(Pred, M1, M2)` and `m_c_ICmp(Pred, M1, M2)`: Match an icmp, writing
4526898147bSNikita Popov   the predicate into `IcmpInst::Predicate Pred`. If the commutative version
4536898147bSNikita Popov   is used, and the operands match in order M2, M1, then `Pred` will be the
4546898147bSNikita Popov   swapped predicate.
4556898147bSNikita Popov * `m_OneUse(M)`: Check that the value only has one use, and also matches M.
4566898147bSNikita Popov   For example `m_OneUse(m_Add(...))`. See the next section for more
4576898147bSNikita Popov   information.
4586898147bSNikita Popov
4596898147bSNikita PopovSee the header for the full list of available matchers.
4606898147bSNikita Popov
4616898147bSNikita Popov### InstCombine APIs
4626898147bSNikita Popov
4636898147bSNikita PopovInstCombine transforms are handled by `visitXYZ()` methods, where XYZ
4646898147bSNikita Popovcorresponds to the root instruction of your transform. If the outermost
4656898147bSNikita Popovinstruction of the pattern you are matching is an icmp, the fold will be
4666898147bSNikita Popovlocated somewhere inside `visitICmpInst()`.
4676898147bSNikita Popov
4686898147bSNikita PopovThe return value of the visit method is an instruction. You can either return
4696898147bSNikita Popova new instruction, in which case it will be inserted before the old one, and
4706898147bSNikita Popovuses of the old one will be replaced by it. Or you can return the original
4716898147bSNikita Popovinstruction to indicate that *some* kind of change has been made. Finally, a
4726898147bSNikita Popovnullptr return value indicates that no change occurred.
4736898147bSNikita Popov
4746898147bSNikita PopovFor example, if your transform produces a single new icmp instruction, you could
4756898147bSNikita Popovwrite the following:
4766898147bSNikita Popov
4776898147bSNikita Popov```
4786898147bSNikita Popovif (...)
4796898147bSNikita Popov  return new ICmpInst(Pred, X, Y);
4806898147bSNikita Popov```
4816898147bSNikita Popov
4826898147bSNikita PopovIn this case the main InstCombine loop takes care of inserting the instruction
4836898147bSNikita Popovand replacing uses of the old instruction.
4846898147bSNikita Popov
4856898147bSNikita PopovAlternatively, you can also write it like this:
4866898147bSNikita Popov
4876898147bSNikita Popov```
4886898147bSNikita Popovif (...)
4896898147bSNikita Popov  return replaceInstUsesWith(OrigI, Builder.CreateICmp(Pred, X, Y));
4906898147bSNikita Popov```
4916898147bSNikita Popov
4926898147bSNikita PopovIn this case `IRBuilder` will insert the instruction and `replaceInstUsesWith()`
4936898147bSNikita Popovwill replace the uses of the old instruction, and return it to indicate that
4946898147bSNikita Popova change occurred.
4956898147bSNikita Popov
4966898147bSNikita PopovBoth forms are equivalent, and you can use whichever is more convenient in
4976898147bSNikita Popovcontext. For example, it's common that folds are inside helper functions that
4986898147bSNikita Popovreturn `Value *` and then `replaceInstUsesWith()` is invoked on the result of
4996898147bSNikita Popovthat helper.
5006898147bSNikita Popov
5016898147bSNikita PopovInstCombine makes use of a worklist, which needs to be correctly updated during
5026898147bSNikita Popovtransforms. This usually happens automatically, but there are some things to
5036898147bSNikita Popovkeep in mind:
5046898147bSNikita Popov
5056898147bSNikita Popov  * Don't use the `Value::replaceAllUsesWith()` API. Use InstCombine's
5066898147bSNikita Popov    `replaceInstUsesWith()` helper instead.
5076898147bSNikita Popov  * Don't use the `Instruction::eraseFromParent()` API. Use InstCombine's
5086898147bSNikita Popov    `eraseInstFromFunction()` helper instead. (Explicitly erasing instruction
5096898147bSNikita Popov    is usually not necessary, as side-effect free instructions without users
5106898147bSNikita Popov    are automatically removed.)
5116898147bSNikita Popov  * Apart from the "directly return an instruction" pattern above, use IRBUilder
5126898147bSNikita Popov    to create all instruction. Do not manually create and insert them.
5136898147bSNikita Popov  * When replacing operands or uses of instructions, use `replaceOperand()`
5146898147bSNikita Popov    and `replaceUse()` instead of `setOperand()`.
5156898147bSNikita Popov
5166898147bSNikita Popov### Multi-use handling
5176898147bSNikita Popov
5186898147bSNikita PopovTransforms should usually not increase the total number of instructions. This
5196898147bSNikita Popovis not a hard requirement: For example, it is usually worthwhile to replace a
5206898147bSNikita Popovsingle division instruction with multiple other instructions.
5216898147bSNikita Popov
5226898147bSNikita PopovFor example, if you have a transform that replaces two instructions, with two
5236898147bSNikita Popovother instructions, this is (usually) only profitable if *both* the original
5246898147bSNikita Popovinstructions can be removed. To ensure that both instructions are removed, you
5256898147bSNikita Popovneed to add a one-use check for the inner instruction.
5266898147bSNikita Popov
5276898147bSNikita PopovOne-use checks can be performed using the `m_OneUse()` matcher, or the
5286898147bSNikita Popov`V->hasOneUse()` method.
5296898147bSNikita Popov
5306898147bSNikita Popov### Generalization
5316898147bSNikita Popov
5326898147bSNikita PopovTransforms can both be too specific (only handling some odd subset of patterns,
5336898147bSNikita Popovleading to unexpected optimization cliffs) and too general (introducing
5346898147bSNikita Popovcomplexity to handle cases with no real-world relevance). The right level of
5356898147bSNikita Popovgenerality is quite subjective, so this section only provides some broad
5366898147bSNikita Popovguidelines.
5376898147bSNikita Popov
5386898147bSNikita Popov * Avoid transforms that are hardcoded to specific constants. Try to figure
5396898147bSNikita Popov   out what the general rule for arbitrary constants is.
5406898147bSNikita Popov * Add handling for conjugate patterns. For example, if you implement a fold
5416898147bSNikita Popov   for `icmp eq`, you almost certainly also want to support `icmp ne`, with the
5426898147bSNikita Popov   inverse result. Similarly, if you implement a pattern for `and` of `icmp`s,
5436898147bSNikita Popov   you should also handle the de-Morgan conjugate using `or`.
5446898147bSNikita Popov * Handle non-splat vector constants if doing so is free, but do not add
5456898147bSNikita Popov   handling for them if it adds any additional complexity to the code.
5466898147bSNikita Popov * Do not handle non-canonical patterns, unless there is a specific motivation
5476898147bSNikita Popov   to do so. For example, it may sometimes be worthwhile to handle a pattern
5486898147bSNikita Popov   that would normally be converted into a different canonical form, but can
5496898147bSNikita Popov   still occur in multi-use scenarios. This is fine to do if there is specific
5506898147bSNikita Popov   real-world motivation, but you should not go out of your way to do this
5516898147bSNikita Popov   otherwise.
5526898147bSNikita Popov * Sometimes the motivating pattern uses a constant value with certain
5536898147bSNikita Popov   properties, but the fold can be generalized to non-constant values by making
5546898147bSNikita Popov   use of ValueTracking queries. Whether this makes sense depends on the case,
5556898147bSNikita Popov   but it's usually a good idea to only handle the constant pattern first, and
5566898147bSNikita Popov   then generalize later if it seems useful.
557*f16e234fSNikita Popov
558*f16e234fSNikita Popov## Guidelines for reviewers
559*f16e234fSNikita Popov
560*f16e234fSNikita Popov * Do not ask new contributors to implement non-splat vector support in code
561*f16e234fSNikita Popov   reviews. If you think non-splat vector support for a fold is both
562*f16e234fSNikita Popov   worthwhile and policy-compliant (that is, the handling would not result in
563*f16e234fSNikita Popov   any appreciable increase in code complexity), implement it yourself in a
564*f16e234fSNikita Popov   follow-up patch.
565