16898147bSNikita Popov# InstCombine contributor guide 26898147bSNikita Popov 36898147bSNikita PopovThis guide lays out a series of rules that contributions to InstCombine should 46898147bSNikita Popovfollow. **Following these rules will results in much faster PR approvals.** 56898147bSNikita Popov 66898147bSNikita Popov## Tests 76898147bSNikita Popov 86898147bSNikita Popov### Precommit tests 96898147bSNikita Popov 106898147bSNikita PopovTests for new optimizations or miscompilation fixes should be pre-committed. 116898147bSNikita PopovThis means that you first commit the test with CHECK lines showing the behavior 126898147bSNikita Popov*without* your change. Your actual change will then only contain CHECK line 136898147bSNikita Popovdiffs relative to that baseline. 146898147bSNikita Popov 156898147bSNikita PopovThis means that pull requests should generally contain two commits: First, 166898147bSNikita Popovone commit adding new tests with baseline check lines. Second, a commit with 176898147bSNikita Popovfunctional changes and test diffs. 186898147bSNikita Popov 196898147bSNikita PopovIf the second commit in your PR does not contain test diffs, you did something 206898147bSNikita Popovwrong. Either you made a mistake when generating CHECK lines, or your tests are 216898147bSNikita Popovnot actually affected by your patch. 226898147bSNikita Popov 236898147bSNikita PopovExceptions: When fixing assertion failures or infinite loops, do not pre-commit 246898147bSNikita Popovtests. 256898147bSNikita Popov 266898147bSNikita Popov### Use `update_test_checks.py` 276898147bSNikita Popov 286898147bSNikita PopovCHECK lines should be generated using the `update_test_checks.py` script. Do 296898147bSNikita Popov**not** manually edit check lines after using it. 306898147bSNikita Popov 316898147bSNikita PopovBe sure to use the correct opt binary when using the script. For example, if 326898147bSNikita Popovyour build directory is `build`, then you'll want to run: 336898147bSNikita Popov 346898147bSNikita Popov```sh 356898147bSNikita Popovllvm/utils/update_test_checks.py --opt-binary build/bin/opt \ 366898147bSNikita Popov llvm/test/Transforms/InstCombine/the_test.ll 376898147bSNikita Popov``` 386898147bSNikita Popov 396898147bSNikita PopovExceptions: Hand-written CHECK lines are allowed for debuginfo tests. 406898147bSNikita Popov 416898147bSNikita Popov### General testing considerations 426898147bSNikita Popov 436898147bSNikita PopovPlace all tests relating to a transform into a single file. If you are adding 446898147bSNikita Popova regression test for a crash/miscompile in an existing transform, find the 456898147bSNikita Popovfile where the existing tests are located. A good way to do that is to comment 466898147bSNikita Popovout the transform and see which tests fail. 476898147bSNikita Popov 486898147bSNikita PopovMake tests minimal. Only test exactly the pattern being transformed. If your 496898147bSNikita Popovoriginal motivating case is a larger pattern that your fold enables to 506898147bSNikita Popovoptimize in some non-trivial way, you may add it as well -- however, the bulk 516898147bSNikita Popovof the test coverage should be minimal. 526898147bSNikita Popov 536898147bSNikita PopovGive tests short, but meaningful names. Don't call them `@test1`, `@test2` etc. 546898147bSNikita PopovFor example, a test checking multi-use behavior of a fold involving the 556898147bSNikita Popovaddition of two selects might be called `@add_of_selects_multi_use`. 566898147bSNikita Popov 576898147bSNikita PopovAdd representative tests for each test category (discussed below), but don't 586898147bSNikita Popovtest all combinations of everything. If you have multi-use tests, and you have 596898147bSNikita Popovcommuted tests, you shouldn't also add commuted multi-use tests. 606898147bSNikita Popov 616898147bSNikita PopovPrefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible. 626898147bSNikita Popov 636898147bSNikita Popov### Add negative tests 646898147bSNikita Popov 656898147bSNikita PopovMake sure to add tests for which your transform does **not** apply. Start with 666898147bSNikita Popovone of the test cases that succeeds and then create a sequence of negative 676898147bSNikita Popovtests, such that **exactly one** different pre-condition of your transform is 686898147bSNikita Popovnot satisfied in each test. 696898147bSNikita Popov 706898147bSNikita Popov### Add multi-use tests 716898147bSNikita Popov 726898147bSNikita PopovAdd multi-use tests that ensures your transform does not increase instruction 736898147bSNikita Popovcount if some instructions have additional uses. The standard pattern is to 746898147bSNikita Popovintroduce extra uses with function calls: 756898147bSNikita Popov 766898147bSNikita Popov```llvm 776898147bSNikita Popovdeclare void @use(i8) 786898147bSNikita Popov 796898147bSNikita Popovdefine i8 @add_mul_const_multi_use(i8 %x) { 806898147bSNikita Popov %add = add i8 %x, 1 816898147bSNikita Popov call void @use(i8 %add) 826898147bSNikita Popov %mul = mul i8 %add, 3 836898147bSNikita Popov ret i8 %mul 846898147bSNikita Popov} 856898147bSNikita Popov``` 866898147bSNikita Popov 876898147bSNikita PopovExceptions: For transform that only produce one instruction, multi-use tests 886898147bSNikita Popovmay be omitted. 896898147bSNikita Popov 906898147bSNikita Popov### Add commuted tests 916898147bSNikita Popov 926898147bSNikita PopovIf the transform involves commutative operations, add tests with commuted 936898147bSNikita Popov(swapped) operands. 946898147bSNikita Popov 956898147bSNikita PopovMake sure that the operand order stays intact in the CHECK lines of your 966898147bSNikita Popovpre-commited tests. You should not see something like this: 976898147bSNikita Popov 986898147bSNikita Popov```llvm 996898147bSNikita Popov; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]] 1006898147bSNikita Popov; ... 1016898147bSNikita Popov%or = or i8 %y, %x 1026898147bSNikita Popov``` 1036898147bSNikita Popov 1046898147bSNikita PopovIf this happens, you may need to change one of the operands to have higher 1056898147bSNikita Popovcomplexity (include the "thwart" comment in that case): 1066898147bSNikita Popov 1076898147bSNikita Popov```llvm 1086898147bSNikita Popov%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization 1096898147bSNikita Popov%or = or i8 %y, %x 1106898147bSNikita Popov``` 1116898147bSNikita Popov 1126898147bSNikita Popov### Add vector tests 1136898147bSNikita Popov 1146898147bSNikita PopovWhen possible, it is recommended to add at least one test that uses vectors 1156898147bSNikita Popovinstead of scalars. 1166898147bSNikita Popov 1176898147bSNikita PopovFor patterns that include constants, we distinguish three kinds of tests. 1186898147bSNikita PopovThe first are "splat" vectors, where all the vector elements are the same. 1196898147bSNikita PopovThese tests *should* usually fold without additional effort. 1206898147bSNikita Popov 1216898147bSNikita Popov```llvm 1226898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) { 1236898147bSNikita Popov %add = add <2 x i8> %x, <i8 1, i8 1> 1246898147bSNikita Popov %mul = mul <2 x i8> %add, <i8 3, i8 3> 1256898147bSNikita Popov ret <2 x i8> %mul 1266898147bSNikita Popov} 1276898147bSNikita Popov``` 1286898147bSNikita Popov 1296898147bSNikita PopovA minor variant is to replace some of the splat elements with poison. These 1306898147bSNikita Popovwill often also fold without additional effort. 1316898147bSNikita Popov 1326898147bSNikita Popov```llvm 1336898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) { 1346898147bSNikita Popov %add = add <2 x i8> %x, <i8 1, i8 poison> 1356898147bSNikita Popov %mul = mul <2 x i8> %add, <i8 3, i8 poison> 1366898147bSNikita Popov ret <2 x i8> %mul 1376898147bSNikita Popov} 1386898147bSNikita Popov``` 1396898147bSNikita Popov 1406898147bSNikita PopovFinally, you can have non-splat vectors, where the vector elements are not 1416898147bSNikita Popovthe same: 1426898147bSNikita Popov 1436898147bSNikita Popov```llvm 1446898147bSNikita Popovdefine <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) { 1456898147bSNikita Popov %add = add <2 x i8> %x, <i8 1, i8 5> 1466898147bSNikita Popov %mul = mul <2 x i8> %add, <i8 3, i8 6> 1476898147bSNikita Popov ret <2 x i8> %mul 1486898147bSNikita Popov} 1496898147bSNikita Popov``` 1506898147bSNikita Popov 1516898147bSNikita PopovNon-splat vectors will often not fold by default. You should **not** try to 1526898147bSNikita Popovmake them fold, unless doing so does not add **any** additional complexity. 1536898147bSNikita PopovYou should still add the test though, even if it does not fold. 1546898147bSNikita Popov 1556898147bSNikita Popov### Flag tests 1566898147bSNikita Popov 1576898147bSNikita PopovIf your transform involves instructions that can have poison-generating flags, 1586898147bSNikita Popovsuch as `nuw` and `nsw` on `add`, you should test how these interact with the 1596898147bSNikita Popovtransform. 1606898147bSNikita Popov 1616898147bSNikita PopovIf your transform *requires* a certain flag for correctness, make sure to add 1626898147bSNikita Popovnegative tests missing the required flag. 1636898147bSNikita Popov 1646898147bSNikita PopovIf your transform doesn't require flags for correctness, you should have tests 1656898147bSNikita Popovfor preservation behavior. If the input instructions have certain flags, are 1666898147bSNikita Popovthey preserved in the output instructions, if it is valid to preserve them? 1676898147bSNikita Popov(This depends on the transform. Check with alive2.) 1686898147bSNikita Popov 1696898147bSNikita PopovThe same also applies to fast-math-flags (FMF). In that case, please always 1706898147bSNikita Popovtest specific flags like `nnan`, `nsz` or `reassoc`, rather than the umbrella 1716898147bSNikita Popov`fast` flag. 1726898147bSNikita Popov 1736898147bSNikita Popov### Other tests 1746898147bSNikita Popov 1756898147bSNikita PopovThe test categories mentioned above are non-exhaustive. There may be more tests 1766898147bSNikita Popovto be added, depending on the instructions involved in the transform. Some 1776898147bSNikita Popovexamples: 1786898147bSNikita Popov 1796898147bSNikita Popov * For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled. 1806898147bSNikita Popov * For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1. 1816898147bSNikita Popov * For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly. 1826898147bSNikita Popov 1836898147bSNikita Popov## Proofs 1846898147bSNikita Popov 1856898147bSNikita PopovYour pull request description should contain one or more 1866898147bSNikita Popov[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the 1876898147bSNikita Popovproposed transform. 1886898147bSNikita Popov 1896898147bSNikita Popov### Basics 1906898147bSNikita Popov 1916898147bSNikita PopovProofs are written using LLVM IR, by specifying a `@src` and `@tgt` function. 1926898147bSNikita PopovIt is possible to include multiple proofs in a single file by giving the src 1936898147bSNikita Popovand tgt functions matching suffixes. 1946898147bSNikita Popov 1956898147bSNikita PopovFor example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can 1966898147bSNikita Popovbe simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)): 1976898147bSNikita Popov 1986898147bSNikita Popov```llvm 1996898147bSNikita Popovdefine i8 @src_add_sub(i8 %x, i8 %y) { 2006898147bSNikita Popov %add = add i8 %x, %y 2016898147bSNikita Popov %sub = sub i8 %add, %y 2026898147bSNikita Popov ret i8 %sub 2036898147bSNikita Popov} 2046898147bSNikita Popov 2056898147bSNikita Popovdefine i8 @tgt_add_sub(i8 %x, i8 %y) { 2066898147bSNikita Popov ret i8 %x 2076898147bSNikita Popov} 2086898147bSNikita Popov 2096898147bSNikita Popov 2106898147bSNikita Popovdefine i8 @src_sub_add(i8 %x, i8 %y) { 2116898147bSNikita Popov %sub = sub i8 %x, %y 2126898147bSNikita Popov %add = add i8 %sub, %y 2136898147bSNikita Popov ret i8 %add 2146898147bSNikita Popov} 2156898147bSNikita Popov 2166898147bSNikita Popovdefine i8 @tgt_sub_add(i8 %x, i8 %y) { 2176898147bSNikita Popov ret i8 %x 2186898147bSNikita Popov} 2196898147bSNikita Popov``` 2206898147bSNikita Popov 2216898147bSNikita Popov### Use generic values in proofs 2226898147bSNikita Popov 2236898147bSNikita PopovProofs should operate on generic values, rather than specific constants, to the degree that this is possible. 2246898147bSNikita Popov 2256898147bSNikita PopovFor example, if we want to fold `X s/ C s< X` to `X s> 0`, the following would 2266898147bSNikita Popovbe a *bad* proof: 2276898147bSNikita Popov 2286898147bSNikita Popov```llvm 2296898147bSNikita Popov; Don't do this! 2306898147bSNikita Popovdefine i1 @src(i8 %x) { 2316898147bSNikita Popov %div = sdiv i8 %x, 123 2326898147bSNikita Popov %cmp = icmp slt i8 %div, %x 2336898147bSNikita Popov ret i1 %cmp 2346898147bSNikita Popov} 2356898147bSNikita Popov 2366898147bSNikita Popovdefine i1 @tgt(i8 %x) { 2376898147bSNikita Popov %cmp = icmp sgt i8 %x, 0 2386898147bSNikita Popov ret i1 %cmp 2396898147bSNikita Popov} 2406898147bSNikita Popov``` 2416898147bSNikita Popov 2426898147bSNikita PopovThis is because it only proves that the transform is correct for the specific 2436898147bSNikita Popovconstant 123. Maybe there are some constants for which the transform is 2446898147bSNikita Popovincorrect? 2456898147bSNikita Popov 2466898147bSNikita PopovThe correct way to write this proof is as follows 2476898147bSNikita Popov([online](https://alive2.llvm.org/ce/z/acjwb6)): 2486898147bSNikita Popov 2496898147bSNikita Popov```llvm 2506898147bSNikita Popovdefine i1 @src(i8 %x, i8 %C) { 2516898147bSNikita Popov %precond = icmp ne i8 %C, 1 2526898147bSNikita Popov call void @llvm.assume(i1 %precond) 2536898147bSNikita Popov %div = sdiv i8 %x, %C 2546898147bSNikita Popov %cmp = icmp slt i8 %div, %x 2556898147bSNikita Popov ret i1 %cmp 2566898147bSNikita Popov} 2576898147bSNikita Popov 2586898147bSNikita Popovdefine i1 @tgt(i8 %x, i8 %C) { 2596898147bSNikita Popov %cmp = icmp sgt i8 %x, 0 2606898147bSNikita Popov ret i1 %cmp 2616898147bSNikita Popov} 2626898147bSNikita Popov``` 2636898147bSNikita Popov 2646898147bSNikita PopovNote that the `@llvm.assume` intrinsic is used to specify pre-conditions for 2656898147bSNikita Popovthe transform. In this case, the proof will fail unless we specify `C != 1` as 2666898147bSNikita Popova pre-condition. 2676898147bSNikita Popov 2686898147bSNikita PopovIt should be emphasized that there is, in general, no expectation that the 2696898147bSNikita PopovIR in the proofs will be transformed by the implemented fold. In the above 2706898147bSNikita Popovexample, the transform would only apply if `%C` is actually a constant, but we 2716898147bSNikita Popovneed to use non-constants in the proof. 2726898147bSNikita Popov 2736898147bSNikita Popov### Common pre-conditions 2746898147bSNikita Popov 2756898147bSNikita PopovHere are some examples of common preconditions. 2766898147bSNikita Popov 2776898147bSNikita Popov```llvm 2786898147bSNikita Popov; %x is non-negative: 2796898147bSNikita Popov%nonneg = icmp sgt i8 %x, -1 2806898147bSNikita Popovcall void @llvm.assume(i1 %nonneg) 2816898147bSNikita Popov 2826898147bSNikita Popov; %x is a power of two: 2836898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x) 2846898147bSNikita Popov%pow2 = icmp eq i8 %x, 1 2856898147bSNikita Popovcall void @llvm.assume(i1 %pow2) 2866898147bSNikita Popov 2876898147bSNikita Popov; %x is a power of two or zero: 2886898147bSNikita Popov%ctpop = call i8 @llvm.ctpop.i8(i8 %x) 2896898147bSNikita Popov%pow2orzero = icmp ult i8 %x, 2 2906898147bSNikita Popovcall void @llvm.assume(i1 %pow2orzero) 2916898147bSNikita Popov 2926898147bSNikita Popov; Adding %x and %y does not overflow in a signed sense: 2936898147bSNikita Popov%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y) 2946898147bSNikita Popov%ov = extractvalue { i8, i1 } %wo, 1 2956898147bSNikita Popov%ov.not = xor i1 %ov, true 2966898147bSNikita Popovcall void @llvm.assume(i1 %ov.not) 2976898147bSNikita Popov``` 2986898147bSNikita Popov 2996898147bSNikita Popov### Timeouts 3006898147bSNikita Popov 3016898147bSNikita PopovAlive2 proofs will sometimes produce a timeout with the following message: 3026898147bSNikita Popov 3036898147bSNikita Popov``` 3046898147bSNikita PopovAlive2 timed out while processing your query. 3056898147bSNikita PopovThere are a few things you can try: 3066898147bSNikita Popov 3076898147bSNikita Popov- remove extraneous instructions, if any 3086898147bSNikita Popov 3096898147bSNikita Popov- reduce variable widths, for example to i16, i8, or i4 3106898147bSNikita Popov 3116898147bSNikita Popov- add the --disable-undef-input command line flag, which 3126898147bSNikita Popov allows Alive2 to assume that arguments to your IR are not 3136898147bSNikita Popov undef. This is, in general, unsound: it can cause Alive2 3146898147bSNikita Popov to miss bugs. 3156898147bSNikita Popov``` 3166898147bSNikita Popov 3176898147bSNikita PopovThis is good advice, follow it! 3186898147bSNikita Popov 3196898147bSNikita PopovReducing the bitwidth usually helps. For floating point numbers, you can use 3206898147bSNikita Popovthe `half` type for bitwidth reduction purposes. For pointers, you can reduce 3216898147bSNikita Popovthe bitwidth by specifying a custom data layout: 3226898147bSNikita Popov 3236898147bSNikita Popov```llvm 3246898147bSNikita Popov; For 16-bit pointers 3256898147bSNikita Popovtarget datalayout = "p:16:16" 3266898147bSNikita Popov``` 3276898147bSNikita Popov 3286898147bSNikita PopovIf reducing the bitwidth does not help, try `-disable-undef-input`. This will 3296898147bSNikita Popovoften significantly improve performance, but also implies that the correctness 3306898147bSNikita Popovof the transform with `undef` values is no longer verified. This is usually 3316898147bSNikita Popovfine if the transform does not increase the number of uses of any value. 3326898147bSNikita Popov 3336898147bSNikita PopovFinally, it's possible to build alive2 locally and use `-smt-to=<m>` to verify 3346898147bSNikita Popovthe proof with a larger timeout. If you don't want to do this (or it still 3356898147bSNikita Popovdoes not work), please submit the proof you have despite the timeout. 3366898147bSNikita Popov 3376898147bSNikita Popov## Implementation 3386898147bSNikita Popov 3396898147bSNikita Popov### Real-world usefulness 3406898147bSNikita Popov 3416898147bSNikita PopovThere is a very large number of transforms that *could* be implemented, but 3426898147bSNikita Popovonly a tiny fraction of them are useful for real-world code. 3436898147bSNikita Popov 3446898147bSNikita PopovTransforms that do not have real-world usefulness provide *negative* value to 3456898147bSNikita Popovthe LLVM project, by taking up valuable reviewer time, increasing code 3466898147bSNikita Popovcomplexity and increasing compile-time overhead. 3476898147bSNikita Popov 3486898147bSNikita PopovWe do not require explicit proof of real-world usefulness for every transform 3496898147bSNikita Popov-- in most cases the usefulness is fairly "obvious". However, the question may 3506898147bSNikita Popovcome up for complex or unusual folds. Keep this in mind when chosing what you 3516898147bSNikita Popovwork on. 3526898147bSNikita Popov 3536898147bSNikita PopovIn particular, fixes for fuzzer-generated missed optimization reports will 3546898147bSNikita Popovlikely be rejected if there is no evidence of real-world usefulness. 3556898147bSNikita Popov 3566898147bSNikita Popov### Pick the correct optimization pass 3576898147bSNikita Popov 3586898147bSNikita PopovThere are a number of passes and utilities in the InstCombine family, and it 3596898147bSNikita Popovis important to pick the right place when implementing a fold. 3606898147bSNikita Popov 3616898147bSNikita Popov * `ConstantFolding`: For folding instructions with constant arguments to a constant. (Mainly relevant for intrinsics.) 3626898147bSNikita Popov * `ValueTracking`: For analyzing instructions, e.g. for known bits, non-zero, etc. Tests should usually use `-passes=instsimplify`. 3636898147bSNikita Popov * `InstructionSimplify`: For folds that do not create new instructions (either fold to existing value or constant). 3646898147bSNikita Popov * `InstCombine`: For folds that create or modify instructions. 3656898147bSNikita Popov * `AggressiveInstCombine`: For folds that are expensive, or violate InstCombine requirements. 3666898147bSNikita Popov * `VectorCombine`: For folds of vector operations that require target-dependent cost-modelling. 3676898147bSNikita Popov 3686898147bSNikita PopovSometimes, folds that logically belong in InstSimplify are placed in InstCombine instead, for example because they are too expensive, or because they are structurally simpler to implement in InstCombine. 3696898147bSNikita Popov 3706898147bSNikita PopovFor example, if a fold produces new instructions in some cases but returns an existing value in others, it may be preferable to keep all cases in InstCombine, rather than trying to split them among InstCombine and InstSimplify. 3716898147bSNikita Popov 3726898147bSNikita Popov### Canonicalization and target-independence 3736898147bSNikita Popov 3746898147bSNikita PopovInstCombine is a target-independent canonicalization pass. This means that it 3756898147bSNikita Popovtries to bring IR into a "canonical form" that other optimizations (both inside 3766898147bSNikita Popovand outside of InstCombine) can rely on. For this reason, the chosen canonical 3776898147bSNikita Popovform needs to be the same for all targets, and not depend on target-specific 3786898147bSNikita Popovcost modelling. 3796898147bSNikita Popov 3806898147bSNikita PopovIn many cases, "canonicalization" and "optimization" coincide. For example, if 3816898147bSNikita Popovwe convert `x * 2` into `x << 1`, this both makes the IR more canonical 3826898147bSNikita Popov(because there is now only one way to express the same operation, rather than 3836898147bSNikita Popovtwo) and faster (because shifts will usually have lower latency than 3846898147bSNikita Popovmultiplies). 3856898147bSNikita Popov 3866898147bSNikita PopovHowever, there are also canonicalizations that don't serve any direct 3876898147bSNikita Popovoptimization purpose. For example, InstCombine will canonicalize non-strict 3886898147bSNikita Popovpredicates like `ule` to strict predicates like `ult`. `icmp ule i8 %x, 7` 3896898147bSNikita Popovbecomes `icmp ult i8 %x, 8`. This is not an optimization in any meaningful 3906898147bSNikita Popovsense, but it does reduce the number of cases that other transforms need to 3916898147bSNikita Popovhandle. 3926898147bSNikita Popov 3936898147bSNikita PopovIf some canonicalization is not profitable for a specific target, then a reverse 3946898147bSNikita Popovtransform needs to be added in the backend. Patches to disable specific 3956898147bSNikita PopovInstCombine transforms on certain targets, or to drive them using 3966898147bSNikita Popovtarget-specific cost-modelling, **will not be accepted**. The only permitted 3976898147bSNikita Popovtarget-dependence is on DataLayout and TargetLibraryInfo. 3986898147bSNikita Popov 3996898147bSNikita PopovThe use of TargetTransformInfo is only allowed for hooks for target-specific 4006898147bSNikita Popovintrinsics, such as `TargetTransformInfo::instCombineIntrinsic()`. These are 4016898147bSNikita Popovalready inherently target-dependent anyway. 4026898147bSNikita Popov 4036898147bSNikita PopovFor vector-specific transforms that require cost-modelling, the VectorCombine 4046898147bSNikita Popovpass can be used instead. In very rare circumstances, if there are no other 4056898147bSNikita Popovalternatives, target-dependent transforms may be accepted into 4066898147bSNikita PopovAggressiveInstCombine. 4076898147bSNikita Popov 4086898147bSNikita Popov### PatternMatch 4096898147bSNikita Popov 4106898147bSNikita PopovMany transforms make use of the matching infrastructure defined in 4116898147bSNikita Popov[PatternMatch.h](https://github.com/llvm/llvm-project/blame/main/llvm/include/llvm/IR/PatternMatch.h). 4126898147bSNikita Popov 4136898147bSNikita PopovHere is a typical usage example: 4146898147bSNikita Popov 4156898147bSNikita Popov``` 4166898147bSNikita Popov// Fold (A - B) + B and B + (A - B) to A. 4176898147bSNikita PopovValue *A, *B; 4186898147bSNikita Popovif (match(V, m_c_Add(m_Sub(m_Value(A), m_Value(B)), m_Deferred(B)))) 4196898147bSNikita Popov return A; 4206898147bSNikita Popov``` 4216898147bSNikita Popov 4226898147bSNikita PopovAnd another: 4236898147bSNikita Popov 4246898147bSNikita Popov``` 4256898147bSNikita Popov// Fold A + C1 == C2 to A == C1+C2 4266898147bSNikita PopovValue *A; 4276898147bSNikita Popovif (match(V, m_ICmp(Pred, m_Add(m_Value(A), m_APInt(C1)), m_APInt(C2))) && 4286898147bSNikita Popov ICmpInst::isEquality(Pred)) 4296898147bSNikita Popov return Builder.CreateICmp(Pred, A, 4306898147bSNikita Popov ConstantInt::get(A->getType(), *C1 + *C2)); 4316898147bSNikita Popov``` 4326898147bSNikita Popov 4336898147bSNikita PopovSome common matchers are: 4346898147bSNikita Popov 4356898147bSNikita Popov * `m_Value(A)`: Match any value and write it into `Value *A`. 4366898147bSNikita Popov * `m_Specific(A)`: Check that the operand equals A. Use this if A is 4376898147bSNikita Popov assigned **outside** the pattern. 4386898147bSNikita Popov * `m_Deferred(A)`: Check that the operand equals A. Use this if A is 4396898147bSNikita Popov assigned **inside** the pattern, for example via `m_Value(A)`. 4406898147bSNikita Popov * `m_APInt(C)`: Match a scalar integer constant or splat vector constant into 4416898147bSNikita Popov `const APInt *C`. Does not permit undef/poison values. 4426898147bSNikita Popov * `m_ImmConstant(C)`: Match any non-constant-expression constant into 4436898147bSNikita Popov `Constant *C`. 4446898147bSNikita Popov * `m_Constant(C)`: Match any constant into `Constant *C`. Don't use this unless 4456898147bSNikita Popov you know what you're doing. 4466898147bSNikita Popov * `m_Add(M1, M2)`, `m_Sub(M1, M2)`, etc: Match an add/sub/etc where the first 4476898147bSNikita Popov operand matches M1 and the second M2. 4486898147bSNikita Popov * `m_c_Add(M1, M2)`, etc: Match an add commutatively. The operands must match 4496898147bSNikita Popov either M1 and M2 or M2 and M1. Most instruction matchers have a commutative 4506898147bSNikita Popov variant. 4516898147bSNikita Popov * `m_ICmp(Pred, M1, M2)` and `m_c_ICmp(Pred, M1, M2)`: Match an icmp, writing 4526898147bSNikita Popov the predicate into `IcmpInst::Predicate Pred`. If the commutative version 4536898147bSNikita Popov is used, and the operands match in order M2, M1, then `Pred` will be the 4546898147bSNikita Popov swapped predicate. 4556898147bSNikita Popov * `m_OneUse(M)`: Check that the value only has one use, and also matches M. 4566898147bSNikita Popov For example `m_OneUse(m_Add(...))`. See the next section for more 4576898147bSNikita Popov information. 4586898147bSNikita Popov 4596898147bSNikita PopovSee the header for the full list of available matchers. 4606898147bSNikita Popov 4616898147bSNikita Popov### InstCombine APIs 4626898147bSNikita Popov 4636898147bSNikita PopovInstCombine transforms are handled by `visitXYZ()` methods, where XYZ 4646898147bSNikita Popovcorresponds to the root instruction of your transform. If the outermost 4656898147bSNikita Popovinstruction of the pattern you are matching is an icmp, the fold will be 4666898147bSNikita Popovlocated somewhere inside `visitICmpInst()`. 4676898147bSNikita Popov 4686898147bSNikita PopovThe return value of the visit method is an instruction. You can either return 4696898147bSNikita Popova new instruction, in which case it will be inserted before the old one, and 4706898147bSNikita Popovuses of the old one will be replaced by it. Or you can return the original 4716898147bSNikita Popovinstruction to indicate that *some* kind of change has been made. Finally, a 4726898147bSNikita Popovnullptr return value indicates that no change occurred. 4736898147bSNikita Popov 4746898147bSNikita PopovFor example, if your transform produces a single new icmp instruction, you could 4756898147bSNikita Popovwrite the following: 4766898147bSNikita Popov 4776898147bSNikita Popov``` 4786898147bSNikita Popovif (...) 4796898147bSNikita Popov return new ICmpInst(Pred, X, Y); 4806898147bSNikita Popov``` 4816898147bSNikita Popov 4826898147bSNikita PopovIn this case the main InstCombine loop takes care of inserting the instruction 4836898147bSNikita Popovand replacing uses of the old instruction. 4846898147bSNikita Popov 4856898147bSNikita PopovAlternatively, you can also write it like this: 4866898147bSNikita Popov 4876898147bSNikita Popov``` 4886898147bSNikita Popovif (...) 4896898147bSNikita Popov return replaceInstUsesWith(OrigI, Builder.CreateICmp(Pred, X, Y)); 4906898147bSNikita Popov``` 4916898147bSNikita Popov 4926898147bSNikita PopovIn this case `IRBuilder` will insert the instruction and `replaceInstUsesWith()` 4936898147bSNikita Popovwill replace the uses of the old instruction, and return it to indicate that 4946898147bSNikita Popova change occurred. 4956898147bSNikita Popov 4966898147bSNikita PopovBoth forms are equivalent, and you can use whichever is more convenient in 4976898147bSNikita Popovcontext. For example, it's common that folds are inside helper functions that 4986898147bSNikita Popovreturn `Value *` and then `replaceInstUsesWith()` is invoked on the result of 4996898147bSNikita Popovthat helper. 5006898147bSNikita Popov 5016898147bSNikita PopovInstCombine makes use of a worklist, which needs to be correctly updated during 5026898147bSNikita Popovtransforms. This usually happens automatically, but there are some things to 5036898147bSNikita Popovkeep in mind: 5046898147bSNikita Popov 5056898147bSNikita Popov * Don't use the `Value::replaceAllUsesWith()` API. Use InstCombine's 5066898147bSNikita Popov `replaceInstUsesWith()` helper instead. 5076898147bSNikita Popov * Don't use the `Instruction::eraseFromParent()` API. Use InstCombine's 5086898147bSNikita Popov `eraseInstFromFunction()` helper instead. (Explicitly erasing instruction 5096898147bSNikita Popov is usually not necessary, as side-effect free instructions without users 5106898147bSNikita Popov are automatically removed.) 5116898147bSNikita Popov * Apart from the "directly return an instruction" pattern above, use IRBUilder 5126898147bSNikita Popov to create all instruction. Do not manually create and insert them. 5136898147bSNikita Popov * When replacing operands or uses of instructions, use `replaceOperand()` 5146898147bSNikita Popov and `replaceUse()` instead of `setOperand()`. 5156898147bSNikita Popov 5166898147bSNikita Popov### Multi-use handling 5176898147bSNikita Popov 5186898147bSNikita PopovTransforms should usually not increase the total number of instructions. This 5196898147bSNikita Popovis not a hard requirement: For example, it is usually worthwhile to replace a 5206898147bSNikita Popovsingle division instruction with multiple other instructions. 5216898147bSNikita Popov 5226898147bSNikita PopovFor example, if you have a transform that replaces two instructions, with two 5236898147bSNikita Popovother instructions, this is (usually) only profitable if *both* the original 5246898147bSNikita Popovinstructions can be removed. To ensure that both instructions are removed, you 5256898147bSNikita Popovneed to add a one-use check for the inner instruction. 5266898147bSNikita Popov 5276898147bSNikita PopovOne-use checks can be performed using the `m_OneUse()` matcher, or the 5286898147bSNikita Popov`V->hasOneUse()` method. 5296898147bSNikita Popov 5306898147bSNikita Popov### Generalization 5316898147bSNikita Popov 5326898147bSNikita PopovTransforms can both be too specific (only handling some odd subset of patterns, 5336898147bSNikita Popovleading to unexpected optimization cliffs) and too general (introducing 5346898147bSNikita Popovcomplexity to handle cases with no real-world relevance). The right level of 5356898147bSNikita Popovgenerality is quite subjective, so this section only provides some broad 5366898147bSNikita Popovguidelines. 5376898147bSNikita Popov 5386898147bSNikita Popov * Avoid transforms that are hardcoded to specific constants. Try to figure 5396898147bSNikita Popov out what the general rule for arbitrary constants is. 5406898147bSNikita Popov * Add handling for conjugate patterns. For example, if you implement a fold 5416898147bSNikita Popov for `icmp eq`, you almost certainly also want to support `icmp ne`, with the 5426898147bSNikita Popov inverse result. Similarly, if you implement a pattern for `and` of `icmp`s, 5436898147bSNikita Popov you should also handle the de-Morgan conjugate using `or`. 5446898147bSNikita Popov * Handle non-splat vector constants if doing so is free, but do not add 5456898147bSNikita Popov handling for them if it adds any additional complexity to the code. 5466898147bSNikita Popov * Do not handle non-canonical patterns, unless there is a specific motivation 5476898147bSNikita Popov to do so. For example, it may sometimes be worthwhile to handle a pattern 5486898147bSNikita Popov that would normally be converted into a different canonical form, but can 5496898147bSNikita Popov still occur in multi-use scenarios. This is fine to do if there is specific 5506898147bSNikita Popov real-world motivation, but you should not go out of your way to do this 5516898147bSNikita Popov otherwise. 5526898147bSNikita Popov * Sometimes the motivating pattern uses a constant value with certain 5536898147bSNikita Popov properties, but the fold can be generalized to non-constant values by making 5546898147bSNikita Popov use of ValueTracking queries. Whether this makes sense depends on the case, 5556898147bSNikita Popov but it's usually a good idea to only handle the constant pattern first, and 5566898147bSNikita Popov then generalize later if it seems useful. 557*f16e234fSNikita Popov 558*f16e234fSNikita Popov## Guidelines for reviewers 559*f16e234fSNikita Popov 560*f16e234fSNikita Popov * Do not ask new contributors to implement non-splat vector support in code 561*f16e234fSNikita Popov reviews. If you think non-splat vector support for a fold is both 562*f16e234fSNikita Popov worthwhile and policy-compliant (that is, the handling would not result in 563*f16e234fSNikita Popov any appreciable increase in code complexity), implement it yourself in a 564*f16e234fSNikita Popov follow-up patch. 565