#
0c63ec53 |
| 29-Jan-2025 |
Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> |
[NFC][SIWholeQuadMode] Remove redundant arguments (#124930)
|
#
2e43f392 |
| 29-Jan-2025 |
Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> |
[NFC][SIWholeQuadMode] Perform less lookups (#124927)
|
Revision tags: llvmorg-21-init |
|
#
f811482a |
| 19-Jan-2025 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] SIWholeQuadMode: Ensure earliest WQM entry point for PS (#123266)
Ensure shaders running WQM (PS) enter at the earliest point irrespective
of WQM marking.
|
Revision tags: llvmorg-19.1.7 |
|
#
40fa7f5e |
| 14-Jan-2025 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] Fix computed kill mask (#122736)
Replace S_XOR with S_ANDN2 when computing the kill mask in demote/kill
lowering. This has the effect of AND'ing demote/kill condition with exec
which is n
[AMDGPU] Fix computed kill mask (#122736)
Replace S_XOR with S_ANDN2 when computing the kill mask in demote/kill
lowering. This has the effect of AND'ing demote/kill condition with exec
which is needed for proper live mask update.
The S_XOR is inadequate because it may return true for lane with exec=0.
This patch fixes an image corruption in game.
I think the issue went unnoticed because demote/kill condition is often
naturally dependent on exec, so AND'ing with exec is usually not
required.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
1562b70e |
| 13-Dec-2024 |
paperchalice <liujunchang97@outlook.com> |
Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting
Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
https://github.com/llvm/llvm-project/blob/6c7e5827eda26990e872eb7c3f0d7866ee3c3171/llvm/include/llvm/Support/GenericDomTree.h#L684-L687
show more ...
|
#
553058f8 |
| 11-Dec-2024 |
paperchalice <liujunchang97@outlook.com> |
Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)
Reverts llvm/llvm-project#115111 Causes #119511
|
#
79047fac |
| 11-Dec-2024 |
paperchalice <liujunchang97@outlook.com> |
[DomTreeUpdater] Move critical edge splitting code to updater (#115111)
Support critical edge splitting in dominator tree updater. Continue the
work in #100856.
Compile time check:
https://llvm
[DomTreeUpdater] Move critical edge splitting code to updater (#115111)
Support critical edge splitting in dominator tree updater. Continue the
work in #100856.
Compile time check:
https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
show more ...
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
33562085 |
| 13-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).
...if only we had a merge queue.
show more ...
|
#
7792b4ae |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
bo
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
show more ...
|
#
703ebca8 |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The bui
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
show more ...
|
#
e55d6f5e |
| 11-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)
Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifyi
[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)
Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
show more ...
|
#
01967e26 |
| 11-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Shrink a live interval instead of recomputing it. NFCI. (#108171)
|
#
c7a7767f |
| 10-Sep-2024 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.
Reverts llvm/llvm-project#105822
|
#
44556e64 |
| 10-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.
As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):
```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail
func:
; ... stuff that should run with the actual EXEC mask
br label %tail
tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```
It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).
The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
show more ...
|
#
1d44ecb9 |
| 08-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove unnecessary untieRegOperand (#107695)
As far as I can tell, V_SET_INACTIVE has never had tied operands.
|
#
16cda01d |
| 05-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] V_SET_INACTIVE optimizations (#98864)
Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically b
[AMDGPU] V_SET_INACTIVE optimizations (#98864)
Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.
Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
3611c0b7 |
| 01-Aug-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] SIWholeQuadMode: avoid execz effects in exact regions (#101157)
Exact mode regions within WQM may have EXEC=0 in divergent control flow.
This occurs if a branch is only taken by helper lan
[AMDGPU] SIWholeQuadMode: avoid execz effects in exact regions (#101157)
Exact mode regions within WQM may have EXEC=0 in divergent control flow.
This occurs if a branch is only taken by helper lanes and an instruction
requiring WQM disabling is encountered.
The current code extends the exact region as far as possible; however,
this can result in it including instructions with unwanted side effects
at EXEC=0.
In particular readfirstlane combined with scalar loads can produce
invalid memory accesses in this circumstance.
Workaround this by shrinking exact regions to only the instructions
requiring WQM disabling when unwanted side effects are present.
Eventually we should branch over these regions when EXEC=0, but this
requires visibility of CFG/divergence information not currently
available.
show more ...
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
8d28a410 |
| 17-Jul-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Remove SIWholeQuadMode pass early exit (#98450)
Merge the code bypass elements from the early exit into the main pass
execution flow.
|
#
5e338f1f |
| 17-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.
|
#
36984536 |
| 15-Jul-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] SIWholeQuadMode: remove unnecessary map access (NFCI)
|
#
d4e46f0e |
| 11-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix machine verification failure from INIT_EXEC lowering (#98333)
Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in
[AMDGPU] Fix machine verification failure from INIT_EXEC lowering (#98333)
Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in #94452.
show more ...
|
#
6a907699 |
| 11-Jul-2024 |
Nikita Popov <npopov@redhat.com> |
Revert "[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)"
This reverts commit c5e5088033fed170068d818c54af6862e449b545.
Causes large compile-time regressions.
|
#
c5e50880 |
| 11-Jul-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlo
[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.
Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
show more ...
|
#
abde52aa |
| 10-Jul-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use
[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
show more ...
|