History log of /llvm-project/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp (Results 1 – 25 of 128)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0c63ec53 29-Jan-2025 Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>

[NFC][SIWholeQuadMode] Remove redundant arguments (#124930)


# 2e43f392 29-Jan-2025 Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>

[NFC][SIWholeQuadMode] Perform less lookups (#124927)


Revision tags: llvmorg-21-init
# f811482a 19-Jan-2025 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] SIWholeQuadMode: Ensure earliest WQM entry point for PS (#123266)

Ensure shaders running WQM (PS) enter at the earliest point irrespective
of WQM marking.


Revision tags: llvmorg-19.1.7
# 40fa7f5e 14-Jan-2025 Piotr Sobczak <piotr.sobczak@amd.com>

[AMDGPU] Fix computed kill mask (#122736)

Replace S_XOR with S_ANDN2 when computing the kill mask in demote/kill
lowering. This has the effect of AND'ing demote/kill condition with exec
which is n

[AMDGPU] Fix computed kill mask (#122736)

Replace S_XOR with S_ANDN2 when computing the kill mask in demote/kill
lowering. This has the effect of AND'ing demote/kill condition with exec
which is needed for proper live mask update.

The S_XOR is inadequate because it may return true for lane with exec=0.

This patch fixes an image corruption in game.

I think the issue went unnoticed because demote/kill condition is often
naturally dependent on exec, so AND'ing with exec is usually not
required.

show more ...


Revision tags: llvmorg-19.1.6
# 1562b70e 13-Dec-2024 paperchalice <liujunchang97@outlook.com>

Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)

This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting

Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)

This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See

https://github.com/llvm/llvm-project/blob/6c7e5827eda26990e872eb7c3f0d7866ee3c3171/llvm/include/llvm/Support/GenericDomTree.h#L684-L687

show more ...


# 553058f8 11-Dec-2024 paperchalice <liujunchang97@outlook.com>

Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)

Reverts llvm/llvm-project#115111 Causes #119511


# 79047fac 11-Dec-2024 paperchalice <liujunchang97@outlook.com>

[DomTreeUpdater] Move critical edge splitting code to updater (#115111)

Support critical edge splitting in dominator tree updater. Continue the
work in #100856.

Compile time check:
https://llvm

[DomTreeUpdater] Move critical edge splitting code to updater (#115111)

Support critical edge splitting in dominator tree updater. Continue the
work in #100856.

Compile time check:
https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au

show more ...


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2
# 8d13e7b8 03-Oct-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Qualify auto. NFC. (#110878)

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 33562085 13-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.

show more ...


# 7792b4ae 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
bo

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.

show more ...


# 703ebca8 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The bui

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.

show more ...


# e55d6f5e 11-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)

Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifyi

[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)

Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.

show more ...


# 01967e26 11-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Shrink a live interval instead of recomputing it. NFCI. (#108171)


# c7a7767f 10-Sep-2024 Vitaly Buka <vitalybuka@google.com>

Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)

Breaks bots, see #105822.

Reverts llvm/llvm-project#105822


# 44556e64 10-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail

func:
; ... stuff that should run with the actual EXEC mask
br label %tail

tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).

show more ...


# 1d44ecb9 08-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove unnecessary untieRegOperand (#107695)

As far as I can tell, V_SET_INACTIVE has never had tied operands.


# 16cda01d 05-Sep-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] V_SET_INACTIVE optimizations (#98864)

Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically b

[AMDGPU] V_SET_INACTIVE optimizations (#98864)

Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.

Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# 3611c0b7 01-Aug-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] SIWholeQuadMode: avoid execz effects in exact regions (#101157)

Exact mode regions within WQM may have EXEC=0 in divergent control flow.
This occurs if a branch is only taken by helper lan

[AMDGPU] SIWholeQuadMode: avoid execz effects in exact regions (#101157)

Exact mode regions within WQM may have EXEC=0 in divergent control flow.
This occurs if a branch is only taken by helper lanes and an instruction
requiring WQM disabling is encountered.

The current code extends the exact region as far as possible; however,
this can result in it including instructions with unwanted side effects
at EXEC=0.
In particular readfirstlane combined with scalar loads can produce
invalid memory accesses in this circumstance.

Workaround this by shrinking exact regions to only the instructions
requiring WQM disabling when unwanted side effects are present.
Eventually we should branch over these regions when EXEC=0, but this
requires visibility of CFG/divergence information not currently
available.

show more ...


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 8d28a410 17-Jul-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Remove SIWholeQuadMode pass early exit (#98450)

Merge the code bypass elements from the early exit into the main pass
execution flow.


# 5e338f1f 17-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.


# 36984536 15-Jul-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] SIWholeQuadMode: remove unnecessary map access (NFCI)


# d4e46f0e 11-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fix machine verification failure from INIT_EXEC lowering (#98333)

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in

[AMDGPU] Fix machine verification failure from INIT_EXEC lowering (#98333)

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in #94452.

show more ...


# 6a907699 11-Jul-2024 Nikita Popov <npopov@redhat.com>

Revert "[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)"

This reverts commit c5e5088033fed170068d818c54af6862e449b545.

Causes large compile-time regressions.


# c5e50880 11-Jul-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)

Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlo

[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)

Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.

Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.

show more ...


# abde52aa 10-Jul-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)

- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use

[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)

- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.

show more ...


123456