|
Revision tags: llvmorg-21-init |
|
| #
5a81a559 |
| 27-Jan-2025 |
David Green <david.green@arm.com> |
[GISel] Explicitly disable BF16 tablegen patterns. (#124113)
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the t
[GISel] Explicitly disable BF16 tablegen patterns. (#124113)
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the two. This
patch explicitly disables them to make sure that they are never used.
The opposite can also happen too, where fp16 patterns are used for
operators that should be bf16. So this also changes any operations with
bf16 types to now cause a fallback to SDAG.
The pass setup for GISel has been slightly adjusted to make sure that a
verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass,
which otherwise can cause verifier issues when falling back.
show more ...
|
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
| #
7b4c8b35 |
| 16-Oct-2024 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031)
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.
This patch
[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031)
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.
This patch applies true16/fake16 changes and asm/dasm changes to
V_ADD_NC_U16
V_ADD_NC_I16
V_SUB_NC_U16
V_SUB_NC_I16
show more ...
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1 |
|
| #
a8203291 |
| 26-Jul-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)
They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.c
[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)
They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.com/llvm/llvm-project/pull/98400
The exception is when "target-features" are set to "+wavefrontsize32" or
"+wavefrontsize64", we still need to remove a wavefrontsize feature
before add a different one to make sure only one of them are present.
show more ...
|
|
Revision tags: llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
| #
be36812f |
| 21-Feb-2024 |
David Majnemer <david.majnemer@gmail.com> |
[TargetLowering] Be more efficient in fp -> bf16 NaN conversions
We can avoid masking completely as it is OK (and probably preferable) to bring over some of the existant NaN payload.
|
| #
9eff001d |
| 21-Feb-2024 |
David Majnemer <david.majnemer@gmail.com> |
[TargetLowering] Correctly yield NaN from FP_TO_BF16
We didn't set the exponent field, resulting in tiny numbers instead of NaNs.
|
| #
cc13f3ba |
| 21-Feb-2024 |
David Majnemer <david.majnemer@gmail.com> |
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the t
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities
Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
show more ...
|
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
| #
460ffcdd |
| 04-Jan-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
la
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.
Depends #76213 #76214
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
| #
fab28e0e |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.
Analysis of failures on buildbots with expensive checks enabled showed that th
Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.
Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.
show more ...
|
| #
6cb3866b |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.
|
| #
0f864c7b |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionali
[AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated.
Reviewed By: arsenm, Joe_Nash
Differential Revision: https://reviews.llvm.org/D156101
show more ...
|
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
26dc2844 |
| 30-May-2023 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions:
* Put `inreg`
[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions:
* Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out.
* Lower the return (which is always void) as an S_ENDPGM.
* Set the ScratchRSrc register to s48:51, as described in the docs.
* Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch.
Differential Revision: https://reviews.llvm.org/D153517
show more ...
|