Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
4583f6d3 |
| 08-Jan-2025 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time.
This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
ed8019d9 |
| 18-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[Target] Remove unused includes (NFC) (#116577)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3 |
|
#
020fa868 |
| 22-Oct-2024 |
Artem Belevich <tra@google.com> |
[NVPTX] mangle symbols in debug info to conform to PTX restrictions. (#113216)
Until now debug info was printing the symbols names as-is and that
resulted in invalid PTX when the symbols contained
[NVPTX] mangle symbols in debug info to conform to PTX restrictions. (#113216)
Until now debug info was printing the symbols names as-is and that
resulted in invalid PTX when the symbols contained characters that are
invalid for PTX. E.g. `__PRETTY_FUNCTION.something`
Debug info is somewhat disconnected from the symbols themselves, so the
regular "NVPTXAssignValidGlobalNames" pass can't easily fix them.
As the "plan B" this patch catches printout of debug symbols and fixes
them, as needed. One gotcha is that the same code path is used to print
the names of debug info sections. Those section names do start with a
'.debug'. The dot in those names is nominally illegal in PTX, but the
debug section names with a dot are accepted as a special case. The
downside of this change is that if someone ever has a `.debug*` symbol
that needs to be referred to from the debug info, that label will be
passed through as-is, and will still produce broken PTX output. If/when
we run into a case where we need it to work, we could consider only
passing through specific debug section names, or add a mechanism
allowing us to tell section names apart from regular symbols.
Fixes #58491
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
9220f645 |
| 13-Oct-2024 |
Kazu Hirata <kazu@google.com> |
[NVPTX] Avoid repeated hash lookups (NFC) (#112117)
|
Revision tags: llvmorg-19.1.1 |
|
#
9bc26e9e |
| 25-Sep-2024 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX] Support !"cluster_dim_{x,y,z}" metadata (#109548)
Add support for !"cluster_dim_{x,y,z}" metadata to allow specifying
cluster dimensions on a kernel function in llvm.
If any of these met
[NVPTX] Support !"cluster_dim_{x,y,z}" metadata (#109548)
Add support for !"cluster_dim_{x,y,z}" metadata to allow specifying
cluster dimensions on a kernel function in llvm.
If any of these metadata entries are present, the `.explicitcluster` PTX
directive is used and the specified dimensions are lowered with the
`.reqnctapercluster` directive. For more details see:
[PTX ISA: 11.7. Cluster Dimension Directives]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives)
show more ...
|
#
489acb24 |
| 25-Sep-2024 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX][NFC] Refactor utilities to use std::optional (#109883)
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
9fa7c05a |
| 30-Jun-2024 |
Akshay Deodhar <adeodhar@nvidia.com> |
[NVPTX] Improved support for grid_constant (#97112)
- Supports escaped grid_constant pointers less conservatively. Casts
uses inside Calls, PtrToInts, Stores where the pointer is a _value
operand_
[NVPTX] Improved support for grid_constant (#97112)
- Supports escaped grid_constant pointers less conservatively. Casts
uses inside Calls, PtrToInts, Stores where the pointer is a _value
operand_ to generic address space, immediately before the escape, while
keeping other uses in the param address space
- Related to: https://github.com/llvm/llvm-project/pull/96125
show more ...
|
#
687d6fbf |
| 24-Jun-2024 |
Akshay Deodhar <adeodhar@nvidia.com> |
[NVPTX] Basic support for "grid_constant" (#96125)
- Adds a helper function for checking whether an argument is a
[grid_constant](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-prop
[NVPTX] Basic support for "grid_constant" (#96125)
- Adds a helper function for checking whether an argument is a
[grid_constant](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties).
- Adds support for cvta.param using changes from
https://github.com/llvm/llvm-project/pull/95289
- Supports escaped grid_constant pointers conservatively, by casting all
uses to the generic address space with cvta.param.
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
435addbf |
| 06-Jun-2024 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX] Revamp NVVMIntrRange pass (#94422)
Revamp the NVVMIntrRange pass making the following updates:
- Use range attributes over range metadata. This is what instcombine has
move to for ranges o
[NVPTX] Revamp NVVMIntrRange pass (#94422)
Revamp the NVVMIntrRange pass making the following updates:
- Use range attributes over range metadata. This is what instcombine has
move to for ranges on intrinsics in
https://github.com/llvm/llvm-project/pull/88776 and it seems a bit
cleaner.
- Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function
metadata when adding ranges for `tid` srge instrinsics. This can allow
for smaller ranges and more optimization.
- When range attributes are already present, use the intersection of the
old and new range. This complements the metadata change by allowing
ranges to be shrunk when an intrinsic is in a function which is inlined
into a kernel with metadata. While we don't call this more then once
yet, we should consider adding a second call after inlining, once this
has had a chance to soak for a while and no issues have arisen.
I've also re-enabled this pass in the TM, it was disabled years ago due
to "numerical discrepancies" https://reviews.llvm.org/D96166. In our
testing we haven't seen any issues with adding ranges to intrinsics, and
I cannot find any further info about what issues were encountered.
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
8da3a8f5 |
| 17-May-2024 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX] fixup support for over-aligned parameters (#92457)
This extends the NVPTX support for over-aligned parameters and return
values in a few related ways:
- Support for `alignstack` attribut
[NVPTX] fixup support for over-aligned parameters (#92457)
This extends the NVPTX support for over-aligned parameters and return
values in a few related ways:
- Support for `alignstack` attribute, as an alternative to legacy nvvm
`!"align"` metadata entries. While we still maintain the legacy support,
long term it might be nice to auto-upgrade to `alignstack`.
- Check the alignment info when emitting the parameter list to prevent a
mismatch between alignment of caller and callee, which would previously
cause a fatal error for `ptxas`.
- Check the alignment info when emitting loads for parameters,
potentially enabling better vectorization.
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
3f8d4a8e |
| 29-Sep-2023 |
Jakub Chlanda <jakub@codeplay.com> |
Reland [NVPTX] Add support for maxclusterrank in launch_bounds (#66496) (#67667)
This reverts commit 0afbcb20fd908f8bf9073697423da097be7db592.
|
#
0afbcb20 |
| 27-Sep-2023 |
Sam McCall <sam.mccall@gmail.com> |
Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)"
This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13.
SemaDeclAttr.cpp cannot depend on Basic's private headers (li
Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)"
This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13.
SemaDeclAttr.cpp cannot depend on Basic's private headers (lib/Basic/Targets/NVPTX.h)
show more ...
|
#
dfab31b4 |
| 27-Sep-2023 |
Jakub Chlanda <jakub@codeplay.com> |
[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)
Since SM_90 CUDA supports specifying additional argument to the
launch_bounds attribute: maxBlocksPerCluster, to express the maximum
[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)
Since SM_90 CUDA supports specifying additional argument to the
launch_bounds attribute: maxBlocksPerCluster, to express the maximum
number of CTAs that can be part of the cluster. See:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives-maxclusterrank
and
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds
for details.
show more ...
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
0a7a9260 |
| 08-Sep-2023 |
Thomas <thomas.raoux@openai.com> |
[NVPTX] Make i16x2 a native type and add supported vec instructions (#65799)
recommit https://github.com/llvm/llvm-project/pull/65432 with minor bug
fix for bitcasts
|
#
b3a14cac |
| 08-Sep-2023 |
Dmitri Gribenko <gribozavr@gmail.com> |
Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)"
This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba.
As per PR discussion "Looks like we've missed low
Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)"
This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba.
As per PR discussion "Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA."
show more ...
|
#
db5d845c |
| 07-Sep-2023 |
Thomas <thomas.raoux@openai.com> |
[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)
On sm_90 some instructions now support i16x2 which allows hardware to
execute more efficiently add, min and max instruct
[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)
On sm_90 some instructions now support i16x2 which allows hardware to
execute more efficiently add, min and max instructions.
In order to support that we need to make i16x2 a native type in the
backend. This does the necessary changes to make i16x2 a native type and
adds support for the instructions natively supporting i16x2.
This caused a negative test in nvptx slp to start passing. Changed the
test to a positive one as the IR is correctly vectorized.
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
fa023e0f |
| 28-Dec-2022 |
Pavel Kopyl <pavelkopyl@gmail.com> |
[NVPTX] Emit .noreturn directive
Differential Revision: https://reviews.llvm.org/D140238
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
#
940fa35e |
| 23-Sep-2022 |
Luke Drummond <luke.drummond@codeplay.com> |
[NVPTX] Fix a segfault for bitcasted calls with byval params
`getFunctionParamOptimizedAlign` was being passed a null function argument when getting the callee of a bitcasted function symbol. This i
[NVPTX] Fix a segfault for bitcasted calls with byval params
`getFunctionParamOptimizedAlign` was being passed a null function argument when getting the callee of a bitcasted function symbol. This is because `CallBase::getCalledFunction` does not look through bitcasts.
There is already code to handle this case in `NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into an NVPTX util.
The alignment computation now gracefully handles computing alignment of virtual functions with a check for null.
show more ...
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
ede60037 |
| 29-Jun-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
show more ...
|
#
e9ce1a58 |
| 10-Jul-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f062457c928c18a88c612f39d9e168f65a85.
Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
|
#
e6f1f062 |
| 29-Jun-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases,
ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
9d745828 |
| 08-Jan-2022 |
Kazu Hirata <kazu@google.com> |
[Target] use range-based for loops (NFC)
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
#
adcd0268 |
| 28-Jan-2020 |
Benjamin Kramer <benny.kra@googlemail.com> |
Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here.
This is mostly m
Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
show more ...
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2 |
|
#
3d5360a4 |
| 07-Aug-2019 |
Benjamin Kramer <benny.kra@googlemail.com> |
Replace llvm::MutexGuard/UniqueLock with their standard equivalents
All supported platforms have <mutex> now, so we don't need our own copies any longer. No functionality change intended.
llvm-svn:
Replace llvm::MutexGuard/UniqueLock with their standard equivalents
All supported platforms have <mutex> now, so we don't need our own copies any longer. No functionality change intended.
llvm-svn: 368149
show more ...
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
7ba838d2 |
| 12-Jul-2019 |
Bryant Wong <llvm-commits@xorshift.org> |
Test commit. NFC.
Formatting fix.
llvm-svn: 365878
|