LowerMemIntrinsics.cpp - OpenGrok history log for /llvm-project/llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 6292a808	24-Jan-2025	Jeremy Morse <jeremy.morse@sony.com>	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNo [NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com> show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 298127dc	15-Nov-2024	Alex Bradbury <asb@igalia.com>	Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583) Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the test case. Supersedes the draft PR #94992, tak Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583) Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the test case. Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc). show more ...
# 0fb8fac5	15-Nov-2024	Alex Bradbury <asb@igalia.com>	Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)" This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3. Recent scheduling changes means tests need to be re-gen Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)" This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3. Recent scheduling changes means tests need to be re-generated. Reverting to green while I do that. show more ...
# 7ff3a9ac	15-Nov-2024	Alex Bradbury <asb@igalia.com>	[IR] Initial introduction of llvm.experimental.memset_pattern (#97583) Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't [IR] Initial introduction of llvm.experimental.memset_pattern (#97583) Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc). show more ...
Revision tags: llvmorg-19.1.3
# 4c697f70	22-Oct-2024	Fabian Ritter <fabian.ritter@amd.com>	[LowerMemIntrinsics] Use i8 GEPs in memcpy/memmove lowering (#112707) The IR lowering of memcpy/memmove intrinsics uses a target-specific type for its load/store operations. So far, the loaded and [LowerMemIntrinsics] Use i8 GEPs in memcpy/memmove lowering (#112707) The IR lowering of memcpy/memmove intrinsics uses a target-specific type for its load/store operations. So far, the loaded and stored addresses are computed with GEPs based on this type. That is wrong if the allocation size of the type differs from its store size: The width of the accesses is determined by the store size, while the GEP stride is determined by the allocation size. If the allocation size is greater than the store size, some bytes are not copied/moved. This patch changes the GEPs to use i8 addressing, with offsets based on the type's store size. The correctness of the lowering therefore no longer depends on the type's allocation size. This is in support of PR #112332, which allows adjusting the memcpy loop lowering type through a command line argument in the AMDGPU backend. show more ...
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# edf46f36	03-Aug-2024	Florian Hahn <flo@fhahn.com>	[SCEV] Use const SCEV * explicitly in more places. Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
# 9e462b7e	29-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	[LowerMemIntrinsics][NFC] Use Align in TTI::getMemcpyLoopLoweringType (#100984) ...and also in TTI::getMemcpyLoopResidualLoweringType.
Revision tags: llvmorg-19.1.0-rc1
# 92a06546	26-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	[LowerMemIntrinsics] Lower llvm.memmove to wide memory accesses (#100122) So far, the IR-level lowering of llvm.memmove intrinsics generates loops that copy each byte individually. This can be wast [LowerMemIntrinsics] Lower llvm.memmove to wide memory accesses (#100122) So far, the IR-level lowering of llvm.memmove intrinsics generates loops that copy each byte individually. This can be wasteful for targets that provide wider memory access operations. This patch makes the memmove lowering more similar to the lowering of memcpy with unknown length. TargetTransformInfo::getMemcpyLoopLoweringType() is queried for an adequate type for the memory accesses, and if it is wider than a single byte, the greatest multiple of the type's size that is less than or equal to the length is copied with corresponding wide memory accesses. A residual loop with byte-wise accesses (or a sequence of suitable memory accesses in case the length is statically known) is introduced for the remaining bytes. For memmove, this construct is required in two variants: one for copying forward and one for copying backwards, to handle overlapping memory ranges. For the backwards case, the residual code still covers the bytes at the end of the copied region and is therefore executed before the wide main loop. This implementation choice is based on the assumption that we are more likely to encounter memory ranges whose start aligns with the access width than ones whose end does. In microbenchmarks on gfx1030 (AMDGPU), this change yields speedups up to 16x for memmoves with variable or large constant lengths. Part of SWDEV-455845. show more ...
Revision tags: llvmorg-20-init
# cbc96b9e	12-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	Reapply "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98482) Reverts llvm/llvm-project#98295, which reverted llvm/llvm-project#97998 The failure in the Reapply "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98482) Reverts llvm/llvm-project#98295, which reverted llvm/llvm-project#97998 The failure in the "InOneWeekend" test of the HIP test suite on clang-hip-vega20 (https://lab.llvm.org/buildbot/#/builders/123/builds/1498) seems to be unrelated; I observed it (and a similar failure for the "TheNextWeek" test in the same suite) intermittently on my system, with and without the patch applied. (It occurred in 2 out of 50 repeated runs without the patch and in 1 out of 50 runs with the patch.) show more ...
# 17316a59	10-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	Revert "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98295) Reverts llvm/llvm-project#97998 This seems to cause a buildbot failure on clang-hip-vega20, in Revert "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98295) Reverts llvm/llvm-project#97998 This seems to cause a buildbot failure on clang-hip-vega20, in the HIP test-suite, need to investigate. show more ...
# 6c84bba2	10-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy (#97998) Memcpy intrinsics with statically unknown loop sizes are lowered with two load/store loops: one with ac [LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy (#97998) Memcpy intrinsics with statically unknown loop sizes are lowered with two load/store loops: one with access widths specified by the target, and a residual loop that copies remaining bytes individually. As the residual loop operates byte-wise, its accesses are only 1-aligned. However, we currently use the alignment that is optimal for the first loop in both, which is unsound. With this patch, we use the correct alignment in the residual loop. The lowering of memcpy with a static size already handles alignments for the residual correctly. show more ...
# d37e7ec2	03-Jul-2024	Fabian Ritter <fabian.ritter@amd.com>	[LowerMemIntrinsics] Respect the volatile argument of llvm.memmove (#97545) So far, we ignored if a memmove intrinsic is volatile when lowering it to loops in the IR. This change generates volatile [LowerMemIntrinsics] Respect the volatile argument of llvm.memmove (#97545) So far, we ignored if a memmove intrinsic is volatile when lowering it to loops in the IR. This change generates volatile loads and stores in this case (similar to how memcpy is handled) and adds tests for volatile memmoves and memcpys. show more ...
# 9df71d76	28-Jun-2024	Nikita Popov <npopov@redhat.com>	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, re [IR] Add getDataLayout() helpers to Function and GlobalValue (#96919) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern. show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 6b62a913	04-Mar-2024	Jeremy Morse <jeremy.morse@sony.com>	[RemoveDIs] Reapply 3fda50d3915, insert instructions using iterators I'd reverted this in 6c7805d5d1 after a bad stage. Original commit messsage follows: [NFC][RemoveDIs] Bulk update utilities to i [RemoveDIs] Reapply 3fda50d3915, insert instructions using iterators I'd reverted this in 6c7805d5d1 after a bad stage. Original commit messsage follows: [NFC][RemoveDIs] Bulk update utilities to insert with iterators As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations. All of this should be NFC. show more ...
# 6c7805d5	29-Feb-2024	Jeremy Morse <jeremy.morse@sony.com>	Revert "[NFC][RemoveDIs] Bulk update utilities to insert with iterators" This reverts commit 3fda50d3915b2163a54a37b602be7783a89dd808. Apparently I've missed a hunk while staging this; will back ou Revert "[NFC][RemoveDIs] Bulk update utilities to insert with iterators" This reverts commit 3fda50d3915b2163a54a37b602be7783a89dd808. Apparently I've missed a hunk while staging this; will back out for now. Picked up here: https://lab.llvm.org/buildbot/#/builders/139/builds/60429/steps/6/logs/stdio show more ...
# 3fda50d3	29-Feb-2024	Jeremy Morse <jeremy.morse@sony.com>	[NFC][RemoveDIs] Bulk update utilities to insert with iterators As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carr [NFC][RemoveDIs] Bulk update utilities to insert with iterators As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations. All of this should be NFC. show more ...
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# 1e36d92b	12-Feb-2024	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[LowerMemIntrinsics] Avoid udiv/urem when type size is a power of 2 (#81238) See #64620 - does not fix the issue but improves the generated code a bit.
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6
# f432a004	15-Nov-2023	Youngsuk Kim <youngsuk.kim@hpe.com>	[llvm] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque ptr cleanup effort (NFC).
Revision tags: llvmorg-17.0.5, llvmorg-17.0.4
# 4c60c0cb	25-Oct-2023	Youngsuk Kim <youngsuk.kim@hpe.com>	[LowerMemIntrinsics] Remove no-op ptr-to-ptr bitcasts (NFC) Remove ptr-to-ptr bitcasts, which are unnecessary with opaque pointers enabled. Opaque pointer clean-up effort. NFC.
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6
# 06962403	10-Jun-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	LowerMemIntrinsics: Check address space aliasing for memmove expansion For cases where we cannot insert an addrspacecast, we can still expand like a memcpy if we know the address spaces cannot alias LowerMemIntrinsics: Check address space aliasing for memmove expansion For cases where we cannot insert an addrspacecast, we can still expand like a memcpy if we know the address spaces cannot alias. Normally non-aliasing memmoves are optimized to memcpy, but we cannot rely on that for lowering. If a target has aliasing address spaces that cannot be casted between, we still have to give up lowering this. show more ...
# ee19fabc	10-Jun-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	LowerMemIntrinsics: Handle inserting addrspacecast for memmove lowering We're missing a trivial non-AA way to check for non-aliasing address spaces.
# 6d2e5c34	10-Jun-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	LowerMemIntrinsics: Skip memmove with different address spaces This is a quick fix for an assert when the source and dest have different address spaces. The pointer compare needs to have matching ty LowerMemIntrinsics: Skip memmove with different address spaces This is a quick fix for an assert when the source and dest have different address spaces. The pointer compare needs to have matching types, but we can't generically introduce addrspacecast and we don't know if the address spaces alias. show more ...
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 86fe4dfd	02-Dec-2022	Krzysztof Parzyszek <kparzysz@quicinc.com>	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".
# 4e12d183	02-Dec-2022	Krzysztof Parzyszek <kparzysz@quicinc.com>	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.
# b8371124	02-Dec-2022	Krzysztof Parzyszek <kparzysz@quicinc.com>	TargetTransformInfo: convert Optional to std::optional
12