TargetLoweringBase.cpp - OpenGrok history log for /llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-16.0.0-rc3
# 7e6e636f	16-Feb-2023	Kazu Hirata <kazu@google.com>	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.
# 08533f8b	14-Feb-2023	Jake Egan <jakeegan10@gmail.com>	Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.ll Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f. show more ...
# c5085c91	14-Feb-2023	Jay Foad <jay.foad@amd.com>	[CodeGen] Trivial simplification of some getRegisterType calls. NFC.
# 4c722668	09-Feb-2023	Alex Richardson <alexrichardson@google.com>	Fix call to deprecated API in bd87a2449da0c82e63cebdf9c131c54a5472e3a7
# bd87a244	09-Feb-2023	Alex Richardson <alexrichardson@google.com>	[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can imp [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282 show more ...
Revision tags: llvmorg-16.0.0-rc2
# 62c7f035	07-Feb-2023	Archibald Elliott <archibald.elliott@arm.com>	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
# 526966d0	29-Jan-2023	Kazu Hirata <kazu@google.com>	Use llvm::bit_ceil (NFC) Note that: std::has_single_bit(X) ? X : llvm::NextPowerOf2(X); is equivalent to: std::bit_ceil(X) even for input 0.
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# e70ae0f4	07-Jan-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable This was incorrectly setting dereferenceable on unaligned operands. getLoadMemOperandFlags does the alignment dereferenceabilty chec DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable This was incorrectly setting dereferenceable on unaligned operands. getLoadMemOperandFlags does the alignment dereferenceabilty check without alignment, and then both paths went on to check isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check isDereferenceableAndAlignedPointer, and remove the second call. show more ...
# 48f5d77e	11-Jan-2023	Guillaume Chatelet <gchatelet@google.com>	[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.
# 16facf1c	31-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do [DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do have some support to reconstruct scalarized loads, we clearly don't catch everything. The comment for the affected AArch64 store suggests that having two stores was the desired outcome in the first place. This was showing as a source of many regressions with more aggressive ZERO_EXTEND_VECTOR_INREG recognition. show more ...
# 603e8490	30-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[NFC][TLI] Move `isLoadBitCastBeneficial()` implementation into source file ... so any change to it does not cause 700 source files to be recompiled.
# 89f36dd8	01-Dec-2022	Freddy Ye <freddy.ye@intel.com>	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementa [X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241 show more ...
Revision tags: llvmorg-15.0.6
# b39b76f2	22-Nov-2022	Phoebe Wang <phoebe.wang@intel.com>	[X86] Allow no X87 on 32-bit This patch is an alternative of D100091. It solved the problems in `f80` type lowering. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D137946
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# bcaf31ec	21-Apr-2022	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fas [AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217 show more ...
# 1121eca6	16-Sep-2022	Craig Topper <craig.topper@sifive.com>	[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB. I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support th [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB. I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support them and the tests were expecting them to fail a specific way. Using Expand caused them to fail differently. Seemed better to emulate them using operations that are supported. @simoll mentioned on Discord that VE has some expansion downstream. Not sure if its done like this or in the VE target. Reviewed By: frasercrmck, efocht Differential Revision: https://reviews.llvm.org/D133514 show more ...
# c1502425	12-Sep-2022	Matthias Gehre <matthias.gehre@xilinx.com>	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetL Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691 show more ...
# 5e96cea1	07-Sep-2022	Joe Loser <joeloser@fastmail.com>	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpf [llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429 show more ...
# c349d7f4	02-Sep-2022	Benjamin Kramer <benny.kra@googlemail.com>	[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207 show more ...
# 7ed3d813	17-Aug-2022	Daniil Fukalov <1671137+dfukalov@users.noreply.github.com>	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723 show more ...
# de9d80c1	08-Aug-2022	Fangrui Song <i@maskray.me>	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.
# 65246d3e	27-Jul-2022	Amara Emerson <amara@apple.com>	Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().
# 19cdd190	26-Jul-2022	Amara Emerson <amara@apple.com>	[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents [AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks. Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D130554 show more ...
# 9e6d1f4b	17-Jul-2022	Kazu Hirata <kazu@google.com>	[CodeGen] Qualify auto variables in for loops (NFC)
# ac2ad3b7	15-Jun-2022	Paul Robinson <paul.robinson@sony.com>	[PS5] Support sin+cos->sincos optimization
# 8bc0bb95	07-Jun-2022	Benjamin Kramer <benny.kra@googlemail.com>	Add a conversion from double to bf16 This introduces a new compiler-rt function `__truncdfbf2`.
1 2 345 6 7 8 9 10 >>...20