History log of /llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp (Results 76 – 100 of 500)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-16.0.0-rc3
# 7e6e636f 16-Feb-2023 Kazu Hirata <kazu@google.com>

Use llvm::has_single_bit<uint32_t> (NFC)

This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.


# 08533f8b 14-Feb-2023 Jake Egan <jakeegan10@gmail.com>

Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"

These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate.
https://lab.ll

Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"

These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate.
https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio

This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.

show more ...


# c5085c91 14-Feb-2023 Jay Foad <jay.foad@amd.com>

[CodeGen] Trivial simplification of some getRegisterType calls. NFC.


# 4c722668 09-Feb-2023 Alex Richardson <alexrichardson@google.com>

Fix call to deprecated API in bd87a2449da0c82e63cebdf9c131c54a5472e3a7


# bd87a244 09-Feb-2023 Alex Richardson <alexrichardson@google.com>

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation

This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can imp

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation

This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134282

show more ...


Revision tags: llvmorg-16.0.0-rc2
# 62c7f035 07-Feb-2023 Archibald Elliott <archibald.elliott@arm.com>

[NFC][TargetParser] Remove llvm/ADT/Triple.h

I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.


# 526966d0 29-Jan-2023 Kazu Hirata <kazu@google.com>

Use llvm::bit_ceil (NFC)

Note that:

std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

std::bit_ceil(X)

even for input 0.


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# e70ae0f4 07-Jan-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable

This was incorrectly setting dereferenceable on unaligned
operands. getLoadMemOperandFlags does the alignment dereferenceabilty
chec

DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable

This was incorrectly setting dereferenceable on unaligned
operands. getLoadMemOperandFlags does the alignment dereferenceabilty
check without alignment, and then both paths went on to check
isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check
isDereferenceableAndAlignedPointer, and remove the second call.

show more ...


# 48f5d77e 11-Jan-2023 Guillaume Chatelet <gchatelet@google.com>

[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()

This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.


# 16facf1c 31-Dec-2022 Roman Lebedev <lebedev.ri@gmail.com>

[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector

Single-element vectors are legalized by splitting,
so the the memory operations would also get scalarized.
While we do

[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector

Single-element vectors are legalized by splitting,
so the the memory operations would also get scalarized.
While we do have some support to reconstruct scalarized loads,
we clearly don't catch everything.

The comment for the affected AArch64 store suggests that
having two stores was the desired outcome in the first place.

This was showing as a source of *many* regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.

show more ...


# 603e8490 30-Dec-2022 Roman Lebedev <lebedev.ri@gmail.com>

[NFC][TLI] Move `isLoadBitCastBeneficial()` implementation into source file

... so any change to it does not cause 700 source files to be recompiled.


# 89f36dd8 01-Dec-2022 Freddy Ye <freddy.ye@intel.com>

[X86] Add ExpandLargeFpConvert Pass and enable for X86

As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementa

[X86] Add ExpandLargeFpConvert Pass and enable for X86

As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241

show more ...


Revision tags: llvmorg-15.0.6
# b39b76f2 22-Nov-2022 Phoebe Wang <phoebe.wang@intel.com>

[X86] Allow no X87 on 32-bit

This patch is an alternative of D100091. It solved the problems in `f80` type lowering.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D137946


Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# bcaf31ec 21-Apr-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Allow finer grain control of an unaligned access speed

A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fas

[AMDGPU] Allow finer grain control of an unaligned access speed

A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217

show more ...


# 1121eca6 16-Sep-2022 Craig Topper <craig.topper@sifive.com>

[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.

I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support th

[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.

I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514

show more ...


# c1502425 12-Sep-2022 Matthias Gehre <matthias.gehre@xilinx.com>

Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth

Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetL

Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth

Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691

show more ...


# 5e96cea1 07-Sep-2022 Joe Loser <joeloser@fastmail.com>

[llvm] Use std::size instead of llvm::array_lengthof

LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpf

[llvm] Use std::size instead of llvm::array_lengthof

LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429

show more ...


# c349d7f4 02-Sep-2022 Benjamin Kramer <benny.kra@googlemail.com>

[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path

The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16

[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path

The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207

show more ...


# 7ed3d813 17-Aug-2022 Daniil Fukalov <1671137+dfukalov@users.noreply.github.com>

[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.

TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all

[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.

TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723

show more ...


# de9d80c1 08-Aug-2022 Fangrui Song <i@maskray.me>

[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC

With C++17 there is no Clang pedantic warning or MSVC C5051.


# 65246d3e 27-Jul-2022 Amara Emerson <amara@apple.com>

Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().


# 19cdd190 26-Jul-2022 Amara Emerson <amara@apple.com>

[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.

This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents

[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.

This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.

Code size savings on CTMark -Os:
Program size.__text
before after diff
ClamAV/clamscan 381940.00 382052.00 0.0%
lencod/lencod 428408.00 428428.00 0.0%
SPASS/SPASS 411868.00 411876.00 0.0%
kimwitu++/kc 449944.00 449944.00 0.0%
Bullet/bullet 463588.00 463556.00 -0.0%
sqlite3/sqlite3 284696.00 284668.00 -0.0%
consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0%
7zip/7zip-benchmark 595244.00 594972.00 -0.0%
mafft/pairlocalalign 247512.00 247368.00 -0.1%
tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2%
Geomean difference -0.0%

Differential Revision: https://reviews.llvm.org/D130554

show more ...


# 9e6d1f4b 17-Jul-2022 Kazu Hirata <kazu@google.com>

[CodeGen] Qualify auto variables in for loops (NFC)


# ac2ad3b7 15-Jun-2022 Paul Robinson <paul.robinson@sony.com>

[PS5] Support sin+cos->sincos optimization


# 8bc0bb95 07-Jun-2022 Benjamin Kramer <benny.kra@googlemail.com>

Add a conversion from double to bf16

This introduces a new compiler-rt function `__truncdfbf2`.


12345678910>>...20