History log of /llvm-project/llvm/lib/Target/X86/X86Subtarget.cpp (Results 126 – 150 of 516)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3
# 6bda14b3 06-Jun-2017 Chandler Carruth <chandlerc@gmail.com>

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787

show more ...


Revision tags: llvmorg-4.0.1-rc2
# 7bf27f03 25-May-2017 Oren Ben Simhon <oren.ben.simhon@intel.com>

[X86] Adding vpopcntd and vpopcntq instructions

AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instr

[X86] Adding vpopcntd and vpopcntq instructions

AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858

show more ...


# a1b2db79 19-May-2017 Daniel Sanders <daniel_l_sanders@apple.com>

[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.

Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were

[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.

Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls

Reviewed By: qcolombet

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32861

llvm-svn: 303418

show more ...


# 2ea271b5 18-May-2017 Lama Saba <lama.saba@intel.com>

[X86] Replace slow LEA instructions in X86

According to Intel's Optimization Reference Manual for SNB+:
" For LEA instructions with three source operands and some specific situations, instructi

[X86] Replace slow LEA instructions in X86

According to Intel's Optimization Reference Manual for SNB+:
" For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
dispatch via port 1:
- LEA that has all three source operands: base, index, and offset
- LEA that uses base and index registers where the base is EBP, RBP,or R13
- LEA that uses RIP relative addressing mode
- LEA that uses 16-bit addressing mode "
This patch currently handles the first 2 cases only.

Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303333

show more ...


# 51de0330 04-May-2017 Oren Ben Simhon <oren.ben.simhon@intel.com>

[X86] Disabling PLT in Regcall CC Functions

According to psABI, PLT stub clobbers XMM8-XMM15.
In Regcall calling convention those registers are used for passing parameters.
Thus we need to prevent

[X86] Disabling PLT in Regcall CC Functions

According to psABI, PLT stub clobbers XMM8-XMM15.
In Regcall calling convention those registers are used for passing parameters.
Thus we need to prevent lazy binding in Regcall.

Differential Revision: https://reviews.llvm.org/D32430

llvm-svn: 302124

show more ...


# 99b925bd 03-May-2017 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][LWP] Add llvm support for LWP instructions (reapplied).

This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1

[X86][LWP] Add llvm support for LWP instructions (reapplied).

This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Reapplied - this time without changing line endings of existing files.

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302041

show more ...


# a271c543 03-May-2017 Simon Pilgrim <llvm-dev@redking.me.uk>

Revert rL302028 due to accidental line ending changes.

llvm-svn: 302038


# b2e0464f 03-May-2017 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][LWP] Add llvm support for LWP instructions.

This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

[X86][LWP] Add llvm support for LWP instructions.

This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302028

show more ...


# 9bb6931c 01-May-2017 Tim Northover <tnorthover@apple.com>

X86: initialize a few subtarget variables.

Otherwise an indeterminate value gets read, causing a bunch of UBSan failures.

llvm-svn: 301819


# e9fdba39 29-Apr-2017 Daniel Sanders <daniel_l_sanders@apple.com>

[globalisel][tablegen] Compute available feature bits correctly.

Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (R

[globalisel][tablegen] Compute available feature bits correctly.

Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().

Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.

Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32491

llvm-svn: 301750

show more ...


Revision tags: llvmorg-4.0.1-rc1
# 203fc177 21-Apr-2017 Clement Courbet <courbet@google.com>

Rename FastString flag.

llvm-svn: 300959


# 1ce3b82d 21-Apr-2017 Clement Courbet <courbet@google.com>

X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies
when the subtarget has fast strings.

This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvemen

X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies
when the subtarget has fast strings.

This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvements increase
linearly with size from 256 to 512 bytes, after which they plateau:
(e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
- Code is much smaller (no need to handle boundaries).

llvm-svn: 300957

show more ...


Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4
# 4f977517 03-Mar-2017 Amjad Aboud <amjad.aboud@intel.com>

[X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

ll

[X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859

show more ...


Revision tags: llvmorg-4.0.0-rc3
# d88389aa 21-Feb-2017 Craig Topper <craig.topper@gmail.com>

[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs

Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement

[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs

Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.

This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.

Reviewers: zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30181

llvm-svn: 295697

show more ...


# 3cac7635 09-Feb-2017 Craig Topper <craig.topper@gmail.com>

[X86] Remove the HLE feature flag.

We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.

[X86] Remove the HLE feature flag.

We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.

llvm-svn: 294562

show more ...


# 50f3d145 09-Feb-2017 Craig Topper <craig.topper@gmail.com>

[X86] Clzero intrinsic and its addition under znver1

This patch does the following.

1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpu

[X86] Clzero intrinsic and its addition under znver1

This patch does the following.

1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.

Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.

Differential revision: https://reviews.llvm.org/D29385

llvm-svn: 294558

show more ...


Revision tags: llvmorg-4.0.0-rc2
# fbd13c5c 02-Feb-2017 Eugene Zelenko <eugene.zelenko@gmail.com>

[X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 293949


# dc5e5836 02-Feb-2017 Peter Collingbourne <peter@pcc.me.uk>

X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).

Differential Revision: https://reviews.llvm.org/D28689

llvm-svn: 293844


Revision tags: llvmorg-4.0.0-rc1
# 270dd41f 17-Jan-2017 Joerg Sonnenberger <joerg@bec.de>

Remove an overeager assert from r288844.

llvm-svn: 292244


# 235c275b 08-Dec-2016 Peter Collingbourne <peter@pcc.me.uk>

IR, X86: Understand !absolute_symbol metadata on global variables.

Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies

IR, X86: Understand !absolute_symbol metadata on global variables.

Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087

show more ...


Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3
# 8bc7e4da 06-Dec-2016 Zvi Rackover <zvi.rackover@intel.com>

[X86] Prefer reduced width multiplication over pmulld on Silvermont

Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference M

[X86] Prefer reduced width multiplication over pmulld on Silvermont

Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Reviewers: wmi, delena, mkuper

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D27203

llvm-svn: 288844

show more ...


Revision tags: llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1
# 76dbf265 15-Nov-2016 Zvi Rackover <zvi.rackover@intel.com>

[X86][GlobalISel] Add minimal call lowering support to the IRTranslator

Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void

[X86][GlobalISel] Add minimal call lowering support to the IRTranslator

Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935

show more ...


# b6d652ad 14-Oct-2016 Pierre Gousseau <pierregousseau14@gmail.com>

[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.

This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))

[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.

This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248

show more ...


Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2
# f679530b 04-Aug-2016 Nikolai Bozhenov <nikolai.bozhenov@intel.com>

[X86] Heuristic to selectively build Newton-Raphson SQRT estimation

On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch intro

[X86] Heuristic to selectively build Newton-Raphson SQRT estimation

On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725

show more ...


Revision tags: llvmorg-3.9.0-rc1
# db6bd021 30-Jun-2016 Rafael Espindola <rafael.espindola@gmail.com>

Delete unused includes. NFC.

llvm-svn: 274225


12345678910>>...21