Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
#
6bda14b3 |
| 06-Jun-2017 |
Chandler Carruth <chandlerc@gmail.com> |
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
show more ...
|
Revision tags: llvmorg-4.0.1-rc2 |
|
#
7bf27f03 |
| 25-May-2017 |
Oren Ben Simhon <oren.ben.simhon@intel.com> |
[X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instr
[X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
show more ...
|
#
a1b2db79 |
| 19-May-2017 |
Daniel Sanders <daniel_l_sanders@apple.com> |
[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.
Summary: This causes them to be re-computed more often than necessary but resolves objections that were
[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.
Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750.
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32861
llvm-svn: 303418
show more ...
|
#
2ea271b5 |
| 18-May-2017 |
Lama Saba <lama.saba@intel.com> |
[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instructi
[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277
llvm-svn: 303333
show more ...
|
#
51de0330 |
| 04-May-2017 |
Oren Ben Simhon <oren.ben.simhon@intel.com> |
[X86] Disabling PLT in Regcall CC Functions
According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent
[X86] Disabling PLT in Regcall CC Functions
According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall.
Differential Revision: https://reviews.llvm.org/D32430
llvm-svn: 302124
show more ...
|
#
99b925bd |
| 03-May-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][LWP] Add llvm support for LWP instructions (reapplied).
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1
[X86][LWP] Add llvm support for LWP instructions (reapplied).
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Reapplied - this time without changing line endings of existing files.
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302041
show more ...
|
#
a271c543 |
| 03-May-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Revert rL302028 due to accidental line ending changes.
llvm-svn: 302038
|
#
b2e0464f |
| 03-May-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][LWP] Add llvm support for LWP instructions.
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
[X86][LWP] Add llvm support for LWP instructions.
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302028
show more ...
|
#
9bb6931c |
| 01-May-2017 |
Tim Northover <tnorthover@apple.com> |
X86: initialize a few subtarget variables.
Otherwise an indeterminate value gets read, causing a bunch of UBSan failures.
llvm-svn: 301819
|
#
e9fdba39 |
| 29-Apr-2017 |
Daniel Sanders <daniel_l_sanders@apple.com> |
[globalisel][tablegen] Compute available feature bits correctly.
Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (R
[globalisel][tablegen] Compute available feature bits correctly.
Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl().
Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes.
Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32491
llvm-svn: 301750
show more ...
|
Revision tags: llvmorg-4.0.1-rc1 |
|
#
203fc177 |
| 21-Apr-2017 |
Clement Courbet <courbet@google.com> |
Rename FastString flag.
llvm-svn: 300959
|
#
1ce3b82d |
| 21-Apr-2017 |
Clement Courbet <courbet@google.com> |
X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies when the subtarget has fast strings.
This has two advantages: - Speed is improved. For example, on Haswell thoughput improvemen
X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies when the subtarget has fast strings.
This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries).
llvm-svn: 300957
show more ...
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4 |
|
#
4f977517 |
| 03-Mar-2017 |
Amjad Aboud <amjad.aboud@intel.com> |
[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
ll
[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
show more ...
|
Revision tags: llvmorg-4.0.0-rc3 |
|
#
d88389aa |
| 21-Feb-2017 |
Craig Topper <craig.topper@gmail.com> |
[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement
[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Reviewers: zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30181
llvm-svn: 295697
show more ...
|
#
3cac7635 |
| 09-Feb-2017 |
Craig Topper <craig.topper@gmail.com> |
[X86] Remove the HLE feature flag.
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
[X86] Remove the HLE feature flag.
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
llvm-svn: 294562
show more ...
|
#
50f3d145 |
| 09-Feb-2017 |
Craig Topper <craig.topper@gmail.com> |
[X86] Clzero intrinsic and its addition under znver1
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpu
[X86] Clzero intrinsic and its addition under znver1
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test.
Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.
Differential revision: https://reviews.llvm.org/D29385
llvm-svn: 294558
show more ...
|
Revision tags: llvmorg-4.0.0-rc2 |
|
#
fbd13c5c |
| 02-Feb-2017 |
Eugene Zelenko <eugene.zelenko@gmail.com> |
[X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293949
|
#
dc5e5836 |
| 02-Feb-2017 |
Peter Collingbourne <peter@pcc.me.uk> |
X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).
Differential Revision: https://reviews.llvm.org/D28689
llvm-svn: 293844
|
Revision tags: llvmorg-4.0.0-rc1 |
|
#
270dd41f |
| 17-Jan-2017 |
Joerg Sonnenberger <joerg@bec.de> |
Remove an overeager assert from r288844.
llvm-svn: 292244
|
#
235c275b |
| 08-Dec-2016 |
Peter Collingbourne <peter@pcc.me.uk> |
IR, X86: Understand !absolute_symbol metadata on global variables.
Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies
IR, X86: Understand !absolute_symbol metadata on global variables.
Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal.
As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html
Differential Revision: https://reviews.llvm.org/D25878
llvm-svn: 289087
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3 |
|
#
8bc7e4da |
| 06-Dec-2016 |
Zvi Rackover <zvi.rackover@intel.com> |
[X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference M
[X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].
Fixes pr31202.
Analysis of this issue was done by Fahana Aleen.
Reviewers: wmi, delena, mkuper
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27203
llvm-svn: 288844
show more ...
|
Revision tags: llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
76dbf265 |
| 15-Nov-2016 |
Zvi Rackover <zvi.rackover@intel.com> |
[X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void
[X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573.
Reviewers: ab, qcolombet, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26593
llvm-svn: 286935
show more ...
|
#
b6d652ad |
| 14-Oct-2016 |
Pierre Gousseau <pierregousseau14@gmail.com> |
[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))
[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
show more ...
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2 |
|
#
f679530b |
| 04-Aug-2016 |
Nikolai Bozhenov <nikolai.bozhenov@intel.com> |
[X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch intro
[X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation.
The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
show more ...
|
Revision tags: llvmorg-3.9.0-rc1 |
|
#
db6bd021 |
| 30-Jun-2016 |
Rafael Espindola <rafael.espindola@gmail.com> |
Delete unused includes. NFC.
llvm-svn: 274225
|