History log of /llvm-project/llvm/lib/Support/Parallel.cpp (Results 1 – 25 of 36)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# b84d773f 21-Sep-2024 Fangrui Song <i@maskray.me>

[Parallel] Revert sequential task changes

https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
`execute` and the old `spawn` without argument. However, sequential
tasks might be ex

[Parallel] Revert sequential task changes

https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
`execute` and the old `spawn` without argument. However, sequential
tasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc (see
https://reviews.llvm.org/D133003).

In addition, the extra member variables have overhead.
This sequential task has only been used for lld parallel relocation
scanning.

This patch restores the behavior before https://reviews.llvm.org/D148728 .

Fix #105958

Pull Request: https://github.com/llvm/llvm-project/pull/109084

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# a3cba6b5 14-Aug-2024 Fangrui Song <i@maskray.me>

[Support] ThreadPoolExecutor: remove unused default argument


# 9487cf97 14-Aug-2024 Fangrui Song <i@maskray.me>

[Support] Restrict ManagedStatic ThreadPoolExecutor to Windows

https://reviews.llvm.org/D70447 switched to `ManagedStatic` to work
around race conditions in MSVC runtimes and the MinGW runtime.
Howe

[Support] Restrict ManagedStatic ThreadPoolExecutor to Windows

https://reviews.llvm.org/D70447 switched to `ManagedStatic` to work
around race conditions in MSVC runtimes and the MinGW runtime.
However, `ManagedStatic` is not suitable for other platforms.

However, this workaround is not suitable for other platforms (#66974).
lld::fatal calls exitLld(1), which calls `llvm_shutdown` to destroy
`ManagedStatic` objects. The worker threads will finish and invoke TLS
destructors (glibc `__call_tls_dtors`). This can lead to race conditions
if other threads attempt to access TLS objects that have already been
destroyed.

While lld's early exit mechanism needs more work, I believe Parallel.cpp
should avoid this pitfall as well.

Pull Request: https://github.com/llvm/llvm-project/pull/102989

show more ...


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# 62624a4b 12-Apr-2024 Harrison,Hao <57025411+TSWorld1314@users.noreply.github.com>

[Support] Fix the issue where the character being saved in Unicode causes a warning to be treated as an error in Visual Studio 2022. (#88513)

Fix the issue where the character being saved in Unicode

[Support] Fix the issue where the character being saved in Unicode causes a warning to be treated as an error in Visual Studio 2022. (#88513)

Fix the issue where the character being saved in Unicode causes a
warning to be treated as an error in Visual Studio 2022.

![image](https://github.com/llvm/llvm-project/assets/57025411/07353525-6520-4b74-b4f5-5b3f5afc47e1)

show more ...


Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4
# 6ab43f9b 04-May-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

[Support] Add PerThreadBumpPtrAllocator class.

PerThreadBumpPtrAllocator allows separating allocations by thread id.
That makes allocations race free. It is possible because
ThreadPoolExecutor class

[Support] Add PerThreadBumpPtrAllocator class.

PerThreadBumpPtrAllocator allows separating allocations by thread id.
That makes allocations race free. It is possible because
ThreadPoolExecutor class creates threads, keeps them until
the destructor of ThreadPoolExecutor is called, and assigns ids
to the threads. Thus PerThreadBumpPtrAllocator should be used with only
threads created by ThreadPoolExecutor. This allocator is useful when
thread safe BumpPtrAllocator is needed.

Reviewed By: MaskRay, dexonsmith, andrewng

Differential Revision: https://reviews.llvm.org/D142318

show more ...


# 629177a8 04-May-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

Fix test bot breakage from 06b617064a997574df409c7d846b6f6b492f5124

This addresses the issue found by: https://lab.llvm.org/buildbot#builders/178/builds/4571


Revision tags: llvmorg-16.0.3
# 06b61706 21-Apr-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

[Support][Parallel] Change check for nested TaskGroups.

This patch changes check for nested TaskGroups so that it allows
parallel execution for TaskGroups. Following pattern would not work
parallell

[Support][Parallel] Change check for nested TaskGroups.

This patch changes check for nested TaskGroups so that it allows
parallel execution for TaskGroups. Following pattern would not work
parallelly with current check:

std::function<void()> Fn = [&]() {
parallel::TaskGroup tg;

tg.spawn([&]() { });
};

ThreadPool Pool;

Pool.async(Fn);
Pool.async(Fn);

Pool.wait();

One of the TaskGroup would work sequentially as current check
verifies overall number of TaskGroup. Two not nested
TaskGroups can work parallelly but current check prevents this.

Also this patch avoids parallel mode for TaskGroup
in parallel::strategy.ThreadsRequested == 1 case.

This patch is a followup of discussion from D142318

Differential Revision: https://reviews.llvm.org/D148984

show more ...


# 85c2768c 20-Apr-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

[Support][Parallel] Initialize threadIndex and add assertion checking its usage.

That patch adds a check for threadIndex being used with only threads
created by ThreadPoolExecutor. This helps catch

[Support][Parallel] Initialize threadIndex and add assertion checking its usage.

That patch adds a check for threadIndex being used with only threads
created by ThreadPoolExecutor. This helps catch two types of errors:

1. If a thread is created not by ThreadPoolExecutor its index may clash
with the index of another thread. Using threadIndex, in that case, may
lead to a data race.

2. Index of the main thread(threadIndex == 0) currently clashes with
the index of thread0 in ThreadPoolExecutor threads. That may lead
to a data race if main thread and thread0 are executed concurrently.

This patch allows execution tasks on the main thread only in case
parallel::strategy.ThreadsRequested == 1. In all other cases,
assertions check that threadIndex != UINT_MAX(i.e. that task
is executed on a thread created by ThreadPoolExecutor).

Differential Revision: https://reviews.llvm.org/D148916

show more ...


Revision tags: llvmorg-16.0.2
# fea8c073 18-Apr-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

[Support][Parallel] Add sequential mode to TaskGroup::spawn().

This patch allows to specify that some part of tasks should be
done in sequential order. It makes it possible to not use
condition oper

[Support][Parallel] Add sequential mode to TaskGroup::spawn().

This patch allows to specify that some part of tasks should be
done in sequential order. It makes it possible to not use
condition operator for separating sequential tasks:

TaskGroup tg;
for () {
if(condition) ==> tg.spawn([](){fn();}, condition)
fn();
else
tg.spawn([](){fn();});
}

It also prevents execution on main thread. Which allows adding
checks for getThreadIndex() function discussed in D142318.

The patch also replaces std::stack with std::deque in the
ThreadPoolExecutor to have natural execution order in case
(parallel::strategy.ThreadsRequested == 1).

Differential Revision: https://reviews.llvm.org/D148728

show more ...


Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4
# 1ddc41c3 22-Feb-2023 Vitaly Buka <vitalybuka@google.com>

[clang-format] Fix format of my last patch


Revision tags: llvmorg-16.0.0-rc3
# 1b43650c 22-Feb-2023 Vitaly Buka <vitalybuka@google.com>

[Support] Fix data race with “safe libc++”

Otherwise NFC.


Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 10a796a0 24-Jan-2023 Alexey Lapshin <a.v.lapshin@mail.ru>

[Support] Avoid using main thread for llvm::parallelFor().

The llvm::parallelFor() uses threads created by ThreadPoolExecutor as well as main thread.
The index for the main thread matches with the i

[Support] Avoid using main thread for llvm::parallelFor().

The llvm::parallelFor() uses threads created by ThreadPoolExecutor as well as main thread.
The index for the main thread matches with the index for the first thread created by ThreadPoolExecutor.
It results in that getThreadIndex returns the same value for different threads.
To avoid thread index clashing - do not use main thread for llvm::parallelFor():

parallel::TaskGroup TG;
for (; Begin + TaskSize < End; Begin += TaskSize) {
TG.spawn([=, &Fn] {
for (size_t I = Begin, E = Begin + TaskSize; I != E; ++I)
Fn(I);
});
}
for (; Begin != End; ++Begin) <<<< executed by main thread.
Fn(Begin); <<<<
return; <<<<

Differential Revision: https://reviews.llvm.org/D142317

show more ...


Revision tags: llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1
# e280940b 13-Sep-2022 Martin Storsjö <martin@martin.st>

[Support] Access threadIndex via a wrapper function

On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a

[Support] Access threadIndex via a wrapper function

On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.

Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.

This fixes mingw dylib builds with native TLS after
e6aebff67426fa0f9779a0c19d6188a043bf15e7.

At the same time, move the whole thread local variable within
#if LLVM_ENABLE_THREADS
to fix builds without threading support.

Differential Revision: https://reviews.llvm.org/D133759

show more ...


# e6aebff6 12-Sep-2022 Fangrui Song <i@maskray.me>

[ELF] Parallelize relocation scanning

* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaD

[ELF] Parallelize relocation scanning

* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.

MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.

Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:

* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast

Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):

* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast

Reviewed By: andrewng

Differential Revision: https://reviews.llvm.org/D133003

show more ...


Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 3b4d8009 24-Aug-2022 Fangrui Song <i@maskray.me>

[ELF] Parallelize writes of different OutputSections

We currently process one OutputSection at a time and for each OutputSection
write contained input sections in parallel. This strategy does not le

[ELF] Parallelize writes of different OutputSections

We currently process one OutputSection at a time and for each OutputSection
write contained input sections in parallel. This strategy does not leverage
multi-threading well. Instead, parallelize writes of different OutputSections.

The default TaskSize for parallelFor often leads to inferior sharding. We
prepare the task in the caller instead.

* Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup
* Add llvm::parallel::TaskGroup::execute.
* Change writeSections to declare TaskGroup and pass it to writeTo.

Speed-up with --threads=8:

* clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast
* clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast
* chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast
* scylladb build/release: 1.09x as fast

On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast.

Differential Revision: https://reviews.llvm.org/D131247

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# 7effcbda 19-Jun-2022 Nico Weber <thakis@chromium.org>

Rename parallelForEachN to just parallelFor

Patch created by running:

rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'

No behavior change.

Differential Revision: ht

Rename parallelForEachN to just parallelFor

Patch created by running:

rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'

No behavior change.

Differential Revision: https://reviews.llvm.org/D128140

show more ...


Revision tags: llvmorg-14.0.5
# 4290ef54 25-May-2022 Andrew Ng <andrew.ng@sony.com>

[Support] Reduce allocations in parallelForEach with move

Differential Revision: https://reviews.llvm.org/D126458


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# 8e382ae9 23-Jan-2022 Fangrui Song <i@maskray.me>

[Support] Simplify parallelForEach{,N}

* Merge parallel_for_each into parallelForEach (this removes 1 `Fn(...)` call)
* Change parallelForEach to use parallelForEachN
* Move parallelForEachN into Pa

[Support] Simplify parallelForEach{,N}

* Merge parallel_for_each into parallelForEach (this removes 1 `Fn(...)` call)
* Change parallelForEach to use parallelForEachN
* Move parallelForEachN into Parallel.cpp

My x86-64 `lld` executable is 100KiB smaller.
No noticeable difference in performance.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D117510

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 7b25fa8c 18-Sep-2021 Alexandre Ganea <alexandre.ganea@ubisoft.com>

[Support] Attempt to fix deadlock in ThreadGroup

This is an attempt to fix the situation described by https://reviews.llvm.org/D104207#2826290 and PR41508.
See sequence of operations leading to the

[Support] Attempt to fix deadlock in ThreadGroup

This is an attempt to fix the situation described by https://reviews.llvm.org/D104207#2826290 and PR41508.
See sequence of operations leading to the bug in https://reviews.llvm.org/D104207#3004689

We ensure that the Latch is completely "free" before decrementing the number of TaskGroupInstances.

Differential revision: https://reviews.llvm.org/D109914

show more ...


Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init
# 4137ab62 08-Jul-2020 Fangrui Song <maskray@google.com>

[Support] Define llvm::parallel::strategy for -DLLVM_ENABLE_THREADS=off builds after D76885


Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5
# eb4663d8 17-Mar-2020 Fangrui Song <maskray@google.com>

[lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}

--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and sever

[lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}

--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.

There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.

--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.

If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.

There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.

Reviewed By: rnk, aganea

Differential Revision: https://reviews.llvm.org/D76885

show more ...


Revision tags: llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# 8404aeb5 14-Feb-2020 Alexandre Ganea <alexandre.ganea@ubisoft.com>

[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups

The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, s

[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups

The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775

show more ...


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# 564481ae 16-Mar-2019 Andrew Ng <anng.sw@gmail.com>

[Support] ThreadPoolExecutor fixes for Windows/MinGW

Changed ThreadPoolExecutor to no longer use detached threads and instead
to join threads on destruction. This is to prevent intermittent crashing

[Support] ThreadPoolExecutor fixes for Windows/MinGW

Changed ThreadPoolExecutor to no longer use detached threads and instead
to join threads on destruction. This is to prevent intermittent crashing
on Windows when doing a normal full exit, e.g. via exit().

Changed ThreadPoolExecutor to be a ManagedStatic so that it can be
stopped on llvm_shutdown(). Without this, it would only be stopped in
the destructor when doing a full exit. This is required to avoid
intermittent crashing on Windows due to a race condition between the
ThreadPoolExecutor starting up threads and the process doing a fast
exit, e.g. via _exit().

The Windows crashes appear to only occur with the MSVC static runtimes
and are more frequent with the debug static runtime.

These changes also prevent intermittent deadlocks on exit with the MinGW
runtime.

Differential Revision: https://reviews.llvm.org/D70447

show more ...


# d4960032 10-Oct-2019 Nico Weber <nicolasweber@gmx.de>

win: Move Parallel.h off concrt to cross-platform code

r179397 added Parallel.h and implemented it terms of concrt in 2013.

In 2015, a cross-platform implementation of the functions has appeared
an

win: Move Parallel.h off concrt to cross-platform code

r179397 added Parallel.h and implemented it terms of concrt in 2013.

In 2015, a cross-platform implementation of the functions has appeared
and is in use everywhere but on Windows (r232419). r246219 hints that
<thread> had issues in MSVC2013, but r296906 suggests they've been fixed
now that we require 2015+.

So remove the concrt code. It's less code, and it sounds like concrt has
conceptual and performance issues, see PR41198.

I built blink_core.dll in a debug component build with full symbols and
in a release component build without any symbols. I couldn't measure a
performance difference for linking blink_core.dll before and after this
patch.

Differential Revision: https://reviews.llvm.org/D68820

llvm-svn: 374421

show more ...


# f6a62909 25-Apr-2019 Fangrui Song <maskray@google.com>

Parallel: only allow the first TaskGroup to run tasks parallelly

Summary:
Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (f

Parallel: only allow the first TaskGroup to run tasks parallelly

Summary:
Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).

When parallel_for_each() is about to return, in ~Latch() called by
~TaskGroup(), a thread (in the default executor) may block in
Latch::sync() waiting for Count to become zero. If all threads in the
default executor are blocked, it is a dead lock.

To fix this, force serial execution if the current TaskGroup is not the
first one. For a nested llvm::parallel::for_each(), this parallelizes
the outermost loop and serializes inner loops.

Differential Revision: https://reviews.llvm.org/D61115

llvm-svn: 359182

show more ...


12