Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
666a3f4e |
| 11-Sep-2024 |
Joseph Huber <huberjn@outlook.com> |
[libc] Stub TLS functions on the GPU temporarily (#108267)
Summary: There's an extern weak symbol for this, we should just factor these into a more common interface. Stub them temporarily to make th
[libc] Stub TLS functions on the GPU temporarily (#108267)
Summary: There's an extern weak symbol for this, we should just factor these into a more common interface. Stub them temporarily to make the bots happy. PTXAS does not handle extern weak.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
5c13f9ae |
| 18-Aug-2024 |
Joseph Huber <huberjn@outlook.com> |
[libc] Add single threaded kernel attributes to AMDGPU startup utility (#104651)
Summary: I fixed the errors here recently so I can actually use these. This shouldn't impact much, just should hopefu
[libc] Add single threaded kernel attributes to AMDGPU startup utility (#104651)
Summary: I fixed the errors here recently so I can actually use these. This shouldn't impact much, just should hopefully make the code generated slightly better.
show more ...
|
#
b7c7dbd4 |
| 11-Aug-2024 |
Schrodinger ZHU Yifan <yifanzhu@rochester.edu> |
Revert "libc: Remove `extern "C"` from main declarations" (#102827)
Reverts llvm/llvm-project#102825
|
#
1b71c471 |
| 11-Aug-2024 |
David Blaikie <dblaikie@gmail.com> |
libc: Remove `extern "C"` from main declarations (#102825)
This is invalid in C++, and clang recently started warning on it as of
#101853
|
#
1a92cc5a |
| 08-Aug-2024 |
Joseph Huber <huberjn@outlook.com> |
[libc] Implement 'getenv' on the GPU target (#102376)
Summary: This patch implements 'getenv'. I was torn on how to implement this, since realistically we only have access to this environment pointe
[libc] Implement 'getenv' on the GPU target (#102376)
Summary: This patch implements 'getenv'. I was torn on how to implement this, since realistically we only have access to this environment pointer in the "loader" interface. An alternative would be to use an RPC call every time, but I think that's overkill for what this will be used for. A better solution is just to emit a common `DataEnvironment` that contains all of the host visible resources to initialize. Right now this is the `env_ptr`, `clock_freq`, and `rpc_client`.
I did this by making the `app.h` interface that Linux uses more general, could possibly move that into a separate patch, but I figured it's easier to see with the usage.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
5ff3ff33 |
| 12-Jul-2024 |
Petr Hosek <phosek@google.com> |
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)
This is a part of #97655.
|
#
ce9035f5 |
| 12-Jul-2024 |
Mehdi Amini <joker.eph@gmail.com> |
Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
#
3f30effe |
| 11-Jul-2024 |
Petr Hosek <phosek@google.com> |
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)
This is a part of #97655.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
0352d5ee |
| 23-Feb-2024 |
Joseph Huber <huberjn@outlook.com> |
[libc][NFC] Remove redundant external clock symbol for AMDGPU (#82794)
Summary: The AMDGPU target needs an external clock symbol so the driver can set the frequency with the correct value. This was
[libc][NFC] Remove redundant external clock symbol for AMDGPU (#82794)
Summary: The AMDGPU target needs an external clock symbol so the driver can set the frequency with the correct value. This was left over from the previous implementation and I forgot to remove it when actually implementing the timing utilities.
show more ...
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
dc30fa6a |
| 09-Nov-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc][fix] Call GPU destructors in the correct order
Summary: I was mistakenly iterating the list backwards. Regular semantics puts both arrays in priority order but the destructors are called back
[libc][fix] Call GPU destructors in the correct order
Summary: I was mistakenly iterating the list backwards. Regular semantics puts both arrays in priority order but the destructors are called backwards.
show more ...
|
Revision tags: llvmorg-17.0.4 |
|
#
f3505320 |
| 19-Oct-2023 |
alfredfo <98554039+alfredfo@users.noreply.github.com> |
[libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620)
See-also: https://github.com/llvm/llvm-project/pull/69548
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
b6bc9d72 |
| 26-Sep-2023 |
Guillaume Chatelet <gchatelet@google.com> |
[libc] Mass replace enclosing namespace (#67032)
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
#
59896c16 |
| 21-Sep-2023 |
Joseph Huber <35342157+jhuber6@users.noreply.github.com> |
[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700)
Summary: This patch removes the `rpc_reset` function. This was previously used to initialize the RPC client on the device b
[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700)
Summary: This patch removes the `rpc_reset` function. This was previously used to initialize the RPC client on the device by setting up the pointers to communicate with the server. The purpose of this was to make it easier to initialize the device for testing. However, this prevented us from enforcing an invariant that the buffers are all read-only from the client side.
The expected way to initialize the server is now to copy it from the host runtime. This will allow us to maintain that the RPC client is in the constant address space on the GPU, potentially through inference, and improving caching behaviour.
show more ...
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
d3aabeb7 |
| 21-Jul-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Treat the locks array as a bitfield
Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply tu
[libc] Treat the locks array as a bitfield
Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
show more ...
|
#
979fb950 |
| 19-Jul-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
Revert "[libc] Treat the locks array as a bitfield"
Summary: This caused test failures on the gfx90a buildbot. This works on my gfx1030 and the Nvidia buildbots, so we'll need to investigate what is
Revert "[libc] Treat the locks array as a bitfield"
Summary: This caused test failures on the gfx90a buildbot. This works on my gfx1030 and the Nvidia buildbots, so we'll need to investigate what is going wrong here. For now revert it to get the bots green.
This reverts commit 05abcc579244b68162b847a6780d27b22bd58f74.
show more ...
|
#
05abcc57 |
| 18-Jul-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Treat the locks array as a bitfield
Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply tu
[libc] Treat the locks array as a bitfield
Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
show more ...
|
#
5db39796 |
| 04-Jul-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Support timing information in libc tests
This patch adds the necessary support to provide timing information in `libc` tests. This is useful for determining which tests look what amount of ti
[libc] Support timing information in libc tests
This patch adds the necessary support to provide timing information in `libc` tests. This is useful for determining which tests look what amount of time. We also can use this as a test basis for providing more fine-grained timing when implementing things on the GPU.
The main difficulty with this is the fact that the AMDGPU fixed frequency clock operates at an unknown frequency. We need to read this on a per-card basis from the driver and then copy it in. NVPTX on the other hand has a fixed clock at a resolution of 1ns. I have also increased the resolution of the print-outs as the majority of these are below a millisecond for me.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154446
show more ...
|
#
964a535b |
| 19-Jun-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Remove flexible array and replace with a template
Currently the implementation of the RPC interface requires a flexible struct. This caused problems when compilling the RPC server with GCC as
[libc] Remove flexible array and replace with a template
Currently the implementation of the RPC interface requires a flexible struct. This caused problems when compilling the RPC server with GCC as would be required if trying to export the RPC server interface. This required that we either move to the `x[1]` workaround or make it a template parameter. While just using `x[1]` would be much less noisy, this is technically undefined behavior. For this reason I elected to use templates.
The downside to using templates is that the server code must now be able to handle multiple different types at runtime. I was unable to find a good solution that didn't rely on type erasure so I simply branch off of the given value.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153304
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
30093d6b |
| 11-May-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc][obvious] Fix undefined variable after name change
I forgot that we still used these variables in the loaders.
Differential Revision: https://reviews.llvm.org/D150362
|
#
bbeae142 |
| 11-May-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[libc][rpc] Allocate a single block of shared memory instead of three
Allows moving the pointer swap between server and client into reset. Single allocation simplifies whatever allocates the client/
[libc][rpc] Allocate a single block of shared memory instead of three
Allows moving the pointer swap between server and client into reset. Single allocation simplifies whatever allocates the client/server, currently the libc loaders.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150337
show more ...
|
#
f497611f |
| 10-May-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[libc][rpc] Allocate locks array within process
Replaces the globals currently used. Worth changing to a bitmap before allowing runtime number of ports >> 64. One bit per port is likely to be cheap
[libc][rpc] Allocate locks array within process
Replaces the globals currently used. Worth changing to a bitmap before allowing runtime number of ports >> 64. One bit per port is likely to be cheap enough that sizing for the worst case is always fine, otherwise in the future we can change to dynamically allocating it.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150309
show more ...
|
Revision tags: llvmorg-16.0.3 |
|
#
aea866c1 |
| 01-May-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Support concurrent RPC port access on the GPU
Previously we used a single port to implement the RPC. This was sufficient for single threaded tests but can potentially cause deadlocks when usi
[libc] Support concurrent RPC port access on the GPU
Previously we used a single port to implement the RPC. This was sufficient for single threaded tests but can potentially cause deadlocks when using multiple threads. The reason for this is that GPUs make no forward progress guarantees. Therefore one group of threads waiting on another group of threads can spin forever because there is no guarantee that the other threads will continue executing. The typical workaround for this is to allocate enough memory that a sufficiently large number of work groups can make progress. As long as this number is somewhat close to the amount of total concurrency we can obtain reliable execution around a shared resource.
This patch enables using multiple ports by widening the arrays to a predetermined size and indexes into them. Empty ports are currently obtained via a trivial linker scan. This should be imporoved in the future for performance reasons. Portions of D148191 were applied to achieve parallel support.
Depends on D149581
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149598
show more ...
|
#
901266da |
| 01-May-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Change GPU startup and loader to use multiple kernels
The GPU has a different execution model to standard `_start` implementations. On the GPU, all threads are active at the start of a kernel
[libc] Change GPU startup and loader to use multiple kernels
The GPU has a different execution model to standard `_start` implementations. On the GPU, all threads are active at the start of a kernel. In order to correctly intitialize and call the constructors we want single threaded semantics. Previously, this was done using a makeshift global barrier with atomics. However, it should be easier to simply put the portions of the code that must be single threaded in separate kernels and then call those with only one thread. Generally, mixing global state between kernel launches makes optimizations more difficult, similarly to calling a function outside of the TU, but for testing it is better to be correct.
Depends on D149527 D148943
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149581
show more ...
|
#
507edb52 |
| 04-May-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Enable multiple threads to use RPC on the GPU
The execution model of the GPU expects that groups of threads will execute in lock-step in SIMD fashion. It's both important for performance and
[libc] Enable multiple threads to use RPC on the GPU
The execution model of the GPU expects that groups of threads will execute in lock-step in SIMD fashion. It's both important for performance and correctness that we treat this as the smallest possible granularity for an RPC operation. Thus, we map multiple threads to a single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on the GPU with multiple threads. This requires some workarounds to mimic the model when handling the protocol from the CPU. I'm not completely happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
show more ...
|
#
1b823abe |
| 27-Apr-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Add support for global ctors / dtors for AMDGPU
This patch makes the necessary changes to support calling global constructors and destructors on the GPU. The patch in D149340 allows the `lld`
[libc] Add support for global ctors / dtors for AMDGPU
This patch makes the necessary changes to support calling global constructors and destructors on the GPU. The patch in D149340 allows the `lld` linker to create the symbols pointing us to these globals. These should be executed by a single thread, which is more difficult on the GPU because all threads are active. I chose to use an atomic counter to sync every thread on the GPU. This is very slow if you use more than a few thousand threads, but for testing purposes it should be sufficient.
Depends on D149340 D149363
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149398
show more ...
|