#
50445dff |
| 20-Apr-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Add more utility functions for the GPU
This patch adds extra intrinsics for the GPU. Some of these are unused for now but will be used later. We use these currently to update the `RPC` handli
[libc] Add more utility functions for the GPU
This patch adds extra intrinsics for the GPU. Some of these are unused for now but will be used later. We use these currently to update the `RPC` handling. Currently, every thread can update the RPC client, which isn't correct. This patch adds code neccesary to allow a single thread to perfrom the write while the others wait.
Feedback is welcome for the naming of these functions. I'm copying the OpenMP nomenclature where we call an AMD `wavefront` or NVIDIA `warp` a `lane`.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D148810
show more ...
|
Revision tags: llvmorg-16.0.2 |
|
#
d0ff5e40 |
| 14-Apr-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Update RPC interface for system utilities on the GPU
This patch reworks the RPC interface to allow more generic memory operations using the shared better. This patch decomposes the entire RPC
[libc] Update RPC interface for system utilities on the GPU
This patch reworks the RPC interface to allow more generic memory operations using the shared better. This patch decomposes the entire RPC interface into opening a port and calling `send` or `recv` on it.
The `send` function sends a single packet of the length of the buffer. The `recv` function is paired with the `send` call to then use the data. So, any aribtrary combination of sending packets is possible. The only restriction is that the client initiates the exchange with a `send` while the server consumes it with a `recv`.
The operation of this is driven by two independent state machines that tracks the buffer ownership during loads / stores. We keep track of two so that we can transition between a send state and a recv state without an extra wait. State transitions are observed via bit toggling, e.g.
This interface supports an efficient `send -> ack -> send -> ack -> send` interface and allows for the last send to be ignored without checking the ack.
A following patch will add some more comprehensive testing to this interface. I I informally made an RPC call that simply incremented an integer and it took roughly 10 microsends to complete an RPC call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D148288
show more ...
|
Revision tags: llvmorg-16.0.1 |
|
#
6bd4d717 |
| 17-Mar-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Add environment variables to GPU libc test for AMDGPU
This patch performs the same operation to copy over the `argv` array to the `envp` array. This allows the GPU tests to use environment va
[libc] Add environment variables to GPU libc test for AMDGPU
This patch performs the same operation to copy over the `argv` array to the `envp` array. This allows the GPU tests to use environment variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
#
8e4f9b1f |
| 10-Mar-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Add initial support for an RPC mechanism for the GPU
This patch adds initial support for an RPC client / server architecture. The GPU is unable to perform several system utilities on its own,
[libc] Add initial support for an RPC mechanism for the GPU
This patch adds initial support for an RPC client / server architecture. The GPU is unable to perform several system utilities on its own, so in order to implement features like printing or memory allocation we need to be able to communicate with the executing process. This is done via a buffer of "sharable" memory. That is, a buffer with a unified pointer that both the client and server can use to communicate.
The implementation here is based off of Jon Chesterfields minimal RPC example in his work. We use an `inbox` and `outbox` to communicate between if there is an RPC request and to signify when work is done. We use a fixed-size buffer for the communication channel. This is fixed size so that we can ensure that there is enough space for all compute-units on the GPU to issue work to any of the ports. Right now the implementation is single threaded so there is only a single buffer that is not shared.
This implementation still has several features missing to be complete. Such as multi-threaded support and asynchrnonous calls.
Depends on D145912
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145913
show more ...
|
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
#
fa34b9e0 |
| 02-Feb-2023 |
Joseph Huber <jhuber6@vols.utk.edu> |
[libc] Add startup code implementation for GPU targets
This patch introduces startup code for executing `main` on a device compiled for the GPU. We will primarily use this to run standalone integrat
[libc] Add startup code implementation for GPU targets
This patch introduces startup code for executing `main` on a device compiled for the GPU. We will primarily use this to run standalone integration tests on the GPU. The actual execution of this routine will need to be provided by a `loader` utility to bootstrap execution on the GPU.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143212
show more ...
|