| 9b8cae4d | 22-Feb-2022 |
Elena Agostini <eagostini@nvidia.com> |
gpudev: use CPU mapping in communication list
rte_gpu_mem_cpu_map() exposes a GPU memory area to the CPU. In gpudev communication list this is useful to store the status flag.
A communication list
gpudev: use CPU mapping in communication list
rte_gpu_mem_cpu_map() exposes a GPU memory area to the CPU. In gpudev communication list this is useful to store the status flag.
A communication list status flag allocated on GPU memory and mapped for CPU visibility can be updated by CPU and polled by a GPU workload.
The polling operation is more frequent than the CPU update operation. Having the status flag in GPU memory reduces the GPU workload polling latency.
If CPU mapping feature is not enabled, status flag resides in CPU memory registered so it's visible from the GPU.
To facilitate the interaction with the status flag, this patch provides also the set/get functions for it.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
show more ...
|
| c7ebd65c | 08-Nov-2021 |
Elena Agostini <eagostini@nvidia.com> |
gpudev: add communication list
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with tas
gpudev: add communication list
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations.
An example could be a receive-and-process application where CPU is responsible for receiving packets in multiple mbufs and the GPU is responsible for processing the content of those packets.
The purpose of this list is to provide a buffer in CPU memory visible from the GPU that can be treated as a circular buffer to let the CPU provide fondamental info of received packets to the GPU.
A possible use-case is described below.
CPU: - Trigger some task on the GPU - in a loop: - receive a number of packets - provide packets info to the GPU
GPU: - Do some pre-processing - Wait to receive a new set of packet to be processed
Layout of a communication list would be:
------- | 0 | => pkt_list | status | | #pkts | ------- | 1 | => pkt_list | status | | #pkts | ------- | 2 | => pkt_list | status | | #pkts | ------- | .... | => pkt_list -------
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
show more ...
|
| f56160a2 | 08-Nov-2021 |
Elena Agostini <eagostini@nvidia.com> |
gpudev: add communication flag
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with tas
gpudev: add communication flag
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations.
The purpose of this flag is to allow the CPU and the GPU to exchange ACKs. A possible use-case is described below.
CPU: - Trigger some task on the GPU - Prepare some data - Signal to the GPU the data is ready updating the communication flag
GPU: - Do some pre-processing - Wait for more data from the CPU polling on the communication flag - Consume the data prepared by the CPU
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
show more ...
|
| e818c4e2 | 08-Nov-2021 |
Elena Agostini <eagostini@nvidia.com> |
gpudev: add memory API
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. Such workload distribution can be achieved by
gpudev: add memory API
In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. Such workload distribution can be achieved by sharing some memory.
As a first step, the features are focused on memory management. A function allows to allocate memory inside the device, or in the main (CPU) memory while making it visible for the device. This memory may be used to save packets or for synchronization data.
The next step should focus on GPU processing task control.
Signed-off-by: Elena Agostini <eagostini@nvidia.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
show more ...
|
| a9af048a | 08-Nov-2021 |
Thomas Monjalon <thomas@monjalon.net> |
gpudev: support multi-process
The device data shared between processes are moved in a struct allocated in a shared memory (a new memzone for all GPUs). The main struct rte_gpu references the shared
gpudev: support multi-process
The device data shared between processes are moved in a struct allocated in a shared memory (a new memzone for all GPUs). The main struct rte_gpu references the shared memory via the pointer mpshared.
The API function rte_gpu_attach() is added to attach a device from the secondary process. The function rte_gpu_allocate() can be used only by primary process.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
show more ...
|