a2142f3a | 28-Dec-2017 |
GangCao <gang.cao@intel.com> |
bdev/qos: add the QoS rate limiting support on bdev
This patch is to add the basic support of QoS on bdev.
Including two major functionalities:
1. The QoS rate limiting algorithm: a. New IO will
bdev/qos: add the QoS rate limiting support on bdev
This patch is to add the basic support of QoS on bdev.
Including two major functionalities:
1. The QoS rate limiting algorithm: a. New IO will be always queued first also under the no memory condition b. Start the QoS IO operation based on the limit c. A poller started in each millisecond to reset the rate limit and send new IOs down d. The rate limit is based on the millisecond and converted from user configurable IOsPerSecond
2. The Master Thread management: a. Add a per bdev channel_count b. Whenever QoS is enabled on bdev, if QoS bdev channel is not created, create the QoS bdev channel and assign the QoS thread c. When new IOs coming from different channels (threads), pass the IOs to the QoS bdev channel through the thread event d. When the IOs are completed from the QoS bdev channel, pass the IOs back to its orignal channel(thread) e. Destroy the QoS bdev channel when it is the last bdev channel for this bdev. Defer the destruction if current thread is not QoS thread
Change-Id: Ie4444551d7c3c7de52f6513c9db926628796adb4 Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.gerrithub.io/393136 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
show more ...
|
583a24a4 | 14-Jan-2018 |
Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> |
bdev: share nomem_io data between bdevs built on the same device
When there are two bdevs built on the same io_device, it is possible that one bdev entirely saturates underlying queue, not letting t
bdev: share nomem_io data between bdevs built on the same device
When there are two bdevs built on the same io_device, it is possible that one bdev entirely saturates underlying queue, not letting the second bdev issue a single I/O. The second bdev will silently fail any subsequent I/O and append it to the nomem_io list. However, since we resend I/O only from I/O completion callback and there's no outstanding I/O for that bdev (io_outstanding==0), the I/O will never be resent. It'll be stuck in nomem_io forever.
This patch makes nomem_io list to be shared between bdevs built on the same device. It is now possible that I/O completion callback from one bdev will retry sending I/O from other bdev.
The shared bdev data is based on thread-local bdev_mgmt_channel, so doesn't need any external synchronization.
Change-Id: Ia5ac3a1627ce3de4087e43907c329aa7d07ed7c7 Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Reviewed-on: https://review.gerrithub.io/394658 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
show more ...
|
94bc8cfd | 15-Sep-2017 |
Jim Harris <james.r.harris@intel.com> |
bdev: add ENOMEM handling
At very high queue depths, bdev modules may not have enough internal resources to track all of the incoming I/O. For example, we allocate a finite number of nvme_request o
bdev: add ENOMEM handling
At very high queue depths, bdev modules may not have enough internal resources to track all of the incoming I/O. For example, we allocate a finite number of nvme_request objects per allocated queue pair. Currently if these resources are exhausted, the bdev module will return failure (with no indication why) which gets propagated all the way back to the application.
So instead, add SPDK_BDEV_IO_STATUS_NOMEM to allow bdev modules to indicate this type of failure. Also add handling for this status type in the generic bdev layer, involving queuing these I/O for later retry after other I/O on the failing channel have completed.
This does place an expectation on the bdev module that these internal resources are allocated per io_channel. Otherwise we cannot guarantee forward progress solely on reception of completions. For example, without this guarantee, a bdev module could theoretically return ENOMEM even if there were no I/O oustanding for that io_channel. nvme, aio, rbd, virtio and null drivers comply with this expectation already. malloc only complies though when not using copy offload.
This patch will fix malloc w/ copy engine to at least return ENOMEM when no copy descriptors are available. If the condition above occurs, I/O waiting for resources will get failed as part of a subsequent reset which matches the behavior it has today.
Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Iea7cd51a611af8abe882794d0b2361fdbb74e84e
Reviewed-on: https://review.gerrithub.io/378853 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
show more ...
|
1f935c7a | 12-Sep-2017 |
Jim Harris <james.r.harris@intel.com> |
bdev: properly handle aborted resets
Previously, we naively assumed that a completed reset was the reset in progress, and would unilaterally set reset_in_progres to false.
So change reset_in_progre
bdev: properly handle aborted resets
Previously, we naively assumed that a completed reset was the reset in progress, and would unilaterally set reset_in_progres to false.
So change reset_in_progress to a bdev_io pointer instead. If this is not NULL, a reset is not in progress. Then when a reset completes, we only set the reset_in_progress pointer to NULL if we are completing the reset that is in progress.
We also were not aborting queued resets when destroying a channel so that is fixed here too.
The added unit test covers both fixes above - it will submit two resets on a different channels, then destroy the second channel. This will abort the second reset and check that the bdev still sees the first reset as in progress.
Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I61df677cfa272c589ca03cb81753f71b0807a182
Reviewed-on: https://review.gerrithub.io/378199 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
show more ...
|
ab29d2ce | 12-Sep-2017 |
Jim Harris <james.r.harris@intel.com> |
bdev: improve handling of channel deletion with queued resets
Upper layers are not supposed to put an I/O channel if there are still I/O outstanding. This should apply to resets as well.
To better
bdev: improve handling of channel deletion with queued resets
Upper layers are not supposed to put an I/O channel if there are still I/O outstanding. This should apply to resets as well.
To better detect this case, do not remove the reset from the channel's queued_reset list until it is ready to be submitted to the bdev module. This ensures:
1) We can detect if a channel is put with a reset outstanding. 2) We do not access freed memory, when the channel is destroyed before the reset message can submit the reset I/O. 3) Abort the queued reset if a channel is destroyed.
Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I0c03eee8b3642155c19c2996e25955baac22d406
Reviewed-on: https://review.gerrithub.io/378198 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
show more ...
|