52a41348 | 02-Dec-2024 |
Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> |
bdev: do not retry nomem I/Os during aborting them
If bdev module reports ENOMEM for the first I/O in an I/O channel, all subsequent I/Os would be queued in the nomem list. In this case, io_outstand
bdev: do not retry nomem I/Os during aborting them
If bdev module reports ENOMEM for the first I/O in an I/O channel, all subsequent I/Os would be queued in the nomem list. In this case, io_outstanding and nomem_threshold would remain 0, allowing nomem I/Os to be resubmitted unconditionally.
Now, a coming reset could trigger nomem I/O resubmission when aborting nomem I/Os in the following path:
``` bdev_reset_freeze_channel -> bdev_abort_all_queued_io -> spdk_bdev_io_complete -> _bdev_io_handle_no_mem -> bdev_ch_retry_io ```
Both bdev_abort_all_queued_io and bdev_ch_retry_io modifies nomem_io list in this path. Thus, there might be I/Os that are firstly submitted to the underlying device by bdev_ch_retry_io, and then get aborted by bdev_abort_all_queued_io, resulting in double-completion of these I/Os later.
To fix this, just do not resubmit nomem I/Os when aborting is in progress.
Change-Id: I1f66262216885779d1a883ec9250d58a13d8c228 Signed-off-by: Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25522 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <jim.harris@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
show more ...
|
06358c25 | 01-Nov-2024 |
Konrad Sztyber <konrad.sztyber@intel.com> |
bdev/nvme: use poll_group's fd_group to register interrupts
This eliminates the need for nesting epoll instances in the kernel and allows us to skip one epoll_wait() call. It shows an around 5-10%
bdev/nvme: use poll_group's fd_group to register interrupts
This eliminates the need for nesting epoll instances in the kernel and allows us to skip one epoll_wait() call. It shows an around 5-10% latency improvement.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: Idd6ed70d41760566b82246c8af59016fa80a0610 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25469 Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Community-CI: Mellanox Build Bot Reviewed-by: Ankit Kumar <ankit.kumar@samsung.com>
show more ...
|
43c35d80 | 01-Nov-2024 |
Konrad Sztyber <konrad.sztyber@intel.com> |
util: multi-level fd_group nesting
This patch adds the ability to nest multiple fd_groups into one another. This builds a tree with fds from all fd_groups being registered at root fd_group's epfd. F
util: multi-level fd_group nesting
This patch adds the ability to nest multiple fd_groups into one another. This builds a tree with fds from all fd_groups being registered at root fd_group's epfd. For instance, in the following configuration:
fgrp0 | fgrp1----+----fgrp2 | fgrp3
fds from all fd_groups will be registered to epfd of fgrp0. After unnesting fgrp1, fgrp1 and fgrp3 fds will be removed from frgp0's epfd and added to fgrp1 epfd.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: I4f586c21fe3db1739bf2010578b20606c53e5e84 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25463 Reviewed-by: Ankit Kumar <ankit.kumar@samsung.com> Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Mellanox Build Bot Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com>
show more ...
|
a2f5e1c2 | 08-Nov-2024 |
Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> |
blob: don't free bs when spdk_bs_destroy/spdk_bs_unload fails
Error handling of spdk_bs_destroy and spdk_bs_unload is confusing. They may or may not free the spdk_blob_store structure on error, depe
blob: don't free bs when spdk_bs_destroy/spdk_bs_unload fails
Error handling of spdk_bs_destroy and spdk_bs_unload is confusing. They may or may not free the spdk_blob_store structure on error, depending on when the error happens. And users can not know if the structure has been freed after the processes finished, thus unable to handle it correctly.
To fix this problem, we only free the structure when there are no errors happended. In this way, users can be sure that the structure pointer is still valid after the failed opertation. They can then retry the operation or debug the failure.
Fixes #3560.
Change-Id: I4f7194ab8fce4f1a408ce3e6500514fd214427d4 Signed-off-by: Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25472 Reviewed-by: Jim Harris <jim.harris@nvidia.com> Reviewed-by: GangCao <gang.cao@intel.com> Reviewed-by: Yankun Li <845245370@qq.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Mellanox Build Bot
show more ...
|
c2471e45 | 24-Sep-2024 |
Alexey Marchuk <alexeymar@nvidia.com> |
nvmf: Clean unassociated_qpairs on connect error
When spdk_nvmf_poll_group_add returns an error, qpair is in uninitialized state and spdk_nvmf_qpair_disconnect handles this state in a special way, i
nvmf: Clean unassociated_qpairs on connect error
When spdk_nvmf_poll_group_add returns an error, qpair is in uninitialized state and spdk_nvmf_qpair_disconnect handles this state in a special way, i.e. we don't decrement the `group->current_unassociated_qpairs` counter. That is not a critical error but may lead to uneven qpairs distribution.
Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com> Change-Id: If68c4c4c8f3a99a690ba15694b5568940a7e0c21 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25012 Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <jim.harris@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
show more ...
|
2c140f58 | 18-Jul-2024 |
Alexey Marchuk <alexeymar@nvidia.com> |
nvme/rdma: Support accel sequence
If a request has an accel sequence, we append a copy task with RDMA memory domain and don't send capsule until the data_transfer callback is called. In the callback
nvme/rdma: Support accel sequence
If a request has an accel sequence, we append a copy task with RDMA memory domain and don't send capsule until the data_transfer callback is called. In the callback we expect to get a single iov and a memory key which are sent in NVMF capsule to remote taget. When network transmission is finished, we finish data tranfer operation. The reuqest is completed in accel sequence finish_cb. A request which is executing accel sequence has a special flag, we don't abort such requests. Also, we store the data transfer completion callback and call it in case of network failure. Added tests for this feature
Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com> Change-Id: I021bd5f268185a5e1b2d77eb098f8daf491aacf9 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/24702 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
show more ...
|
4b59d789 | 28-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
bdev/nvme: Use nbdev always for local nvme_bdev pointer variables
Previously, for the local nvme_bdev pointer, different names, nvme_disk, bdev, and nbdev were used. No special preference but nbdev
bdev/nvme: Use nbdev always for local nvme_bdev pointer variables
Previously, for the local nvme_bdev pointer, different names, nvme_disk, bdev, and nbdev were used. No special preference but nbdev has been used widely. Let's use nbdev always for local nvme_bdev pointer variables.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I1c076b553587b576305bfbb7b25f97fabb83ce02 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25482 Reviewed-by: Jim Harris <jim.harris@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Mellanox Build Bot Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com>
show more ...
|
04524ea4 | 08-Nov-2024 |
Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> |
blob: fix possible memory leak in bs loading
If I/O errors happen during blobstore loading, the allocated memory for spdk_bs_load_ctx->mask may be leaked.
Change-Id: I7e802dfb1b719b1ba23f70bdb216e7
blob: fix possible memory leak in bs loading
If I/O errors happen during blobstore loading, the allocated memory for spdk_bs_load_ctx->mask may be leaked.
Change-Id: I7e802dfb1b719b1ba23f70bdb216e7f9cb35357e Signed-off-by: Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25470 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Mellanox Build Bot Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com>
show more ...
|
aac967c0 | 09-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
lib/nvmf: create pollers for each transport poll group
Having more granular pollers makes it easier to detect and report whether any given poller has pending work in progress.
Signed-off-by: Jim Ha
lib/nvmf: create pollers for each transport poll group
Having more granular pollers makes it easier to detect and report whether any given poller has pending work in progress.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: I640eca2c702ac07eec1b84e3f541564fd0d44a12 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25184 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Ben Walker <ben@nvidia.com>
show more ...
|
2e10c84c | 15-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
nvmf: Expose DIF type of namespace to host again
We disabled exposing DIF type of namespaces to host because nvmf library did not support PRACT. https://github.com/spdk/spdk/commit/a438718fc25a5298f
nvmf: Expose DIF type of namespace to host again
We disabled exposing DIF type of namespaces to host because nvmf library did not support PRACT. https://github.com/spdk/spdk/commit/a438718fc25a5298f0bfc9daf2be0abc885b54be
nvmf library support PRACT now.
Let's expose DIF type of namespaces to host again.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I9c30c3c50791ade18fb46c2be807984c934d3788 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25438 Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <jim.harris@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com>
show more ...
|
38b931b2 | 15-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
nvmf: Set bdev_ext_io_opts::dif_check_flags_exclude_mask for read/write
Fill opts->dif_check_flags_exclude_mask when nvmf submits read or write I/O to bdev layer. Furthremore, get CDW12 value for re
nvmf: Set bdev_ext_io_opts::dif_check_flags_exclude_mask for read/write
Fill opts->dif_check_flags_exclude_mask when nvmf submits read or write I/O to bdev layer. Furthremore, get CDW12 value for read I/O because it has PRACT bit.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Icfa5b9c9b3ca8a0d6d9b08ba447e06ebe0986443 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23629 Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <jim.harris@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
show more ...
|
afdec00e | 06-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
nvmf: Add hide_metadata option to nvmf_subsystem_add_ns
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I518c3cc5fdcb16b41f1e3bda1debf5cb3cc9c47b Reviewed-on: https://review.spdk.
nvmf: Add hide_metadata option to nvmf_subsystem_add_ns
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I518c3cc5fdcb16b41f1e3bda1debf5cb3cc9c47b Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25413 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
show more ...
|
b09de013 | 06-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
nvmf: Get metadata config by not bdev but bdev_desc
This is a preparation to enable no_metadata option to each bdev for DIF insert/strip.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Cha
nvmf: Get metadata config by not bdev but bdev_desc
This is a preparation to enable no_metadata option to each bdev for DIF insert/strip.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I018ccbdba662fa71489ae9664fc9d41bf79cf7d7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25412 Reviewed-by: Jim Harris <jim.harris@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
show more ...
|
ff173863 | 20-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
ut/bdev: Remove duplication with many stups among unit test files
Move all duplicated stubs in bdev.c, mt/bdev.c, and part.c unit test files into the new file common_stubs.h in test/common/lib/bdev.
ut/bdev: Remove duplication with many stups among unit test files
Move all duplicated stubs in bdev.c, mt/bdev.c, and part.c unit test files into the new file common_stubs.h in test/common/lib/bdev.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ic3d75821bf828e196fa576a18feae90d8bd2ffeb Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25455 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot
show more ...
|
0836dccd | 30-Sep-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
bdev: Add spdk_dif_ctx and spdk_dif_error into spdk_bdev_io
If the generic bdev layer and the underlying bdev module use T10 DIF APIs of the accel framework for T10 DIF, DIF context and DIF error st
bdev: Add spdk_dif_ctx and spdk_dif_error into spdk_bdev_io
If the generic bdev layer and the underlying bdev module use T10 DIF APIs of the accel framework for T10 DIF, DIF context and DIF error structures must be persistent while executing the APIs.
As a preparation, embed spdk_dif_ctx structure and spdk_dif_error structure into spdk_bdev_io structure, and initialize both at the start of the I/O submission if the DIF type of the bdev is not disabled or no_metadata option is enabled.
These embedded data structure will be able to use for other purposes too.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I4e61222034bb33d4dd862692a878bd283b5d32d4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23741 Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Mellanox Build Bot
show more ...
|
fb1630bf | 21-Aug-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
bdev: Use data_block_size for upper layer buffer if hide_metadata is true
Add two helper functions, bdev_desc_get_block_size() and bdev_io_get_block_size().
Then, use these functions for buffers of
bdev: Use data_block_size for upper layer buffer if hide_metadata is true
Add two helper functions, bdev_desc_get_block_size() and bdev_io_get_block_size().
Then, use these functions for buffers of upper layer.
The following patches will do DIF insert/strip for read/write if no_metadata is true.
bdev_io_update_io_stat() keeps using bdev->blocklen because I/O to the underlying bdev module uses bdev->blocklen regardless of the no_metadata option.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I46cedb846a4362ba75742d4e543df466ef43f112 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/24629 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Community-CI: Mellanox Build Bot Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
show more ...
|
16e5e505 | 20-Nov-2024 |
Shuhei Matsumoto <smatsumoto@nvidia.com> |
bdev: Add spdk_bdev_open_ext_v2() to support per-open options
We will add the DIF insert/strip feature into the generic bdev layer. We want to make the feature option per bdev open.
There exists sp
bdev: Add spdk_bdev_open_ext_v2() to support per-open options
We will add the DIF insert/strip feature into the generic bdev layer. We want to make the feature option per bdev open.
There exists spdk_bdev_open_async_opts. However it is only for spdk_bdev_open_async(). spdk_bdev_open_async() is not generally usable.
spdk_bdev_open_ext() does not receive any option structure via parameter.
It is not practical to change the existing spdk_bdev_open_ext().
Hence, add a new API spdk_bdev_open_ext_v2() with spdk_bdev_open_opts structure. We find many examples to use v2 in DPDK.
Add hide_metadata option as the first option of the spdk_bdev_open_opts structure.
Last 7 bytes in spdk_bdev_open_opts structure are unused and are not initialized by the caller. To zero out these clearly with minimal user effort, add spdk_bdev_open_opts_init() for initialization.
opts in spdk_bdev_desc is a hot data. Put it into the first cache line in spdk_bdev_desc.
Furthermore, add simple unit test for verification.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I38d93ffbb2becc59e57f9a7163defd5f8f201f07 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23771 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Community-CI: Mellanox Build Bot Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com>
show more ...
|
6c35d974 | 07-Nov-2024 |
Nathan Claudel <nclaudel@kalrayinc.com> |
lib/nvme: destruct controllers that failed init asynchronously
The controller destroy sequence is as follows: - Set `CC.SHN` to request shutdown - Wait for `CSTS.SHST` to be set to `0b10` (Shutdown
lib/nvme: destruct controllers that failed init asynchronously
The controller destroy sequence is as follows: - Set `CC.SHN` to request shutdown - Wait for `CSTS.SHST` to be set to `0b10` (Shutdown complete) - Destroy the associated structs when it's done or after a timeout To do it, two things should be done: - First, call `nvme_ctrlr_destruct_async` - Then, poll `nvme_ctrlr_destruct_poll_async`
However, when a controller fails to initialize on probe, this polling is done synchronously using `nvme_ctrlr_destruct`, which introduces 1ms sleep between each poll.
This is really bad if a controller does not behave as expected and does not set its `CSTS.SHST` in a timely manner because it burdens the probe thread with tons of sleep. If hot-plug is enabled, it makes things even worse because this operation is retried again and again.
Fix this by doing an asynchronous destruct when the controller fails to initialize. Add contexts for this operation on the probe context and poll for controllers destruction in the probe poller function.
Signed-off-by: Nathan Claudel <nclaudel@kalrayinc.com> Change-Id: Ic072a2b7c3351a229d3b6e5c667b71dca2a84b93 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25414 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Vasuki Manikarnike <vasuki.manikarnike@hpe.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ankit Kumar <ankit.kumar@samsung.com> Reviewed-by: Changpeng Liu <changpeliu@tencent.com> Community-CI: Mellanox Build Bot
show more ...
|
8bbc7b69 | 02-Oct-2024 |
Pawel Baldysiak <pawel.baldysiak@dell.com> |
nvmf: Block ctrlr-only admin cmds if NSID is set
According to NVMe spec rev1.3d, Section4, Figure12: If admin cmd is not using NSID field - host should set it to 0h. Otherwise - such command should
nvmf: Block ctrlr-only admin cmds if NSID is set
According to NVMe spec rev1.3d, Section4, Figure12: If admin cmd is not using NSID field - host should set it to 0h. Otherwise - such command should be returned with INVALID FIELD IN COMMAND error.
With recent change that added passthrough command caps to nvmf - SPDK engine is forwarding all admin commands with NSID set to the bdev - even the incorrectly formed ones. For example - commit firmware with NSID set to 1.
Validate if requested command's opcode is using NSID, Return appropriate error if not - for request with NSID set.
Fixes issue #3564.
Change-Id: Id2fce050eeaf96ff039073f439d6444ecd55c0b3 Signed-off-by: Pawel Baldysiak <pawel.baldysiak@dell.com> Signed-off-by: Marcin Galecki <marcin.galecki@dell.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25151 Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
show more ...
|
2dc4a231 | 22-Oct-2024 |
Atul Malakar <a.malakar@samsung.com> |
blob: Add support for variable metadata page size
Currently SPDK blobstore depends on 4KB atomicity of metadata. So, Metadata page size will be made variable based on the physical block size reporte
blob: Add support for variable metadata page size
Currently SPDK blobstore depends on 4KB atomicity of metadata. So, Metadata page size will be made variable based on the physical block size reported by the underlying device. This will enable blobstore to store metadata in IU-sized chunks, not always 4KB.
blobstore.c uses SPDK_BS_PAGE_SIZE (4KB) hardcoded at many places. To remove this make the metadata page size variable, phys_blocklen is added to spdk_bs_dev struct and md_page_size is added to spdk_bs_super_block struct.
Change-Id: I29d073eb4f4341a94a0675e70492b9186382f97f Signed-off-by: Atul Malakar <a.malakar@samsung.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25130 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <ben@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Jim Harris <jim.harris@nvidia.com>
show more ...
|
60164a9d | 24-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
test/unit/blob: do not hardcode size of reserved field
bs_test_recover_cluster_count() builds a super block field by field. But it zeroes the reserved field with a hardcoded size, instead of getting
test/unit/blob: do not hardcode size of reserved field
bs_test_recover_cluster_count() builds a super block field by field. But it zeroes the reserved field with a hardcoded size, instead of getting the size of the reserved field.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: I04be4ec8bf65d8e95b592ca5d262961026b72d95 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25369 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Alliswell <hisunzhenliang@outlook.com>
show more ...
|
3299bf6d | 23-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
blob: remove all references to pages as io_units
When blobstore was first created, everything was in terms of 4KB pages - this included metadata page size and the unit for I/O operations.
A bit lat
blob: remove all references to pages as io_units
When blobstore was first created, everything was in terms of 4KB pages - this included metadata page size and the unit for I/O operations.
A bit later, we introduced concept of "io_unit". If a blobstore was put on a bdev with 512 blocksize, then the io_unit could be 512 bytes.
But when this happened, we should have changed all of the blobstore code such that remaining "page" references only referred to metadata pages. Instead, we left a bunch of places where we would convert various values to/from a number of 4KB pages, and then to the number of io_units. This made the code quite confusing, since direct conversion to/from io_units would have been much clearer.
This existing problem was exacerbated with the upcoming patch to support variable metadata page sizes. We need things like spdk_bs_get_page_size() to return the size of the metadata pages, which may not be 4KB.
So make all of the changes necessary such that all references to "page" now means "metadata page". This includes removing the spdk_blob_get_num_pages() function, which no longer makes sense.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: I66e93e7a4325a3b032bb16edaf657ef12044e1fd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25368 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
show more ...
|
82d52845 | 22-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
test/unit/blob: use io_unit_size in more places
When blobstore was first created, we had concept of "page size". This size was used both as the block size of blobs (i.e. the units that users could r
test/unit/blob: use io_unit_size in more places
When blobstore was first created, we had concept of "page size". This size was used both as the block size of blobs (i.e. the units that users could read/write) and the metadata page size. They were both hardcoded to 4096.
"io unit size" was introduced a long time ago. This allowed IO in sizes different than 4096, usually 512. spdk_bs_get_io_unit_size() can be used to get this value.
We are also working on support for metadata pages sizes greater than 4096. spdk_bs_get_page_size() will continue to return the size of a metadata page, it may just not always be 4096.
But the unit tests in a lot of places use "page size" when "io unit size" should be used instead. Many of the unit tests use 4096 as the default blocklen, so everything still worked. But let's change these to use io_unit_size correctly now, so that they will still pass when the large IU metadata page sizes are introduced.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: I81d9d9b711a20e84bb31181dd167d2f7916500e0 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25315 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Reviewed-by: Ben Walker <ben@nvidia.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
show more ...
|
1390df9e | 22-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
test/unit/blob: do not make assumptions about first data cluster
A bunch of blobstore unit tests make assumptions that the first data cluster is cluster index 1 (meaning cluster index 0 contains all
test/unit/blob: do not make assumptions about first data cluster
A bunch of blobstore unit tests make assumptions that the first data cluster is cluster index 1 (meaning cluster index 0 contains all of the metadata).
But to support variable metadata page sizes, this assumption is no longer valid. For example, with 16KB metadata page sizes, the metadata will take up more than 1 cluster.
So improve the unit tests to remove assumptions in this area.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: If5112b8bcb7124cadc292dd1fde466cbf3d91cd4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25314 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Ben Walker <ben@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot
show more ...
|
1b0a702c | 23-Oct-2024 |
Jim Harris <jim.harris@samsung.com> |
test/unit/blob: remove useless asserts from bs_grow_live_*** unit tests
There are a couple of asserts that do nothing but make assumptions that the metadata only consumes one cluster. Remove them si
test/unit/blob: remove useless asserts from bs_grow_live_*** unit tests
There are a couple of asserts that do nothing but make assumptions that the metadata only consumes one cluster. Remove them since this assumption will be invalid in upcoming patches which support larger md page sizes.
Signed-off-by: Jim Harris <jim.harris@samsung.com> Change-Id: I86d0fe487939669d01b604d0155188356a12f794 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25313 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <ben@nvidia.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
show more ...
|