1.\" $NetBSD: disk.9,v 1.30 2009/04/08 12:51:43 joerg Exp $ 2.\" 3.\" Copyright (c) 1995, 1996 Jason R. Thorpe. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 3. All advertising materials mentioning features or use of this software 15.\" must display the following acknowledgement: 16.\" This product includes software developed for the NetBSD Project 17.\" by Jason R. Thorpe. 18.\" 4. The name of the author may not be used to endorse or promote products 19.\" derived from this software without specific prior written permission. 20.\" 21.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 22.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 23.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 24.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 25.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 26.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 27.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 28.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 29.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.Dd March 14, 2009 34.Dt DISK 9 35.Os 36.Sh NAME 37.Nm disk , 38.Nm disk_init , 39.Nm disk_attach , 40.Nm disk_detach , 41.Nm disk_destroy , 42.Nm disk_busy , 43.Nm disk_unbusy , 44.Nm disk_find , 45.Nm disk_blocksize 46.Nd generic disk framework 47.Sh SYNOPSIS 48.In sys/types.h 49.In sys/disklabel.h 50.In sys/disk.h 51.Ft void 52.Fn disk_init "struct disk *" "const char *name" "const struct dkdriver *driver" 53.Ft void 54.Fn disk_attach "struct disk *" 55.Ft void 56.Fn disk_detach "struct disk *" 57.Ft void 58.Fn disk_destroy "struct disk *" 59.Ft void 60.Fn disk_busy "struct disk *" 61.Ft void 62.Fn disk_unbusy "struct disk *" "long bcount" "int read" 63.Ft struct disk * 64.Fn disk_find "const char *" 65.Ft void 66.Fn disk_blocksize "struct disk *" "int blocksize" 67.Sh DESCRIPTION 68The 69.Nx 70generic disk framework is designed to provide flexible, 71scalable, and consistent handling of disk state and metrics information. 72The fundamental component of this framework is the 73.Nm disk 74structure, which is defined as follows: 75.Bd -literal 76struct disk { 77 TAILQ_ENTRY(disk) dk_link; /* link in global disklist */ 78 const char *dk_name; /* disk name */ 79 prop_dictionary_t dk_info; /* reference to disk-info dictionary */ 80 int dk_bopenmask; /* block devices open */ 81 int dk_copenmask; /* character devices open */ 82 int dk_openmask; /* composite (bopen|copen) */ 83 int dk_state; /* label state ### */ 84 int dk_blkshift; /* shift to convert DEV_BSIZE to blks */ 85 int dk_byteshift; /* shift to convert bytes to blks */ 86 87 /* 88 * Metrics data; note that some metrics may have no meaning 89 * on certain types of disks. 90 */ 91 struct io_stats *dk_stats; 92 93 const struct dkdriver *dk_driver; /* pointer to driver */ 94 95 /* 96 * Information required to be the parent of a disk wedge. 97 */ 98 kmutex_t dk_rawlock; /* lock on these fields */ 99 u_int dk_rawopens; /* # of openes of rawvp */ 100 struct vnode *dk_rawvp; /* vnode for the RAW_PART bdev */ 101 102 kmutex_t dk_openlock; /* lock on these and openmask */ 103 u_int dk_nwedges; /* # of configured wedges */ 104 /* all wedges on this disk */ 105 LIST_HEAD(, dkwedge_softc) dk_wedges; 106 107 /* 108 * Disk label information. Storage for the in-core disk label 109 * must be dynamically allocated, otherwise the size of this 110 * structure becomes machine-dependent. 111 */ 112 daddr_t dk_labelsector; /* sector containing label */ 113 struct disklabel *dk_label; /* label */ 114 struct cpu_disklabel *dk_cpulabel; 115}; 116.Ed 117.Pp 118The system maintains a global linked-list of all disks attached to the 119system. 120This list, called 121.Nm disklist , 122may grow or shrink over time as disks are dynamically added and removed 123from the system. 124Drivers which currently make use of the detachment 125capability of the framework are the 126.Nm ccd 127and 128.Nm vnd 129pseudo-device drivers. 130.Pp 131The following is a brief description of each function in the framework: 132.Bl -tag -width ".Fn disk_blocksize" 133.It Fn disk_init 134Initialize the disk structure. 135.It Fn disk_attach 136Attach a disk; allocate storage for the disklabel, set the 137.Dq attached time 138timestamp, insert the disk into the disklist, and increment the 139system disk count. 140.It Fn disk_detach 141Detach a disk; free storage for the disklabel, remove the disk 142from the disklist, and decrement the system disk count. 143If the count drops below zero, panic. 144.It Fn disk_destroy 145Release resources used by the disk structure when it is no longer 146required. 147.It Fn disk_busy 148Increment the disk's 149.Dq busy counter . 150If this counter goes from 0 to 1, set the timestamp corresponding to 151this transfer. 152.It Fn disk_unbusy 153Decrement a disk's busy counter. 154If the count drops below zero, panic. 155Get the current time, subtract it from the disk's timestamp, and add 156the difference to the disk's running total. 157Set the disk's timestamp to the current time. 158If the provided byte count is greater than 0, add it to the disk's 159running total and increment the number of transfers performed by the disk. 160The third argument 161.Ar read 162specifies the direction of I/O; 163if non-zero it means reading from the disk, 164otherwise it means writing to the disk. 165.It Fn disk_find 166Return a pointer to the disk structure corresponding to the name provided, 167or NULL if the disk does not exist. 168.It Fn disk_blocksize 169Initialize 170.Fa dk_blkshift 171and 172.Fa dk_byteshift 173members of 174.Fa struct disk 175with suitable values derived from the supplied physical blocksize. 176It is only necessary to call this function if the device's physical blocksize 177is not 178.Dv DEV_BSIZE . 179.El 180.Pp 181The functions typically called by device drivers are 182.Fn disk_init 183.Fn disk_attach , 184.Fn disk_detach , 185.Fn disk_destroy, 186.Fn disk_busy , 187.Fn disk_unbusy , 188and 189.Fn disk_blocksize . 190The function 191.Fn disk_find 192is provided as a utility function. 193.Sh DISK IOCTLS 194The following ioctls should be implemented by disk drivers: 195.Bl -tag -width "xxxxxx" 196.It Dv DIOCGDINFO "struct disklabel" 197Get disklabel. 198.It Dv DIOCSDINFO "struct disklabel" 199Set in-memory disklabel. 200.It Dv DIOCWDINFO "struct disklabel" 201Set in-memory disklabel, and write on-disk disklabel. 202.It Dv DIOCGPART "struct partinfo" 203Get partition information. 204This is used internally. 205.It Dv DIOCRFORMAT "struct format_op" 206Read format. 207.It Dv DIOCWFORMAT "struct format_op" 208Write format. 209.It Dv DIOCSSTEP "int" 210Set step rate. 211.It Dv DIOCSRETRIES "int" 212Set number of retries. 213.It Dv DIOCKLABEL "int" 214Specify whether to keep or drop the in-memory disklabel 215when the device is closed. 216.It Dv DIOCWLABEL "int" 217Enable or disable writing to the part of the disk that contains the label. 218.It Dv DIOCSBAD "struct dkbad" 219Set kernel dkbad. 220.It Dv DIOCEJECT "int" 221Eject removable disk. 222.It Dv DIOCLOCK "int" 223Lock or unlock disk pack. 224For devices with removable media, locking is intended to prevent 225the operator from removing the media. 226.It Dv DIOCGDEFLABEL "struct disklabel" 227Get default label. 228.It Dv DIOCCLRLABEL 229Clear disk label. 230.It Dv DIOCGCACHE "int" 231Get status of disk read and write caches. 232The result is a bitmask containing the following values: 233.Bl -tag -width DKCACHE_RCHANGE 234.It Dv DKCACHE_READ 235Read cache enabled. 236.It Dv DKCACHE_WRITE 237Write(back) cache enabled. 238.It Dv DKCACHE_RCHANGE 239Read cache enable is changeable. 240.It Dv DKCACHE_WCHANGE 241Write cache enable is changeable. 242.It Dv DKCACHE_SAVE 243Cache parameters may be saved, so that they persist across reboots 244or device detach/attach cycles. 245.El 246.It Dv DIOCSCACHE "int" 247Set status of disk read and write caches. 248The input is a bitmask in the same format as used for 249.Dv DIOCGCACHE . 250.It Dv DIOCCACHESYNC "int" 251Synchronise the disk cache. 252This causes information in the disk's write cache (if any) 253to be flushed to stable storage. 254The argument specifies whether or not to force a flush even if 255the kernel believes that there is no outstanding data. 256.It Dv DIOCBSLIST "struct disk_badsecinfo" 257Get bad sector list. 258.It Dv DIOCBSFLUSH 259Flush bad sector list. 260.It Dv DIOCAWEDGE "struct dkwedge_info" 261Add wedge. 262.It Dv DIOCGWEDGEINFO "struct dkwedge_info" 263Get wedge information. 264.It Dv DIOCDWEDGE "struct dkwedge_info" 265Delete wedge. 266.It Dv DIOCLWEDGES "struct dkwedge_list" 267List wedges. 268.It Dv DIOCGSTRATEGY "struct disk_strategy" 269Get disk buffer queue strategy. 270.It Dv DIOCSSTRATEGY "struct disk_strategy" 271Set disk buffer queue strategy. 272.It Dv DIOCGDISKINFO "struct plistref" 273Get disk-info dictionary. 274.El 275.Sh USING THE FRAMEWORK 276This section includes a description on basic use of the framework 277and example usage of its functions. 278Actual implementation of a device driver which uses the framework 279may vary. 280.Pp 281Each device in the system uses a 282.Dq softc 283structure which contains autoconfiguration and state information for that 284device. 285In the case of disks, the softc should also contain one instance 286of the disk structure, e.g.: 287.Bd -literal 288struct foo_softc { 289 device_t sc_dev; /* generic device information */ 290 struct disk sc_dk; /* generic disk information */ 291 [ . . . more . . . ] 292}; 293.Ed 294.Pp 295In order for the system to gather metrics data about a disk, the disk must 296be registered with the system. 297The 298.Fn disk_attach 299routine performs all of the functions currently required to register a disk 300with the system including allocation of disklabel storage space, 301recording of the time since boot that the disk was attached, and insertion 302into the disklist. 303Note that since this function allocates storage space for the disklabel, 304it must be called before the disklabel is read from the media or used in 305any other way. 306Before 307.Fn disk_attach 308is called, a portions of the disk structure must be initialized with 309data specific to that disk. 310For example, in the 311.Dq foo 312disk driver, the following would be performed in the autoconfiguration 313.Dq attach 314routine: 315.Bd -literal 316void 317fooattach(device_t parent, device_t self, void *aux) 318{ 319 struct foo_softc *sc = device_private(self); 320 [ . . . ] 321 322 /* Initialize and attach the disk structure. */ 323 disk_init(\*[Am]sc-\*[Gt]sc_dk, device_xname(self), \*[Am]foodkdriver); 324 disk_attach(\*[Am]sc-\*[Gt]sc_dk); 325 326 /* Read geometry and fill in pertinent parts of disklabel. */ 327 [ . . . ] 328 disk_blocksize(\*[Am]sc-\*[Gt]sc_dk, bytes_per_sector); 329} 330.Ed 331.Pp 332The 333.Nm foodkdriver 334above is the disk's 335.Dq driver 336switch. 337This switch currently includes a pointer to the disk's 338.Dq strategy 339routine. 340This switch needs to have global scope and should be initialized as follows: 341.Bd -literal 342void foostrategy(struct buf *); 343 344const struct dkdriver foodkdriver = { 345 .d_strategy = foostrategy, 346}; 347.Ed 348.Pp 349Once the disk is attached, metrics may be gathered on that disk. 350In order to gather metrics data, the driver must tell the framework when 351the disk starts and stops operations. 352This functionality is provided by the 353.Fn disk_busy 354and 355.Fn disk_unbusy 356routines. 357The 358.Fn disk_busy 359routine should be called immediately before a command to the disk is 360sent, e.g.: 361.Bd -literal 362void 363foostart(sc) 364 struct foo_softc *sc; 365{ 366 [ . . . ] 367 368 /* Get buffer from drive's transfer queue. */ 369 [ . . . ] 370 371 /* Build command to send to drive. */ 372 [ . . . ] 373 374 /* Tell the disk framework we're going busy. */ 375 disk_busy(\*[Am]sc-\*[Gt]sc_dk); 376 377 /* Send command to the drive. */ 378 [ . . . ] 379} 380.Ed 381.Pp 382When 383.Fn disk_busy 384is called, a timestamp is taken if the disk's busy counter moves from 3850 to 1, indicating the disk has gone from an idle to non-idle state. 386Note that 387.Fn disk_busy 388must be called at 389.Fn splbio . 390At the end of a transaction, the 391.Fn disk_unbusy 392routine should be called. 393This routine performs some consistency checks, 394such as ensuring that the calls to 395.Fn disk_busy 396and 397.Fn disk_unbusy 398are balanced. 399This routine also performs the actual metrics calculation. 400A timestamp is taken, and the difference from the timestamp taken in 401.Fn disk_busy 402is added to the disk's total running time. 403The disk's timestamp is then updated in case there is more than one 404pending transfer on the disk. 405A byte count is also added to the disk's running total, and if greater than 406zero, the number of transfers the disk has performed is incremented. 407The third argument 408.Ar read 409specifies the direction of I/O; 410if non-zero it means reading from the disk, 411otherwise it means writing to the disk. 412.Bd -literal 413void 414foodone(xfer) 415 struct foo_xfer *xfer; 416{ 417 struct foo_softc = (struct foo_softc *)xfer-\*[Gt]xf_softc; 418 struct buf *bp = xfer-\*[Gt]xf_buf; 419 long nbytes; 420 [ . . . ] 421 422 /* 423 * Get number of bytes transfered. If there is no buf 424 * associated with the xfer, we are being called at the 425 * end of a non-I/O command. 426 */ 427 if (bp == NULL) 428 nbytes = 0; 429 else 430 nbytes = bp-\*[Gt]b_bcount - bp-\*[Gt]b_resid; 431 432 [ . . . ] 433 434 /* Notify the disk framework that we've completed the transfer. */ 435 disk_unbusy(\*[Am]sc-\*[Gt]sc_dk, nbytes, 436 bp != NULL ? bp-\*[Gt]b_flags \*[Am] B_READ : 0); 437 438 [ . . . ] 439} 440.Ed 441.Pp 442Like 443.Fn disk_busy , 444.Fn disk_unbusy 445must be called at 446.Fn splbio . 447.Sh CODE REFERENCES 448This section describes places within the 449.Nx 450source tree where actual 451code implementing or using the disk framework can be found. 452All pathnames are relative to 453.Pa /usr/src . 454.Pp 455The disk framework itself is implemented within the file 456.Pa sys/kern/subr_disk.c . 457Data structures and function prototypes for the framework are located in 458.Pa sys/sys/disk.h . 459.Pp 460The 461.Nx 462machine-independent SCSI disk and CD-ROM drivers use the 463disk framework. 464They are located in 465.Pa sys/scsi/sd.c 466and 467.Pa sys/scsi/cd.c . 468.Pp 469The 470.Nx 471.Nm ccd 472and 473.Nm vnd 474drivers use the detachment capability of the framework. 475They are located in 476.Pa sys/dev/ccd.c 477and 478.Pa sys/dev/vnd.c . 479.Sh SEE ALSO 480.Xr ccd 4 , 481.Xr vnd 4 , 482.Xr spl 9 483.Sh HISTORY 484The 485.Nx 486generic disk framework appeared in 487.Nx 1.2 . 488.Sh AUTHORS 489The 490.Nx 491generic disk framework was architected and implemented by 492.An Jason R. Thorpe 493.Aq thorpej@NetBSD.org . 494