xref: /netbsd-src/share/man/man9/bufferio.9 (revision 88fd7a1b6c2674ac2473a701e55ac2e48a544ca1)
1.\"	$NetBSD: bufferio.9,v 1.18 2019/09/12 21:08:35 sevan Exp $
2.\"
3.\" Copyright (c) 2015 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Taylor R. Campbell.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.Dd September 12, 2019
31.Dt BUFFERIO 9
32.Os
33.Sh NAME
34.Nm BUFFERIO ,
35.Nm biodone ,
36.Nm biowait ,
37.Nm getiobuf ,
38.Nm putiobuf ,
39.Nm nestiobuf_setup ,
40.Nm nestiobuf_done
41.Nd block I/O buffer transfers
42.Sh SYNOPSIS
43.In sys/buf.h
44.Ft void
45.Fn biodone "buf_t *bp"
46.Ft int
47.Fn biowait "buf_t *bp"
48.Ft buf_t *
49.Fn getiobuf "struct vnode *vp" "bool waitok"
50.Ft void
51.Fn putiobuf "buf_t *bp"
52.Ft void
53.Fn nestiobuf_setup "buf_t *mbp" "buf_t *bp" "int offset" \
54        "size_t size"
55.Ft void
56.Fn nestiobuf_done "buf_t *mbp" "int donebytes" "int error"
57.Sh DESCRIPTION
58The
59.Nm
60subsystem manages block I/O buffer transfers, described by the
61.Vt "struct buf"
62structure, which serves multiple purposes between users in
63.Nm ,
64users in
65.Xr buffercache 9 ,
66and users in block device drivers to execute transfers to physical
67disks.
68.Sh BLOCK DEVICE USERS
69Users of
70.Nm
71wishing to submit a buffer for block I/O transfer must obtain a
72.Vt "struct buf" ,
73e.g. via
74.Fn getiobuf ,
75fill its parameters, and submit it to a block device with
76.Xr bdev_strategy 9 ,
77usually via
78.Xr VOP_STRATEGY 9 .
79.Pp
80The parameters to an I/O transfer described by
81.Fa bp
82are specified by the following
83.Vt "struct buf"
84fields:
85.Bl -tag -width 6n -offset abcd
86.It Fa bp Ns Li "->b_flags"
87Flags specifying the type of transfer.
88.Bl -tag -width 6n -compact
89.It Dv B_READ
90Transfer is read from device.
91If not set, transfer is write to device.
92.It Dv B_ASYNC
93Asynchronous I/O.
94Caller must not provide
95.Fa bp Ns Li "->b_iodone"
96and must not call
97.Fn biowait bp .
98.El
99For legibility, callers should indicate writes by passing the
100pseudo-flag
101.Dv B_WRITE ,
102which is zero.
103.It Fa bp Ns Li "->b_data"
104Pointer to kernel virtual address of source/target for transfer.
105.It Fa bp Ns Li "->b_bcount"
106Nonnegative number of bytes requested for transfer.
107.It Fa bp Ns Li "->b_blkno"
108Block number at which to do transfer.
109.It Fa bp Ns Li "->b_iodone"
110I/O completion callback.
111.Dv B_ASYNC
112must not be set in
113.Fa bp Ns Li "->b_flags" .
114.El
115.Pp
116Additionally, if the I/O transfer is a write associated with a
117.Xr vnode 9
118.Fa vp ,
119then before the user submits it to a block device, the user must
120increment
121.Fa vp Ns Li "->v_numoutput" .
122The user must not acquire
123.Fa vp Ns Ap s
124vnode lock between incrementing
125.Fa vp Ns Li "->v_numoutput"
126and submitting
127.Fa bp
128to a block device \(em doing so will likely cause deadlock with the
129syncer.
130.Pp
131Block I/O transfer completion may be notified by the
132.Fa bp Ns Li "->b_iodone"
133callback, by signalling
134.Fn biowait
135waiters, or not at all in the
136.Dv B_ASYNC
137case.
138.Bl -dash
139.It
140If the user sets the
141.Fa bp Ns Li "->b_iodone"
142callback to a
143.Pf non- Dv NULL
144function pointer, it will be called in soft interrupt context when the
145I/O transfer is complete.
146The user
147.Em may not
148call
149.Fn biowait bp
150in this case.
151.It
152If
153.Dv B_ASYNC
154is set, then the I/O transfer is asynchronous and the user will not be
155notified when it is completed.
156The user
157.Em may not
158call
159.Fn biowait bp
160in this case.
161.It
162Otherwise, if
163.Fa bp Ns Li "->b_iodone"
164is
165.Dv NULL
166and
167.Dv B_ASYNC
168is not specified, the user may wait for the I/O transfer to complete
169with
170.Fn biowait bp .
171.El
172.Pp
173Once an I/O transfer has completed, its
174.Vt "struct buf"
175may be reused, but the user must first clear the
176.Dv BO_DONE
177flag of
178.Fa bp Ns Li "->b_oflags"
179before reusing it.
180.Sh NESTED I/O TRANSFERS
181Sometimes an I/O transfer from a single buffer in memory cannot go to a
182single location on a block device: it must be split up into smaller
183transfers for each segment of the memory buffer.
184.Pp
185After initializing the
186.Li b_flags ,
187.Li b_data ,
188and
189.Li b_bcount
190parameters of an I/O transfer for the buffer, called the
191.Em master
192buffer, the user can issue smaller transfers for segments of the buffer
193using
194.Fn nestiobuf_setup .
195When nested I/O transfers complete, in any order, they debit from the
196amount of work left to be done in the master buffer.
197If any segments of the buffer were skipped, the user can report this
198with
199.Fn nestiobuf_done
200to debit the skipped part of the work.
201.Pp
202The master buffer's I/O transfer is completed when all nested buffers'
203I/O transfers are completed, and if
204.Fn nestiobuf_done
205is called in the case of skipped segments.
206.Pp
207For writes associated with a vnode
208.Fa vp ,
209.Fn nestiobuf_setup
210accounts for
211.Fa vp Ns Li "->v_numoutput" ,
212so the caller is not allowed to acquire
213.Fa vp Ns Ap s
214vnode lock before submitting the nested I/O transfer to a block
215device.
216However, the caller is responsible for accounting the master buffer in
217.Fa vp Ns Li "->v_numoutput" .
218This must be done very carefully because after incrementing
219.Fa vp Ns Li "->v_numoutput" ,
220the caller is not allowed to acquire
221.Fa vp Ns Ap s
222vnode lock before either calling
223.Fn nestiobuf_done
224or submitting the last nested I/O transfer to a block device.
225.Pp
226For example:
227.Bd -literal -offset abcd
228struct buf *mbp, *bp;
229size_t skipped = 0;
230unsigned i;
231int error = 0;
232
233mbp = getiobuf(vp, true);
234mbp->b_data = data;
235mbp->b_resid = mbp->b_bcount = datalen;
236mbp->b_flags = B_WRITE;
237
238KASSERT(0 < nsegs);
239KASSERT(datalen == nsegs*segsz);
240for (i = 0; i < nsegs; i++) {
241	struct vnode *devvp;
242	daddr_t blkno;
243
244	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
245	error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL);
246	VOP_UNLOCK(vp);
247	if (error == 0 && blkno == -1)
248		error = EIO;
249	if (error) {
250		/* Give up early, don't try to handle holes.  */
251		skipped += datalen - i*segsz;
252		break;
253	}
254
255	bp = getiobuf(vp, true);
256	nestiobuf_setup(bp, mbp, i*segsz, segsz);
257	bp->b_blkno = blkno;
258	if (i == nsegs - 1)	/* Last segment.  */
259		break;
260	VOP_STRATEGY(devvp, bp);
261}
262
263/*
264 * Account v_numoutput for master write.
265 * (Must not vn_lock before last VOP_STRATEGY!)
266 */
267mutex_enter(&vp->v_interlock);
268vp->v_numoutput++;
269mutex_exit(&vp->v_interlock);
270
271if (skipped)
272	nestiobuf_done(mbp, skipped, error);
273else
274	VOP_STRATEGY(devvp, bp);
275.Ed
276.Sh BLOCK DEVICE DRIVERS
277Block device drivers implement a
278.Sq strategy
279method, in the
280.Li d_strategy
281member of
282.Li struct bdevsw
283.Pq Xr driver 9 ,
284to queue a buffer for disk I/O.
285The inputs to the strategy method are:
286.Bl -tag -width 6n -offset abcd
287.It Fa bp Ns Li "->b_flags"
288Flags specifying the type of transfer.
289.Bl -tag -width 6n -compact
290.It Dv B_READ
291Transfer is read from device.
292If not set, transfer is write to device.
293.El
294.It Fa bp Ns Li "->b_data"
295Pointer to kernel virtual address of source/target for transfer.
296.It Fa bp Ns Li "->b_bcount"
297Nonnegative number of bytes requested for transfer.
298.It Fa bp Ns Li "->b_blkno"
299Block number at which to do transfer, relative to partition start.
300.El
301.Pp
302If the strategy method uses
303.Xr bufq 9 ,
304it must additionally initialize the following fields before queueing
305.Fa bp
306with
307.Xr bufq_put 9 :
308.Bl -tag -width 6n -offset abcd
309.It Fa bp Ns Li "->b_rawblkno"
310Block number relative to volume start.
311.El
312.Pp
313When the I/O transfer is complete, whether it succeeded or failed, the
314strategy method must:
315.Bl -dash
316.It
317Set
318.Fa bp Ns Li "->b_error"
319to zero on success, or to an
320.Xr errno 2
321error code on failure.
322.It
323Set
324.Fa bp Ns Li "->b_resid"
325to the number of bytes remaining to transfer, whether on success or
326on failure.
327If no bytes were transferred, this must be set to
328.Fa bp Ns Li "->b_bcount" .
329.It
330Call
331.Fn biodone bp .
332.El
333.Sh FUNCTIONS
334.Bl -tag -width abcd
335.It Fn biodone bp
336Notify that the I/O transfer described by
337.Fa bp
338has completed.
339.Pp
340To be called by a block device driver.
341Caller must first set
342.Fa bp Ns Li "->b_error"
343to an error code and
344.Fa bp Ns Li "->b_resid"
345to the number of bytes remaining to transfer.
346.It Fn biowait bp
347Wait for the synchronous I/O transfer described by
348.Fa bp
349to complete.
350Returns the value of
351.Fa bp Ns Li "->b_error" .
352.Pp
353To be called by a user requesting the I/O transfer.
354.Pp
355May not be called if
356.Fa bp
357has a callback or is asynchronous \(em that is, if
358.Fa bp Ns Li "->b_iodone"
359is set, or if
360.Dv B_ASYNC
361is set in
362.Fa bp Ns Li "->b_flags" .
363.It Fn getiobuf vp waitok
364Allocate a
365.Vt "struct buf"
366for an I/O transfer.
367If
368.Fa vp
369is
370.Pf non- Dv NULL ,
371the transfer is associated with it.
372If
373.Fa waitok
374is false,
375returns
376.Dv NULL
377if none can be allocated immediately.
378.Pp
379The resulting
380.Vt "struct buf"
381pointer must eventually be passed to
382.Fn putiobuf
383to release it.
384Do
385.Em not
386use
387.Xr brelse 9 .
388.Pp
389The buffer may not be used for an asynchronous I/O transfer, because
390there is no way to know when it is completed and may be safely passed
391to
392.Fn putiobuf .
393Asynchronous I/O transfers are allowed only for buffers in the
394.Xr buffercache 9 .
395.Pp
396May sleep if
397.Fa waitok
398is true.
399.It Fn putiobuf bp
400Free
401.Fa bp ,
402which must have been allocated by
403.Fn getiobuf .
404Either
405.Fa bp
406must never have been submitted to a block device, or the I/O transfer
407must have completed.
408.El
409.Sh CODE REFERENCES
410The
411.Nm
412subsystem is implemented in
413.Pa sys/kern/vfs_bio.c .
414.Sh SEE ALSO
415.Xr buffercache 9 ,
416.Xr bufq 9
417.Sh BUGS
418The
419.Nm
420abstraction provides no way to cancel an I/O transfer once it has been
421submitted to a block device.
422.Pp
423The
424.Nm
425abstraction provides no way to do I/O transfers with non-kernel pages,
426e.g. directly to buffers in userland without copying into the kernel
427first.
428.Pp
429The
430.Vt "struct buf"
431type is all mixed up with the
432.Xr buffercache 9 .
433.Pp
434The
435.Nm
436abstraction is a totally idiotic API design.
437.Pp
438The
439.Li v_numoutput
440accounting required of
441.Nm
442callers is asinine.
443