xref: /netbsd-src/lib/libc/sys/kqueue.2 (revision 37b34d511dea595d3ba03a661cf3b775038ea5f8)
1.\"	$NetBSD: kqueue.2,v 1.9 2002/10/13 07:37:39 jdolecek Exp $
2.\"
3.\" Copyright (c) 2000 Jonathan Lemon
4.\" All rights reserved.
5.\"
6.\" Copyright (c) 2001 The NetBSD Foundation, Inc.
7.\" All rights reserved.
8.\"
9.\" Portions of this documentation is derived from text contributed by
10.\" Luke Mewburn.
11.\"
12.\" Redistribution and use in source and binary forms, with or without
13.\" modification, are permitted provided that the following conditions
14.\" are met:
15.\" 1. Redistributions of source code must retain the above copyright
16.\"    notice, this list of conditions and the following disclaimer.
17.\" 2. Redistributions in binary form must reproduce the above copyright
18.\"    notice, this list of conditions and the following disclaimer in the
19.\"    documentation and/or other materials provided with the distribution.
20.\"
21.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $
34.\"
35.Dd September 22, 2002
36.Dt KQUEUE 2
37.Os
38.Sh NAME
39.Nm kqueue ,
40.Nm kevent
41.Nd kernel event notification mechanism
42.Sh LIBRARY
43.Lb libc
44.Sh SYNOPSIS
45.Fd #include \*[Lt]sys/types.h\*[Gt]
46.Fd #include \*[Lt]sys/event.h\*[Gt]
47.Fd #include \*[Lt]sys/time.h\*[Gt]
48.Ft int
49.Fn kqueue "void"
50.Ft int
51.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout"
52.Fn EV_SET "&kev" ident filter flags fflags data udata
53.Sh DESCRIPTION
54.Fn kqueue
55provides a generic method of notifying the user when an event
56happens or a condition holds, based on the results of small
57pieces of kernel code termed filters.
58A kevent is identified by the (ident, filter) pair; there may only
59be one unique kevent per kqueue.
60.Pp
61The filter is executed upon the initial registration of a kevent
62in order to detect whether a preexisting condition is present, and is also
63executed whenever an event is passed to the filter for evaluation.
64If the filter determines that the condition should be reported,
65then the kevent is placed on the kqueue for the user to retrieve.
66.Pp
67The filter is also run when the user attempts to retrieve the kevent
68from the kqueue.
69If the filter indicates that the condition that triggered
70the event no longer holds, the kevent is removed from the kqueue and
71is not returned.
72.Pp
73Multiple events which trigger the filter do not result in multiple
74kevents being placed on the kqueue; instead, the filter will aggregate
75the events into a single struct kevent.
76Calling
77.Fn close
78on a file descriptor will remove any kevents that reference the descriptor.
79.Pp
80.Fn kqueue
81creates a new kernel event queue and returns a descriptor.
82The queue is not inherited by a child created with
83.Xr fork 2 .
84.ig
85However, if
86.Xr rfork 2
87is called without the
88.Dv RFFDG
89flag, then the descriptor table is shared,
90which will allow sharing of the kqueue between two processes.
91..
92.Pp
93.Fn kevent
94is used to register events with the queue, and return any pending
95events to the user.
96.Fa changelist
97is a pointer to an array of
98.Va kevent
99structures, as defined in
100.Aq Pa sys/event.h .
101All changes contained in the
102.Fa changelist
103are applied before any pending events are read from the queue.
104.Fa nchanges
105gives the size of
106.Fa changelist .
107.Fa eventlist
108is a pointer to an array of kevent structures.
109.Fa nevents
110determines the size of
111.Fa eventlist .
112If
113.Fa timeout
114is a
115.No non- Ns Dv NULL
116pointer, it specifies a maximum interval to wait
117for an event, which will be interpreted as a struct timespec.
118If
119.Fa timeout
120is a
121.Dv NULL
122pointer,
123.Fn kevent
124waits indefinitely.
125To effect a poll, the
126.Fa timeout
127argument should be
128.No non- Ns Dv NULL ,
129pointing to a zero-valued
130.Va timespec
131structure.
132The same array may be used for the
133.Fa changelist
134and
135.Fa eventlist .
136.Pp
137.Fn EV_SET
138is a macro which is provided for ease of initializing a
139kevent structure.
140.Pp
141The
142.Va kevent
143structure is defined as:
144.Bd -literal
145struct kevent {
146	uintptr_t ident;	/* identifier for this event */
147	uint32_t  filter;	/* filter for event */
148	uint32_t  flags;	/* action flags for kqueue */
149	uint32_t  fflags;	/* filter flag value */
150	intptr_t  data;		/* filter data value */
151	void	  *udata;	/* opaque user data identifier */
152};
153.Ed
154.Pp
155The fields of
156.Fa struct kevent
157are:
158.Bl -tag -width XXXfilter -offset indent
159.It ident
160Value used to identify this event.
161The exact interpretation is determined by the attached filter,
162but often is a file descriptor.
163.It filter
164Identifies the kernel filter used to process this event.
165There are pre-defined system filters (which are described below), and
166other filters may be added by kernel subsystems as necessary.
167.It flags
168Actions to perform on the event.
169.It fflags
170Filter-specific flags.
171.It data
172Filter-specific data value.
173.It udata
174Opaque user-defined value passed through the kernel unchanged.
175.El
176.Pp
177The
178.Va flags
179field can contain the following values:
180.Bl -tag -width XXXEV_ONESHOT -offset indent
181.It EV_ADD
182Adds the event to the kqueue.
183Re-adding an existing event will modify the parameters of the original
184event, and not result in a duplicate entry.
185Adding an event automatically enables it,
186unless overridden by the EV_DISABLE flag.
187.It EV_ENABLE
188Permit
189.Fn kevent
190to return the event if it is triggered.
191.It EV_DISABLE
192Disable the event so
193.Fn kevent
194will not return it.
195The filter itself is not disabled.
196.It EV_DELETE
197Removes the event from the kqueue.
198Events which are attached to file descriptors are automatically deleted
199on the last close of the descriptor.
200.It EV_ONESHOT
201Causes the event to return only the first occurrence of the filter
202being triggered.
203After the user retrieves the event from the kqueue, it is deleted.
204.It EV_CLEAR
205After the event is retrieved by the user, its state is reset.
206This is useful for filters which report state transitions
207instead of the current state.
208Note that some filters may automatically set this flag internally.
209.It EV_EOF
210Filters may set this flag to indicate filter-specific EOF condition.
211.It EV_ERROR
212See
213.Sx RETURN VALUES
214below.
215.El
216.Ss Filters
217Filters are identified by a number.
218There are two types of filters; pre-defined filters which
219are described below, and third-party filters that may be added with
220.Xr kfilter_register 2
221by kernel sub-systems, third-party device drivers, or loadable
222kernel modules.
223.Pp
224As a third-party filter is referenced by a well-known name instead
225of a statically assigned number, two
226.Xr ioctl 2 Ns s
227are supported on the file descriptor returned by
228.Fn kqueue
229to map a filter name to a filter number, and vice-versa (passing
230arguments in a structure described below):
231.Bl -tag -width KFILTER_BYFILTER -offset indent
232.It KFILTER_BYFILTER
233Map
234.Va filter
235to
236.Va name ,
237which is of size
238.Va len .
239.It KFILTER_BYNAME
240Map
241.Va name
242to
243.Va filter .
244.Va len
245is ignored.
246.El
247.Pp
248The following structure is used to pass arguments in and out of the
249.Xr ioctl 2 :
250.Bd -literal -offset indent
251struct kfilter_mapping {
252	char	 *name;		/* name to lookup or return */
253	size_t	 len;		/* length of name */
254	uint32_t filter;	/* filter to lookup or return */
255};
256.Ed
257.Pp
258Arguments may be passed to and from the filter via the
259.Va fflags
260and
261.Va data
262fields in the kevent structure.
263.Pp
264The predefined system filters are:
265.Bl -tag -width EVFILT_SIGNAL
266.It EVFILT_READ
267Takes a descriptor as the identifier, and returns whenever
268there is data available to read.
269The behavior of the filter is slightly different depending
270on the descriptor type.
271.Pp
272.Bl -tag -width 2n
273.It Sockets
274Sockets which have previously been passed to
275.Fn listen
276return when there is an incoming connection pending.
277.Va data
278contains the size of the listen backlog (i.e., the number of
279connections ready to be accepted with
280.Xr accept 2 . )
281.Pp
282Other socket descriptors return when there is data to be read,
283subject to the
284.Dv SO_RCVLOWAT
285value of the socket buffer.
286This may be overridden with a per-filter low water mark at the
287time the filter is added by setting the
288NOTE_LOWAT
289flag in
290.Va fflags ,
291and specifying the new low water mark in
292.Va data .
293On return,
294.Va data
295contains the number of bytes in the socket buffer.
296.Pp
297If the read direction of the socket has shutdown, then the filter
298also sets EV_EOF in
299.Va flags ,
300and returns the socket error (if any) in
301.Va fflags .
302It is possible for EOF to be returned (indicating the connection is gone)
303while there is still data pending in the socket buffer.
304.It Vnodes
305Returns when the file pointer is not at the end of file.
306.Va data
307contains the offset from current position to end of file,
308and may be negative.
309.It "Fifos, Pipes"
310Returns when the there is data to read;
311.Va data
312contains the number of bytes available.
313.Pp
314When the last writer disconnects, the filter will set EV_EOF in
315.Va flags .
316This may be cleared by passing in EV_CLEAR, at which point the
317filter will resume waiting for data to become available before
318returning.
319.El
320.It EVFILT_WRITE
321Takes a descriptor as the identifier, and returns whenever
322it is possible to write to the descriptor.
323For sockets, pipes, fifos, and ttys,
324.Va data
325will contain the amount of space remaining in the write buffer.
326The filter will set EV_EOF when the reader disconnects, and for
327the fifo case, this may be cleared by use of EV_CLEAR.
328Note that this filter is not supported for vnodes.
329.Pp
330For sockets, the low water mark and socket error handling is
331identical to the EVFILT_READ case.
332.It EVFILT_AIO
333This is not implemented in
334.Nx .
335.ig
336The sigevent portion of the AIO request is filled in, with
337.Va sigev_notify_kqueue
338containing the descriptor of the kqueue that the event should
339be attached to,
340.Va sigev_value
341containing the udata value, and
342.Va sigev_notify
343set to SIGEV_EVENT.
344When the aio_* function is called, the event will be registered
345with the specified kqueue, and the
346.Va ident
347argument set to the
348.Fa struct aiocb
349returned by the aio_* function.
350The filter returns under the same conditions as aio_error.
351.Pp
352Alternatively, a kevent structure may be initialized, with
353.Va ident
354containing the descriptor of the kqueue, and the
355address of the kevent structure placed in the
356.Va aio_lio_opcode
357field of the AIO request.
358However, this approach will not work on
359architectures with 64-bit pointers, and should be considered deprecated.
360..
361.It EVFILT_VNODE
362Takes a file descriptor as the identifier and the events to watch for in
363.Va fflags ,
364and returns when one or more of the requested events occurs on the descriptor.
365The events to monitor are:
366.Bl -tag -width XXNOTE_RENAME
367.It NOTE_DELETE
368.Fn unlink
369was called on the file referenced by the descriptor.
370.It NOTE_WRITE
371A write occurred on the file referenced by the descriptor.
372.It NOTE_EXTEND
373The file referenced by the descriptor was extended.
374.It NOTE_ATTRIB
375The file referenced by the descriptor had its attributes changed.
376.It NOTE_LINK
377The link count on the file changed.
378.It NOTE_RENAME
379The file referenced by the descriptor was renamed.
380.It NOTE_REVOKE
381Access to the file was revoked via
382.Xr revoke 2
383or the underlying fileystem was unmounted.
384.El
385.Pp
386On return,
387.Va fflags
388contains the events which triggered the filter.
389.It EVFILT_PROC
390Takes the process ID to monitor as the identifier and the events to watch for
391in
392.Va fflags ,
393and returns when the process performs one or more of the requested events.
394If a process can normally see another process, it can attach an event to it.
395The events to monitor are:
396.Bl -tag -width XXNOTE_TRACKERR
397.It NOTE_EXIT
398The process has exited.
399.It NOTE_FORK
400The process has called
401.Fn fork .
402.It NOTE_EXEC
403The process has executed a new process via
404.Xr execve 2
405or similar call.
406.It NOTE_TRACK
407Follow a process across
408.Fn fork
409calls.
410The parent process will return with NOTE_TRACK set in the
411.Va fflags
412field, while the child process will return with NOTE_CHILD set in
413.Va fflags
414and the parent PID in
415.Va data .
416.It NOTE_TRACKERR
417This flag is returned if the system was unable to attach an event to
418the child process, usually due to resource limitations.
419.El
420.Pp
421On return,
422.Va fflags
423contains the events which triggered the filter.
424.It EVFILT_SIGNAL
425Takes the signal number to monitor as the identifier and returns
426when the given signal is delivered to the current process.
427This coexists with the
428.Fn signal
429and
430.Fn sigaction
431facilities, and has a lower precedence.
432The filter will record
433all attempts to deliver a signal to a process, even if the signal has
434been marked as SIG_IGN.
435Event notification happens after normal signal delivery processing.
436.Va data
437returns the number of times the signal has occurred since the last call to
438.Fn kevent .
439This filter automatically sets the EV_CLEAR flag internally.
440.El
441.Sh RETURN VALUES
442.Fn kqueue
443creates a new kernel event queue and returns a file descriptor.
444If there was an error creating the kernel event queue, a value of \-1 is
445returned and errno set.
446.Pp
447.Fn kevent
448returns the number of events placed in the
449.Fa eventlist ,
450up to the value given by
451.Fa nevents .
452If an error occurs while processing an element of the
453.Fa changelist
454and there is enough room in the
455.Fa eventlist ,
456then the event will be placed in the
457.Fa eventlist
458with
459.Dv EV_ERROR
460set in
461.Va flags
462and the system error in
463.Va data .
464Otherwise,
465.Dv \-1
466will be returned, and
467.Dv errno
468will be set to indicate the error condition.
469If the time limit expires, then
470.Fn kevent
471returns 0.
472.Sh ERRORS
473The
474.Fn kqueue
475function fails if:
476.Bl -tag -width Er
477.It Bq Er ENOMEM
478The kernel failed to allocate enough memory for the kernel queue.
479.It Bq Er EMFILE
480The per-process descriptor table is full.
481.It Bq Er ENFILE
482The system file table is full.
483.El
484.Pp
485The
486.Fn kevent
487function fails if:
488.Bl -tag -width Er
489.It Bq Er EACCES
490The process does not have permission to register a filter.
491.It Bq Er EFAULT
492There was an error reading or writing the
493.Va kevent
494structure.
495.It Bq Er EBADF
496The specified descriptor is invalid.
497.It Bq Er EINTR
498A signal was delivered before the timeout expired and before any
499events were placed on the kqueue for return.
500.It Bq Er EINVAL
501The specified time limit or filter is invalid.
502.It Bq Er ENOENT
503The event could not be found to be modified or deleted.
504.It Bq Er ENOMEM
505No memory was available to register the event.
506.It Bq Er ESRCH
507The specified process to attach to does not exist.
508.El
509.Sh SEE ALSO
510.\" .Xr aio_error 2 ,
511.\" .Xr aio_read 2 ,
512.\" .Xr aio_return 2 ,
513.Xr ioctl 2 ,
514.Xr poll 2 ,
515.Xr read 2 ,
516.Xr select 2 ,
517.Xr sigaction 2 ,
518.Xr write 2 ,
519.Xr signal 3 ,
520.Xr kfilter_register 9 ,
521.Xr knote 9
522.Sh HISTORY
523The
524.Fn kqueue
525and
526.Fn kevent
527functions first appeared in
528.Fx 4.1 ,
529and then in
530.Nx .
531This interface is currently only available on experimental kernel branch
532.Li kqueue
533in
534.Nx .
535.Sh AUTHORS
536The
537.Fn kqueue
538system and this manual page were written by
539.An Jonathan Lemon Aq jlemon@FreeBSD.org .
540.Nx
541port and manpage additions were done by
542.An Luke Mewburn Aq lukem@NetBSD.org ,
543.An Jason Thorpe Aq thorpej@NetBSD.org ,
544and
545.An Jaromir Dolecek Aq jdolecek@NetBSD.org .
546