xref: /netbsd-src/lib/libc/sys/kqueue.2 (revision d48f14661dda8638fee055ba15d35bdfb29b9fa8)
1.\"	$NetBSD: kqueue.2,v 1.19 2003/12/09 19:49:53 augustss Exp $
2.\"
3.\" Copyright (c) 2000 Jonathan Lemon
4.\" All rights reserved.
5.\"
6.\" Copyright (c) 2001, 2002, 2003 The NetBSD Foundation, Inc.
7.\" All rights reserved.
8.\"
9.\" Portions of this documentation is derived from text contributed by
10.\" Luke Mewburn.
11.\"
12.\" Redistribution and use in source and binary forms, with or without
13.\" modification, are permitted provided that the following conditions
14.\" are met:
15.\" 1. Redistributions of source code must retain the above copyright
16.\"    notice, this list of conditions and the following disclaimer.
17.\" 2. Redistributions in binary form must reproduce the above copyright
18.\"    notice, this list of conditions and the following disclaimer in the
19.\"    documentation and/or other materials provided with the distribution.
20.\"
21.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $
34.\"
35.Dd February 4, 2003
36.Dt KQUEUE 2
37.Os
38.Sh NAME
39.Nm kqueue ,
40.Nm kevent
41.Nd kernel event notification mechanism
42.Sh LIBRARY
43.Lb libc
44.Sh SYNOPSIS
45.In sys/event.h
46.In sys/time.h
47.Ft int
48.Fn kqueue "void"
49.Ft int
50.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout"
51.Fn EV_SET "&kev" ident filter flags fflags data udata
52.Sh DESCRIPTION
53.Fn kqueue
54provides a generic method of notifying the user when an event
55happens or a condition holds, based on the results of small
56pieces of kernel code termed filters.
57A kevent is identified by the (ident, filter) pair; there may only
58be one unique kevent per kqueue.
59.Pp
60The filter is executed upon the initial registration of a kevent
61in order to detect whether a preexisting condition is present, and is also
62executed whenever an event is passed to the filter for evaluation.
63If the filter determines that the condition should be reported,
64then the kevent is placed on the kqueue for the user to retrieve.
65.Pp
66The filter is also run when the user attempts to retrieve the kevent
67from the kqueue.
68If the filter indicates that the condition that triggered
69the event no longer holds, the kevent is removed from the kqueue and
70is not returned.
71.Pp
72Multiple events which trigger the filter do not result in multiple
73kevents being placed on the kqueue; instead, the filter will aggregate
74the events into a single struct kevent.
75Calling
76.Fn close
77on a file descriptor will remove any kevents that reference the descriptor.
78.Pp
79.Fn kqueue
80creates a new kernel event queue and returns a descriptor.
81The queue is not inherited by a child created with
82.Xr fork 2 .
83.ig
84However, if
85.Xr rfork 2
86is called without the
87.Dv RFFDG
88flag, then the descriptor table is shared,
89which will allow sharing of the kqueue between two processes.
90..
91.Pp
92.Fn kevent
93is used to register events with the queue, and return any pending
94events to the user.
95.Fa changelist
96is a pointer to an array of
97.Va kevent
98structures, as defined in
99.Aq Pa sys/event.h .
100All changes contained in the
101.Fa changelist
102are applied before any pending events are read from the queue.
103.Fa nchanges
104gives the size of
105.Fa changelist .
106.Fa eventlist
107is a pointer to an array of kevent structures.
108.Fa nevents
109determines the size of
110.Fa eventlist .
111If
112.Fa timeout
113is a
114.No non- Ns Dv NULL
115pointer, it specifies a maximum interval to wait
116for an event, which will be interpreted as a struct timespec.
117If
118.Fa timeout
119is a
120.Dv NULL
121pointer,
122.Fn kevent
123waits indefinitely.
124To effect a poll, the
125.Fa timeout
126argument should be
127.No non- Ns Dv NULL ,
128pointing to a zero-valued
129.Va timespec
130structure.
131The same array may be used for the
132.Fa changelist
133and
134.Fa eventlist .
135.Pp
136.Fn EV_SET
137is a macro which is provided for ease of initializing a
138kevent structure.
139.Pp
140The
141.Va kevent
142structure is defined as:
143.Bd -literal
144struct kevent {
145	uintptr_t ident;	/* identifier for this event */
146	uint32_t  filter;	/* filter for event */
147	uint32_t  flags;	/* action flags for kqueue */
148	uint32_t  fflags;	/* filter flag value */
149	int64_t   data;		/* filter data value */
150	intptr_t  udata;	/* opaque user data identifier */
151};
152.Ed
153.Pp
154The fields of
155.Fa struct kevent
156are:
157.Bl -tag -width XXXfilter -offset indent
158.It ident
159Value used to identify this event.
160The exact interpretation is determined by the attached filter,
161but often is a file descriptor.
162.It filter
163Identifies the kernel filter used to process this event.
164There are pre-defined system filters (which are described below), and
165other filters may be added by kernel subsystems as necessary.
166.It flags
167Actions to perform on the event.
168.It fflags
169Filter-specific flags.
170.It data
171Filter-specific data value.
172.It udata
173Opaque user-defined value passed through the kernel unchanged.
174.El
175.Pp
176The
177.Va flags
178field can contain the following values:
179.Bl -tag -width XXXEV_ONESHOT -offset indent
180.It EV_ADD
181Adds the event to the kqueue.
182Re-adding an existing event will modify the parameters of the original
183event, and not result in a duplicate entry.
184Adding an event automatically enables it,
185unless overridden by the EV_DISABLE flag.
186.It EV_ENABLE
187Permit
188.Fn kevent
189to return the event if it is triggered.
190.It EV_DISABLE
191Disable the event so
192.Fn kevent
193will not return it.
194The filter itself is not disabled.
195.It EV_DELETE
196Removes the event from the kqueue.
197Events which are attached to file descriptors are automatically deleted
198on the last close of the descriptor.
199.It EV_ONESHOT
200Causes the event to return only the first occurrence of the filter
201being triggered.
202After the user retrieves the event from the kqueue, it is deleted.
203.It EV_CLEAR
204After the event is retrieved by the user, its state is reset.
205This is useful for filters which report state transitions
206instead of the current state.
207Note that some filters may automatically set this flag internally.
208.It EV_EOF
209Filters may set this flag to indicate filter-specific EOF condition.
210.It EV_ERROR
211See
212.Sx RETURN VALUES
213below.
214.El
215.Ss Filters
216Filters are identified by a number.
217There are two types of filters; pre-defined filters which
218are described below, and third-party filters that may be added with
219.Xr kfilter_register 9
220by kernel sub-systems, third-party device drivers, or loadable
221kernel modules.
222.Pp
223As a third-party filter is referenced by a well-known name instead
224of a statically assigned number, two
225.Xr ioctl 2 Ns s
226are supported on the file descriptor returned by
227.Fn kqueue
228to map a filter name to a filter number, and vice-versa (passing
229arguments in a structure described below):
230.Bl -tag -width KFILTER_BYFILTER -offset indent
231.It KFILTER_BYFILTER
232Map
233.Va filter
234to
235.Va name ,
236which is of size
237.Va len .
238.It KFILTER_BYNAME
239Map
240.Va name
241to
242.Va filter .
243.Va len
244is ignored.
245.El
246.Pp
247The following structure is used to pass arguments in and out of the
248.Xr ioctl 2 :
249.Bd -literal -offset indent
250struct kfilter_mapping {
251	char	 *name;		/* name to lookup or return */
252	size_t	 len;		/* length of name */
253	uint32_t filter;	/* filter to lookup or return */
254};
255.Ed
256.Pp
257Arguments may be passed to and from the filter via the
258.Va fflags
259and
260.Va data
261fields in the kevent structure.
262.Pp
263The predefined system filters are:
264.Bl -tag -width EVFILT_SIGNAL
265.It EVFILT_READ
266Takes a descriptor as the identifier, and returns whenever
267there is data available to read.
268The behavior of the filter is slightly different depending
269on the descriptor type.
270.Pp
271.Bl -tag -width 2n
272.It Sockets
273Sockets which have previously been passed to
274.Fn listen
275return when there is an incoming connection pending.
276.Va data
277contains the size of the listen backlog (i.e., the number of
278connections ready to be accepted with
279.Xr accept 2 . )
280.Pp
281Other socket descriptors return when there is data to be read,
282subject to the
283.Dv SO_RCVLOWAT
284value of the socket buffer.
285This may be overridden with a per-filter low water mark at the
286time the filter is added by setting the
287NOTE_LOWAT
288flag in
289.Va fflags ,
290and specifying the new low water mark in
291.Va data .
292On return,
293.Va data
294contains the number of bytes in the socket buffer.
295.Pp
296If the read direction of the socket has shutdown, then the filter
297also sets EV_EOF in
298.Va flags ,
299and returns the socket error (if any) in
300.Va fflags .
301It is possible for EOF to be returned (indicating the connection is gone)
302while there is still data pending in the socket buffer.
303.It Vnodes
304Returns when the file pointer is not at the end of file.
305.Va data
306contains the offset from current position to end of file,
307and may be negative.
308.It "Fifos, Pipes"
309Returns when the there is data to read;
310.Va data
311contains the number of bytes available.
312.Pp
313When the last writer disconnects, the filter will set EV_EOF in
314.Va flags .
315This may be cleared by passing in EV_CLEAR, at which point the
316filter will resume waiting for data to become available before
317returning.
318.El
319.It EVFILT_WRITE
320Takes a descriptor as the identifier, and returns whenever
321it is possible to write to the descriptor.
322For sockets, pipes, fifos, and ttys,
323.Va data
324will contain the amount of space remaining in the write buffer.
325The filter will set EV_EOF when the reader disconnects, and for
326the fifo case, this may be cleared by use of EV_CLEAR.
327Note that this filter is not supported for vnodes.
328.Pp
329For sockets, the low water mark and socket error handling is
330identical to the EVFILT_READ case.
331.It EVFILT_AIO
332This is not implemented in
333.Nx .
334.ig
335The sigevent portion of the AIO request is filled in, with
336.Va sigev_notify_kqueue
337containing the descriptor of the kqueue that the event should
338be attached to,
339.Va sigev_value
340containing the udata value, and
341.Va sigev_notify
342set to SIGEV_EVENT.
343When the aio_* function is called, the event will be registered
344with the specified kqueue, and the
345.Va ident
346argument set to the
347.Fa struct aiocb
348returned by the aio_* function.
349The filter returns under the same conditions as aio_error.
350.Pp
351Alternatively, a kevent structure may be initialized, with
352.Va ident
353containing the descriptor of the kqueue, and the
354address of the kevent structure placed in the
355.Va aio_lio_opcode
356field of the AIO request.
357However, this approach will not work on
358architectures with 64-bit pointers, and should be considered deprecated.
359..
360.It EVFILT_VNODE
361Takes a file descriptor as the identifier and the events to watch for in
362.Va fflags ,
363and returns when one or more of the requested events occurs on the descriptor.
364The events to monitor are:
365.Bl -tag -width XXNOTE_RENAME
366.It NOTE_DELETE
367.Fn unlink
368was called on the file referenced by the descriptor.
369.It NOTE_WRITE
370A write occurred on the file referenced by the descriptor.
371.It NOTE_EXTEND
372The file referenced by the descriptor was extended.
373.It NOTE_ATTRIB
374The file referenced by the descriptor had its attributes changed.
375.It NOTE_LINK
376The link count on the file changed.
377.It NOTE_RENAME
378The file referenced by the descriptor was renamed.
379.It NOTE_REVOKE
380Access to the file was revoked via
381.Xr revoke 2
382or the underlying fileystem was unmounted.
383.El
384.Pp
385On return,
386.Va fflags
387contains the events which triggered the filter.
388.It EVFILT_PROC
389Takes the process ID to monitor as the identifier and the events to watch for
390in
391.Va fflags ,
392and returns when the process performs one or more of the requested events.
393If a process can normally see another process, it can attach an event to it.
394The events to monitor are:
395.Bl -tag -width XXNOTE_TRACKERR
396.It NOTE_EXIT
397The process has exited.
398.It NOTE_FORK
399The process has called
400.Fn fork .
401.It NOTE_EXEC
402The process has executed a new process via
403.Xr execve 2
404or similar call.
405.It NOTE_TRACK
406Follow a process across
407.Fn fork
408calls.
409The parent process will return with NOTE_TRACK set in the
410.Va fflags
411field, while the child process will return with NOTE_CHILD set in
412.Va fflags
413and the parent PID in
414.Va data .
415.It NOTE_TRACKERR
416This flag is returned if the system was unable to attach an event to
417the child process, usually due to resource limitations.
418.El
419.Pp
420On return,
421.Va fflags
422contains the events which triggered the filter.
423.It EVFILT_SIGNAL
424Takes the signal number to monitor as the identifier and returns
425when the given signal is delivered to the current process.
426This coexists with the
427.Fn signal
428and
429.Fn sigaction
430facilities, and has a lower precedence.
431The filter will record
432all attempts to deliver a signal to a process, even if the signal has
433been marked as SIG_IGN.
434Event notification happens after normal signal delivery processing.
435.Va data
436returns the number of times the signal has occurred since the last call to
437.Fn kevent .
438This filter automatically sets the EV_CLEAR flag internally.
439.It EVFILT_TIMER
440Establishes an arbitrary timer identified by
441.Va ident .
442When adding a timer,
443.Va data
444specifies the timeout period in milliseconds.
445The timer will be periodic unless EV_ONESHOT is specified.
446On return,
447.Va data
448contains the number of times the timeout has expired since the last call to
449.Fn kevent .
450This filter automatically sets the EV_CLEAR flag internally.
451.El
452.Sh RETURN VALUES
453.Fn kqueue
454creates a new kernel event queue and returns a file descriptor.
455If there was an error creating the kernel event queue, a value of \-1 is
456returned and errno set.
457.Pp
458.Fn kevent
459returns the number of events placed in the
460.Fa eventlist ,
461up to the value given by
462.Fa nevents .
463If an error occurs while processing an element of the
464.Fa changelist
465and there is enough room in the
466.Fa eventlist ,
467then the event will be placed in the
468.Fa eventlist
469with
470.Dv EV_ERROR
471set in
472.Va flags
473and the system error in
474.Va data .
475Otherwise,
476.Dv \-1
477will be returned, and
478.Dv errno
479will be set to indicate the error condition.
480If the time limit expires, then
481.Fn kevent
482returns 0.
483.Sh ERRORS
484The
485.Fn kqueue
486function fails if:
487.Bl -tag -width Er
488.It Bq Er ENOMEM
489The kernel failed to allocate enough memory for the kernel queue.
490.It Bq Er EMFILE
491The per-process descriptor table is full.
492.It Bq Er ENFILE
493The system file table is full.
494.El
495.Pp
496The
497.Fn kevent
498function fails if:
499.Bl -tag -width Er
500.It Bq Er EACCES
501The process does not have permission to register a filter.
502.It Bq Er EFAULT
503There was an error reading or writing the
504.Va kevent
505structure.
506.It Bq Er EBADF
507The specified descriptor is invalid.
508.It Bq Er EINTR
509A signal was delivered before the timeout expired and before any
510events were placed on the kqueue for return.
511.It Bq Er EINVAL
512The specified time limit or filter is invalid.
513.It Bq Er ENOENT
514The event could not be found to be modified or deleted.
515.It Bq Er ENOMEM
516No memory was available to register the event.
517.It Bq Er ESRCH
518The specified process to attach to does not exist.
519.El
520.Sh SEE ALSO
521.\" .Xr aio_error 2 ,
522.\" .Xr aio_read 2 ,
523.\" .Xr aio_return 2 ,
524.Xr ioctl 2 ,
525.Xr poll 2 ,
526.Xr read 2 ,
527.Xr select 2 ,
528.Xr sigaction 2 ,
529.Xr write 2 ,
530.Xr signal 3 ,
531.Xr kfilter_register 9 ,
532.Xr knote 9
533.Sh HISTORY
534The
535.Fn kqueue
536and
537.Fn kevent
538functions first appeared in
539.Fx 4.1 ,
540and then in
541.Nx 2.0 .
542.Sh AUTHORS
543The
544.Fn kqueue
545system and this manual page were written by
546.An Jonathan Lemon Aq jlemon@FreeBSD.org .
547.Nx
548port and manpage additions were done by
549.An Luke Mewburn Aq lukem@NetBSD.org ,
550.An Jason Thorpe Aq thorpej@NetBSD.org ,
551and
552.An Jaromir Dolecek Aq jdolecek@NetBSD.org .
553