xref: /openbsd-src/lib/libc/sys/kqueue.2 (revision 00d87da49ed1966106220ad18aaa171ad0b06741)
1.\"	$OpenBSD: kqueue.2,v 1.51 2023/08/20 19:52:40 jmc Exp $
2.\"
3.\" Copyright (c) 2000 Jonathan Lemon
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.18 2001/02/14 08:48:35 guido Exp $
28.\"
29.Dd $Mdocdate: August 20 2023 $
30.Dt KQUEUE 2
31.Os
32.Sh NAME
33.Nm kqueue ,
34.Nm kqueue1 ,
35.Nm kevent ,
36.Nm EV_SET
37.Nd kernel event notification mechanism
38.Sh SYNOPSIS
39.In sys/types.h
40.In sys/event.h
41.In sys/time.h
42.Ft int
43.Fn kqueue "void"
44.Ft int
45.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
46.Fn EV_SET "&kev" ident filter flags fflags data udata
47.In sys/types.h
48.In sys/event.h
49.In sys/time.h
50.In fcntl.h
51.Ft int
52.Fn kqueue1 "int flags"
53.Sh DESCRIPTION
54.Fn kqueue
55provides a generic method of notifying the user when an event
56happens or a condition holds, based on the results of small
57pieces of kernel code termed
58.Dq filters .
59A kevent is identified by the (ident, filter) pair; there may only
60be one unique kevent per kqueue.
61.Pp
62The filter is executed upon the initial registration of a kevent
63in order to detect whether a preexisting condition is present, and is also
64executed whenever an event is passed to the filter for evaluation.
65If the filter determines that the condition should be reported,
66then the kevent is placed on the kqueue for the user to retrieve.
67.Pp
68The filter is also run when the user attempts to retrieve the kevent
69from the kqueue.
70If the filter indicates that the condition that triggered
71the event no longer holds, the kevent is removed from the kqueue and
72is not returned.
73.Pp
74Multiple events which trigger the filter do not result in multiple
75kevents being placed on the kqueue; instead, the filter will aggregate
76the events into a single
77.Vt struct kevent .
78Calling
79.Xr close 2
80on a file descriptor will remove any kevents that reference the descriptor.
81.Pp
82.Fn kqueue
83creates a new kernel event queue and returns a descriptor.
84The queue is not inherited by a child created with
85.Xr fork 2 .
86Similarly, kqueues cannot be passed across UNIX-domain sockets.
87.Pp
88The
89.Fn kqueue1
90function is identical to
91.Fn kqueue
92except that the close-on-exec flag on the new file descriptor
93is determined by the
94.Dv O_CLOEXEC
95flag
96in the
97.Fa flags
98argument.
99.Pp
100.Fn kevent
101is used to register events with the queue, and return any pending
102events to the user.
103.Fa changelist
104is a pointer to an array of
105.Vt kevent
106structures, as defined in
107.In sys/event.h .
108All changes contained in the
109.Fa changelist
110are applied before any pending events are read from the queue.
111.Fa nchanges
112gives the size of
113.Fa changelist .
114.Fa eventlist
115is a pointer to an array of
116.Vt kevent
117structures.
118.Fa nevents
119determines the size of
120.Fa eventlist .
121When
122.Fa nevents
123is zero,
124.Fn kevent
125will return immediately even if there is a
126.Fa timeout
127specified, unlike
128.Xr select 2 .
129If
130.Fa timeout
131is not
132.Dv NULL ,
133it specifies a maximum interval to wait
134for an event, which will be interpreted as a
135.Vt struct timespec .
136If
137.Fa timeout
138is
139.Dv NULL ,
140.Fn kevent
141waits indefinitely.
142To effect a poll, the
143.Fa timeout
144argument should not be
145.Dv NULL ,
146pointing to a zero-valued
147.Vt struct timespec .
148The same array may be used for the
149.Fa changelist
150and
151.Fa eventlist .
152.Pp
153.Fn EV_SET
154is a macro which is provided for ease of initializing a
155.Vt kevent
156structure.
157.Pp
158The
159.Vt kevent
160structure is defined as:
161.Bd -literal
162struct kevent {
163	uintptr_t   ident;	/* identifier for this event */
164	short	    filter;	/* filter for event */
165	u_short	    flags;	/* action flags for kqueue */
166	u_int	    fflags;	/* filter flag value */
167	int64_t	    data;	/* filter data value */
168	void	   *udata;	/* opaque user data identifier */
169};
170.Ed
171.Pp
172The fields of
173.Vt struct kevent
174are:
175.Bl -tag -width XXXfilter
176.It Fa ident
177Value used to identify this event.
178The exact interpretation is determined by the attached filter,
179but often is a file descriptor.
180.It Fa filter
181Identifies the kernel filter used to process this event.
182The pre-defined system filters are described below.
183.It Fa flags
184Actions to perform on the event.
185.It Fa fflags
186Filter-specific flags.
187.It Fa data
188Filter-specific data value.
189.It Fa udata
190Opaque user-defined value passed through the kernel unchanged.
191.El
192.Pp
193The
194.Fa flags
195field can contain the following values:
196.Bl -tag -width XXXEV_ONESHOT
197.It Dv EV_ADD
198Adds the event to the kqueue.
199Re-adding an existing event will modify the parameters of the original event,
200and not result in a duplicate entry.
201Adding an event automatically enables it, unless overridden by the
202.Dv EV_DISABLE
203flag.
204.It Dv EV_ENABLE
205Permit
206.Fn kevent
207to return the event if it is triggered.
208.It Dv EV_DISABLE
209Disable the event so
210.Fn kevent
211will not return it.
212The filter itself is not disabled.
213.It Dv EV_DISPATCH
214Disable the event source immediately after delivery of an event.
215See
216.Dv EV_DISABLE
217above.
218.It Dv EV_DELETE
219Removes the event from the kqueue.
220Events which are attached to file descriptors are automatically deleted
221on the last close of the descriptor.
222.It Dv EV_RECEIPT
223Causes
224.Fn kevent
225to return with
226.Dv EV_ERROR
227set without draining any pending events after updating events in the kqueue.
228When a filter is successfully added, the
229.Fa data
230field will be zero.
231This flag is useful for making bulk changes to a kqueue.
232.It Dv EV_ONESHOT
233Causes the event to return only the first occurrence of the filter
234being triggered.
235After the user retrieves the event from the kqueue, it is deleted.
236.It Dv EV_CLEAR
237After the event is retrieved by the user, its state is reset.
238This is useful for filters which report state transitions
239instead of the current state.
240Note that some filters may automatically set this flag internally.
241.It Dv EV_EOF
242Filters may set this flag to indicate filter-specific EOF condition.
243.It Dv EV_ERROR
244See
245.Sx RETURN VALUES
246below.
247.El
248.Pp
249The predefined system filters are listed below.
250Arguments may be passed to and from the filter via the
251.Fa fflags
252and
253.Fa data
254fields in the
255.Vt kevent
256structure.
257.Bl -tag -width EVFILT_SIGNAL
258.It Dv EVFILT_READ
259Takes a descriptor as the identifier, and returns whenever
260there is data available to read.
261The behavior of the filter is slightly different depending
262on the descriptor type.
263.Bl -tag -width 2n
264.It Sockets
265Sockets which have previously been passed to
266.Xr listen 2
267return when there is an incoming connection pending.
268.Fa data
269contains the size of the listen backlog.
270.Pp
271Other socket descriptors return when there is data to be read,
272subject to the
273.Dv SO_RCVLOWAT
274value of the socket buffer.
275This may be overridden with a per-filter low water mark at the
276time the filter is added by setting the
277.Dv NOTE_LOWAT
278flag in
279.Fa fflags ,
280and specifying the new low water mark in
281.Fa data .
282On return,
283.Fa data
284contains the number of bytes in the socket buffer.
285.Pp
286If the read direction of the socket has shutdown, then the filter
287also sets
288.Dv EV_EOF
289in
290.Fa flags ,
291and returns the socket error (if any) in
292.Fa fflags .
293It is possible for EOF to be returned (indicating the connection is gone)
294while there is still data pending in the socket buffer.
295.It Vnodes
296Returns when the file pointer is not at the end of file.
297.Fa data
298contains the offset from current position to end of file,
299and may be negative.
300If
301.Dv NOTE_EOF
302is set in
303.Fa fflags ,
304.Fn kevent
305will also return when the file pointer is at the end of file.
306The end of file condition is indicated by the presence of
307.Dv NOTE_EOF
308in
309.Fa fflags
310on return.
311.It "FIFOs, Pipes"
312Returns when there is data to read;
313.Fa data
314contains the number of bytes available.
315.Pp
316When the last writer disconnects, the filter will set
317.Dv EV_EOF
318in
319.Fa flags .
320This may be cleared by passing in
321.Dv EV_CLEAR ,
322at which point the filter will resume waiting for data to become
323available before returning.
324.It "BPF devices"
325Returns when the BPF buffer is full, the BPF timeout has expired, or
326when the BPF has
327.Dq immediate mode
328enabled and there is any data to read;
329.Fa data
330contains the number of bytes available.
331.El
332.It Dv EVFILT_EXCEPT
333Takes a descriptor as the identifier, and returns whenever one of the
334specified exceptional conditions has occurred on the descriptor.
335Conditions are specified in
336.Fa fflags .
337Currently, a filter can monitor the reception of out-of-band data
338on a socket or pseudo terminal with
339.Dv NOTE_OOB .
340.It Dv EVFILT_WRITE
341Takes a descriptor as the identifier, and returns whenever
342it is possible to write to the descriptor.
343For sockets, pipes, and FIFOs,
344.Fa data
345will contain the amount of space remaining in the write buffer.
346The filter will set
347.Dv EV_EOF
348when the reader disconnects, and for the FIFO case,
349this may be cleared by use of
350.Dv EV_CLEAR .
351Note that this filter is not supported for vnodes or BPF devices.
352.Pp
353For sockets, the low water mark and socket error handling is
354identical to the
355.Dv EVFILT_READ
356case.
357.\".It Dv EVFILT_AIO
358.\"The sigevent portion of the AIO request is filled in, with
359.\".Va sigev_notify_kqueue
360.\"containing the descriptor of the kqueue that the event should
361.\"be attached to,
362.\".Va sigev_value
363.\"containing the udata value, and
364.\".Va sigev_notify
365.\"set to
366.\".Dv SIGEV_KEVENT .
367.\"When the aio_* function is called, the event will be registered
368.\"with the specified kqueue, and the
369.\".Va ident
370.\"argument set to the
371.\".Li struct aiocb
372.\"returned by the aio_* function.
373.\"The filter returns under the same conditions as aio_error.
374.\".Pp
375.\"Alternatively, a kevent structure may be initialized, with
376.\".Va ident
377.\"containing the descriptor of the kqueue, and the
378.\"address of the kevent structure placed in the
379.\".Va aio_lio_opcode
380.\"field of the AIO request.
381.\"However, this approach will not work on architectures with 64-bit pointers,
382.\"and should be considered deprecated.
383.It Dv EVFILT_VNODE
384Takes a file descriptor as the identifier and the events to watch for in
385.Fa fflags ,
386and returns when one or more of the requested events occurs on the descriptor.
387The events to monitor are:
388.Bl -tag -width XXNOTE_RENAME
389.It Dv NOTE_DELETE
390.Xr unlink 2
391was called on the file referenced by the descriptor.
392.It Dv NOTE_WRITE
393A write occurred on the file referenced by the descriptor.
394.It Dv NOTE_EXTEND
395The file referenced by the descriptor was extended.
396.It Dv NOTE_TRUNCATE
397The file referenced by the descriptor was truncated.
398.It Dv NOTE_ATTRIB
399The file referenced by the descriptor had its attributes changed.
400.It Dv NOTE_LINK
401The link count on the file changed.
402.It Dv NOTE_RENAME
403The file referenced by the descriptor was renamed.
404.It Dv NOTE_REVOKE
405Access to the file was revoked via
406.Xr revoke 2
407or the underlying file system was unmounted.
408.El
409.Pp
410On return,
411.Fa fflags
412contains the events which triggered the filter.
413.It Dv EVFILT_PROC
414Takes the process ID to monitor as the identifier and the events to watch for
415in
416.Fa fflags ,
417and returns when the process performs one or more of the requested events.
418If a process can normally see another process, it can attach an event to it.
419The events to monitor are:
420.Bl -tag -width XXNOTE_TRACKERR
421.It Dv NOTE_EXIT
422The process has exited.
423The exit status will be stored in
424.Fa data
425in the same format as the status set by
426.Xr wait 2 .
427.It Dv NOTE_FORK
428The process has called
429.Xr fork 2 .
430.It Dv NOTE_EXEC
431The process has executed a new process via
432.Xr execve 2
433or similar call.
434.It Dv NOTE_TRACK
435Follow a process across
436.Xr fork 2
437calls.
438The parent process will return with
439.Dv NOTE_FORK
440set in the
441.Fa fflags
442field, while the child process will return with
443.Dv NOTE_CHILD
444set in
445.Fa fflags
446and the parent PID in
447.Fa data .
448.It Dv NOTE_TRACKERR
449This flag is returned if the system was unable to attach an event to
450the child process, usually due to resource limitations.
451.El
452.Pp
453On return,
454.Fa fflags
455contains the events which triggered the filter.
456.It Dv EVFILT_SIGNAL
457Takes the signal number to monitor as the identifier and returns
458when the given signal is delivered to the process.
459This coexists with the
460.Xr signal 3
461and
462.Xr sigaction 2
463facilities, and has a lower precedence.
464The filter will record all attempts to deliver a signal to a process,
465even if the signal has been marked as
466.Dv SIG_IGN .
467Event notification happens after normal signal delivery processing.
468.Fa data
469returns the number of times the signal has occurred since the last call to
470.Fn kevent .
471This filter automatically sets the
472.Dv EV_CLEAR
473flag internally.
474.It Dv EVFILT_TIMER
475Establishes an arbitrary timer identified by
476.Fa ident .
477When adding a timer,
478.Fa data
479specifies the timeout period in units described below or, if
480.Dv NOTE_ABSTIME
481is set in
482.Va fflags ,
483the absolute time at which the timer should fire.
484The timer will repeat unless
485.Dv EV_ONESHOT
486is set in
487.Va flags
488or
489.Dv NOTE_ABSTIME
490is set in
491.Va fflags .
492On return,
493.Fa data
494contains the number of times the timeout has expired since the last call to
495.Fn kevent .
496This filter automatically sets
497.Dv EV_CLEAR
498in
499.Va flags
500for periodic timers.
501Timers created with
502.Dv NOTE_ABSTIME
503remain activated on the kqueue once the absolute time has passed unless
504.Dv EV_CLEAR
505or
506.Dv EV_ONESHOT
507are also specified.
508.Pp
509The filter accepts the following flags in the
510.Va fflags
511argument:
512.Bl -tag -width NOTE_MSECONDS
513.It Dv NOTE_SECONDS
514The timer value in
515.Va data
516is expressed in seconds.
517.It Dv NOTE_MSECONDS
518The timer value in
519.Va data
520is expressed in milliseconds.
521.It Dv NOTE_USECONDS
522The timer value in
523.Va data
524is expressed in microseconds.
525.It Dv NOTE_NSECONDS
526The timer value in
527.Va data
528is expressed in nanoseconds.
529.It Dv NOTE_ABSTIME
530The timer value is an absolute time with
531.Dv CLOCK_REALTIME
532as the reference clock.
533.El
534.Pp
535Note that
536.Dv NOTE_SECONDS ,
537.Dv NOTE_MSECONDS ,
538.Dv NOTE_USECONDS ,
539and
540.Dv NOTE_NSECONDS
541are mutually exclusive; behavior is undefined if more than one are specified.
542If a timer value unit is not specified, the default is
543.Dv NOTE_MSECONDS .
544.Pp
545If an existing timer is re-added, the existing timer and related pending events
546will be cancelled.
547The timer will be re-started using the timeout period
548.Fa data .
549.It Dv EVFILT_DEVICE
550Takes a descriptor as the identifier and the events to watch for in
551.Fa fflags ,
552and returns when one or more of the requested events occur on the
553descriptor.
554The events to monitor are:
555.Bl -tag -width XXNOTE_CHANGE
556.It Dv NOTE_CHANGE
557A device change event has occurred,
558e.g. an HDMI cable has been plugged in to a port.
559.El
560.Pp
561On return,
562.Fa fflags
563contains the events which triggered the filter.
564.El
565.Sh RETURN VALUES
566.Fn kqueue
567and
568.Fn kqueue1
569create a new kernel event queue and returns a file descriptor.
570If there was an error creating the kernel event queue, a value of -1 is
571returned and
572.Va errno
573set.
574.Pp
575.Fn kevent
576returns the number of events placed in the
577.Fa eventlist ,
578up to the value given by
579.Fa nevents .
580If an error occurs while processing an element of the
581.Fa changelist
582and there is enough room in the
583.Fa eventlist ,
584then the event will be placed in the
585.Fa eventlist
586with
587.Dv EV_ERROR
588set in
589.Fa flags
590and the system error in
591.Fa data .
592Otherwise, -1 will be returned, and
593.Va errno
594will be set to indicate the error condition.
595If the time limit expires, then
596.Fn kevent
597returns 0.
598.Sh ERRORS
599The
600.Fn kqueue
601and
602.Fn kqueue1
603functions fail if:
604.Bl -tag -width Er
605.It Bq Er ENOMEM
606The kernel failed to allocate enough memory for the kernel queue.
607.It Bq Er EMFILE
608The per-process descriptor table is full.
609.It Bq Er ENFILE
610The system file table is full.
611.El
612.Pp
613In addition,
614.Fn kqueue1
615fails if:
616.Bl -tag -width Er
617.It Bq Er EINVAL
618.Fa flags
619is invalid.
620.El
621.Pp
622The
623.Fn kevent
624function fails if:
625.Bl -tag -width Er
626.It Bq Er EACCES
627The process does not have permission to register a filter.
628.It Bq Er EFAULT
629There was an error reading or writing the
630.Vt kevent
631structure.
632.It Bq Er EBADF
633The specified descriptor is invalid.
634.It Bq Er EINTR
635A signal was delivered before the timeout expired and before any
636events were placed on the kqueue for return.
637.It Bq Er EINVAL
638The specified time limit or filter is invalid.
639.It Bq Er ENOENT
640The event could not be found to be modified or deleted.
641.It Bq Er ENOMEM
642No memory was available to register the event.
643.It Bq Er ESRCH
644The specified process to attach to does not exist.
645.El
646.Sh SEE ALSO
647.Xr clock_gettime 2 ,
648.Xr poll 2 ,
649.Xr read 2 ,
650.Xr select 2 ,
651.Xr sigaction 2 ,
652.Xr wait 2 ,
653.Xr write 2 ,
654.Xr signal 3
655.Sh HISTORY
656The
657.Fn kqueue
658and
659.Fn kevent
660functions first appeared in
661.Fx 4.1
662and have been available since
663.Ox 2.9 .
664.Sh AUTHORS
665The
666.Fn kqueue
667system and this manual page were written by
668.An Jonathan Lemon Aq Mt jlemon@FreeBSD.org .
669