1.\" $NetBSD: kqueue.2,v 1.19 2003/12/09 19:49:53 augustss Exp $ 2.\" 3.\" Copyright (c) 2000 Jonathan Lemon 4.\" All rights reserved. 5.\" 6.\" Copyright (c) 2001, 2002, 2003 The NetBSD Foundation, Inc. 7.\" All rights reserved. 8.\" 9.\" Portions of this documentation is derived from text contributed by 10.\" Luke Mewburn. 11.\" 12.\" Redistribution and use in source and binary forms, with or without 13.\" modification, are permitted provided that the following conditions 14.\" are met: 15.\" 1. Redistributions of source code must retain the above copyright 16.\" notice, this list of conditions and the following disclaimer. 17.\" 2. Redistributions in binary form must reproduce the above copyright 18.\" notice, this list of conditions and the following disclaimer in the 19.\" documentation and/or other materials provided with the distribution. 20.\" 21.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $ 34.\" 35.Dd February 4, 2003 36.Dt KQUEUE 2 37.Os 38.Sh NAME 39.Nm kqueue , 40.Nm kevent 41.Nd kernel event notification mechanism 42.Sh LIBRARY 43.Lb libc 44.Sh SYNOPSIS 45.In sys/event.h 46.In sys/time.h 47.Ft int 48.Fn kqueue "void" 49.Ft int 50.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout" 51.Fn EV_SET "&kev" ident filter flags fflags data udata 52.Sh DESCRIPTION 53.Fn kqueue 54provides a generic method of notifying the user when an event 55happens or a condition holds, based on the results of small 56pieces of kernel code termed filters. 57A kevent is identified by the (ident, filter) pair; there may only 58be one unique kevent per kqueue. 59.Pp 60The filter is executed upon the initial registration of a kevent 61in order to detect whether a preexisting condition is present, and is also 62executed whenever an event is passed to the filter for evaluation. 63If the filter determines that the condition should be reported, 64then the kevent is placed on the kqueue for the user to retrieve. 65.Pp 66The filter is also run when the user attempts to retrieve the kevent 67from the kqueue. 68If the filter indicates that the condition that triggered 69the event no longer holds, the kevent is removed from the kqueue and 70is not returned. 71.Pp 72Multiple events which trigger the filter do not result in multiple 73kevents being placed on the kqueue; instead, the filter will aggregate 74the events into a single struct kevent. 75Calling 76.Fn close 77on a file descriptor will remove any kevents that reference the descriptor. 78.Pp 79.Fn kqueue 80creates a new kernel event queue and returns a descriptor. 81The queue is not inherited by a child created with 82.Xr fork 2 . 83.ig 84However, if 85.Xr rfork 2 86is called without the 87.Dv RFFDG 88flag, then the descriptor table is shared, 89which will allow sharing of the kqueue between two processes. 90.. 91.Pp 92.Fn kevent 93is used to register events with the queue, and return any pending 94events to the user. 95.Fa changelist 96is a pointer to an array of 97.Va kevent 98structures, as defined in 99.Aq Pa sys/event.h . 100All changes contained in the 101.Fa changelist 102are applied before any pending events are read from the queue. 103.Fa nchanges 104gives the size of 105.Fa changelist . 106.Fa eventlist 107is a pointer to an array of kevent structures. 108.Fa nevents 109determines the size of 110.Fa eventlist . 111If 112.Fa timeout 113is a 114.No non- Ns Dv NULL 115pointer, it specifies a maximum interval to wait 116for an event, which will be interpreted as a struct timespec. 117If 118.Fa timeout 119is a 120.Dv NULL 121pointer, 122.Fn kevent 123waits indefinitely. 124To effect a poll, the 125.Fa timeout 126argument should be 127.No non- Ns Dv NULL , 128pointing to a zero-valued 129.Va timespec 130structure. 131The same array may be used for the 132.Fa changelist 133and 134.Fa eventlist . 135.Pp 136.Fn EV_SET 137is a macro which is provided for ease of initializing a 138kevent structure. 139.Pp 140The 141.Va kevent 142structure is defined as: 143.Bd -literal 144struct kevent { 145 uintptr_t ident; /* identifier for this event */ 146 uint32_t filter; /* filter for event */ 147 uint32_t flags; /* action flags for kqueue */ 148 uint32_t fflags; /* filter flag value */ 149 int64_t data; /* filter data value */ 150 intptr_t udata; /* opaque user data identifier */ 151}; 152.Ed 153.Pp 154The fields of 155.Fa struct kevent 156are: 157.Bl -tag -width XXXfilter -offset indent 158.It ident 159Value used to identify this event. 160The exact interpretation is determined by the attached filter, 161but often is a file descriptor. 162.It filter 163Identifies the kernel filter used to process this event. 164There are pre-defined system filters (which are described below), and 165other filters may be added by kernel subsystems as necessary. 166.It flags 167Actions to perform on the event. 168.It fflags 169Filter-specific flags. 170.It data 171Filter-specific data value. 172.It udata 173Opaque user-defined value passed through the kernel unchanged. 174.El 175.Pp 176The 177.Va flags 178field can contain the following values: 179.Bl -tag -width XXXEV_ONESHOT -offset indent 180.It EV_ADD 181Adds the event to the kqueue. 182Re-adding an existing event will modify the parameters of the original 183event, and not result in a duplicate entry. 184Adding an event automatically enables it, 185unless overridden by the EV_DISABLE flag. 186.It EV_ENABLE 187Permit 188.Fn kevent 189to return the event if it is triggered. 190.It EV_DISABLE 191Disable the event so 192.Fn kevent 193will not return it. 194The filter itself is not disabled. 195.It EV_DELETE 196Removes the event from the kqueue. 197Events which are attached to file descriptors are automatically deleted 198on the last close of the descriptor. 199.It EV_ONESHOT 200Causes the event to return only the first occurrence of the filter 201being triggered. 202After the user retrieves the event from the kqueue, it is deleted. 203.It EV_CLEAR 204After the event is retrieved by the user, its state is reset. 205This is useful for filters which report state transitions 206instead of the current state. 207Note that some filters may automatically set this flag internally. 208.It EV_EOF 209Filters may set this flag to indicate filter-specific EOF condition. 210.It EV_ERROR 211See 212.Sx RETURN VALUES 213below. 214.El 215.Ss Filters 216Filters are identified by a number. 217There are two types of filters; pre-defined filters which 218are described below, and third-party filters that may be added with 219.Xr kfilter_register 9 220by kernel sub-systems, third-party device drivers, or loadable 221kernel modules. 222.Pp 223As a third-party filter is referenced by a well-known name instead 224of a statically assigned number, two 225.Xr ioctl 2 Ns s 226are supported on the file descriptor returned by 227.Fn kqueue 228to map a filter name to a filter number, and vice-versa (passing 229arguments in a structure described below): 230.Bl -tag -width KFILTER_BYFILTER -offset indent 231.It KFILTER_BYFILTER 232Map 233.Va filter 234to 235.Va name , 236which is of size 237.Va len . 238.It KFILTER_BYNAME 239Map 240.Va name 241to 242.Va filter . 243.Va len 244is ignored. 245.El 246.Pp 247The following structure is used to pass arguments in and out of the 248.Xr ioctl 2 : 249.Bd -literal -offset indent 250struct kfilter_mapping { 251 char *name; /* name to lookup or return */ 252 size_t len; /* length of name */ 253 uint32_t filter; /* filter to lookup or return */ 254}; 255.Ed 256.Pp 257Arguments may be passed to and from the filter via the 258.Va fflags 259and 260.Va data 261fields in the kevent structure. 262.Pp 263The predefined system filters are: 264.Bl -tag -width EVFILT_SIGNAL 265.It EVFILT_READ 266Takes a descriptor as the identifier, and returns whenever 267there is data available to read. 268The behavior of the filter is slightly different depending 269on the descriptor type. 270.Pp 271.Bl -tag -width 2n 272.It Sockets 273Sockets which have previously been passed to 274.Fn listen 275return when there is an incoming connection pending. 276.Va data 277contains the size of the listen backlog (i.e., the number of 278connections ready to be accepted with 279.Xr accept 2 . ) 280.Pp 281Other socket descriptors return when there is data to be read, 282subject to the 283.Dv SO_RCVLOWAT 284value of the socket buffer. 285This may be overridden with a per-filter low water mark at the 286time the filter is added by setting the 287NOTE_LOWAT 288flag in 289.Va fflags , 290and specifying the new low water mark in 291.Va data . 292On return, 293.Va data 294contains the number of bytes in the socket buffer. 295.Pp 296If the read direction of the socket has shutdown, then the filter 297also sets EV_EOF in 298.Va flags , 299and returns the socket error (if any) in 300.Va fflags . 301It is possible for EOF to be returned (indicating the connection is gone) 302while there is still data pending in the socket buffer. 303.It Vnodes 304Returns when the file pointer is not at the end of file. 305.Va data 306contains the offset from current position to end of file, 307and may be negative. 308.It "Fifos, Pipes" 309Returns when the there is data to read; 310.Va data 311contains the number of bytes available. 312.Pp 313When the last writer disconnects, the filter will set EV_EOF in 314.Va flags . 315This may be cleared by passing in EV_CLEAR, at which point the 316filter will resume waiting for data to become available before 317returning. 318.El 319.It EVFILT_WRITE 320Takes a descriptor as the identifier, and returns whenever 321it is possible to write to the descriptor. 322For sockets, pipes, fifos, and ttys, 323.Va data 324will contain the amount of space remaining in the write buffer. 325The filter will set EV_EOF when the reader disconnects, and for 326the fifo case, this may be cleared by use of EV_CLEAR. 327Note that this filter is not supported for vnodes. 328.Pp 329For sockets, the low water mark and socket error handling is 330identical to the EVFILT_READ case. 331.It EVFILT_AIO 332This is not implemented in 333.Nx . 334.ig 335The sigevent portion of the AIO request is filled in, with 336.Va sigev_notify_kqueue 337containing the descriptor of the kqueue that the event should 338be attached to, 339.Va sigev_value 340containing the udata value, and 341.Va sigev_notify 342set to SIGEV_EVENT. 343When the aio_* function is called, the event will be registered 344with the specified kqueue, and the 345.Va ident 346argument set to the 347.Fa struct aiocb 348returned by the aio_* function. 349The filter returns under the same conditions as aio_error. 350.Pp 351Alternatively, a kevent structure may be initialized, with 352.Va ident 353containing the descriptor of the kqueue, and the 354address of the kevent structure placed in the 355.Va aio_lio_opcode 356field of the AIO request. 357However, this approach will not work on 358architectures with 64-bit pointers, and should be considered deprecated. 359.. 360.It EVFILT_VNODE 361Takes a file descriptor as the identifier and the events to watch for in 362.Va fflags , 363and returns when one or more of the requested events occurs on the descriptor. 364The events to monitor are: 365.Bl -tag -width XXNOTE_RENAME 366.It NOTE_DELETE 367.Fn unlink 368was called on the file referenced by the descriptor. 369.It NOTE_WRITE 370A write occurred on the file referenced by the descriptor. 371.It NOTE_EXTEND 372The file referenced by the descriptor was extended. 373.It NOTE_ATTRIB 374The file referenced by the descriptor had its attributes changed. 375.It NOTE_LINK 376The link count on the file changed. 377.It NOTE_RENAME 378The file referenced by the descriptor was renamed. 379.It NOTE_REVOKE 380Access to the file was revoked via 381.Xr revoke 2 382or the underlying fileystem was unmounted. 383.El 384.Pp 385On return, 386.Va fflags 387contains the events which triggered the filter. 388.It EVFILT_PROC 389Takes the process ID to monitor as the identifier and the events to watch for 390in 391.Va fflags , 392and returns when the process performs one or more of the requested events. 393If a process can normally see another process, it can attach an event to it. 394The events to monitor are: 395.Bl -tag -width XXNOTE_TRACKERR 396.It NOTE_EXIT 397The process has exited. 398.It NOTE_FORK 399The process has called 400.Fn fork . 401.It NOTE_EXEC 402The process has executed a new process via 403.Xr execve 2 404or similar call. 405.It NOTE_TRACK 406Follow a process across 407.Fn fork 408calls. 409The parent process will return with NOTE_TRACK set in the 410.Va fflags 411field, while the child process will return with NOTE_CHILD set in 412.Va fflags 413and the parent PID in 414.Va data . 415.It NOTE_TRACKERR 416This flag is returned if the system was unable to attach an event to 417the child process, usually due to resource limitations. 418.El 419.Pp 420On return, 421.Va fflags 422contains the events which triggered the filter. 423.It EVFILT_SIGNAL 424Takes the signal number to monitor as the identifier and returns 425when the given signal is delivered to the current process. 426This coexists with the 427.Fn signal 428and 429.Fn sigaction 430facilities, and has a lower precedence. 431The filter will record 432all attempts to deliver a signal to a process, even if the signal has 433been marked as SIG_IGN. 434Event notification happens after normal signal delivery processing. 435.Va data 436returns the number of times the signal has occurred since the last call to 437.Fn kevent . 438This filter automatically sets the EV_CLEAR flag internally. 439.It EVFILT_TIMER 440Establishes an arbitrary timer identified by 441.Va ident . 442When adding a timer, 443.Va data 444specifies the timeout period in milliseconds. 445The timer will be periodic unless EV_ONESHOT is specified. 446On return, 447.Va data 448contains the number of times the timeout has expired since the last call to 449.Fn kevent . 450This filter automatically sets the EV_CLEAR flag internally. 451.El 452.Sh RETURN VALUES 453.Fn kqueue 454creates a new kernel event queue and returns a file descriptor. 455If there was an error creating the kernel event queue, a value of \-1 is 456returned and errno set. 457.Pp 458.Fn kevent 459returns the number of events placed in the 460.Fa eventlist , 461up to the value given by 462.Fa nevents . 463If an error occurs while processing an element of the 464.Fa changelist 465and there is enough room in the 466.Fa eventlist , 467then the event will be placed in the 468.Fa eventlist 469with 470.Dv EV_ERROR 471set in 472.Va flags 473and the system error in 474.Va data . 475Otherwise, 476.Dv \-1 477will be returned, and 478.Dv errno 479will be set to indicate the error condition. 480If the time limit expires, then 481.Fn kevent 482returns 0. 483.Sh ERRORS 484The 485.Fn kqueue 486function fails if: 487.Bl -tag -width Er 488.It Bq Er ENOMEM 489The kernel failed to allocate enough memory for the kernel queue. 490.It Bq Er EMFILE 491The per-process descriptor table is full. 492.It Bq Er ENFILE 493The system file table is full. 494.El 495.Pp 496The 497.Fn kevent 498function fails if: 499.Bl -tag -width Er 500.It Bq Er EACCES 501The process does not have permission to register a filter. 502.It Bq Er EFAULT 503There was an error reading or writing the 504.Va kevent 505structure. 506.It Bq Er EBADF 507The specified descriptor is invalid. 508.It Bq Er EINTR 509A signal was delivered before the timeout expired and before any 510events were placed on the kqueue for return. 511.It Bq Er EINVAL 512The specified time limit or filter is invalid. 513.It Bq Er ENOENT 514The event could not be found to be modified or deleted. 515.It Bq Er ENOMEM 516No memory was available to register the event. 517.It Bq Er ESRCH 518The specified process to attach to does not exist. 519.El 520.Sh SEE ALSO 521.\" .Xr aio_error 2 , 522.\" .Xr aio_read 2 , 523.\" .Xr aio_return 2 , 524.Xr ioctl 2 , 525.Xr poll 2 , 526.Xr read 2 , 527.Xr select 2 , 528.Xr sigaction 2 , 529.Xr write 2 , 530.Xr signal 3 , 531.Xr kfilter_register 9 , 532.Xr knote 9 533.Sh HISTORY 534The 535.Fn kqueue 536and 537.Fn kevent 538functions first appeared in 539.Fx 4.1 , 540and then in 541.Nx 2.0 . 542.Sh AUTHORS 543The 544.Fn kqueue 545system and this manual page were written by 546.An Jonathan Lemon Aq jlemon@FreeBSD.org . 547.Nx 548port and manpage additions were done by 549.An Luke Mewburn Aq lukem@NetBSD.org , 550.An Jason Thorpe Aq thorpej@NetBSD.org , 551and 552.An Jaromir Dolecek Aq jdolecek@NetBSD.org . 553