1.\" $NetBSD: kqueue.2,v 1.9 2002/10/13 07:37:39 jdolecek Exp $ 2.\" 3.\" Copyright (c) 2000 Jonathan Lemon 4.\" All rights reserved. 5.\" 6.\" Copyright (c) 2001 The NetBSD Foundation, Inc. 7.\" All rights reserved. 8.\" 9.\" Portions of this documentation is derived from text contributed by 10.\" Luke Mewburn. 11.\" 12.\" Redistribution and use in source and binary forms, with or without 13.\" modification, are permitted provided that the following conditions 14.\" are met: 15.\" 1. Redistributions of source code must retain the above copyright 16.\" notice, this list of conditions and the following disclaimer. 17.\" 2. Redistributions in binary form must reproduce the above copyright 18.\" notice, this list of conditions and the following disclaimer in the 19.\" documentation and/or other materials provided with the distribution. 20.\" 21.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $ 34.\" 35.Dd September 22, 2002 36.Dt KQUEUE 2 37.Os 38.Sh NAME 39.Nm kqueue , 40.Nm kevent 41.Nd kernel event notification mechanism 42.Sh LIBRARY 43.Lb libc 44.Sh SYNOPSIS 45.Fd #include \*[Lt]sys/types.h\*[Gt] 46.Fd #include \*[Lt]sys/event.h\*[Gt] 47.Fd #include \*[Lt]sys/time.h\*[Gt] 48.Ft int 49.Fn kqueue "void" 50.Ft int 51.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout" 52.Fn EV_SET "&kev" ident filter flags fflags data udata 53.Sh DESCRIPTION 54.Fn kqueue 55provides a generic method of notifying the user when an event 56happens or a condition holds, based on the results of small 57pieces of kernel code termed filters. 58A kevent is identified by the (ident, filter) pair; there may only 59be one unique kevent per kqueue. 60.Pp 61The filter is executed upon the initial registration of a kevent 62in order to detect whether a preexisting condition is present, and is also 63executed whenever an event is passed to the filter for evaluation. 64If the filter determines that the condition should be reported, 65then the kevent is placed on the kqueue for the user to retrieve. 66.Pp 67The filter is also run when the user attempts to retrieve the kevent 68from the kqueue. 69If the filter indicates that the condition that triggered 70the event no longer holds, the kevent is removed from the kqueue and 71is not returned. 72.Pp 73Multiple events which trigger the filter do not result in multiple 74kevents being placed on the kqueue; instead, the filter will aggregate 75the events into a single struct kevent. 76Calling 77.Fn close 78on a file descriptor will remove any kevents that reference the descriptor. 79.Pp 80.Fn kqueue 81creates a new kernel event queue and returns a descriptor. 82The queue is not inherited by a child created with 83.Xr fork 2 . 84.ig 85However, if 86.Xr rfork 2 87is called without the 88.Dv RFFDG 89flag, then the descriptor table is shared, 90which will allow sharing of the kqueue between two processes. 91.. 92.Pp 93.Fn kevent 94is used to register events with the queue, and return any pending 95events to the user. 96.Fa changelist 97is a pointer to an array of 98.Va kevent 99structures, as defined in 100.Aq Pa sys/event.h . 101All changes contained in the 102.Fa changelist 103are applied before any pending events are read from the queue. 104.Fa nchanges 105gives the size of 106.Fa changelist . 107.Fa eventlist 108is a pointer to an array of kevent structures. 109.Fa nevents 110determines the size of 111.Fa eventlist . 112If 113.Fa timeout 114is a 115.No non- Ns Dv NULL 116pointer, it specifies a maximum interval to wait 117for an event, which will be interpreted as a struct timespec. 118If 119.Fa timeout 120is a 121.Dv NULL 122pointer, 123.Fn kevent 124waits indefinitely. 125To effect a poll, the 126.Fa timeout 127argument should be 128.No non- Ns Dv NULL , 129pointing to a zero-valued 130.Va timespec 131structure. 132The same array may be used for the 133.Fa changelist 134and 135.Fa eventlist . 136.Pp 137.Fn EV_SET 138is a macro which is provided for ease of initializing a 139kevent structure. 140.Pp 141The 142.Va kevent 143structure is defined as: 144.Bd -literal 145struct kevent { 146 uintptr_t ident; /* identifier for this event */ 147 uint32_t filter; /* filter for event */ 148 uint32_t flags; /* action flags for kqueue */ 149 uint32_t fflags; /* filter flag value */ 150 intptr_t data; /* filter data value */ 151 void *udata; /* opaque user data identifier */ 152}; 153.Ed 154.Pp 155The fields of 156.Fa struct kevent 157are: 158.Bl -tag -width XXXfilter -offset indent 159.It ident 160Value used to identify this event. 161The exact interpretation is determined by the attached filter, 162but often is a file descriptor. 163.It filter 164Identifies the kernel filter used to process this event. 165There are pre-defined system filters (which are described below), and 166other filters may be added by kernel subsystems as necessary. 167.It flags 168Actions to perform on the event. 169.It fflags 170Filter-specific flags. 171.It data 172Filter-specific data value. 173.It udata 174Opaque user-defined value passed through the kernel unchanged. 175.El 176.Pp 177The 178.Va flags 179field can contain the following values: 180.Bl -tag -width XXXEV_ONESHOT -offset indent 181.It EV_ADD 182Adds the event to the kqueue. 183Re-adding an existing event will modify the parameters of the original 184event, and not result in a duplicate entry. 185Adding an event automatically enables it, 186unless overridden by the EV_DISABLE flag. 187.It EV_ENABLE 188Permit 189.Fn kevent 190to return the event if it is triggered. 191.It EV_DISABLE 192Disable the event so 193.Fn kevent 194will not return it. 195The filter itself is not disabled. 196.It EV_DELETE 197Removes the event from the kqueue. 198Events which are attached to file descriptors are automatically deleted 199on the last close of the descriptor. 200.It EV_ONESHOT 201Causes the event to return only the first occurrence of the filter 202being triggered. 203After the user retrieves the event from the kqueue, it is deleted. 204.It EV_CLEAR 205After the event is retrieved by the user, its state is reset. 206This is useful for filters which report state transitions 207instead of the current state. 208Note that some filters may automatically set this flag internally. 209.It EV_EOF 210Filters may set this flag to indicate filter-specific EOF condition. 211.It EV_ERROR 212See 213.Sx RETURN VALUES 214below. 215.El 216.Ss Filters 217Filters are identified by a number. 218There are two types of filters; pre-defined filters which 219are described below, and third-party filters that may be added with 220.Xr kfilter_register 2 221by kernel sub-systems, third-party device drivers, or loadable 222kernel modules. 223.Pp 224As a third-party filter is referenced by a well-known name instead 225of a statically assigned number, two 226.Xr ioctl 2 Ns s 227are supported on the file descriptor returned by 228.Fn kqueue 229to map a filter name to a filter number, and vice-versa (passing 230arguments in a structure described below): 231.Bl -tag -width KFILTER_BYFILTER -offset indent 232.It KFILTER_BYFILTER 233Map 234.Va filter 235to 236.Va name , 237which is of size 238.Va len . 239.It KFILTER_BYNAME 240Map 241.Va name 242to 243.Va filter . 244.Va len 245is ignored. 246.El 247.Pp 248The following structure is used to pass arguments in and out of the 249.Xr ioctl 2 : 250.Bd -literal -offset indent 251struct kfilter_mapping { 252 char *name; /* name to lookup or return */ 253 size_t len; /* length of name */ 254 uint32_t filter; /* filter to lookup or return */ 255}; 256.Ed 257.Pp 258Arguments may be passed to and from the filter via the 259.Va fflags 260and 261.Va data 262fields in the kevent structure. 263.Pp 264The predefined system filters are: 265.Bl -tag -width EVFILT_SIGNAL 266.It EVFILT_READ 267Takes a descriptor as the identifier, and returns whenever 268there is data available to read. 269The behavior of the filter is slightly different depending 270on the descriptor type. 271.Pp 272.Bl -tag -width 2n 273.It Sockets 274Sockets which have previously been passed to 275.Fn listen 276return when there is an incoming connection pending. 277.Va data 278contains the size of the listen backlog (i.e., the number of 279connections ready to be accepted with 280.Xr accept 2 . ) 281.Pp 282Other socket descriptors return when there is data to be read, 283subject to the 284.Dv SO_RCVLOWAT 285value of the socket buffer. 286This may be overridden with a per-filter low water mark at the 287time the filter is added by setting the 288NOTE_LOWAT 289flag in 290.Va fflags , 291and specifying the new low water mark in 292.Va data . 293On return, 294.Va data 295contains the number of bytes in the socket buffer. 296.Pp 297If the read direction of the socket has shutdown, then the filter 298also sets EV_EOF in 299.Va flags , 300and returns the socket error (if any) in 301.Va fflags . 302It is possible for EOF to be returned (indicating the connection is gone) 303while there is still data pending in the socket buffer. 304.It Vnodes 305Returns when the file pointer is not at the end of file. 306.Va data 307contains the offset from current position to end of file, 308and may be negative. 309.It "Fifos, Pipes" 310Returns when the there is data to read; 311.Va data 312contains the number of bytes available. 313.Pp 314When the last writer disconnects, the filter will set EV_EOF in 315.Va flags . 316This may be cleared by passing in EV_CLEAR, at which point the 317filter will resume waiting for data to become available before 318returning. 319.El 320.It EVFILT_WRITE 321Takes a descriptor as the identifier, and returns whenever 322it is possible to write to the descriptor. 323For sockets, pipes, fifos, and ttys, 324.Va data 325will contain the amount of space remaining in the write buffer. 326The filter will set EV_EOF when the reader disconnects, and for 327the fifo case, this may be cleared by use of EV_CLEAR. 328Note that this filter is not supported for vnodes. 329.Pp 330For sockets, the low water mark and socket error handling is 331identical to the EVFILT_READ case. 332.It EVFILT_AIO 333This is not implemented in 334.Nx . 335.ig 336The sigevent portion of the AIO request is filled in, with 337.Va sigev_notify_kqueue 338containing the descriptor of the kqueue that the event should 339be attached to, 340.Va sigev_value 341containing the udata value, and 342.Va sigev_notify 343set to SIGEV_EVENT. 344When the aio_* function is called, the event will be registered 345with the specified kqueue, and the 346.Va ident 347argument set to the 348.Fa struct aiocb 349returned by the aio_* function. 350The filter returns under the same conditions as aio_error. 351.Pp 352Alternatively, a kevent structure may be initialized, with 353.Va ident 354containing the descriptor of the kqueue, and the 355address of the kevent structure placed in the 356.Va aio_lio_opcode 357field of the AIO request. 358However, this approach will not work on 359architectures with 64-bit pointers, and should be considered deprecated. 360.. 361.It EVFILT_VNODE 362Takes a file descriptor as the identifier and the events to watch for in 363.Va fflags , 364and returns when one or more of the requested events occurs on the descriptor. 365The events to monitor are: 366.Bl -tag -width XXNOTE_RENAME 367.It NOTE_DELETE 368.Fn unlink 369was called on the file referenced by the descriptor. 370.It NOTE_WRITE 371A write occurred on the file referenced by the descriptor. 372.It NOTE_EXTEND 373The file referenced by the descriptor was extended. 374.It NOTE_ATTRIB 375The file referenced by the descriptor had its attributes changed. 376.It NOTE_LINK 377The link count on the file changed. 378.It NOTE_RENAME 379The file referenced by the descriptor was renamed. 380.It NOTE_REVOKE 381Access to the file was revoked via 382.Xr revoke 2 383or the underlying fileystem was unmounted. 384.El 385.Pp 386On return, 387.Va fflags 388contains the events which triggered the filter. 389.It EVFILT_PROC 390Takes the process ID to monitor as the identifier and the events to watch for 391in 392.Va fflags , 393and returns when the process performs one or more of the requested events. 394If a process can normally see another process, it can attach an event to it. 395The events to monitor are: 396.Bl -tag -width XXNOTE_TRACKERR 397.It NOTE_EXIT 398The process has exited. 399.It NOTE_FORK 400The process has called 401.Fn fork . 402.It NOTE_EXEC 403The process has executed a new process via 404.Xr execve 2 405or similar call. 406.It NOTE_TRACK 407Follow a process across 408.Fn fork 409calls. 410The parent process will return with NOTE_TRACK set in the 411.Va fflags 412field, while the child process will return with NOTE_CHILD set in 413.Va fflags 414and the parent PID in 415.Va data . 416.It NOTE_TRACKERR 417This flag is returned if the system was unable to attach an event to 418the child process, usually due to resource limitations. 419.El 420.Pp 421On return, 422.Va fflags 423contains the events which triggered the filter. 424.It EVFILT_SIGNAL 425Takes the signal number to monitor as the identifier and returns 426when the given signal is delivered to the current process. 427This coexists with the 428.Fn signal 429and 430.Fn sigaction 431facilities, and has a lower precedence. 432The filter will record 433all attempts to deliver a signal to a process, even if the signal has 434been marked as SIG_IGN. 435Event notification happens after normal signal delivery processing. 436.Va data 437returns the number of times the signal has occurred since the last call to 438.Fn kevent . 439This filter automatically sets the EV_CLEAR flag internally. 440.El 441.Sh RETURN VALUES 442.Fn kqueue 443creates a new kernel event queue and returns a file descriptor. 444If there was an error creating the kernel event queue, a value of \-1 is 445returned and errno set. 446.Pp 447.Fn kevent 448returns the number of events placed in the 449.Fa eventlist , 450up to the value given by 451.Fa nevents . 452If an error occurs while processing an element of the 453.Fa changelist 454and there is enough room in the 455.Fa eventlist , 456then the event will be placed in the 457.Fa eventlist 458with 459.Dv EV_ERROR 460set in 461.Va flags 462and the system error in 463.Va data . 464Otherwise, 465.Dv \-1 466will be returned, and 467.Dv errno 468will be set to indicate the error condition. 469If the time limit expires, then 470.Fn kevent 471returns 0. 472.Sh ERRORS 473The 474.Fn kqueue 475function fails if: 476.Bl -tag -width Er 477.It Bq Er ENOMEM 478The kernel failed to allocate enough memory for the kernel queue. 479.It Bq Er EMFILE 480The per-process descriptor table is full. 481.It Bq Er ENFILE 482The system file table is full. 483.El 484.Pp 485The 486.Fn kevent 487function fails if: 488.Bl -tag -width Er 489.It Bq Er EACCES 490The process does not have permission to register a filter. 491.It Bq Er EFAULT 492There was an error reading or writing the 493.Va kevent 494structure. 495.It Bq Er EBADF 496The specified descriptor is invalid. 497.It Bq Er EINTR 498A signal was delivered before the timeout expired and before any 499events were placed on the kqueue for return. 500.It Bq Er EINVAL 501The specified time limit or filter is invalid. 502.It Bq Er ENOENT 503The event could not be found to be modified or deleted. 504.It Bq Er ENOMEM 505No memory was available to register the event. 506.It Bq Er ESRCH 507The specified process to attach to does not exist. 508.El 509.Sh SEE ALSO 510.\" .Xr aio_error 2 , 511.\" .Xr aio_read 2 , 512.\" .Xr aio_return 2 , 513.Xr ioctl 2 , 514.Xr poll 2 , 515.Xr read 2 , 516.Xr select 2 , 517.Xr sigaction 2 , 518.Xr write 2 , 519.Xr signal 3 , 520.Xr kfilter_register 9 , 521.Xr knote 9 522.Sh HISTORY 523The 524.Fn kqueue 525and 526.Fn kevent 527functions first appeared in 528.Fx 4.1 , 529and then in 530.Nx . 531This interface is currently only available on experimental kernel branch 532.Li kqueue 533in 534.Nx . 535.Sh AUTHORS 536The 537.Fn kqueue 538system and this manual page were written by 539.An Jonathan Lemon Aq jlemon@FreeBSD.org . 540.Nx 541port and manpage additions were done by 542.An Luke Mewburn Aq lukem@NetBSD.org , 543.An Jason Thorpe Aq thorpej@NetBSD.org , 544and 545.An Jaromir Dolecek Aq jdolecek@NetBSD.org . 546