1.\" $OpenBSD: bpf.4,v 1.43 2020/09/30 19:25:40 tb Exp $ 2.\" $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $ 3.\" 4.\" Copyright (c) 1990 The Regents of the University of California. 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that: (1) source code distributions 9.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" distributions including binary code include the above copyright notice and 11.\" this paragraph in its entirety in the documentation or other materials 12.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" features or use of this software display the following acknowledgement: 14.\" ``This product includes software developed by the University of California, 15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" the University nor the names of its contributors may be used to endorse 17.\" or promote products derived from this software without specific prior 18.\" written permission. 19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" 23.\" This document is derived in part from the enet man page (enet.4) 24.\" distributed with 4.3BSD Unix. 25.\" 26.Dd $Mdocdate: September 30 2020 $ 27.Dt BPF 4 28.Os 29.Sh NAME 30.Nm bpf 31.Nd Berkeley Packet Filter 32.Sh SYNOPSIS 33.Cd "pseudo-device bpfilter" 34.Sh DESCRIPTION 35The Berkeley Packet Filter provides a raw interface to data link layers in 36a protocol-independent fashion. 37All packets on the network, even those destined for other hosts, are 38accessible through this mechanism. 39.Pp 40The packet filter appears as a character special device, 41.Pa /dev/bpf . 42After opening the device, the file descriptor must be bound to a specific 43network interface with the 44.Dv BIOCSETIF 45.Xr ioctl 2 . 46A given interface can be shared between multiple listeners, and the filter 47underlying each descriptor will see an identical packet stream. 48.Pp 49Associated with each open instance of a 50.Nm 51file is a user-settable 52packet filter. 53Whenever a packet is received by an interface, all file descriptors 54listening on that interface apply their filter. 55Each descriptor that accepts the packet receives its own copy. 56.Pp 57Reads from these files return the next group of packets that have matched 58the filter. 59To improve performance, the buffer passed to read must be the same size as 60the buffers used internally by 61.Nm bpf . 62This size is returned by the 63.Dv BIOCGBLEN 64.Xr ioctl 2 65and can be set with 66.Dv BIOCSBLEN . 67Note that an individual packet larger than this size is necessarily truncated. 68.Pp 69A packet can be sent out on the network by writing to a 70.Nm 71file descriptor. 72Each descriptor can also have a user-settable filter 73for controlling the writes. 74Only packets matching the filter are sent out of the interface. 75The writes are unbuffered, meaning only one packet can be processed per write. 76.Pp 77Once a descriptor is configured, further changes to the configuration 78can be prevented using the 79.Dv BIOCLOCK 80.Xr ioctl 2 . 81.Sh IOCTL INTERFACE 82The 83.Xr ioctl 2 84command codes below are defined in 85.In net/bpf.h . 86All commands require these includes: 87.Pp 88.nr nS 1 89.In sys/types.h 90.In sys/time.h 91.In sys/ioctl.h 92.In net/bpf.h 93.nr nS 0 94.Pp 95Additionally, 96.Dv BIOCGETIF 97and 98.Dv BIOCSETIF 99require 100.In sys/socket.h 101and 102.In net/if.h . 103.Pp 104The (third) argument to the 105.Xr ioctl 2 106call should be a pointer to the type indicated. 107.Pp 108.Bl -tag -width Ds -compact 109.It Dv BIOCGBLEN Fa "u_int *" 110Returns the required buffer length for reads on 111.Nm 112files. 113.Pp 114.It Dv BIOCSBLEN Fa "u_int *" 115Sets the buffer length for reads on 116.Nm 117files. 118The buffer must be set before the file is attached to an interface with 119.Dv BIOCSETIF . 120If the requested buffer size cannot be accommodated, the closest allowable 121size will be set and returned in the argument. 122A read call will result in 123.Er EINVAL 124if it is passed a buffer that is not this size. 125.Pp 126.It Dv BIOCGDLT Fa "u_int *" 127Returns the type of the data link layer underlying the attached interface. 128.Er EINVAL 129is returned if no interface has been specified. 130The device types, prefixed with 131.Dq DLT_ , 132are defined in 133.In net/bpf.h . 134.Pp 135.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *" 136Returns an array of the available types of the data link layer 137underlying the attached interface: 138.Bd -literal -offset indent 139struct bpf_dltlist { 140 u_int bfl_len; 141 u_int *bfl_list; 142}; 143.Ed 144.Pp 145The available types are returned in the array pointed to by the 146.Va bfl_list 147field while their length in 148.Vt u_int 149is supplied to the 150.Va bfl_len 151field. 152.Er ENOMEM 153is returned if there is not enough buffer space and 154.Er EFAULT 155is returned if a bad address is encountered. 156The 157.Va bfl_len 158field is modified on return to indicate the actual length in 159.Vt u_int 160of the array returned. 161If 162.Va bfl_list 163is 164.Dv NULL , 165the 166.Va bfl_len 167field is set to indicate the required length of the array in 168.Vt u_int . 169.Pp 170.It Dv BIOCSDLT Fa "u_int *" 171Changes the type of the data link layer underlying the attached interface. 172.Er EINVAL 173is returned if no interface has been specified or the specified 174type is not available for the interface. 175.Pp 176.It Dv BIOCPROMISC 177Forces the interface into promiscuous mode. 178All packets, not just those destined for the local host, are processed. 179Since more than one file can be listening on a given interface, a listener 180that opened its interface non-promiscuously may receive packets promiscuously. 181This problem can be remedied with an appropriate filter. 182.Pp 183The interface remains in promiscuous mode until all files listening 184promiscuously are closed. 185.Pp 186.It Dv BIOCFLUSH 187Flushes the buffer of incoming packets and resets the statistics that are 188returned by 189.Dv BIOCGSTATS . 190.Pp 191.It Dv BIOCLOCK 192This ioctl is designed to prevent the security issues associated 193with an open 194.Nm 195descriptor in unprivileged programs. 196Even with dropped privileges, an open 197.Nm 198descriptor can be abused by a rogue program to listen on any interface 199on the system, send packets on these interfaces if the descriptor was 200opened read-write and send signals to arbitrary processes using the 201signaling mechanism of 202.Nm bpf . 203By allowing only 204.Dq known safe 205ioctls, the 206.Dv BIOCLOCK 207ioctl prevents this abuse. 208The allowable ioctls are 209.Dv BIOCFLUSH , 210.Dv BIOCGBLEN , 211.Dv BIOCGDIRFILT , 212.Dv BIOCGDLT , 213.Dv BIOCGDIRFILT , 214.Dv BIOCGDLTLIST , 215.Dv BIOCGETIF , 216.Dv BIOCGHDRCMPLT , 217.Dv BIOCGRSIG , 218.Dv BIOCGRTIMEOUT , 219.Dv BIOCGSTATS , 220.Dv BIOCIMMEDIATE , 221.Dv BIOCLOCK , 222.Dv BIOCSRTIMEOUT , 223.Dv BIOCVERSION , 224.Dv TIOCGPGRP , 225and 226.Dv FIONREAD . 227Use of any other ioctl is denied with error 228.Er EPERM . 229Once a descriptor is locked, it is not possible to unlock it. 230A process with root privileges is not affected by the lock. 231.Pp 232A privileged program can open a 233.Nm 234device, drop privileges, set the interface, filters and modes on the 235descriptor, and lock it. 236Once the descriptor is locked, the system is safe 237from further abuse through the descriptor. 238Locking a descriptor does not prevent writes. 239If the application does not need to send packets through 240.Nm bpf , 241it can open the device read-only to prevent writing. 242If sending packets is necessary, a write-filter can be set before locking the 243descriptor to prevent arbitrary packets from being sent out. 244.Pp 245.It Dv BIOCGETIF Fa "struct ifreq *" 246Returns the name of the hardware interface that the file is listening on. 247The name is returned in the 248.Fa ifr_name 249field of the 250.Li struct ifreq . 251All other fields are undefined. 252.Pp 253.It Dv BIOCSETIF Fa "struct ifreq *" 254Sets the hardware interface associated with the file. 255This command must be performed before any packets can be read. 256The device is indicated by name using the 257.Fa ifr_name 258field of the 259.Li struct ifreq . 260Additionally, performs the actions of 261.Dv BIOCFLUSH . 262.Pp 263.It Dv BIOCSRTIMEOUT Fa "struct timeval *" 264.It Dv BIOCGRTIMEOUT Fa "struct timeval *" 265Sets or gets the read timeout parameter. 266The 267.Ar timeval 268specifies the length of time to wait before timing out on a read request. 269This parameter is initialized to zero by 270.Xr open 2 , 271indicating no timeout. 272.Pp 273.It Dv BIOCGSTATS Fa "struct bpf_stat *" 274Returns the following structure of packet statistics: 275.Bd -literal -offset indent 276struct bpf_stat { 277 u_int bs_recv; 278 u_int bs_drop; 279}; 280.Ed 281.Pp 282The fields are: 283.Bl -tag -width bs_recv 284.It Fa bs_recv 285Number of packets received by the descriptor since opened or reset (including 286any buffered since the last read call). 287.It Fa bs_drop 288Number of packets which were accepted by the filter but dropped by the kernel 289because of buffer overflows (i.e., the application's reads aren't keeping up 290with the packet traffic). 291.El 292.Pp 293.It Dv BIOCIMMEDIATE Fa "u_int *" 294Enables or disables 295.Dq immediate mode , 296based on the truth value of the argument. 297When immediate mode is enabled, reads return immediately upon packet reception. 298Otherwise, a read will block until either the kernel buffer becomes full or a 299timeout occurs. 300This is useful for programs like 301.Xr rarpd 8 , 302which must respond to messages in real time. 303The default for a new file is off. 304.Pp 305.It Dv BIOCSETF Fa "struct bpf_program *" 306Sets the filter program used by the kernel to discard uninteresting packets. 307An array of instructions and its length are passed in using the following 308structure: 309.Bd -literal -offset indent 310struct bpf_program { 311 u_int bf_len; 312 struct bpf_insn *bf_insns; 313}; 314.Ed 315.Pp 316The filter program is pointed to by the 317.Fa bf_insns 318field, while its length in units of 319.Li struct bpf_insn 320is given by the 321.Fa bf_len 322field. 323Also, the actions of 324.Dv BIOCFLUSH 325are performed. 326.Pp 327See section 328.Sx FILTER MACHINE 329for an explanation of the filter language. 330.Pp 331.It Dv BIOCSETWF Fa "struct bpf_program *" 332Sets the filter program used by the kernel to filter the packets 333written to the descriptor before the packets are sent out on the 334network. 335See 336.Dv BIOCSETF 337for a description of the filter program. 338This ioctl also acts as 339.Dv BIOCFLUSH . 340.Pp 341Note that the filter operates on the packet data written to the descriptor. 342If the 343.Dq header complete 344flag is not set, the kernel sets the link-layer source address 345of the packet after filtering. 346.Pp 347.It Dv BIOCVERSION Fa "struct bpf_version *" 348Returns the major and minor version numbers of the filter language currently 349recognized by the kernel. 350Before installing a filter, applications must check that the current version 351is compatible with the running kernel. 352Version numbers are compatible if the major numbers match and the application 353minor is less than or equal to the kernel minor. 354The kernel version number is returned in the following structure: 355.Bd -literal -offset indent 356struct bpf_version { 357 u_short bv_major; 358 u_short bv_minor; 359}; 360.Ed 361.Pp 362The current version numbers are given by 363.Dv BPF_MAJOR_VERSION 364and 365.Dv BPF_MINOR_VERSION 366from 367.In net/bpf.h . 368An incompatible filter may result in undefined behavior (most likely, an 369error returned by 370.Xr ioctl 2 371or haphazard packet matching). 372.Pp 373.It Dv BIOCSRSIG Fa "u_int *" 374.It Dv BIOCGRSIG Fa "u_int *" 375Sets or gets the receive signal. 376This signal will be sent to the process or process group specified by 377.Dv FIOSETOWN . 378It defaults to 379.Dv SIGIO . 380.Pp 381.It Dv BIOCSHDRCMPLT Fa "u_int *" 382.It Dv BIOCGHDRCMPLT Fa "u_int *" 383Sets or gets the status of the 384.Dq header complete 385flag. 386Set to zero if the link level source address should be filled in 387automatically by the interface output routine. 388Set to one if the link level source address will be written, 389as provided, to the wire. 390This flag is initialized to zero by default. 391.Pp 392.It Dv BIOCSFILDROP Fa "u_int *" 393.It Dv BIOCGFILDROP Fa "u_int *" 394Sets or gets the 395.Dq filter drop 396action. 397The supported actions for packets matching the filter are: 398.Pp 399.Bl -tag -width "BPF_FILDROP_CAPTURE" -compact 400.It Dv BPF_FILDROP_PASS 401Accept and capture 402.It Dv BPF_FILDROP_CAPTURE 403Drop and capture 404.It Dv BPF_FILDROP_DROP 405Drop and do not capture 406.El 407.Pp 408Packets matching any filter configured to drop packets will be 409reported to the associated interface so that they can be dropped. 410The default action is 411.Dv BPF_FILDROP_PASS . 412.Pp 413.It Dv BIOCSDIRFILT Fa "u_int *" 414.It Dv BIOCGDIRFILT Fa "u_int *" 415Sets or gets the status of the 416.Dq direction filter 417flag. 418If non-zero, packets matching the specified direction (either 419.Dv BPF_DIRECTION_IN 420or 421.Dv BPF_DIRECTION_OUT ) 422will be ignored. 423.El 424.Ss Standard ioctls 425.Nm 426now supports several standard ioctls which allow the user to do asynchronous 427and/or non-blocking I/O to an open 428.Nm 429file descriptor. 430.Pp 431.Bl -tag -width Ds -compact 432.It Dv FIONREAD Fa "int *" 433Returns the number of bytes that are immediately available for reading. 434.Pp 435.It Dv FIONBIO Fa "int *" 436Sets or clears non-blocking I/O. 437If the argument is non-zero, enable non-blocking I/O. 438If the argument is zero, disable non-blocking I/O. 439If non-blocking I/O is enabled, the return value of a read while no data 440is available will be 0. 441The non-blocking read behavior is different from performing non-blocking 442reads on other file descriptors, which will return \-1 and set 443.Va errno 444to 445.Er EAGAIN 446if no data is available. 447Note: setting this overrides the timeout set by 448.Dv BIOCSRTIMEOUT . 449.Pp 450.It Dv FIOASYNC Fa "int *" 451Enables or disables asynchronous I/O. 452When enabled (argument is non-zero), the process or process group specified 453by 454.Dv FIOSETOWN 455will start receiving 456.Dv SIGIO 457signals when packets arrive. 458Note that you must perform an 459.Dv FIOSETOWN 460command in order for this to take effect, as the system will not do it by 461default. 462The signal may be changed via 463.Dv BIOCSRSIG . 464.Pp 465.It Dv FIOSETOWN Fa "int *" 466.It Dv FIOGETOWN Fa "int *" 467Sets or gets the process or process group (if negative) that should receive 468.Dv SIGIO 469when packets are available. 470The signal may be changed using 471.Dv BIOCSRSIG 472(see above). 473.El 474.Ss BPF header 475The following structure is prepended to each packet returned by 476.Xr read 2 : 477.Bd -literal -offset indent 478struct bpf_hdr { 479 struct bpf_timeval bh_tstamp; 480 u_int32_t bh_caplen; 481 u_int32_t bh_datalen; 482 u_int16_t bh_hdrlen; 483}; 484.Ed 485.Pp 486The fields, stored in host order, are as follows: 487.Bl -tag -width Ds 488.It Fa bh_tstamp 489Time at which the packet was processed by the packet filter. 490.It Fa bh_caplen 491Length of the captured portion of the packet. 492This is the minimum of the truncation amount specified by the filter and the 493length of the packet. 494.It Fa bh_datalen 495Length of the packet off the wire. 496This value is independent of the truncation amount specified by the filter. 497.It Fa bh_hdrlen 498Length of the BPF header, which may not be equal to 499.Li sizeof(struct bpf_hdr) . 500.El 501.Pp 502The 503.Fa bh_hdrlen 504field exists to account for padding between the header and the link level 505protocol. 506The purpose here is to guarantee proper alignment of the packet data 507structures, which is required on alignment-sensitive architectures and 508improves performance on many other architectures. 509The packet filter ensures that the 510.Fa bpf_hdr 511and the network layer header will be word aligned. 512Suitable precautions must be taken when accessing the link layer protocol 513fields on alignment restricted machines. 514(This isn't a problem on an Ethernet, since the type field is a 515.Li short 516falling on an even offset, and the addresses are probably accessed in a 517bytewise fashion). 518.Pp 519Additionally, individual packets are padded so that each starts on a 520word boundary. 521This requires that an application has some knowledge of how to get from packet 522to packet. 523The macro 524.Dv BPF_WORDALIGN 525is defined in 526.In net/bpf.h 527to facilitate this process. 528It rounds up its argument to the nearest word aligned value (where a word is 529.Dv BPF_ALIGNMENT 530bytes wide). 531For example, if 532.Va p 533points to the start of a packet, this expression will advance it to the 534next packet: 535.Pp 536.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen); 537.Pp 538For the alignment mechanisms to work properly, the buffer passed to 539.Xr read 2 540must itself be word aligned. 541.Xr malloc 3 542will always return an aligned buffer. 543.Ss Filter machine 544A filter program is an array of instructions with all branches forwardly 545directed, terminated by a 546.Dq return 547instruction. 548Each instruction performs some action on the pseudo-machine state, which 549consists of an accumulator, index register, scratch memory store, and 550implicit program counter. 551.Pp 552The following structure defines the instruction format: 553.Bd -literal -offset indent 554struct bpf_insn { 555 u_int16_t code; 556 u_char jt; 557 u_char jf; 558 u_int32_t k; 559}; 560.Ed 561.Pp 562The 563.Fa k 564field is used in different ways by different instructions, and the 565.Fa jt 566and 567.Fa jf 568fields are used as offsets by the branch instructions. 569The opcodes are encoded in a semi-hierarchical fashion. 570There are eight classes of instructions: 571.Dv BPF_LD , 572.Dv BPF_LDX , 573.Dv BPF_ST , 574.Dv BPF_STX , 575.Dv BPF_ALU , 576.Dv BPF_JMP , 577.Dv BPF_RET , 578and 579.Dv BPF_MISC . 580Various other mode and operator bits are logically OR'd into the class to 581give the actual instructions. 582The classes and modes are defined in 583.In net/bpf.h . 584Below are the semantics for each defined 585.Nm 586instruction. 587We use the convention that A is the accumulator, X is the index register, 588P[] packet data, and M[] scratch memory store. 589P[i:n] gives the data at byte offset 590.Dq i 591in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or 592unsigned byte (n=1). 593M[i] gives the i'th word in the scratch memory store, which is only addressed 594in word units. 595The memory store is indexed from 0 to 596.Dv BPF_MEMWORDS Ns \-1 . 597.Fa k , 598.Fa jt , 599and 600.Fa jf 601are the corresponding fields in the instruction definition. 602.Dq len 603refers to the length of the packet. 604.Bl -tag -width Ds 605.It Dv BPF_LD 606These instructions copy a value into the accumulator. 607The type of the source operand is specified by an 608.Dq addressing mode 609and can be a constant 610.Pf ( Dv BPF_IMM ) , 611packet data at a fixed offset 612.Pf ( Dv BPF_ABS ) , 613packet data at a variable offset 614.Pf ( Dv BPF_IND ) , 615the packet length 616.Pf ( Dv BPF_LEN ) , 617a random number 618.Pf ( Dv BPF_RND ) , 619or a word in the scratch memory store 620.Pf ( Dv BPF_MEM ) . 621For 622.Dv BPF_IND 623and 624.Dv BPF_ABS , 625the data size must be specified as a word 626.Pf ( Dv BPF_W ) , 627halfword 628.Pf ( Dv BPF_H ) , 629or byte 630.Pf ( Dv BPF_B ) . 631The semantics of all recognized 632.Dv BPF_LD 633instructions follow. 634.Pp 635.Bl -tag -width 32n -compact 636.Sm off 637.It Xo Dv BPF_LD No + Dv BPF_W No + 638.Dv BPF_ABS 639.Xc 640.Sm on 641A <- P[k:4] 642.Sm off 643.It Xo Dv BPF_LD No + Dv BPF_H No + 644.Dv BPF_ABS 645.Xc 646.Sm on 647A <- P[k:2] 648.Sm off 649.It Xo Dv BPF_LD No + Dv BPF_B No + 650.Dv BPF_ABS 651.Xc 652.Sm on 653A <- P[k:1] 654.Sm off 655.It Xo Dv BPF_LD No + Dv BPF_W No + 656.Dv BPF_IND 657.Xc 658.Sm on 659A <- P[X+k:4] 660.Sm off 661.It Xo Dv BPF_LD No + Dv BPF_H No + 662.Dv BPF_IND 663.Xc 664.Sm on 665A <- P[X+k:2] 666.Sm off 667.It Xo Dv BPF_LD No + Dv BPF_B No + 668.Dv BPF_IND 669.Xc 670.Sm on 671A <- P[X+k:1] 672.Sm off 673.It Xo Dv BPF_LD No + Dv BPF_W No + 674.Dv BPF_LEN 675.Xc 676.Sm on 677A <- len 678.Sm off 679.It Xo Dv BPF_LD No + Dv BPF_W No + 680.Dv BPF_RND 681.Xc 682.Sm on 683A <- arc4random() 684.Sm off 685.It Dv BPF_LD No + Dv BPF_IMM 686.Sm on 687A <- k 688.Sm off 689.It Dv BPF_LD No + Dv BPF_MEM 690.Sm on 691A <- M[k] 692.El 693.It Dv BPF_LDX 694These instructions load a value into the index register. 695Note that the addressing modes are more restricted than those of the 696accumulator loads, but they include 697.Dv BPF_MSH , 698a hack for efficiently loading the IP header length. 699.Pp 700.Bl -tag -width 32n -compact 701.Sm off 702.It Xo Dv BPF_LDX No + Dv BPF_W No + 703.Dv BPF_IMM 704.Xc 705.Sm on 706X <- k 707.Sm off 708.It Xo Dv BPF_LDX No + Dv BPF_W No + 709.Dv BPF_MEM 710.Xc 711.Sm on 712X <- M[k] 713.Sm off 714.It Xo Dv BPF_LDX No + Dv BPF_W No + 715.Dv BPF_LEN 716.Xc 717.Sm on 718X <- len 719.Sm off 720.It Xo Dv BPF_LDX No + Dv BPF_B No + 721.Dv BPF_MSH 722.Xc 723.Sm on 724X <- 4*(P[k:1]&0xf) 725.El 726.It Dv BPF_ST 727This instruction stores the accumulator into the scratch memory. 728We do not need an addressing mode since there is only one possibility for 729the destination. 730.Pp 731.Bl -tag -width 32n -compact 732.It Dv BPF_ST 733M[k] <- A 734.El 735.It Dv BPF_STX 736This instruction stores the index register in the scratch memory store. 737.Pp 738.Bl -tag -width 32n -compact 739.It Dv BPF_STX 740M[k] <- X 741.El 742.It Dv BPF_ALU 743The ALU instructions perform operations between the accumulator and index 744register or constant, and store the result back in the accumulator. 745For binary operations, a source mode is required 746.Pf ( Dv BPF_K 747or 748.Dv BPF_X ) . 749.Pp 750.Bl -tag -width 32n -compact 751.Sm off 752.It Xo Dv BPF_ALU No + BPF_ADD No + 753.Dv BPF_K 754.Xc 755.Sm on 756A <- A + k 757.Sm off 758.It Xo Dv BPF_ALU No + BPF_SUB No + 759.Dv BPF_K 760.Xc 761.Sm on 762A <- A - k 763.Sm off 764.It Xo Dv BPF_ALU No + BPF_MUL No + 765.Dv BPF_K 766.Xc 767.Sm on 768A <- A * k 769.Sm off 770.It Xo Dv BPF_ALU No + BPF_DIV No + 771.Dv BPF_K 772.Xc 773.Sm on 774A <- A / k 775.Sm off 776.It Xo Dv BPF_ALU No + BPF_AND No + 777.Dv BPF_K 778.Xc 779.Sm on 780A <- A & k 781.Sm off 782.It Xo Dv BPF_ALU No + BPF_OR No + 783.Dv BPF_K 784.Xc 785.Sm on 786A <- A | k 787.Sm off 788.It Xo Dv BPF_ALU No + BPF_LSH No + 789.Dv BPF_K 790.Xc 791.Sm on 792A <- A << k 793.Sm off 794.It Xo Dv BPF_ALU No + BPF_RSH No + 795.Dv BPF_K 796.Xc 797.Sm on 798A <- A >> k 799.Sm off 800.It Xo Dv BPF_ALU No + BPF_ADD No + 801.Dv BPF_X 802.Xc 803.Sm on 804A <- A + X 805.Sm off 806.It Xo Dv BPF_ALU No + BPF_SUB No + 807.Dv BPF_X 808.Xc 809.Sm on 810A <- A - X 811.Sm off 812.It Xo Dv BPF_ALU No + BPF_MUL No + 813.Dv BPF_X 814.Xc 815.Sm on 816A <- A * X 817.Sm off 818.It Xo Dv BPF_ALU No + BPF_DIV No + 819.Dv BPF_X 820.Xc 821.Sm on 822A <- A / X 823.Sm off 824.It Xo Dv BPF_ALU No + BPF_AND No + 825.Dv BPF_X 826.Xc 827.Sm on 828A <- A & X 829.Sm off 830.It Xo Dv BPF_ALU No + BPF_OR No + 831.Dv BPF_X 832.Xc 833.Sm on 834A <- A | X 835.Sm off 836.It Xo Dv BPF_ALU No + BPF_LSH No + 837.Dv BPF_X 838.Xc 839.Sm on 840A <- A << X 841.Sm off 842.It Xo Dv BPF_ALU No + BPF_RSH No + 843.Dv BPF_X 844.Xc 845.Sm on 846A <- A >> X 847.Sm off 848.It Dv BPF_ALU No + BPF_NEG 849.Sm on 850A <- -A 851.El 852.It Dv BPF_JMP 853The jump instructions alter flow of control. 854Conditional jumps compare the accumulator against a constant 855.Pf ( Dv BPF_K ) 856or the index register 857.Pf ( Dv BPF_X ) . 858If the result is true (or non-zero), the true branch is taken, otherwise the 859false branch is taken. 860Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 861However, the jump always 862.Pf ( Dv BPF_JA ) 863opcode uses the 32-bit 864.Fa k 865field as the offset, allowing arbitrarily distant destinations. 866All conditionals use unsigned comparison conventions. 867.Pp 868.Bl -tag -width 32n -compact 869.Sm off 870.It Dv BPF_JMP No + BPF_JA 871pc += k 872.Sm on 873.Sm off 874.It Xo Dv BPF_JMP No + BPF_JGT No + 875.Dv BPF_K 876.Xc 877.Sm on 878pc += (A > k) ? jt : jf 879.Sm off 880.It Xo Dv BPF_JMP No + BPF_JGE No + 881.Dv BPF_K 882.Xc 883.Sm on 884pc += (A >= k) ? jt : jf 885.Sm off 886.It Xo Dv BPF_JMP No + BPF_JEQ No + 887.Dv BPF_K 888.Xc 889.Sm on 890pc += (A == k) ? jt : jf 891.Sm off 892.It Xo Dv BPF_JMP No + BPF_JSET No + 893.Dv BPF_K 894.Xc 895.Sm on 896pc += (A & k) ? jt : jf 897.Sm off 898.It Xo Dv BPF_JMP No + BPF_JGT No + 899.Dv BPF_X 900.Xc 901.Sm on 902pc += (A > X) ? jt : jf 903.Sm off 904.It Xo Dv BPF_JMP No + BPF_JGE No + 905.Dv BPF_X 906.Xc 907.Sm on 908pc += (A >= X) ? jt : jf 909.Sm off 910.It Xo Dv BPF_JMP No + BPF_JEQ No + 911.Dv BPF_X 912.Xc 913.Sm on 914pc += (A == X) ? jt : jf 915.Sm off 916.It Xo Dv BPF_JMP No + BPF_JSET No + 917.Dv BPF_X 918.Xc 919.Sm on 920pc += (A & X) ? jt : jf 921.El 922.It Dv BPF_RET 923The return instructions terminate the filter program and specify the 924amount of packet to accept (i.e., they return the truncation amount) 925or, for the write filter, the maximum acceptable size for the packet 926(i.e., the packet is dropped if it is larger than the returned 927amount). 928A return value of zero indicates that the packet should be ignored/dropped. 929The return value is either a constant 930.Pf ( Dv BPF_K ) 931or the accumulator 932.Pf ( Dv BPF_A ) . 933.Pp 934.Bl -tag -width 32n -compact 935.It Dv BPF_RET No + Dv BPF_A 936Accept A bytes. 937.It Dv BPF_RET No + Dv BPF_K 938Accept k bytes. 939.El 940.It Dv BPF_MISC 941The miscellaneous category was created for anything that doesn't fit into 942the above classes, and for any new instructions that might need to be added. 943Currently, these are the register transfer instructions that copy the index 944register to the accumulator or vice versa. 945.Pp 946.Bl -tag -width 32n -compact 947.Sm off 948.It Dv BPF_MISC No + Dv BPF_TAX 949.Sm on 950X <- A 951.Sm off 952.It Dv BPF_MISC No + Dv BPF_TXA 953.Sm on 954A <- X 955.El 956.El 957.Pp 958The 959.Nm 960interface provides the following macros to facilitate array initializers: 961.Bd -filled -offset indent 962.Dv BPF_STMT ( Ns Ar opcode , 963.Ar operand ) 964.Pp 965.Dv BPF_JUMP ( Ns Ar opcode , 966.Ar operand , 967.Ar true_offset , 968.Ar false_offset ) 969.Ed 970.Sh FILES 971.Bl -tag -width /dev/bpf -compact 972.It Pa /dev/bpf 973.Nm 974device 975.El 976.Sh EXAMPLES 977The following filter is taken from the Reverse ARP daemon. 978It accepts only Reverse ARP requests. 979.Bd -literal -offset indent 980struct bpf_insn insns[] = { 981 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 982 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 983 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 984 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 985 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 986 sizeof(struct ether_header)), 987 BPF_STMT(BPF_RET+BPF_K, 0), 988}; 989.Ed 990.Pp 991This filter accepts only IP packets between host 128.3.112.15 and 992128.3.112.35. 993.Bd -literal -offset indent 994struct bpf_insn insns[] = { 995 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 996 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 997 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 998 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 999 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 1000 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 1001 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 1002 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 1003 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 1004 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1005 BPF_STMT(BPF_RET+BPF_K, 0), 1006}; 1007.Ed 1008.Pp 1009Finally, this filter returns only TCP finger packets. 1010We must parse the IP header to reach the TCP header. 1011The 1012.Dv BPF_JSET 1013instruction checks that the IP fragment offset is 0 so we are sure that we 1014have a TCP header. 1015.Bd -literal -offset indent 1016struct bpf_insn insns[] = { 1017 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 1018 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 1019 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 1020 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 1021 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 1022 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 1023 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 1024 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 1025 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 1026 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 1027 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 1028 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1029 BPF_STMT(BPF_RET+BPF_K, 0), 1030}; 1031.Ed 1032.Sh ERRORS 1033If the 1034.Xr ioctl 2 1035call fails, 1036.Xr errno 2 1037is set to one of the following values: 1038.Bl -tag -width Er 1039.It Bq Er EINVAL 1040The timeout used in a 1041.Dv BIOCSRTIMEOUT 1042request is negative. 1043.It Bq Er EINVAL 1044The timeout used in a 1045.Dv BIOCSRTIMEOUT 1046request specified a microsecond value less than zero or 1047greater than or equal to 1 million. 1048.It Bq Er EOVERFLOW 1049The timeout used in a 1050.Dv BIOCSRTIMEOUT 1051request is too large to be represented by an 1052.Vt int . 1053.El 1054.Sh SEE ALSO 1055.Xr ioctl 2 , 1056.Xr read 2 , 1057.Xr select 2 , 1058.Xr signal 3 , 1059.Xr MAKEDEV 8 , 1060.Xr tcpdump 8 , 1061.Xr arc4random 9 1062.Rs 1063.%A McCanne, S. 1064.%A Jacobson, V. 1065.%D January 1993 1066.%J 1993 Winter USENIX Conference 1067.%T The BSD Packet Filter: A New Architecture for User-level Packet Capture 1068.Re 1069.Sh HISTORY 1070The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid 1071at Carnegie-Mellon University. 1072Jeffrey Mogul, at Stanford, ported the code to 1073.Bx 1074and continued its 1075development from 1983 on. 1076Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS 1077NIT module under SunOS 4.1, and BPF. 1078.Sh AUTHORS 1079.An -nosplit 1080.An Steve McCanne 1081of Lawrence Berkeley Laboratory implemented BPF in Summer 1990. 1082Much of the design is due to 1083.An Van Jacobson . 1084.Sh BUGS 1085The read buffer must be of a fixed size (returned by the 1086.Dv BIOCGBLEN 1087ioctl). 1088.Pp 1089A file that does not request promiscuous mode may receive promiscuously 1090received packets as a side effect of another file requesting this mode on 1091the same hardware interface. 1092This could be fixed in the kernel with additional processing overhead. 1093However, we favor the model where all files must assume that the interface 1094is promiscuous, and if so desired, must utilize a filter to reject foreign 1095packets. 1096