1.\" $OpenBSD: bpf.4,v 1.30 2009/03/01 18:59:50 otto Exp $ 2.\" $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $ 3.\" 4.\" Copyright (c) 1990 The Regents of the University of California. 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that: (1) source code distributions 9.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" distributions including binary code include the above copyright notice and 11.\" this paragraph in its entirety in the documentation or other materials 12.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" features or use of this software display the following acknowledgement: 14.\" ``This product includes software developed by the University of California, 15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" the University nor the names of its contributors may be used to endorse 17.\" or promote products derived from this software without specific prior 18.\" written permission. 19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" 23.\" This document is derived in part from the enet man page (enet.4) 24.\" distributed with 4.3BSD Unix. 25.\" 26.Dd $Mdocdate: March 1 2009 $ 27.Dt BPF 4 28.Os 29.Sh NAME 30.Nm bpf 31.Nd Berkeley Packet Filter 32.Sh SYNOPSIS 33.Cd "pseudo-device bpfilter" 34.Sh DESCRIPTION 35The Berkeley Packet Filter provides a raw interface to data link layers in 36a protocol-independent fashion. 37All packets on the network, even those destined for other hosts, are 38accessible through this mechanism. 39.Pp 40The packet filter appears as a character special device, 41.Pa /dev/bpf0 , 42.Pa /dev/bpf1 , 43etc. 44After opening the device, the file descriptor must be bound to a specific 45network interface with the 46.Dv BIOCSETIF 47.Xr ioctl 2 . 48A given interface can be shared between multiple listeners, and the filter 49underlying each descriptor will see an identical packet stream. 50.Pp 51A separate device file is required for each minor device. 52If a file is in use, the open will fail and 53.Va errno 54will be set to 55.Er EBUSY . 56The number of open files can be increased by creating additional 57device nodes with the 58.Xr MAKEDEV 8 59script. 60.Pp 61Associated with each open instance of a 62.Nm 63file is a user-settable 64packet filter. 65Whenever a packet is received by an interface, all file descriptors 66listening on that interface apply their filter. 67Each descriptor that accepts the packet receives its own copy. 68.Pp 69Reads from these files return the next group of packets that have matched 70the filter. 71To improve performance, the buffer passed to read must be the same size as 72the buffers used internally by 73.Nm bpf . 74This size is returned by the 75.Dv BIOCGBLEN 76.Xr ioctl 2 77and can be set with 78.Dv BIOCSBLEN . 79Note that an individual packet larger than this size is necessarily truncated. 80.Pp 81A packet can be sent out on the network by writing to a 82.Nm 83file descriptor. 84Each descriptor can also have a user-settable filter 85for controlling the writes. 86Only packets matching the filter are sent out of the interface. 87The writes are unbuffered, meaning only one packet can be processed per write. 88.Pp 89Once a descriptor is configured, further changes to the configuration 90can be prevented using the 91.Dv BIOCLOCK 92.Xr ioctl 2 . 93.Sh IOCTL INTERFACE 94The 95.Xr ioctl 2 96command codes below are defined in 97.Aq Pa net/bpf.h . 98All commands require these includes: 99.Bd -unfilled -offset indent 100.Cd #include <sys/types.h> 101.Cd #include <sys/time.h> 102.Cd #include <sys/ioctl.h> 103.Cd #include <net/bpf.h> 104.Ed 105.Pp 106Additionally, 107.Dv BIOCGETIF 108and 109.Dv BIOCSETIF 110require 111.Aq Pa sys/socket.h 112and 113.Aq Pa net/if.h . 114.Pp 115The (third) argument to the 116.Xr ioctl 2 117call should be a pointer to the type indicated. 118.Pp 119.Bl -tag -width Ds -compact 120.It Dv BIOCGBLEN Fa "u_int *" 121Returns the required buffer length for reads on 122.Nm 123files. 124.Pp 125.It Dv BIOCSBLEN Fa "u_int *" 126Sets the buffer length for reads on 127.Nm 128files. 129The buffer must be set before the file is attached to an interface with 130.Dv BIOCSETIF . 131If the requested buffer size cannot be accommodated, the closest allowable 132size will be set and returned in the argument. 133A read call will result in 134.Er EINVAL 135if it is passed a buffer that is not this size. 136.Pp 137.It Dv BIOCGDLT Fa "u_int *" 138Returns the type of the data link layer underlying the attached interface. 139.Er EINVAL 140is returned if no interface has been specified. 141The device types, prefixed with 142.Dq DLT_ , 143are defined in 144.Aq Pa net/bpf.h . 145.Pp 146.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *" 147Returns an array of the available types of the data link layer 148underlying the attached interface: 149.Bd -literal -offset indent 150struct bpf_dltlist { 151 u_int bfl_len; 152 u_int *bfl_list; 153}; 154.Ed 155.Pp 156The available types are returned in the array pointed to by the 157.Va bfl_list 158field while their length in 159.Vt u_int 160is supplied to the 161.Va bfl_len 162field. 163.Er ENOMEM 164is returned if there is not enough buffer space and 165.Er EFAULT 166is returned if a bad address is encountered. 167The 168.Va bfl_len 169field is modified on return to indicate the actual length in 170.Vt u_int 171of the array returned. 172If 173.Va bfl_list 174is 175.Dv NULL , 176the 177.Va bfl_len 178field is set to indicate the required length of the array in 179.Vt u_int . 180.Pp 181.It Dv BIOCSDLT Fa "u_int *" 182Changes the type of the data link layer underlying the attached interface. 183.Er EINVAL 184is returned if no interface has been specified or the specified 185type is not available for the interface. 186.Pp 187.It Dv BIOCPROMISC 188Forces the interface into promiscuous mode. 189All packets, not just those destined for the local host, are processed. 190Since more than one file can be listening on a given interface, a listener 191that opened its interface non-promiscuously may receive packets promiscuously. 192This problem can be remedied with an appropriate filter. 193.Pp 194The interface remains in promiscuous mode until all files listening 195promiscuously are closed. 196.Pp 197.It Dv BIOCFLUSH 198Flushes the buffer of incoming packets and resets the statistics that are 199returned by 200.Dv BIOCGSTATS . 201.Pp 202.It Dv BIOCLOCK 203This ioctl is designed to prevent the security issues associated 204with an open 205.Nm 206descriptor in unprivileged programs. 207Even with dropped privileges, an open 208.Nm 209descriptor can be abused by a rogue program to listen on any interface 210on the system, send packets on these interfaces if the descriptor was 211opened read-write and send signals to arbitrary processes using the 212signaling mechanism of 213.Nm bpf . 214By allowing only 215.Dq known safe 216ioctls, the 217.Dv BIOCLOCK 218ioctl prevents this abuse. 219The allowable ioctls are 220.Dv BIOCFLUSH , 221.Dv BIOCGBLEN , 222.Dv BIOCGDIRFILT , 223.Dv BIOCGDLT , 224.Dv BIOCGDLTLIST , 225.Dv BIOCGETIF , 226.Dv BIOCGHDRCMPLT , 227.Dv BIOCGRSIG , 228.Dv BIOCGRTIMEOUT , 229.Dv BIOCGSTATS , 230.Dv BIOCIMMEDIATE , 231.Dv BIOCLOCK , 232.Dv BIOCSRTIMEOUT , 233.Dv BIOCVERSION , 234.Dv TIOCGPGRP , 235and 236.Dv FIONREAD . 237Use of any other ioctl is denied with error 238.Er EPERM . 239Once a descriptor is locked, it is not possible to unlock it. 240A process with root privileges is not affected by the lock. 241.Pp 242A privileged program can open a 243.Nm 244device, drop privileges, set the interface, filters and modes on the 245descriptor, and lock it. 246Once the descriptor is locked, the system is safe 247from further abuse through the descriptor. 248Locking a descriptor does not prevent writes. 249If the application does not need to send packets through 250.Nm bpf , 251it can open the device read-only to prevent writing. 252If sending packets is necessary, a write-filter can be set before locking the 253descriptor to prevent arbitrary packets from being sent out. 254.Pp 255.It Dv BIOCGETIF Fa "struct ifreq *" 256Returns the name of the hardware interface that the file is listening on. 257The name is returned in the 258.Fa ifr_name 259field of the 260.Li struct ifreq . 261All other fields are undefined. 262.Pp 263.It Dv BIOCSETIF Fa "struct ifreq *" 264Sets the hardware interface associated with the file. 265This command must be performed before any packets can be read. 266The device is indicated by name using the 267.Fa ifr_name 268field of the 269.Li struct ifreq . 270Additionally, performs the actions of 271.Dv BIOCFLUSH . 272.Pp 273.It Dv BIOCSRTIMEOUT Fa "struct timeval *" 274.It Dv BIOCGRTIMEOUT Fa "struct timeval *" 275Sets or gets the read timeout parameter. 276The 277.Ar timeval 278specifies the length of time to wait before timing out on a read request. 279This parameter is initialized to zero by 280.Xr open 2 , 281indicating no timeout. 282.Pp 283.It Dv BIOCGSTATS Fa "struct bpf_stat *" 284Returns the following structure of packet statistics: 285.Bd -literal -offset indent 286struct bpf_stat { 287 u_int bs_recv; 288 u_int bs_drop; 289}; 290.Ed 291.Pp 292The fields are: 293.Bl -tag -width bs_recv 294.It Fa bs_recv 295Number of packets received by the descriptor since opened or reset (including 296any buffered since the last read call). 297.It Fa bs_drop 298Number of packets which were accepted by the filter but dropped by the kernel 299because of buffer overflows (i.e., the application's reads aren't keeping up 300with the packet traffic). 301.El 302.Pp 303.It Dv BIOCIMMEDIATE Fa "u_int *" 304Enables or disables 305.Dq immediate mode , 306based on the truth value of the argument. 307When immediate mode is enabled, reads return immediately upon packet reception. 308Otherwise, a read will block until either the kernel buffer becomes full or a 309timeout occurs. 310This is useful for programs like 311.Xr rarpd 8 , 312which must respond to messages in real time. 313The default for a new file is off. 314.Pp 315.It Dv BIOCSETF Fa "struct bpf_program *" 316Sets the filter program used by the kernel to discard uninteresting packets. 317An array of instructions and its length are passed in using the following 318structure: 319.Bd -literal -offset indent 320struct bpf_program { 321 u_int bf_len; 322 struct bpf_insn *bf_insns; 323}; 324.Ed 325.Pp 326The filter program is pointed to by the 327.Fa bf_insns 328field, while its length in units of 329.Li struct bpf_insn 330is given by the 331.Fa bf_len 332field. 333Also, the actions of 334.Dv BIOCFLUSH 335are performed. 336.Pp 337See section 338.Sx FILTER MACHINE 339for an explanation of the filter language. 340.Pp 341.It Dv BIOCSETWF Fa "struct bpf_program *" 342Sets the filter program used by the kernel to filter the packets 343written to the descriptor before the packets are sent out on the 344network. 345See 346.Dv BIOCSETF 347for a description of the filter program. 348This ioctl also acts as 349.Dv BIOCFLUSH . 350.Pp 351Note that the filter operates on the packet data written to the descriptor. 352If the 353.Dq header complete 354flag is not set, the kernel sets the link-layer source address 355of the packet after filtering. 356.Pp 357.It Dv BIOCVERSION Fa "struct bpf_version *" 358Returns the major and minor version numbers of the filter language currently 359recognized by the kernel. 360Before installing a filter, applications must check that the current version 361is compatible with the running kernel. 362Version numbers are compatible if the major numbers match and the application 363minor is less than or equal to the kernel minor. 364The kernel version number is returned in the following structure: 365.Bd -literal -offset indent 366struct bpf_version { 367 u_short bv_major; 368 u_short bv_minor; 369}; 370.Ed 371.Pp 372The current version numbers are given by 373.Dv BPF_MAJOR_VERSION 374and 375.Dv BPF_MINOR_VERSION 376from 377.Aq Pa net/bpf.h . 378An incompatible filter may result in undefined behavior (most likely, an 379error returned by 380.Xr ioctl 2 381or haphazard packet matching). 382.Pp 383.It Dv BIOCSRSIG Fa "u_int *" 384.It Dv BIOCGRSIG Fa "u_int *" 385Sets or gets the receive signal. 386This signal will be sent to the process or process group specified by 387.Dv FIOSETOWN . 388It defaults to 389.Dv SIGIO . 390.Pp 391.It Dv BIOCSHDRCMPLT Fa "u_int *" 392.It Dv BIOCGHDRCMPLT Fa "u_int *" 393Sets or gets the status of the 394.Dq header complete 395flag. 396Set to zero if the link level source address should be filled in 397automatically by the interface output routine. 398Set to one if the link level source address will be written, 399as provided, to the wire. 400This flag is initialized to zero by default. 401.Pp 402.It Dv BIOCSFILDROP Fa "u_int *" 403.It Dv BIOCGFILDROP Fa "u_int *" 404Sets or gets the status of the 405.Dq filter drop 406flag. 407If non-zero, packets matching any filters will be reported to the 408associated interface so that they can be dropped. 409.Pp 410.It Dv BIOCSDIRFILT Fa "u_int *" 411.It Dv BIOCGDIRFILT Fa "u_int *" 412Sets or gets the status of the 413.Dq direction filter 414flag. 415If non-zero, packets matching the specified direction (either 416.Dv BPF_DIRECTION_IN 417or 418.Dv BPF_DIRECTION_OUT ) 419will be ignored. 420.El 421.Ss Standard ioctls 422.Nm 423now supports several standard ioctls which allow the user to do asynchronous 424and/or non-blocking I/O to an open 425.Nm 426file descriptor. 427.Pp 428.Bl -tag -width Ds -compact 429.It Dv FIONREAD Fa "int *" 430Returns the number of bytes that are immediately available for reading. 431.Pp 432.It Dv SIOCGIFADDR Fa "struct ifreq *" 433Returns the address associated with the interface. 434.Pp 435.It Dv FIONBIO Fa "int *" 436Sets or clears non-blocking I/O. 437If the argument is non-zero, enable non-blocking I/O. 438If the argument is zero, disable non-blocking I/O. 439If non-blocking I/O is enabled, the return value of a read while no data 440is available will be 0. 441The non-blocking read behavior is different from performing non-blocking 442reads on other file descriptors, which will return \-1 and set 443.Va errno 444to 445.Er EAGAIN 446if no data is available. 447Note: setting this overrides the timeout set by 448.Dv BIOCSRTIMEOUT . 449.Pp 450.It Dv FIOASYNC Fa "int *" 451Enables or disables asynchronous I/O. 452When enabled (argument is non-zero), the process or process group specified 453by 454.Dv FIOSETOWN 455will start receiving 456.Dv SIGIO 457signals when packets arrive. 458Note that you must perform an 459.Dv FIOSETOWN 460command in order for this to take effect, as the system will not do it by 461default. 462The signal may be changed via 463.Dv BIOCSRSIG . 464.Pp 465.It Dv FIOSETOWN Fa "int *" 466.It Dv FIOGETOWN Fa "int *" 467Sets or gets the process or process group (if negative) that should receive 468.Dv SIGIO 469when packets are available. 470The signal may be changed using 471.Dv BIOCSRSIG 472(see above). 473.El 474.Ss BPF header 475The following structure is prepended to each packet returned by 476.Xr read 2 : 477.Bd -literal -offset indent 478struct bpf_hdr { 479 struct bpf_timeval bh_tstamp; 480 u_int32_t bh_caplen; 481 u_int32_t bh_datalen; 482 u_int16_t bh_hdrlen; 483}; 484.Ed 485.Pp 486The fields, stored in host order, are as follows: 487.Bl -tag -width Ds 488.It Fa bh_tstamp 489Time at which the packet was processed by the packet filter. 490.It Fa bh_caplen 491Length of the captured portion of the packet. 492This is the minimum of the truncation amount specified by the filter and the 493length of the packet. 494.It Fa bh_datalen 495Length of the packet off the wire. 496This value is independent of the truncation amount specified by the filter. 497.It Fa bh_hdrlen 498Length of the BPF header, which may not be equal to 499.Li sizeof(struct bpf_hdr) . 500.El 501.Pp 502The 503.Fa bh_hdrlen 504field exists to account for padding between the header and the link level 505protocol. 506The purpose here is to guarantee proper alignment of the packet data 507structures, which is required on alignment-sensitive architectures and 508improves performance on many other architectures. 509The packet filter ensures that the 510.Fa bpf_hdr 511and the network layer header will be word aligned. 512Suitable precautions must be taken when accessing the link layer protocol 513fields on alignment restricted machines. 514(This isn't a problem on an Ethernet, since the type field is a 515.Li short 516falling on an even offset, and the addresses are probably accessed in a 517bytewise fashion). 518.Pp 519Additionally, individual packets are padded so that each starts on a 520word boundary. 521This requires that an application has some knowledge of how to get from packet 522to packet. 523The macro 524.Dv BPF_WORDALIGN 525is defined in 526.Aq Pa net/bpf.h 527to facilitate this process. 528It rounds up its argument to the nearest word aligned value (where a word is 529.Dv BPF_ALIGNMENT 530bytes wide). 531For example, if 532.Va p 533points to the start of a packet, this expression will advance it to the 534next packet: 535.Pp 536.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen); 537.Pp 538For the alignment mechanisms to work properly, the buffer passed to 539.Xr read 2 540must itself be word aligned. 541.Xr malloc 3 542will always return an aligned buffer. 543.Ss Filter machine 544A filter program is an array of instructions with all branches forwardly 545directed, terminated by a 546.Dq return 547instruction. 548Each instruction performs some action on the pseudo-machine state, which 549consists of an accumulator, index register, scratch memory store, and 550implicit program counter. 551.Pp 552The following structure defines the instruction format: 553.Bd -literal -offset indent 554struct bpf_insn { 555 u_int16_t code; 556 u_char jt; 557 u_char jf; 558 u_int32_t k; 559}; 560.Ed 561.Pp 562The 563.Fa k 564field is used in different ways by different instructions, and the 565.Fa jt 566and 567.Fa jf 568fields are used as offsets by the branch instructions. 569The opcodes are encoded in a semi-hierarchical fashion. 570There are eight classes of instructions: 571.Dv BPF_LD , 572.Dv BPF_LDX , 573.Dv BPF_ST , 574.Dv BPF_STX , 575.Dv BPF_ALU , 576.Dv BPF_JMP , 577.Dv BPF_RET , 578and 579.Dv BPF_MISC . 580Various other mode and operator bits are logically OR'd into the class to 581give the actual instructions. 582The classes and modes are defined in 583.Aq Pa net/bpf.h . 584Below are the semantics for each defined 585.Nm 586instruction. 587We use the convention that A is the accumulator, X is the index register, 588P[] packet data, and M[] scratch memory store. 589P[i:n] gives the data at byte offset 590.Dq i 591in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or 592unsigned byte (n=1). 593M[i] gives the i'th word in the scratch memory store, which is only addressed 594in word units. 595The memory store is indexed from 0 to 596.Dv BPF_MEMWORDS Ns \-1 . 597.Fa k , 598.Fa jt , 599and 600.Fa jf 601are the corresponding fields in the instruction definition. 602.Dq len 603refers to the length of the packet. 604.Bl -tag -width Ds 605.It Dv BPF_LD 606These instructions copy a value into the accumulator. 607The type of the source operand is specified by an 608.Dq addressing mode 609and can be a constant 610.Pf ( Dv BPF_IMM ) , 611packet data at a fixed offset 612.Pf ( Dv BPF_ABS ) , 613packet data at a variable offset 614.Pf ( Dv BPF_IND ) , 615the packet length 616.Pf ( Dv BPF_LEN ) , 617or a word in the scratch memory store 618.Pf ( Dv BPF_MEM ) . 619For 620.Dv BPF_IND 621and 622.Dv BPF_ABS , 623the data size must be specified as a word 624.Pf ( Dv BPF_W ) , 625halfword 626.Pf ( Dv BPF_H ) , 627or byte 628.Pf ( Dv BPF_B ) . 629The semantics of all recognized 630.Dv BPF_LD 631instructions follow. 632.Pp 633.Bl -tag -width 32n -compact 634.Sm off 635.It Xo Dv BPF_LD No + Dv BPF_W No + 636.Dv BPF_ABS 637.Xc 638.Sm on 639A <- P[k:4] 640.Sm off 641.It Xo Dv BPF_LD No + Dv BPF_H No + 642.Dv BPF_ABS 643.Xc 644.Sm on 645A <- P[k:2] 646.Sm off 647.It Xo Dv BPF_LD No + Dv BPF_B No + 648.Dv BPF_ABS 649.Xc 650.Sm on 651A <- P[k:1] 652.Sm off 653.It Xo Dv BPF_LD No + Dv BPF_W No + 654.Dv BPF_IND 655.Xc 656.Sm on 657A <- P[X+k:4] 658.Sm off 659.It Xo Dv BPF_LD No + Dv BPF_H No + 660.Dv BPF_IND 661.Xc 662.Sm on 663A <- P[X+k:2] 664.Sm off 665.It Xo Dv BPF_LD No + Dv BPF_B No + 666.Dv BPF_IND 667.Xc 668.Sm on 669A <- P[X+k:1] 670.Sm off 671.It Xo Dv BPF_LD No + Dv BPF_W No + 672.Dv BPF_LEN 673.Xc 674.Sm on 675A <- len 676.Sm off 677.It Dv BPF_LD No + Dv BPF_IMM 678.Sm on 679A <- k 680.Sm off 681.It Dv BPF_LD No + Dv BPF_MEM 682.Sm on 683A <- M[k] 684.El 685.It Dv BPF_LDX 686These instructions load a value into the index register. 687Note that the addressing modes are more restricted than those of the 688accumulator loads, but they include 689.Dv BPF_MSH , 690a hack for efficiently loading the IP header length. 691.Pp 692.Bl -tag -width 32n -compact 693.Sm off 694.It Xo Dv BPF_LDX No + Dv BPF_W No + 695.Dv BPF_IMM 696.Xc 697.Sm on 698X <- k 699.Sm off 700.It Xo Dv BPF_LDX No + Dv BPF_W No + 701.Dv BPF_MEM 702.Xc 703.Sm on 704X <- M[k] 705.Sm off 706.It Xo Dv BPF_LDX No + Dv BPF_W No + 707.Dv BPF_LEN 708.Xc 709.Sm on 710X <- len 711.Sm off 712.It Xo Dv BPF_LDX No + Dv BPF_B No + 713.Dv BPF_MSH 714.Xc 715.Sm on 716X <- 4*(P[k:1]&0xf) 717.El 718.It Dv BPF_ST 719This instruction stores the accumulator into the scratch memory. 720We do not need an addressing mode since there is only one possibility for 721the destination. 722.Pp 723.Bl -tag -width 32n -compact 724.It Dv BPF_ST 725M[k] <- A 726.El 727.It Dv BPF_STX 728This instruction stores the index register in the scratch memory store. 729.Pp 730.Bl -tag -width 32n -compact 731.It Dv BPF_STX 732M[k] <- X 733.El 734.It Dv BPF_ALU 735The ALU instructions perform operations between the accumulator and index 736register or constant, and store the result back in the accumulator. 737For binary operations, a source mode is required 738.Pf ( Dv BPF_K 739or 740.Dv BPF_X ) . 741.Pp 742.Bl -tag -width 32n -compact 743.Sm off 744.It Xo Dv BPF_ALU No + BPF_ADD No + 745.Dv BPF_K 746.Xc 747.Sm on 748A <- A + k 749.Sm off 750.It Xo Dv BPF_ALU No + BPF_SUB No + 751.Dv BPF_K 752.Xc 753.Sm on 754A <- A - k 755.Sm off 756.It Xo Dv BPF_ALU No + BPF_MUL No + 757.Dv BPF_K 758.Xc 759.Sm on 760A <- A * k 761.Sm off 762.It Xo Dv BPF_ALU No + BPF_DIV No + 763.Dv BPF_K 764.Xc 765.Sm on 766A <- A / k 767.Sm off 768.It Xo Dv BPF_ALU No + BPF_AND No + 769.Dv BPF_K 770.Xc 771.Sm on 772A <- A & k 773.Sm off 774.It Xo Dv BPF_ALU No + BPF_OR No + 775.Dv BPF_K 776.Xc 777.Sm on 778A <- A | k 779.Sm off 780.It Xo Dv BPF_ALU No + BPF_LSH No + 781.Dv BPF_K 782.Xc 783.Sm on 784A <- A << k 785.Sm off 786.It Xo Dv BPF_ALU No + BPF_RSH No + 787.Dv BPF_K 788.Xc 789.Sm on 790A <- A >> k 791.Sm off 792.It Xo Dv BPF_ALU No + BPF_ADD No + 793.Dv BPF_X 794.Xc 795.Sm on 796A <- A + X 797.Sm off 798.It Xo Dv BPF_ALU No + BPF_SUB No + 799.Dv BPF_X 800.Xc 801.Sm on 802A <- A - X 803.Sm off 804.It Xo Dv BPF_ALU No + BPF_MUL No + 805.Dv BPF_X 806.Xc 807.Sm on 808A <- A * X 809.Sm off 810.It Xo Dv BPF_ALU No + BPF_DIV No + 811.Dv BPF_X 812.Xc 813.Sm on 814A <- A / X 815.Sm off 816.It Xo Dv BPF_ALU No + BPF_AND No + 817.Dv BPF_X 818.Xc 819.Sm on 820A <- A & X 821.Sm off 822.It Xo Dv BPF_ALU No + BPF_OR No + 823.Dv BPF_X 824.Xc 825.Sm on 826A <- A | X 827.Sm off 828.It Xo Dv BPF_ALU No + BPF_LSH No + 829.Dv BPF_X 830.Xc 831.Sm on 832A <- A << X 833.Sm off 834.It Xo Dv BPF_ALU No + BPF_RSH No + 835.Dv BPF_X 836.Xc 837.Sm on 838A <- A >> X 839.Sm off 840.It Dv BPF_ALU No + BPF_NEG 841.Sm on 842A <- -A 843.El 844.It Dv BPF_JMP 845The jump instructions alter flow of control. 846Conditional jumps compare the accumulator against a constant 847.Pf ( Dv BPF_K ) 848or the index register 849.Pf ( Dv BPF_X ) . 850If the result is true (or non-zero), the true branch is taken, otherwise the 851false branch is taken. 852Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 853However, the jump always 854.Pf ( Dv BPF_JA ) 855opcode uses the 32-bit 856.Fa k 857field as the offset, allowing arbitrarily distant destinations. 858All conditionals use unsigned comparison conventions. 859.Pp 860.Bl -tag -width 32n -compact 861.Sm off 862.It Dv BPF_JMP No + BPF_JA 863pc += k 864.Sm on 865.Sm off 866.It Xo Dv BPF_JMP No + BPF_JGT No + 867.Dv BPF_K 868.Xc 869.Sm on 870pc += (A > k) ? jt : jf 871.Sm off 872.It Xo Dv BPF_JMP No + BPF_JGE No + 873.Dv BPF_K 874.Xc 875.Sm on 876pc += (A >= k) ? jt : jf 877.Sm off 878.It Xo Dv BPF_JMP No + BPF_JEQ No + 879.Dv BPF_K 880.Xc 881.Sm on 882pc += (A == k) ? jt : jf 883.Sm off 884.It Xo Dv BPF_JMP No + BPF_JSET No + 885.Dv BPF_K 886.Xc 887.Sm on 888pc += (A & k) ? jt : jf 889.Sm off 890.It Xo Dv BPF_JMP No + BPF_JGT No + 891.Dv BPF_X 892.Xc 893.Sm on 894pc += (A > X) ? jt : jf 895.Sm off 896.It Xo Dv BPF_JMP No + BPF_JGE No + 897.Dv BPF_X 898.Xc 899.Sm on 900pc += (A >= X) ? jt : jf 901.Sm off 902.It Xo Dv BPF_JMP No + BPF_JEQ No + 903.Dv BPF_X 904.Xc 905.Sm on 906pc += (A == X) ? jt : jf 907.Sm off 908.It Xo Dv BPF_JMP No + BPF_JSET No + 909.Dv BPF_X 910.Xc 911.Sm on 912pc += (A & X) ? jt : jf 913.El 914.It Dv BPF_RET 915The return instructions terminate the filter program and specify the 916amount of packet to accept (i.e., they return the truncation amount) 917or, for the write filter, the maximum acceptable size for the packet 918(i.e., the packet is dropped if it is larger than the returned 919amount). 920A return value of zero indicates that the packet should be ignored/dropped. 921The return value is either a constant 922.Pf ( Dv BPF_K ) 923or the accumulator 924.Pf ( Dv BPF_A ) . 925.Pp 926.Bl -tag -width 32n -compact 927.It Dv BPF_RET No + Dv BPF_A 928Accept A bytes. 929.It Dv BPF_RET No + Dv BPF_K 930Accept k bytes. 931.El 932.It Dv BPF_MISC 933The miscellaneous category was created for anything that doesn't fit into 934the above classes, and for any new instructions that might need to be added. 935Currently, these are the register transfer instructions that copy the index 936register to the accumulator or vice versa. 937.Pp 938.Bl -tag -width 32n -compact 939.Sm off 940.It Dv BPF_MISC No + Dv BPF_TAX 941.Sm on 942X <- A 943.Sm off 944.It Dv BPF_MISC No + Dv BPF_TXA 945.Sm on 946A <- X 947.El 948.El 949.Pp 950The 951.Nm 952interface provides the following macros to facilitate array initializers: 953.Bd -filled -offset indent 954.Dv BPF_STMT ( Ns Ar opcode , 955.Ar operand ) 956.Pp 957.Dv BPF_JUMP ( Ns Ar opcode , 958.Ar operand , 959.Ar true_offset , 960.Ar false_offset ) 961.Ed 962.Sh FILES 963.Bl -tag -width /dev/bpf[0-9] -compact 964.It Pa /dev/bpf[0-9] 965.Nm 966devices 967.El 968.Sh EXAMPLES 969The following filter is taken from the Reverse ARP daemon. 970It accepts only Reverse ARP requests. 971.Bd -literal -offset indent 972struct bpf_insn insns[] = { 973 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 974 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 975 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 976 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 977 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 978 sizeof(struct ether_header)), 979 BPF_STMT(BPF_RET+BPF_K, 0), 980}; 981.Ed 982.Pp 983This filter accepts only IP packets between host 128.3.112.15 and 984128.3.112.35. 985.Bd -literal -offset indent 986struct bpf_insn insns[] = { 987 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 988 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 989 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 990 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 991 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 992 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 993 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 994 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 995 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 996 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 997 BPF_STMT(BPF_RET+BPF_K, 0), 998}; 999.Ed 1000.Pp 1001Finally, this filter returns only TCP finger packets. 1002We must parse the IP header to reach the TCP header. 1003The 1004.Dv BPF_JSET 1005instruction checks that the IP fragment offset is 0 so we are sure that we 1006have a TCP header. 1007.Bd -literal -offset indent 1008struct bpf_insn insns[] = { 1009 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 1010 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 1011 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 1012 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 1013 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 1014 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 1015 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 1016 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 1017 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 1018 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 1019 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 1020 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1021 BPF_STMT(BPF_RET+BPF_K, 0), 1022}; 1023.Ed 1024.Sh SEE ALSO 1025.Xr ioctl 2 , 1026.Xr read 2 , 1027.Xr select 2 , 1028.Xr signal 3 , 1029.Xr MAKEDEV 8 , 1030.Xr tcpdump 8 1031.Rs 1032.%A McCanne, S. 1033.%A Jacobson, V. 1034.%J "An efficient, extensible, and portable network monitor" 1035.Re 1036.Sh HISTORY 1037The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid 1038at Carnegie-Mellon University. 1039Jeffrey Mogul, at Stanford, ported the code to BSD and continued its 1040development from 1983 on. 1041Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS 1042NIT module under SunOS 4.1, and BPF. 1043.Sh AUTHORS 1044Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990. 1045Much of the design is due to Van Jacobson. 1046.Sh BUGS 1047The read buffer must be of a fixed size (returned by the 1048.Dv BIOCGBLEN 1049ioctl). 1050.Pp 1051A file that does not request promiscuous mode may receive promiscuously 1052received packets as a side effect of another file requesting this mode on 1053the same hardware interface. 1054This could be fixed in the kernel with additional processing overhead. 1055However, we favor the model where all files must assume that the interface 1056is promiscuous, and if so desired, must utilize a filter to reject foreign 1057packets. 1058