1.\" $OpenBSD: bpf.4,v 1.29 2007/05/31 19:19:49 jmc Exp $ 2.\" $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $ 3.\" 4.\" Copyright (c) 1990 The Regents of the University of California. 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that: (1) source code distributions 9.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" distributions including binary code include the above copyright notice and 11.\" this paragraph in its entirety in the documentation or other materials 12.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" features or use of this software display the following acknowledgement: 14.\" ``This product includes software developed by the University of California, 15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" the University nor the names of its contributors may be used to endorse 17.\" or promote products derived from this software without specific prior 18.\" written permission. 19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" 23.\" This document is derived in part from the enet man page (enet.4) 24.\" distributed with 4.3BSD Unix. 25.\" 26.Dd $Mdocdate: May 31 2007 $ 27.Dt BPF 4 28.Os 29.Sh NAME 30.Nm bpf 31.Nd Berkeley Packet Filter 32.Sh SYNOPSIS 33.Cd "pseudo-device bpfilter" 34.Sh DESCRIPTION 35The Berkeley Packet Filter provides a raw interface to data link layers in 36a protocol-independent fashion. 37All packets on the network, even those destined for other hosts, are 38accessible through this mechanism. 39.Pp 40The packet filter appears as a character special device, 41.Pa /dev/bpf0 , 42.Pa /dev/bpf1 , 43etc. 44After opening the device, the file descriptor must be bound to a specific 45network interface with the 46.Dv BIOCSETIF 47.Xr ioctl 2 . 48A given interface can be shared between multiple listeners, and the filter 49underlying each descriptor will see an identical packet stream. 50.Pp 51A separate device file is required for each minor device. 52If a file is in use, the open will fail and 53.Va errno 54will be set to 55.Er EBUSY . 56The number of open files can be increased by creating additional 57device nodes with the 58.Xr MAKEDEV 8 59script. 60.Pp 61Associated with each open instance of a 62.Nm 63file is a user-settable 64packet filter. 65Whenever a packet is received by an interface, all file descriptors 66listening on that interface apply their filter. 67Each descriptor that accepts the packet receives its own copy. 68.Pp 69Reads from these files return the next group of packets that have matched 70the filter. 71To improve performance, the buffer passed to read must be the same size as 72the buffers used internally by 73.Nm bpf . 74This size is returned by the 75.Dv BIOCGBLEN 76.Xr ioctl 2 77and can be set with 78.Dv BIOCSBLEN . 79Note that an individual packet larger than this size is necessarily truncated. 80.Pp 81The packet filter will support any link level protocol that has fixed length 82headers. 83Currently, only Ethernet, SLIP, and PPP drivers have been modified to 84interact with 85.Nm bpf . 86.Pp 87Since packet data is in network byte order, applications should use the 88.Xr byteorder 3 89macros to extract multi-byte values. 90.Pp 91A packet can be sent out on the network by writing to a 92.Nm 93file descriptor. 94Each descriptor can also have a user-settable filter 95for controlling the writes. 96Only packets matching the filter are sent out of the interface. 97The writes are unbuffered, meaning only one packet can be processed per write. 98.Pp 99Once a descriptor is configured, further changes to the configuration 100can be prevented using the 101.Dv BIOCLOCK 102.Xr ioctl 2 . 103.Sh IOCTL INTERFACE 104The 105.Xr ioctl 2 106command codes below are defined in 107.Aq Pa net/bpf.h . 108All commands require these includes: 109.Bd -unfilled -offset indent 110.Cd #include <sys/types.h> 111.Cd #include <sys/time.h> 112.Cd #include <sys/ioctl.h> 113.Cd #include <net/bpf.h> 114.Ed 115.Pp 116Additionally, 117.Dv BIOCGETIF 118and 119.Dv BIOCSETIF 120require 121.Aq Pa sys/socket.h 122and 123.Aq Pa net/if.h . 124.Pp 125The (third) argument to the 126.Xr ioctl 2 127call should be a pointer to the type indicated. 128.Pp 129.Bl -tag -width Ds -compact 130.It Dv BIOCGBLEN Fa "u_int *" 131Returns the required buffer length for reads on 132.Nm 133files. 134.Pp 135.It Dv BIOCSBLEN Fa "u_int *" 136Sets the buffer length for reads on 137.Nm 138files. 139The buffer must be set before the file is attached to an interface with 140.Dv BIOCSETIF . 141If the requested buffer size cannot be accommodated, the closest allowable 142size will be set and returned in the argument. 143A read call will result in 144.Er EIO 145if it is passed a buffer that is not this size. 146.Pp 147.It Dv BIOCGDLT Fa "u_int *" 148Returns the type of the data link layer underlying the attached interface. 149.Er EINVAL 150is returned if no interface has been specified. 151The device types, prefixed with 152.Dq DLT_ , 153are defined in 154.Aq Pa net/bpf.h . 155.Pp 156.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *" 157Returns an array of the available types of the data link layer 158underlying the attached interface: 159.Bd -literal -offset indent 160struct bpf_dltlist { 161 u_int bfl_len; 162 u_int *bfl_list; 163}; 164.Ed 165.Pp 166The available types are returned in the array pointed to by the 167.Va bfl_list 168field while their length in 169.Vt u_int 170is supplied to the 171.Va bfl_len 172field. 173.Er ENOMEM 174is returned if there is not enough buffer space and 175.Er EFAULT 176is returned if a bad address is encountered. 177The 178.Va bfl_len 179field is modified on return to indicate the actual length in 180.Vt u_int 181of the array returned. 182If 183.Va bfl_list 184is 185.Dv NULL , 186the 187.Va bfl_len 188field is set to indicate the required length of the array in 189.Vt u_int . 190.Pp 191.It Dv BIOCSDLT Fa "u_int *" 192Changes the type of the data link layer underlying the attached interface. 193.Er EINVAL 194is returned if no interface has been specified or the specified 195type is not available for the interface. 196.Pp 197.It Dv BIOCPROMISC 198Forces the interface into promiscuous mode. 199All packets, not just those destined for the local host, are processed. 200Since more than one file can be listening on a given interface, a listener 201that opened its interface non-promiscuously may receive packets promiscuously. 202This problem can be remedied with an appropriate filter. 203.Pp 204The interface remains in promiscuous mode until all files listening 205promiscuously are closed. 206.Pp 207.It Dv BIOCFLUSH 208Flushes the buffer of incoming packets and resets the statistics that are 209returned by 210.Dv BIOCGSTATS . 211.Pp 212.It Dv BIOCLOCK 213This ioctl is designed to prevent the security issues associated 214with an open 215.Nm 216descriptor in unprivileged programs. 217Even with dropped privileges, an open 218.Nm 219descriptor can be abused by a rogue program to listen on any interface 220on the system, send packets on these interfaces if the descriptor was 221opened read-write and send signals to arbitrary processes using the 222signaling mechanism of 223.Nm bpf . 224By allowing only 225.Dq known safe 226ioctls, the 227.Dv BIOCLOCK 228ioctl prevents this abuse. 229The allowable ioctls are 230.Dv BIOCFLUSH , 231.Dv BIOCGBLEN , 232.Dv BIOCGDIRFILT , 233.Dv BIOCGDLT , 234.Dv BIOCGDLTLIST , 235.Dv BIOCGETIF , 236.Dv BIOCGHDRCMPLT , 237.Dv BIOCGRSIG , 238.Dv BIOCGRTIMEOUT , 239.Dv BIOCGSTATS , 240.Dv BIOCIMMEDIATE , 241.Dv BIOCLOCK , 242.Dv BIOCSRTIMEOUT , 243.Dv BIOCVERSION , 244.Dv TIOCGPGRP , 245and 246.Dv FIONREAD . 247Use of any other ioctl is denied with error 248.Er EPERM . 249Once a descriptor is locked, it is not possible to unlock it. 250A process with root privileges is not affected by the lock. 251.Pp 252A privileged program can open a 253.Nm 254device, drop privileges, set the interface, filters and modes on the 255descriptor, and lock it. 256Once the descriptor is locked, the system is safe 257from further abuse through the descriptor. 258Locking a descriptor does not prevent writes. 259If the application does not need to send packets through 260.Nm bpf , 261it can open the device read-only to prevent writing. 262If sending packets is necessary, a write-filter can be set before locking the 263descriptor to prevent arbitrary packets from being sent out. 264.Pp 265.It Dv BIOCGETIF Fa "struct ifreq *" 266Returns the name of the hardware interface that the file is listening on. 267The name is returned in the 268.Fa ifr_name 269field of the 270.Li struct ifreq . 271All other fields are undefined. 272.Pp 273.It Dv BIOCSETIF Fa "struct ifreq *" 274Sets the hardware interface associated with the file. 275This command must be performed before any packets can be read. 276The device is indicated by name using the 277.Fa ifr_name 278field of the 279.Li struct ifreq . 280Additionally, performs the actions of 281.Dv BIOCFLUSH . 282.Pp 283.It Dv BIOCSRTIMEOUT Fa "struct timeval *" 284.It Dv BIOCGRTIMEOUT Fa "struct timeval *" 285Set or get the read timeout parameter. 286The 287.Ar timeval 288specifies the length of time to wait before timing out on a read request. 289This parameter is initialized to zero by 290.Xr open 2 , 291indicating no timeout. 292.Pp 293.It Dv BIOCGSTATS Fa "struct bpf_stat *" 294Returns the following structure of packet statistics: 295.Bd -literal -offset indent 296struct bpf_stat { 297 u_int bs_recv; 298 u_int bs_drop; 299}; 300.Ed 301.Pp 302The fields are: 303.Bl -tag -width bs_recv 304.It Fa bs_recv 305Number of packets received by the descriptor since opened or reset (including 306any buffered since the last read call). 307.It Fa bs_drop 308Number of packets which were accepted by the filter but dropped by the kernel 309because of buffer overflows (i.e., the application's reads aren't keeping up 310with the packet traffic). 311.El 312.Pp 313.It Dv BIOCIMMEDIATE Fa "u_int *" 314Enable or disable 315.Dq immediate mode , 316based on the truth value of the argument. 317When immediate mode is enabled, reads return immediately upon packet reception. 318Otherwise, a read will block until either the kernel buffer becomes full or a 319timeout occurs. 320This is useful for programs like 321.Xr rarpd 8 , 322which must respond to messages in real time. 323The default for a new file is off. 324.Pp 325.It Dv BIOCSETF Fa "struct bpf_program *" 326Sets the filter program used by the kernel to discard uninteresting packets. 327An array of instructions and its length are passed in using the following 328structure: 329.Bd -literal -offset indent 330struct bpf_program { 331 int bf_len; 332 struct bpf_insn *bf_insns; 333}; 334.Ed 335.Pp 336The filter program is pointed to by the 337.Fa bf_insns 338field, while its length in units of 339.Li struct bpf_insn 340is given by the 341.Fa bf_len 342field. 343Also, the actions of 344.Dv BIOCFLUSH 345are performed. 346.Pp 347See section 348.Sx FILTER MACHINE 349for an explanation of the filter language. 350.Pp 351.It Dv BIOCSETWF Fa "struct bpf_program *" 352Sets the filter program used by the kernel to filter the packets 353written to the descriptor before the packets are sent out on the 354network. 355See 356.Dv BIOCSETF 357for a description of the filter program. 358This ioctl also acts as 359.Dv BIOCFLUSH . 360.Pp 361Note that the filter operates on the packet data written to the descriptor. 362If the 363.Dq header complete 364flag is not set, the kernel sets the link-layer source address 365of the packet after filtering. 366.Pp 367.It Dv BIOCVERSION Fa "struct bpf_version *" 368Returns the major and minor version numbers of the filter language currently 369recognized by the kernel. 370Before installing a filter, applications must check that the current version 371is compatible with the running kernel. 372Version numbers are compatible if the major numbers match and the application 373minor is less than or equal to the kernel minor. 374The kernel version number is returned in the following structure: 375.Bd -literal -offset indent 376struct bpf_version { 377 u_short bv_major; 378 u_short bv_minor; 379}; 380.Ed 381.Pp 382The current version numbers are given by 383.Dv BPF_MAJOR_VERSION 384and 385.Dv BPF_MINOR_VERSION 386from 387.Aq Pa net/bpf.h . 388An incompatible filter may result in undefined behavior (most likely, an 389error returned by 390.Xr ioctl 2 391or haphazard packet matching). 392.Pp 393.It Dv BIOCSRSIG Fa "u_int *" 394.It Dv BIOCGRSIG Fa "u_int *" 395Set or get the receive signal. 396This signal will be sent to the process or process group specified by 397.Dv FIOSETOWN . 398It defaults to 399.Dv SIGIO . 400.Pp 401.It Dv BIOCSHDRCMPLT Fa "u_int *" 402.It Dv BIOCGHDRCMPLT Fa "u_int *" 403Set or get the status of the 404.Dq header complete 405flag. 406Set to zero if the link level source address should be filled in 407automatically by the interface output routine. 408Set to one if the link level source address will be written, 409as provided, to the wire. 410This flag is initialized to zero by default. 411.Pp 412.It Dv BIOCGFILDROP Fa "u_int *" 413.It Dv BIOCSFILDROP Fa "u_int *" 414Get or set the status of the 415.Dq filter drop 416flag. 417If non-zero, packets matching any filters will be reported to the 418associated interface so that they can be dropped. 419.Pp 420.It Dv BIOCGDIRFILT Fa "u_int *" 421.It Dv BIOCSDIRFILT Fa "u_int *" 422Get or set the status of the 423.Dq direction filter 424flag. 425If non-zero, packets matching the specified direction (either 426.Dv BPF_DIRECTION_IN 427or 428.Dv BPF_DIRECTION_OUT ) 429will be ignored. 430.El 431.Ss Standard ioctls 432.Nm 433now supports several standard ioctls which allow the user to do asynchronous 434and/or non-blocking I/O to an open 435.Nm 436file descriptor. 437.Pp 438.Bl -tag -width Ds -compact 439.It Dv FIONREAD Fa "int *" 440Returns the number of bytes that are immediately available for reading. 441.Pp 442.It Dv SIOCGIFADDR Fa "struct ifreq *" 443Returns the address associated with the interface. 444.Pp 445.It Dv FIONBIO Fa "int *" 446Set or clear non-blocking I/O. 447If the argument is non-zero, enable non-blocking I/O. 448If the argument is zero, disable non-blocking I/O. 449If non-blocking I/O is enabled, the return value of a read while no data 450is available will be 0. 451The non-blocking read behavior is different from performing non-blocking 452reads on other file descriptors, which will return \-1 and set 453.Va errno 454to 455.Er EAGAIN 456if no data is available. 457Note: setting this overrides the timeout set by 458.Dv BIOCSRTIMEOUT . 459.Pp 460.It Dv FIOASYNC Fa "int *" 461Enable or disable asynchronous I/O. 462When enabled (argument is non-zero), the process or process group specified 463by 464.Dv FIOSETOWN 465will start receiving 466.Dv SIGIO 467signals when packets arrive. 468Note that you must perform an 469.Dv FIOSETOWN 470command in order for this to take effect, as the system will not do it by 471default. 472The signal may be changed via 473.Dv BIOCSRSIG . 474.Pp 475.It Dv FIOSETOWN Fa "int *" 476.It Dv FIOGETOWN Fa "int *" 477Set or get the process or process group (if negative) that should receive 478.Dv SIGIO 479when packets are available. 480The signal may be changed using 481.Dv BIOCSRSIG 482(see above). 483.El 484.Ss BPF header 485The following structure is prepended to each packet returned by 486.Xr read 2 : 487.Bd -literal -offset indent 488struct bpf_hdr { 489 struct bpf_timeval bh_tstamp; 490 u_int32_t bh_caplen; 491 u_int32_t bh_datalen; 492 u_int16_t bh_hdrlen; 493}; 494.Ed 495.Pp 496The fields, stored in host order, are as follows: 497.Bl -tag -width Ds 498.It Fa bh_tstamp 499Time at which the packet was processed by the packet filter. 500.It Fa bh_caplen 501Length of the captured portion of the packet. 502This is the minimum of the truncation amount specified by the filter and the 503length of the packet. 504.It Fa bh_datalen 505Length of the packet off the wire. 506This value is independent of the truncation amount specified by the filter. 507.It Fa bh_hdrlen 508Length of the BPF header, which may not be equal to 509.Li sizeof(struct bpf_hdr) . 510.El 511.Pp 512The 513.Fa bh_hdrlen 514field exists to account for padding between the header and the link level 515protocol. 516The purpose here is to guarantee proper alignment of the packet data 517structures, which is required on alignment-sensitive architectures and 518improves performance on many other architectures. 519The packet filter ensures that the 520.Fa bpf_hdr 521and the network layer header will be word aligned. 522Suitable precautions must be taken when accessing the link layer protocol 523fields on alignment restricted machines. 524(This isn't a problem on an Ethernet, since the type field is a 525.Li short 526falling on an even offset, and the addresses are probably accessed in a 527bytewise fashion). 528.Pp 529Additionally, individual packets are padded so that each starts on a 530word boundary. 531This requires that an application has some knowledge of how to get from packet 532to packet. 533The macro 534.Dv BPF_WORDALIGN 535is defined in 536.Aq Pa net/bpf.h 537to facilitate this process. 538It rounds up its argument to the nearest word aligned value (where a word is 539.Dv BPF_ALIGNMENT 540bytes wide). 541For example, if 542.Va p 543points to the start of a packet, this expression will advance it to the 544next packet: 545.Pp 546.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen); 547.Pp 548For the alignment mechanisms to work properly, the buffer passed to 549.Xr read 2 550must itself be word aligned. 551.Xr malloc 3 552will always return an aligned buffer. 553.Ss Filter machine 554A filter program is an array of instructions with all branches forwardly 555directed, terminated by a 556.Dq return 557instruction. 558Each instruction performs some action on the pseudo-machine state, which 559consists of an accumulator, index register, scratch memory store, and 560implicit program counter. 561.Pp 562The following structure defines the instruction format: 563.Bd -literal -offset indent 564struct bpf_insn { 565 u_int16_t code; 566 u_char jt; 567 u_char jf; 568 u_int32_t k; 569}; 570.Ed 571.Pp 572The 573.Fa k 574field is used in different ways by different instructions, and the 575.Fa jt 576and 577.Fa jf 578fields are used as offsets by the branch instructions. 579The opcodes are encoded in a semi-hierarchical fashion. 580There are eight classes of instructions: 581.Dv BPF_LD , 582.Dv BPF_LDX , 583.Dv BPF_ST , 584.Dv BPF_STX , 585.Dv BPF_ALU , 586.Dv BPF_JMP , 587.Dv BPF_RET , 588and 589.Dv BPF_MISC . 590Various other mode and operator bits are logically OR'd into the class to 591give the actual instructions. 592The classes and modes are defined in 593.Aq Pa net/bpf.h . 594Below are the semantics for each defined 595.Nm 596instruction. 597We use the convention that A is the accumulator, X is the index register, 598P[] packet data, and M[] scratch memory store. 599P[i:n] gives the data at byte offset 600.Dq i 601in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or 602unsigned byte (n=1). 603M[i] gives the i'th word in the scratch memory store, which is only addressed 604in word units. 605The memory store is indexed from 0 to 606.Dv BPF_MEMWORDS Ns \-1 . 607.Fa k , 608.Fa jt , 609and 610.Fa jf 611are the corresponding fields in the instruction definition. 612.Dq len 613refers to the length of the packet. 614.Bl -tag -width Ds 615.It Dv BPF_LD 616These instructions copy a value into the accumulator. 617The type of the source operand is specified by an 618.Dq addressing mode 619and can be a constant 620.Pf ( Dv BPF_IMM ) , 621packet data at a fixed offset 622.Pf ( Dv BPF_ABS ) , 623packet data at a variable offset 624.Pf ( Dv BPF_IND ) , 625the packet length 626.Pf ( Dv BPF_LEN ) , 627or a word in the scratch memory store 628.Pf ( Dv BPF_MEM ) . 629For 630.Dv BPF_IND 631and 632.Dv BPF_ABS , 633the data size must be specified as a word 634.Pf ( Dv BPF_W ) , 635halfword 636.Pf ( Dv BPF_H ) , 637or byte 638.Pf ( Dv BPF_B ) . 639The semantics of all recognized 640.Dv BPF_LD 641instructions follow. 642.Pp 643.Bl -tag -width 32n -compact 644.Sm off 645.It Xo Dv BPF_LD No + Dv BPF_W No + 646.Dv BPF_ABS 647.Xc 648.Sm on 649A <- P[k:4] 650.Sm off 651.It Xo Dv BPF_LD No + Dv BPF_H No + 652.Dv BPF_ABS 653.Xc 654.Sm on 655A <- P[k:2] 656.Sm off 657.It Xo Dv BPF_LD No + Dv BPF_B No + 658.Dv BPF_ABS 659.Xc 660.Sm on 661A <- P[k:1] 662.Sm off 663.It Xo Dv BPF_LD No + Dv BPF_W No + 664.Dv BPF_IND 665.Xc 666.Sm on 667A <- P[X+k:4] 668.Sm off 669.It Xo Dv BPF_LD No + Dv BPF_H No + 670.Dv BPF_IND 671.Xc 672.Sm on 673A <- P[X+k:2] 674.Sm off 675.It Xo Dv BPF_LD No + Dv BPF_B No + 676.Dv BPF_IND 677.Xc 678.Sm on 679A <- P[X+k:1] 680.Sm off 681.It Xo Dv BPF_LD No + Dv BPF_W No + 682.Dv BPF_LEN 683.Xc 684.Sm on 685A <- len 686.Sm off 687.It Dv BPF_LD No + Dv BPF_IMM 688.Sm on 689A <- k 690.Sm off 691.It Dv BPF_LD No + Dv BPF_MEM 692.Sm on 693A <- M[k] 694.El 695.It Dv BPF_LDX 696These instructions load a value into the index register. 697Note that the addressing modes are more restricted than those of the 698accumulator loads, but they include 699.Dv BPF_MSH , 700a hack for efficiently loading the IP header length. 701.Pp 702.Bl -tag -width 32n -compact 703.Sm off 704.It Xo Dv BPF_LDX No + Dv BPF_W No + 705.Dv BPF_IMM 706.Xc 707.Sm on 708X <- k 709.Sm off 710.It Xo Dv BPF_LDX No + Dv BPF_W No + 711.Dv BPF_MEM 712.Xc 713.Sm on 714X <- M[k] 715.Sm off 716.It Xo Dv BPF_LDX No + Dv BPF_W No + 717.Dv BPF_LEN 718.Xc 719.Sm on 720X <- len 721.Sm off 722.It Xo Dv BPF_LDX No + Dv BPF_B No + 723.Dv BPF_MSH 724.Xc 725.Sm on 726X <- 4*(P[k:1]&0xf) 727.El 728.It Dv BPF_ST 729This instruction stores the accumulator into the scratch memory. 730We do not need an addressing mode since there is only one possibility for 731the destination. 732.Pp 733.Bl -tag -width 32n -compact 734.It Dv BPF_ST 735M[k] <- A 736.El 737.It Dv BPF_STX 738This instruction stores the index register in the scratch memory store. 739.Pp 740.Bl -tag -width 32n -compact 741.It Dv BPF_STX 742M[k] <- X 743.El 744.It Dv BPF_ALU 745The ALU instructions perform operations between the accumulator and index 746register or constant, and store the result back in the accumulator. 747For binary operations, a source mode is required 748.Pf ( Dv BPF_K 749or 750.Dv BPF_X ) . 751.Pp 752.Bl -tag -width 32n -compact 753.Sm off 754.It Xo Dv BPF_ALU No + BPF_ADD No + 755.Dv BPF_K 756.Xc 757.Sm on 758A <- A + k 759.Sm off 760.It Xo Dv BPF_ALU No + BPF_SUB No + 761.Dv BPF_K 762.Xc 763.Sm on 764A <- A - k 765.Sm off 766.It Xo Dv BPF_ALU No + BPF_MUL No + 767.Dv BPF_K 768.Xc 769.Sm on 770A <- A * k 771.Sm off 772.It Xo Dv BPF_ALU No + BPF_DIV No + 773.Dv BPF_K 774.Xc 775.Sm on 776A <- A / k 777.Sm off 778.It Xo Dv BPF_ALU No + BPF_AND No + 779.Dv BPF_K 780.Xc 781.Sm on 782A <- A & k 783.Sm off 784.It Xo Dv BPF_ALU No + BPF_OR No + 785.Dv BPF_K 786.Xc 787.Sm on 788A <- A | k 789.Sm off 790.It Xo Dv BPF_ALU No + BPF_LSH No + 791.Dv BPF_K 792.Xc 793.Sm on 794A <- A << k 795.Sm off 796.It Xo Dv BPF_ALU No + BPF_RSH No + 797.Dv BPF_K 798.Xc 799.Sm on 800A <- A >> k 801.Sm off 802.It Xo Dv BPF_ALU No + BPF_ADD No + 803.Dv BPF_X 804.Xc 805.Sm on 806A <- A + X 807.Sm off 808.It Xo Dv BPF_ALU No + BPF_SUB No + 809.Dv BPF_X 810.Xc 811.Sm on 812A <- A - X 813.Sm off 814.It Xo Dv BPF_ALU No + BPF_MUL No + 815.Dv BPF_X 816.Xc 817.Sm on 818A <- A * X 819.Sm off 820.It Xo Dv BPF_ALU No + BPF_DIV No + 821.Dv BPF_X 822.Xc 823.Sm on 824A <- A / X 825.Sm off 826.It Xo Dv BPF_ALU No + BPF_AND No + 827.Dv BPF_X 828.Xc 829.Sm on 830A <- A & X 831.Sm off 832.It Xo Dv BPF_ALU No + BPF_OR No + 833.Dv BPF_X 834.Xc 835.Sm on 836A <- A | X 837.Sm off 838.It Xo Dv BPF_ALU No + BPF_LSH No + 839.Dv BPF_X 840.Xc 841.Sm on 842A <- A << X 843.Sm off 844.It Xo Dv BPF_ALU No + BPF_RSH No + 845.Dv BPF_X 846.Xc 847.Sm on 848A <- A >> X 849.Sm off 850.It Dv BPF_ALU No + BPF_NEG 851.Sm on 852A <- -A 853.El 854.It Dv BPF_JMP 855The jump instructions alter flow of control. 856Conditional jumps compare the accumulator against a constant 857.Pf ( Dv BPF_K ) 858or the index register 859.Pf ( Dv BPF_X ) . 860If the result is true (or non-zero), the true branch is taken, otherwise the 861false branch is taken. 862Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 863However, the jump always 864.Pf ( Dv BPF_JA ) 865opcode uses the 32-bit 866.Fa k 867field as the offset, allowing arbitrarily distant destinations. 868All conditionals use unsigned comparison conventions. 869.Pp 870.Bl -tag -width 32n -compact 871.Sm off 872.It Dv BPF_JMP No + BPF_JA 873pc += k 874.Sm on 875.Sm off 876.It Xo Dv BPF_JMP No + BPF_JGT No + 877.Dv BPF_K 878.Xc 879.Sm on 880pc += (A > k) ? jt : jf 881.Sm off 882.It Xo Dv BPF_JMP No + BPF_JGE No + 883.Dv BPF_K 884.Xc 885.Sm on 886pc += (A >= k) ? jt : jf 887.Sm off 888.It Xo Dv BPF_JMP No + BPF_JEQ No + 889.Dv BPF_K 890.Xc 891.Sm on 892pc += (A == k) ? jt : jf 893.Sm off 894.It Xo Dv BPF_JMP No + BPF_JSET No + 895.Dv BPF_K 896.Xc 897.Sm on 898pc += (A & k) ? jt : jf 899.Sm off 900.It Xo Dv BPF_JMP No + BPF_JGT No + 901.Dv BPF_X 902.Xc 903.Sm on 904pc += (A > X) ? jt : jf 905.Sm off 906.It Xo Dv BPF_JMP No + BPF_JGE No + 907.Dv BPF_X 908.Xc 909.Sm on 910pc += (A >= X) ? jt : jf 911.Sm off 912.It Xo Dv BPF_JMP No + BPF_JEQ No + 913.Dv BPF_X 914.Xc 915.Sm on 916pc += (A == X) ? jt : jf 917.Sm off 918.It Xo Dv BPF_JMP No + BPF_JSET No + 919.Dv BPF_X 920.Xc 921.Sm on 922pc += (A & X) ? jt : jf 923.El 924.It Dv BPF_RET 925The return instructions terminate the filter program and specify the 926amount of packet to accept (i.e., they return the truncation amount) 927or, for the write filter, the maximum acceptable size for the packet 928(i.e., the packet is dropped if it is larger than the returned 929amount). 930A return value of zero indicates that the packet should be ignored/dropped. 931The return value is either a constant 932.Pf ( Dv BPF_K ) 933or the accumulator 934.Pf ( Dv BPF_A ) . 935.Pp 936.Bl -tag -width 32n -compact 937.It Dv BPF_RET No + Dv BPF_A 938Accept A bytes. 939.It Dv BPF_RET No + Dv BPF_K 940Accept k bytes. 941.El 942.It Dv BPF_MISC 943The miscellaneous category was created for anything that doesn't fit into 944the above classes, and for any new instructions that might need to be added. 945Currently, these are the register transfer instructions that copy the index 946register to the accumulator or vice versa. 947.Pp 948.Bl -tag -width 32n -compact 949.Sm off 950.It Dv BPF_MISC No + Dv BPF_TAX 951.Sm on 952X <- A 953.Sm off 954.It Dv BPF_MISC No + Dv BPF_TXA 955.Sm on 956A <- X 957.El 958.El 959.Pp 960The 961.Nm 962interface provides the following macros to facilitate array initializers: 963.Bd -filled -offset indent 964.Dv BPF_STMT ( Ns Ar opcode , 965.Ar operand ) 966.Pp 967.Dv BPF_JUMP ( Ns Ar opcode , 968.Ar operand , 969.Ar true_offset , 970.Ar false_offset ) 971.Ed 972.Sh FILES 973.Bl -tag -width /dev/bpf[0-9] -compact 974.It Pa /dev/bpf[0-9] 975.Nm 976devices 977.El 978.Sh EXAMPLES 979The following filter is taken from the Reverse ARP daemon. 980It accepts only Reverse ARP requests. 981.Bd -literal -offset indent 982struct bpf_insn insns[] = { 983 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 984 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 985 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 986 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 987 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 988 sizeof(struct ether_header)), 989 BPF_STMT(BPF_RET+BPF_K, 0), 990}; 991.Ed 992.Pp 993This filter accepts only IP packets between host 128.3.112.15 and 994128.3.112.35. 995.Bd -literal -offset indent 996struct bpf_insn insns[] = { 997 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 998 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 999 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 1000 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 1001 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 1002 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 1003 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 1004 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 1005 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 1006 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1007 BPF_STMT(BPF_RET+BPF_K, 0), 1008}; 1009.Ed 1010.Pp 1011Finally, this filter returns only TCP finger packets. 1012We must parse the IP header to reach the TCP header. 1013The 1014.Dv BPF_JSET 1015instruction checks that the IP fragment offset is 0 so we are sure that we 1016have a TCP header. 1017.Bd -literal -offset indent 1018struct bpf_insn insns[] = { 1019 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 1020 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 1021 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 1022 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 1023 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 1024 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 1025 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 1026 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 1027 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 1028 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 1029 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 1030 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1031 BPF_STMT(BPF_RET+BPF_K, 0), 1032}; 1033.Ed 1034.Sh SEE ALSO 1035.Xr ioctl 2 , 1036.Xr read 2 , 1037.Xr select 2 , 1038.Xr signal 3 , 1039.Xr MAKEDEV 8 , 1040.Xr tcpdump 8 1041.Rs 1042.%A McCanne, S. 1043.%A Jacobson, V. 1044.%J "An efficient, extensible, and portable network monitor" 1045.Re 1046.Sh HISTORY 1047The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid 1048at Carnegie-Mellon University. 1049Jeffrey Mogul, at Stanford, ported the code to BSD and continued its 1050development from 1983 on. 1051Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS 1052NIT module under SunOS 4.1, and BPF. 1053.Sh AUTHORS 1054Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990. 1055Much of the design is due to Van Jacobson. 1056.Sh BUGS 1057The read buffer must be of a fixed size (returned by the 1058.Dv BIOCGBLEN 1059ioctl). 1060.Pp 1061A file that does not request promiscuous mode may receive promiscuously 1062received packets as a side effect of another file requesting this mode on 1063the same hardware interface. 1064This could be fixed in the kernel with additional processing overhead. 1065However, we favor the model where all files must assume that the interface 1066is promiscuous, and if so desired, must utilize a filter to reject foreign 1067packets. 1068.Pp 1069Data link protocols with variable length headers are not currently supported. 1070