xref: /netbsd-src/share/man/man4/bpf.4 (revision 60cc20743280a98c4baaf8b5093d3d46dbaad5b7)
1.\" -*- nroff -*-
2.\"
3.\"	$NetBSD: bpf.4,v 1.73 2023/02/11 18:03:25 uwe Exp $
4.\"
5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
6.\"	The Regents of the University of California.  All rights reserved.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that: (1) source code distributions
10.\" retain the above copyright notice and this paragraph in its entirety, (2)
11.\" distributions including binary code include the above copyright notice and
12.\" this paragraph in its entirety in the documentation or other materials
13.\" provided with the distribution, and (3) all advertising materials mentioning
14.\" features or use of this software display the following acknowledgement:
15.\" ``This product includes software developed by the University of California,
16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
17.\" the University nor the names of its contributors may be used to endorse
18.\" or promote products derived from this software without specific prior
19.\" written permission.
20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
23.\"
24.\" This document is derived in part from the enet man page (enet.4)
25.\" distributed with 4.3BSD Unix.
26.\"
27.Dd November 30, 2022
28.Dt BPF 4
29.Os
30.Sh NAME
31.Nm bpf
32.Nd Berkeley Packet Filter raw network interface
33.Sh SYNOPSIS
34.Cd "pseudo-device bpfilter"
35.Sh DESCRIPTION
36The Berkeley Packet Filter
37provides a raw interface to data link layers in a protocol
38independent fashion.
39All packets on the network, even those destined for other hosts,
40are accessible through this mechanism.
41.Pp
42The packet filter appears as a character special device,
43.Pa /dev/bpf .
44After opening the device, the file descriptor must be bound to a
45specific network interface with the
46.Dv BIOCSETIF
47ioctl.
48A given interface can be shared by multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51Associated with each open instance of a
52.Nm
53file is a user-settable packet filter.
54Whenever a packet is received by an interface,
55all file descriptors listening on that interface apply their filter.
56Each descriptor that accepts the packet receives its own copy.
57.Pp
58Reads from these files return the next group of packets
59that have matched the filter.
60To improve performance, the buffer passed to read must be
61the same size as the buffers used internally by
62.Nm .
63This size is returned by the
64.Dv BIOCGBLEN
65ioctl (see below), and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily
68truncated.
69.Pp
70Since packet data is in network byte order, applications should use the
71.Xr byteorder 3
72macros to extract multi-byte values.
73.Pp
74A packet can be sent out on the network by writing to a
75.Nm
76file descriptor.
77The writes are unbuffered, meaning only one packet can be processed per write.
78Currently, only writes to Ethernet-based (including Wi-Fi), SLIP and loopback
79links are supported.
80.Sh IOCTLS
81The
82.Xr ioctl 2
83command codes below are defined in
84.In net/bpf.h .
85All commands require these includes:
86.Bd -literal -offset indent
87#include <sys/types.h>
88#include <sys/time.h>
89#include <sys/ioctl.h>
90#include <net/bpf.h>
91.Ed
92.Pp
93Additionally,
94.Dv BIOCGETIF
95and
96.Dv BIOCSETIF
97require
98.Pa <net/if.h> .
99.Pp
100The (third) argument to the
101.Xr ioctl 2
102should be a pointer to the type indicated.
103.Bl -tag -width Dv
104.It Dv BIOCGBLEN Pq Vt u_int
105Returns the required buffer length for reads on
106.Nm
107files.
108.It Dv BIOCSBLEN Pq Vt u_int
109Sets the buffer length for reads on
110.Nm
111files.
112The buffer must be set before the file is attached to an interface with
113.Dv BIOCSETIF .
114If the requested buffer size cannot be accommodated, the closest
115allowable size will be set and returned in the argument.
116A read call will result in
117.Er EINVAL
118if it is passed a buffer that is not this size.
119.It Dv BIOCGDLT Pq Vt u_int
120Returns the type of the data link layer underlying the attached interface.
121.Er EINVAL
122is returned if no interface has been specified.
123The device types, prefixed with
124.Ql DLT_ ,
125are defined in
126.In net/bpf.h .
127.It Dv BIOCGDLTLIST Pq Vt struct bpf_dltlist
128Returns an array of the available types of the data link layer
129underlying the attached interface:
130.Bd -literal -offset indent
131struct bpf_dltlist {
132	u_int bfl_len;
133	u_int *bfl_list;
134};
135.Ed
136.Pp
137The available types are returned in the array pointed to by the
138.Fa bfl_list
139field while their length in
140.Vt u_int
141is supplied to the
142.Fa bfl_len
143field.
144.Er ENOMEM
145is returned if there is not enough buffer space and
146.Er EFAULT
147is returned if a bad address is encountered.
148The
149.Fa bfl_len
150field is modified on return to indicate the actual length in u_int
151of the array returned.
152If
153.Fa bfl_list
154is
155.Dv NULL ,
156the
157.Fa bfl_len
158field is set to indicate the required length of an array in
159.Vt u_int .
160.It Dv BIOCSDLT Pq Vt u_int
161Changes the type of the data link layer underlying the attached interface.
162.Er EINVAL
163is returned if no interface has been specified or the specified
164type is not available for the interface.
165.It Dv BIOCPROMISC
166Forces the interface into promiscuous mode.
167All packets, not just those destined for the local host, are processed.
168Since more than one file can be listening on a given interface,
169a listener that opened its interface non-promiscuously may receive
170packets promiscuously.
171This problem can be remedied with an appropriate filter.
172.Pp
173The interface remains in promiscuous mode until all files listening
174promiscuously are closed.
175.It Dv BIOCFLUSH
176Flushes the buffer of incoming packets,
177and resets the statistics that are returned by
178.Dv BIOCGSTATS .
179.It Dv BIOCGETIF Pq Vt struct ifreq
180Returns the name of the hardware interface that the file is listening on.
181The name is returned in the
182.Fa ifr_name
183field of
184.Vt ifreq .
185All other fields are undefined.
186.It Dv BIOCSETIF Pq Vt struct ifreq
187Sets the hardware interface associated with the file.
188This command must be performed before any packets can be read.
189The device is indicated by name using the
190.Fa ifr_name
191field of the
192.Vt ifreq .
193Additionally, performs the actions of
194.Dv BIOCFLUSH .
195.It Dv BIOCSRTIMEOUT , BIOCGRTIMEOUT Pq Vt struct timeval
196Sets or gets the
197.Dq Em read timeout
198parameter.
199The
200.Vt timeval
201specifies the length of time to wait before timing
202out on a read request.
203This parameter is initialized to zero by
204.Xr open 2 ,
205indicating no timeout.
206.It Dv BIOCGSTATS Pq Vt struct bpf_stat
207Returns the following structure of packet statistics:
208.Bd -literal -offset indent
209struct bpf_stat {
210	uint64_t bs_recv;
211	uint64_t bs_drop;
212	uint64_t bs_capt;
213	uint64_t bs_padding[13];
214};
215.Ed
216.Pp
217The fields are:
218.Bl -tag -width Fa -offset indent
219.It Fa bs_recv
220the number of packets received by the descriptor since opened or reset
221.Pq including any buffered since the last read call ;
222.It Fa bs_drop
223the number of packets which were accepted by the filter but dropped by the
224kernel because of buffer overflows
225.Po
226i.e., the application's reads aren't keeping up with the packet traffic
227.Pc ;
228and
229.It Fa bs_capt
230the number of packets accepted by the filter.
231.El
232.It Dv BIOCIMMEDIATE Pq Vt u_int
233Enables or disables
234.Dq Em immediate mode ,
235based on the truth value of the argument.
236When immediate mode is enabled, reads return immediately upon packet
237reception.
238Otherwise, a read will block until either the kernel buffer
239becomes full or a timeout occurs.
240This is useful for programs like
241.Xr rarpd 8 ,
242which must respond to messages in real time.
243The default for a new file is off.
244.It Dv BIOCLOCK Pq Dv NULL
245Set the locked flag on the bpf descriptor.
246This prevents the execution of ioctl commands which could change the
247underlying operating parameters of the device.
248.It Dv BIOCSETF Pq Vt struct bpf_program
249Sets the filter program used by the kernel to discard uninteresting
250packets.
251An array of instructions and its length are passed in using the following structure:
252.Bd -literal -offset indent
253struct bpf_program {
254	u_int bf_len;
255	struct bpf_insn *bf_insns;
256};
257.Ed
258.Pp
259The filter program is pointed to by the
260.Fa bf_insns
261field while its length in units of
262.Vt struct bpf_insn
263is given by the
264.Fa bf_len
265field.
266Also, the actions of
267.Dv BIOCFLUSH
268are performed.
269.Pp
270See section
271.Sx FILTER MACHINE
272for an explanation of the filter language.
273.It Dv BIOCSETWF Pq Vt struct bpf_program
274Sets the write filter program used by the kernel to control what type
275of packets can be written to the interface.
276See the
277.Dv BIOCSETF
278command for more information on the bpf filter program.
279.It Dv BIOCVERSION Pq Vt struct bpf_version
280Returns the major and minor version numbers of the filter language currently
281recognized by the kernel.
282Before installing a filter, applications must check
283that the current version is compatible with the running kernel.
284Version numbers are compatible if the major numbers match and the
285application minor is less than or equal to the kernel minor.
286The kernel version number is returned in the following structure:
287.Bd -literal -offset indent
288struct bpf_version {
289	u_short bv_major;
290	u_short bv_minor;
291};
292.Ed
293.Pp
294The current version numbers are given by
295.Dv BPF_MAJOR_VERSION
296and
297.Dv BPF_MINOR_VERSION
298from
299.In net/bpf.h .
300An incompatible filter
301may result in undefined behavior
302.Po
303most likely, an error returned by
304.Xr ioctl 2
305or haphazard packet matching
306.Pc .
307.It Dv BIOCSRSIG , BIOCGRSIG Pq Vt u_int
308Sets or gets the receive signal.
309This signal will be sent to the process or process group specified by
310.Dv FIOSETOWN .
311It defaults to
312.Dv SIGIO .
313.It Dv BIOCGHDRCMPLT , BIOCSHDRCMPLT Pq Vt u_int
314Sets or gets the status of the
315.Dq header complete
316flag.
317Set to zero if the link level source address should be filled in
318automatically by the interface output routine.
319Set to one if the link level source address will be written,
320as provided, to the wire.
321This flag is initialized to zero by default.
322.It Dv BIOCGSEESENT , BIOCSSEESENT Pq Vt u_int
323These commands are obsolete but left for compatibility.
324Use
325.Dv BIOCSDIRECTION
326and
327.Dv BIOCGDIRECTION
328instead.
329Set or get the flag determining whether locally generated packets on the
330interface should be returned by BPF.
331Set to zero to see only incoming packets on the interface.
332Set to one to see packets originating locally and remotely on the interface.
333This flag is initialized to one by default.
334.It Dv BIOCSDIRECTION , BIOCGDIRECTION Pq Vt u_int
335Set or get the setting determining whether incoming, outgoing, or all packets
336on the interface should be returned by BPF.
337Set to
338.Dv BPF_D_IN
339to see only incoming packets on the interface.
340Set to
341.Dv BPF_D_INOUT
342to see packets originating locally and remotely on the interface.
343Set to
344.Dv BPF_D_OUT
345to see only outgoing packets on the interface.
346This setting is initialized to
347.Dv BPF_D_INOUT
348by default.
349.It Dv BIOCFEEDBACK , BIOCSFEEDBACK , BIOCGFEEDBACK Pq Vt u_int
350Set (or get)
351.Dq packet feedback mode .
352This allows injected packets to be fed back as input to the interface when
353output via the interface is successful.
354The first name is meant for
355.Fx
356compatibility, the two others follow the Get/Set convention.
357.\"When
358.\".Dv BPF_D_INOUT
359.\"direction is set, injected
360Injected
361outgoing packets are not returned by BPF to avoid
362duplication.
363This flag is initialized to zero by default.
364.El
365.Sh STANDARD IOCTLS
366.Nm
367supports several standard
368.Xr ioctl 2 Ap s
369which allow the user to do async and/or non-blocking I/O to an open
370.Nm bpf
371file descriptor.
372.Bl -tag -width Dv
373.It Dv FIONREAD Pq Vt int
374Returns the number of bytes that are immediately available for reading.
375.It Dv FIONBIO Pq Vt int
376Set or clear non-blocking I/O.
377If arg is non-zero, then doing a
378.Xr read 2
379when no data is available will return \-1 and
380.Va errno
381will be set to
382.Er EAGAIN .
383If arg is zero, non-blocking I/O is disabled.
384Note: setting this
385overrides the timeout set by
386.Dv BIOCSRTIMEOUT .
387.It Dv FIOASYNC Pq Vt int
388Enable or disable async I/O.
389When enabled (arg is non-zero), the process or process group specified by
390.Dv FIOSETOWN
391will start receiving
392.Dv SIGIO Ap s
393when packets arrive.
394Note that you must do an
395.Dv FIOSETOWN
396in order for this to take effect, as
397the system will not default this for you.
398The signal may be changed via
399.Dv BIOCSRSIG .
400.It Dv FIOSETOWN , FIOGETOWN Pq Vt int
401Set or get the process or process group (if negative) that should receive
402.Dv SIGIO
403when packets are available.
404The signal may be changed using
405.Dv BIOCSRSIG
406(see above).
407.El
408.Sh BPF HEADER
409The following structure is prepended to each packet returned by
410.Xr read 2 :
411.Bd -literal -offset indent
412struct bpf_hdr {
413	struct bpf_timeval bh_tstamp;
414	uint32_t bh_caplen;
415	uint32_t bh_datalen;
416	uint16_t bh_hdrlen;
417};
418.Ed
419.Pp
420The fields, whose values are stored in host order, are:
421.Bl -tag -width Fa -offset indent
422.It Fa bh_tstamp
423The time at which the packet was processed by the packet filter.
424This structure differs from the standard
425.Vt struct timeval
426in that both members are of type
427.Vt long .
428.It Fa bh_caplen
429The length of the captured portion of the packet.
430This is the minimum of
431the truncation amount specified by the filter and the length of the packet.
432.It Fa bh_datalen
433The length of the packet off the wire.
434This value is independent of the truncation amount specified by the filter.
435.It Fa bh_hdrlen
436The length of the BPF header, which may not be equal to
437.Li sizeof(struct bpf_hdr) .
438.El
439.Pp
440The
441.Fa bh_hdrlen
442field exists to account for
443padding between the header and the link level protocol.
444The purpose here is to guarantee proper alignment of the packet
445data structures, which is required on alignment sensitive
446architectures and improves performance on many other architectures.
447The packet filter ensures that the
448.Vt bpf_hdr
449and the
450.Em network layer
451header will be word aligned.
452Suitable precautions must be taken when accessing the link layer
453protocol fields on alignment restricted machines.
454.Po
455This isn't a problem on an Ethernet, since
456the type field is a short falling on an even offset,
457and the addresses are probably accessed in a bytewise fashion
458.Pc .
459.Pp
460Additionally, individual packets are padded so that each starts
461on a word boundary.
462This requires that an application
463has some knowledge of how to get from packet to packet.
464The macro
465.Dv BPF_WORDALIGN
466is defined in
467.In net/bpf.h
468to facilitate this process.
469It rounds up its argument
470to the nearest word aligned value
471.Po
472where a word is
473.Dv BPF_ALIGNMENT
474bytes wide
475.Pc .
476.Pp
477For example, if
478.Va p
479points to the start of a packet, this expression
480will advance it to the next packet:
481.Pp
482.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
483.Pp
484For the alignment mechanisms to work properly, the
485buffer passed to
486.Xr read 2
487must itself be word aligned.
488.Xr malloc 3
489will always return an aligned buffer.
490.Sh FILTER MACHINE
491A filter program is an array of instructions, with all branches
492.Em forwardly directed ,
493terminated by a
494.Em return
495instruction.
496Each instruction performs some action on the pseudo-machine state,
497which consists of an accumulator, index register, scratch memory store,
498and implicit program counter.
499.Pp
500The following structure defines the instruction format:
501.Bd -literal -offset indent
502struct bpf_insn {
503	uint16_t code;
504	u_char 	jt;
505	u_char 	jf;
506	uint32_t k;
507};
508.Ed
509.Pp
510The
511.Fa k
512field is used in different ways by different instructions,
513and the
514.Fa jt
515and
516.Fa jf
517fields are used as offsets
518by the branch instructions.
519The opcodes are encoded in a semi-hierarchical fashion.
520There are eight classes of instructions:
521.Dv BPF_LD ,
522.Dv BPF_LDX ,
523.Dv BPF_ST ,
524.Dv BPF_STX ,
525.Dv BPF_ALU ,
526.Dv BPF_JMP ,
527.Dv BPF_RET ,
528and
529.Dv BPF_MISC .
530Various other mode and
531operator bits are
532.Em or Ap d
533into the class to give the actual instructions.
534The classes and modes are defined in
535.In net/bpf.h .
536.Pp
537Below are the semantics for each defined BPF instruction.
538We use the convention that
539.Ar A
540is the accumulator,
541.Ar X
542is the index register,
543.Ar P
544packet data, and
545.Ar M
546scratch memory store.
547.Sm off
548.Ar P Li \&[ Ar i Li \&: Ar n\^ Li \&]
549.Sm on
550gives the data at byte offset
551.Ar i
552in the packet,
553interpreted as a word
554.Ar ( n No = 4 ) ,
555unsigned halfword
556.Ar ( n No = 2 ) ,
557or unsigned byte
558.Ar ( n No = 1 ) .
559.Sm off
560.Ar M\^ Li \&[ Ar i\^ Li \&]
561.Sm on
562gives the
563.Ar i Ap th
564word in the scratch memory store, which is only
565addressed in word units.
566The memory store is indexed from 0 to
567.Dv BPF_MEMWORDS Ns Li \&-1 .
568.Fa k ,
569.Fa jt ,
570and
571.Fa jf
572are the corresponding fields in the
573instruction definition.
574.Ar len
575refers to the length of the packet.
576.Bl -tag -width indent
577.It Sy BPF_LD
578These instructions copy a value into the accumulator.
579The type of the source operand is specified by an
580.Dq addressing mode
581and can be a constant
582.Sy ( BPF_IMM ) ,
583packet data at a fixed offset
584.Sy ( BPF_ABS ) ,
585packet data at a variable offset
586.Sy ( BPF_IND ) ,
587the packet length
588.Sy ( BPF_LEN ) ,
589or a word in the scratch memory store
590.Sy ( BPF_MEM ) .
591For
592.Sy BPF_IND
593and
594.Sy BPF_ABS ,
595the data size must be specified as a word
596.Sy ( BPF_W ) ,
597halfword
598.Sy ( BPF_H ) ,
599or byte
600.Sy ( BPF_B ) .
601Arithmetic overflow when calculating a variable offset terminates
602the filter program and the packet is ignored.
603The semantics of all the recognized
604.Sy BPF_LD
605instructions follow.
606.\" to make all instruction tables align nicely, use common max width
607.ds max-insn .Sy BPF_LDX + BPF_W + BPF_WWW
608.\"
609.Bl -column "\*[max-insn]" -offset indent
610.It Sy BPF_LD + BPF_W + BPF_ABS Ta A \[<-] P[k:4]
611.It Sy BPF_LD + BPF_H + BPF_ABS Ta A \[<-] P[k:2]
612.It Sy BPF_LD + BPF_B + BPF_ABS Ta A \[<-] P[k:1]
613.It Sy BPF_LD + BPF_W + BPF_IND Ta A \[<-] P[X+k:4]
614.It Sy BPF_LD + BPF_H + BPF_IND Ta A \[<-] P[X+k:2]
615.It Sy BPF_LD + BPF_B + BPF_IND Ta A \[<-] P[X+k:1]
616.It Sy BPF_LD + BPF_W + BPF_LEN Ta A \[<-] len
617.It Sy BPF_LD + BPF_IMM Ta A \[<-] k
618.It Sy BPF_LD + BPF_MEM Ta A \[<-] M[k]
619.El
620.It Sy BPF_LDX
621These instructions load a value into the index register.
622Note that the addressing modes are more restricted than those of
623the accumulator loads, but they include
624.Sy BPF_MSH ,
625a hack for efficiently loading the IP header length.
626.Bl -column "\*[max-insn]" -offset indent
627.It Sy BPF_LDX + BPF_W + BPF_IMM Ta X \[<-] k
628.It Sy BPF_LDX + BPF_W + BPF_MEM Ta X \[<-] M[k]
629.It Sy BPF_LDX + BPF_W + BPF_LEN Ta X \[<-] len
630.It Sy BPF_LDX + BPF_B + BPF_MSH Ta X \[<-] 4*(P[k:1]&0xf)
631.El
632.It Sy BPF_ST
633This instruction stores the accumulator into the scratch memory.
634We do not need an addressing mode since there is only one possibility
635for the destination.
636.Bl -column "\*[max-insn]" -offset indent
637.It Sy BPF_ST Ta M[k] \[<-] A
638.El
639.It Sy BPF_STX
640This instruction stores the index register in the scratch memory store.
641.Bl -column "\*[max-insn]" -offset indent
642.It Sy BPF_STX Ta M[k] \[<-] X
643.El
644.It Sy BPF_ALU
645The alu instructions perform operations between the accumulator and
646index register or constant, and store the result back in the accumulator.
647For binary operations, a source mode is required
648.Sy ( BPF_K
649or
650.Sy BPF_X ) .
651.Bl -column "\*[max-insn]" -offset indent
652.It Sy BPF_ALU + BPF_ADD + BPF_K Ta A \[<-] A + k
653.It Sy BPF_ALU + BPF_SUB + BPF_K Ta A \[<-] A \- k
654.It Sy BPF_ALU + BPF_MUL + BPF_K Ta A \[<-] A * k
655.It Sy BPF_ALU + BPF_DIV + BPF_K Ta A \[<-] A / k
656.It Sy BPF_ALU + BPF_AND + BPF_K Ta A \[<-] A & k
657.It Sy BPF_ALU + BPF_OR + BPF_K Ta A \[<-] A | k
658.It Sy BPF_ALU + BPF_LSH + BPF_K Ta A \[<-] A \[<<] k
659.It Sy BPF_ALU + BPF_RSH + BPF_K Ta A \[<-] A \[>>] k
660.It Sy BPF_ALU + BPF_ADD + BPF_X Ta A \[<-] A + X
661.It Sy BPF_ALU + BPF_SUB + BPF_X Ta A \[<-] A \- X
662.It Sy BPF_ALU + BPF_MUL + BPF_X Ta A \[<-] A * X
663.It Sy BPF_ALU + BPF_DIV + BPF_X Ta A \[<-] A / X
664.It Sy BPF_ALU + BPF_AND + BPF_X Ta A \[<-] A & X
665.It Sy BPF_ALU + BPF_OR + BPF_X Ta A \[<-] A | X
666.It Sy BPF_ALU + BPF_LSH + BPF_X Ta A \[<-] A \[<<] X
667.It Sy BPF_ALU + BPF_RSH + BPF_X Ta A \[<-] A \[>>] X
668.It Sy BPF_ALU + BPF_NEG Ta A \[<-] \-A
669.El
670.It Sy BPF_JMP
671The jump instructions alter flow of control.
672Conditional jumps compare the accumulator against a constant
673.Sy ( BPF_K )
674or the index register
675.Sy ( BPF_X ) .
676If the result is true (or non-zero),
677the true branch is taken, otherwise the false branch is taken.
678Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
679However, the jump always
680.Sy ( BPF_JA )
681opcode uses the 32 bit
682.Fa k
683field as the offset, allowing arbitrarily distant destinations.
684All conditionals use unsigned comparison conventions.
685.Bl -column "\*[max-insn]" -offset indent
686.It Sy BPF_JMP + BPF_JA Ta pc += k
687.It Sy BPF_JMP + BPF_JGT + BPF_K Ta "pc += (A > k) ? jt : jf"
688.It Sy BPF_JMP + BPF_JGE + BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
689.It Sy BPF_JMP + BPF_JEQ + BPF_K Ta "pc += (A == k) ? jt : jf"
690.It Sy BPF_JMP + BPF_JSET + BPF_K Ta "pc += (A & k) ? jt : jf"
691.It Sy BPF_JMP + BPF_JGT + BPF_X Ta "pc += (A > X) ? jt : jf"
692.It Sy BPF_JMP + BPF_JGE + BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
693.It Sy BPF_JMP + BPF_JEQ + BPF_X Ta "pc += (A == X) ? jt : jf"
694.It Sy BPF_JMP + BPF_JSET + BPF_X Ta "pc += (A & X) ? jt : jf"
695.El
696.It Sy BPF_RET
697The return instructions terminate the filter program and specify the amount
698of packet to accept
699.Pq i.e., they return the truncation amount .
700A return value of zero indicates that the packet should be ignored.
701The return value is either a constant
702.Sy ( BPF_K )
703or the accumulator
704.Sy ( BPF_A ) .
705.Bl -column "\*[max-insn]" -offset indent
706.It Sy BPF_RET + BPF_A Ta accept A bytes
707.It Sy BPF_RET + BPF_K Ta accept k bytes
708.El
709.It Sy BPF_MISC
710The miscellaneous category was created for anything that doesn't
711fit into the above classes, and for any new instructions that might need to
712be added.
713Currently, these are the register transfer instructions
714that copy the index register to the accumulator or vice versa.
715.Bl -column "\*[max-insn]" -offset indent
716.It Sy BPF_MISC + BPF_TAX Ta X \[<-] A
717.It Sy BPF_MISC + BPF_TXA Ta A \[<-] X
718.El
719.Pp
720Also, two instructions to call a
721.Dq Em coprocessor
722if initialized by the kernel component.
723There is no coprocessor by default.
724.Bl -column "\*[max-insn]" -offset indent
725.It Sy BPF_MISC + BPF_COP Ta A \[<-] funcs[k](...)
726.It Sy BPF_MISC + BPF_COPX Ta A \[<-] funcs[X](...)
727.El
728.Pp
729If the coprocessor is not set or the function index is out of range, these
730instructions will abort the program and return zero.
731.El
732.Pp
733The BPF interface provides the following macros to facilitate
734array initializers:
735.Bd -unfilled -offset indent
736.Fn BPF_STMT opcode operand
737.Fn BPF_JUMP opcode operand true_offset false_offset
738.Ed
739.Sh SYSCTLS
740The following sysctls are available when
741.Nm
742is enabled:
743.Bl -tag -width ".Li net.bpf.maxbufsize"
744.It Li net.bpf.maxbufsize
745Sets the maximum buffer size available for
746.Nm
747peers.
748.It Li net.bpf.stats
749Shows
750.Nm
751statistics.
752They can be retrieved with the
753.Xr netstat 1
754utility.
755.It Li net.bpf.peers
756Shows the current
757.Nm
758peers.
759This is only available to the super user and can also be retrieved with the
760.Xr netstat 1
761utility.
762.El
763.Pp
764On architectures with
765.Xr bpfjit 4
766support, the additional sysctl is available:
767.Bl -tag -width ".Li net.bpf.jit"
768.It Li net.bpf.jit
769Toggle
770.Em just-in-time
771compilation of new filter programs.
772In order to enable just-in-time compilation,
773the bpfjit kernel module must be loaded.
774Changing a value of this sysctl doesn't affect
775existing filter programs.
776.El
777.Sh FILES
778.Pa /dev/bpf
779.Sh EXAMPLES
780The following filter is taken from the Reverse ARP Daemon.
781It accepts only Reverse ARP requests.
782.Bd -literal -offset indent
783struct bpf_insn insns[] = {
784	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
785	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
786	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
787	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
788	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
789	    sizeof(struct ether_header)),
790	BPF_STMT(BPF_RET+BPF_K, 0),
791};
792.Ed
793.Pp
794This filter accepts only IP packets between host 128.3.112.15 and
795128.3.112.35.
796.Bd -literal -offset indent
797struct bpf_insn insns[] = {
798	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
799	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
800	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
801	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
802	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
803	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
804	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
805	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
806	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
807	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
808	BPF_STMT(BPF_RET+BPF_K, 0),
809};
810.Ed
811.Pp
812Finally, this filter returns only TCP finger packets.
813We must parse the IP header to reach the TCP header.
814The
815.Sy BPF_JSET
816instruction checks that the IP fragment offset is 0 so we are sure
817that we have a TCP header.
818.Bd -literal -offset indent
819struct bpf_insn insns[] = {
820	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
821	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
822	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
823	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
824	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
825	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
826	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
827	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
828	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
829	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
830	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
831	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
832	BPF_STMT(BPF_RET+BPF_K, 0),
833};
834.Ed
835.Sh SEE ALSO
836.Xr ioctl 2 ,
837.Xr read 2 ,
838.Xr select 2 ,
839.Xr signal 3 ,
840.Xr bpfjit 4 ,
841.Xr tcpdump 8
842.Rs
843.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
844.%A S. McCanne
845.%A V. Jacobson
846.%J Proceedings of the 1993 Winter USENIX
847.%C Technical Conference, San Diego, CA
848.Re
849.Sh HISTORY
850The Enet packet filter was created in 1980 by Mike Accetta and
851Rick Rashid at Carnegie-Mellon University.
852Jeffrey Mogul, at Stanford, ported the code to BSD and continued
853its development from 1983 on.
854Since then, it has evolved into the ULTRIX Packet Filter
855at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
856.Sh AUTHORS
857.An -nosplit
858.An Steven McCanne ,
859of Lawrence Berkeley Laboratory, implemented BPF in Summer 1990.
860The design was in collaboration with
861.An Van Jacobson ,
862also of Lawrence Berkeley Laboratory.
863.Sh BUGS
864The read buffer must be of a fixed size
865.Po
866returned by the
867.Dv BIOCGBLEN
868ioctl
869.Pc .
870.Pp
871A file that does not request promiscuous mode may receive promiscuously
872received packets as a side effect of another file requesting this
873mode on the same hardware interface.
874This could be fixed in the kernel with additional processing overhead.
875However, we favor the model where
876all files must assume that the interface is promiscuous, and if
877so desired, must use a filter to reject foreign packets.
878.\" .Pp
879.\" Under SunOS, if a BPF application reads more than 2^31 bytes of
880.\" data, read will fail in
881.\" .Er EINVAL .
882.\" You can either fix the bug in SunOS,
883.\" or lseek to 0 when read fails for this reason.
884.Pp
885.Dq Em Immediate mode
886and the
887.Dq Em read timeout
888are misguided features.
889This functionality can be emulated with non-blocking mode and
890.Xr select 2 .
891