xref: /netbsd-src/share/man/man4/bpf.4 (revision 53b02e147d4ed531c0d2a5ca9b3e8026ba3e99b5)
1.\" -*- nroff -*-
2.\"
3.\"	$NetBSD: bpf.4,v 1.64 2021/10/24 17:46:06 gutteridge Exp $
4.\"
5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
6.\"	The Regents of the University of California.  All rights reserved.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that: (1) source code distributions
10.\" retain the above copyright notice and this paragraph in its entirety, (2)
11.\" distributions including binary code include the above copyright notice and
12.\" this paragraph in its entirety in the documentation or other materials
13.\" provided with the distribution, and (3) all advertising materials mentioning
14.\" features or use of this software display the following acknowledgement:
15.\" ``This product includes software developed by the University of California,
16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
17.\" the University nor the names of its contributors may be used to endorse
18.\" or promote products derived from this software without specific prior
19.\" written permission.
20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
23.\"
24.\" This document is derived in part from the enet man page (enet.4)
25.\" distributed with 4.3BSD Unix.
26.\"
27.Dd October 24, 2021
28.Dt BPF 4
29.Os
30.Sh NAME
31.Nm bpf
32.Nd Berkeley Packet Filter raw network interface
33.Sh SYNOPSIS
34.Cd "pseudo-device bpfilter"
35.Sh DESCRIPTION
36The Berkeley Packet Filter
37provides a raw interface to data link layers in a protocol
38independent fashion.
39All packets on the network, even those destined for other hosts,
40are accessible through this mechanism.
41.Pp
42The packet filter appears as a character special device,
43.Pa /dev/bpf .
44After opening the device, the file descriptor must be bound to a
45specific network interface with the
46.Dv BIOCSETIF
47ioctl.
48A given interface can be shared by multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51Associated with each open instance of a
52.Nm
53file is a user-settable packet filter.
54Whenever a packet is received by an interface,
55all file descriptors listening on that interface apply their filter.
56Each descriptor that accepts the packet receives its own copy.
57.Pp
58Reads from these files return the next group of packets
59that have matched the filter.
60To improve performance, the buffer passed to read must be
61the same size as the buffers used internally by
62.Nm .
63This size is returned by the
64.Dv BIOCGBLEN
65ioctl (see below), and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily
68truncated.
69.Pp
70Since packet data is in network byte order, applications should use the
71.Xr byteorder 3
72macros to extract multi-byte values.
73.Pp
74A packet can be sent out on the network by writing to a
75.Nm
76file descriptor.
77The writes are unbuffered, meaning only one packet can be processed per write.
78Currently, only writes to Ethernet-based (including Wi-Fi) and SLIP
79links are supported.
80.Sh IOCTLS
81The
82.Xr ioctl 2
83command codes below are defined in
84.In net/bpf.h .
85All commands require these includes:
86.Bd -literal -offset indent
87#include <sys/types.h>
88#include <sys/time.h>
89#include <sys/ioctl.h>
90#include <net/bpf.h>
91.Ed
92.Pp
93Additionally,
94.Dv BIOCGETIF
95and
96.Dv BIOCSETIF
97require
98.Pa <net/if.h> .
99.Pp
100The (third) argument to the
101.Xr ioctl 2
102should be a pointer to the type indicated.
103.Bl -tag -width indent -offset indent
104.It Dv BIOCGBLEN ( u_int )
105Returns the required buffer length for reads on
106.Nm
107files.
108.It Dv BIOCSBLEN ( u_int )
109Sets the buffer length for reads on
110.Nm
111files.
112The buffer must be set before the file is attached to an interface with
113.Dv BIOCSETIF .
114If the requested buffer size cannot be accommodated, the closest
115allowable size will be set and returned in the argument.
116A read call will result in
117.Er EINVAL
118if it is passed a buffer that is not this size.
119.It Dv BIOCGDLT ( u_int )
120Returns the type of the data link layer underlying the attached interface.
121.Er EINVAL
122is returned if no interface has been specified.
123The device types, prefixed with
124.Dq DLT_ ,
125are defined in
126.In net/bpf.h .
127.It Dv BIOCGDLTLIST ( struct bpf_dltlist )
128Returns an array of the available types of the data link layer
129underlying the attached interface:
130.Bd -literal -offset indent
131struct bpf_dltlist {
132	u_int bfl_len;
133	u_int *bfl_list;
134};
135.Ed
136.Pp
137The available types are returned in the array pointed to by the
138.Va bfl_list
139field while their length in u_int is supplied to the
140.Va bfl_len
141field.
142.Er ENOMEM
143is returned if there is not enough buffer space and
144.Er EFAULT
145is returned if a bad address is encountered.
146The
147.Va bfl_len
148field is modified on return to indicate the actual length in u_int
149of the array returned.
150If
151.Va bfl_list
152is
153.Dv NULL ,
154the
155.Va bfl_len
156field is set to indicate the required length of an array in u_int.
157.It Dv BIOCSDLT ( u_int )
158Changes the type of the data link layer underlying the attached interface.
159.Er EINVAL
160is returned if no interface has been specified or the specified
161type is not available for the interface.
162.It Dv BIOCPROMISC
163Forces the interface into promiscuous mode.
164All packets, not just those destined for the local host, are processed.
165Since more than one file can be listening on a given interface,
166a listener that opened its interface non-promiscuously may receive
167packets promiscuously.
168This problem can be remedied with an appropriate filter.
169.Pp
170The interface remains in promiscuous mode until all files listening
171promiscuously are closed.
172.It Dv BIOCFLUSH
173Flushes the buffer of incoming packets,
174and resets the statistics that are returned by
175.Dv BIOCGSTATS .
176.It Dv BIOCGETIF ( struct ifreq )
177Returns the name of the hardware interface that the file is listening on.
178The name is returned in the ifr_name field of
179.Fa ifr .
180All other fields are undefined.
181.It Dv BIOCSETIF ( struct ifreq )
182Sets the hardware interface associated with the file.
183This command must be performed before any packets can be read.
184The device is indicated by name using the
185.Dv ifr_name
186field of the
187.Fa ifreq .
188Additionally, performs the actions of
189.Dv BIOCFLUSH .
190.It Dv BIOCSRTIMEOUT , BIOCGRTIMEOUT ( struct timeval )
191Sets or gets the read timeout parameter.
192The
193.Fa timeval
194specifies the length of time to wait before timing
195out on a read request.
196This parameter is initialized to zero by
197.Xr open 2 ,
198indicating no timeout.
199.It Dv BIOCGSTATS ( struct bpf_stat )
200Returns the following structure of packet statistics:
201.Bd -literal -offset indent
202struct bpf_stat {
203	uint64_t bs_recv;
204	uint64_t bs_drop;
205	uint64_t bs_capt;
206	uint64_t bs_padding[13];
207};
208.Ed
209.Pp
210The fields are:
211.Bl -tag -width bs_recv -offset indent
212.It Va bs_recv
213the number of packets received by the descriptor since opened or reset
214(including any buffered since the last read call);
215.It Va bs_drop
216the number of packets which were accepted by the filter but dropped by the
217kernel because of buffer overflows
218(i.e., the application's reads aren't keeping up with the packet
219traffic); and
220.It Va bs_capt
221the number of packets accepted by the filter.
222.El
223.It Dv BIOCIMMEDIATE ( u_int )
224Enables or disables
225.Dq immediate mode ,
226based on the truth value of the argument.
227When immediate mode is enabled, reads return immediately upon packet
228reception.
229Otherwise, a read will block until either the kernel buffer
230becomes full or a timeout occurs.
231This is useful for programs like
232.Xr rarpd 8 ,
233which must respond to messages in real time.
234The default for a new file is off.
235.Dv BIOCLOCK
236Set the locked flag on the bpf descriptor.
237This prevents the execution of ioctl commands which could change the
238underlying operating parameters of the device.
239.It Dv BIOCSETF ( struct bpf_program )
240Sets the filter program used by the kernel to discard uninteresting
241packets.
242An array of instructions and its length are passed in using the following structure:
243.Bd -literal -offset indent
244struct bpf_program {
245	u_int bf_len;
246	struct bpf_insn *bf_insns;
247};
248.Ed
249.Pp
250The filter program is pointed to by the
251.Va bf_insns
252field while its length in units of
253.Sq struct bpf_insn
254is given by the
255.Va bf_len
256field.
257Also, the actions of
258.Dv BIOCFLUSH
259are performed.
260.Pp
261See section
262.Sy FILTER MACHINE
263for an explanation of the filter language.
264.It Dv BIOCSETWF ( struct bpf_program )
265Sets the write filter program used by the kernel to control what type
266of packets can be written to the interface.
267See the
268.Dv BIOCSETF
269command for more information on the bpf filter program.
270.It Dv BIOCVERSION ( struct bpf_version )
271Returns the major and minor version numbers of the filter language currently
272recognized by the kernel.
273Before installing a filter, applications must check
274that the current version is compatible with the running kernel.
275Version numbers are compatible if the major numbers match and the
276application minor is less than or equal to the kernel minor.
277The kernel version number is returned in the following structure:
278.Bd -literal -offset indent
279struct bpf_version {
280	u_short bv_major;
281	u_short bv_minor;
282};
283.Ed
284.Pp
285The current version numbers are given by
286.Dv BPF_MAJOR_VERSION
287and
288.Dv BPF_MINOR_VERSION
289from
290.In net/bpf.h .
291An incompatible filter
292may result in undefined behavior (most likely, an error returned by
293.Xr ioctl 2
294or haphazard packet matching).
295.It Dv BIOCSRSIG , BIOCGRSIG ( u_int )
296Sets or gets the receive signal.
297This signal will be sent to the process or process group specified by
298.Dv FIOSETOWN .
299It defaults to
300.Dv SIGIO .
301.It Dv BIOCGHDRCMPLT , BIOCSHDRCMPLT ( u_int )
302Sets or gets the status of the
303.Dq header complete
304flag.
305Set to zero if the link level source address should be filled in
306automatically by the interface output routine.
307Set to one if the link level source address will be written,
308as provided, to the wire.
309This flag is initialized to zero by default.
310.It Dv BIOCGSEESENT , BIOCSSEESENT ( u_int )
311These commands are obsolete but left for compatibility.
312Use
313.Dv BIOCSDIRECTION
314and
315.Dv BIOCGDIRECTION
316instead.
317Set or get the flag determining whether locally generated packets on the
318interface should be returned by BPF.
319Set to zero to see only incoming packets on the interface.
320Set to one to see packets originating locally and remotely on the interface.
321This flag is initialized to one by default.
322.It Dv BIOCSDIRECTION
323.It Dv BIOCGDIRECTION
324.Pq Li u_int
325Set or get the setting determining whether incoming, outgoing, or all packets
326on the interface should be returned by BPF.
327Set to
328.Dv BPF_D_IN
329to see only incoming packets on the interface.
330Set to
331.Dv BPF_D_INOUT
332to see packets originating locally and remotely on the interface.
333Set to
334.Dv BPF_D_OUT
335to see only outgoing packets on the interface.
336This setting is initialized to
337.Dv BPF_D_INOUT
338by default.
339.It Dv BIOCFEEDBACK , BIOCSFEEDBACK , BIOCGFEEDBACK ( u_int )
340Set (or get)
341.Dq packet feedback mode .
342This allows injected packets to be fed back as input to the interface when
343output via the interface is successful.
344The first name is meant for
345.Fx
346compatibility, the two others follow the Get/Set convention.
347.\"When
348.\".Dv BPF_D_INOUT
349.\"direction is set, injected
350Injected
351outgoing packets are not returned by BPF to avoid
352duplication.
353This flag is initialized to zero by default.
354.El
355.Sh STANDARD IOCTLS
356.Nm
357now supports several standard
358.Xr ioctl 2 Ns 's
359which allow the user to do async and/or non-blocking I/O to an open
360.Nm bpf
361file descriptor.
362.Bl -tag -width indent -offset indent
363.It Dv FIONREAD ( int )
364Returns the number of bytes that are immediately available for reading.
365.It Dv FIONBIO ( int )
366Set or clear non-blocking I/O.
367If arg is non-zero, then doing a
368.Xr read 2
369when no data is available will return -1 and
370.Va errno
371will be set to
372.Er EAGAIN .
373If arg is zero, non-blocking I/O is disabled.
374Note: setting this
375overrides the timeout set by
376.Dv BIOCSRTIMEOUT .
377.It Dv FIOASYNC ( int )
378Enable or disable async I/O.
379When enabled (arg is non-zero), the process or process group specified by
380.Dv FIOSETOWN
381will start receiving SIGIO's when packets
382arrive.
383Note that you must do an
384.Dv FIOSETOWN
385in order for this to take effect, as
386the system will not default this for you.
387The signal may be changed via
388.Dv BIOCSRSIG .
389.It Dv FIOSETOWN , FIOGETOWN ( int )
390Set or get the process or process group (if negative) that should receive SIGIO
391when packets are available.
392The signal may be changed using
393.Dv BIOCSRSIG
394(see above).
395.El
396.Sh BPF HEADER
397The following structure is prepended to each packet returned by
398.Xr read 2 :
399.Bd -literal -offset indent
400struct bpf_hdr {
401	struct bpf_timeval bh_tstamp;
402	uint32_t bh_caplen;
403	uint32_t bh_datalen;
404	uint16_t bh_hdrlen;
405};
406.Ed
407.Pp
408The fields, whose values are stored in host order, are:
409.Bl -tag -width bh_datalen -offset indent
410.It Va bh_tstamp
411The time at which the packet was processed by the packet filter.
412This structure differs from the standard
413.Vt struct timeval
414in that both members are of type
415.Vt long .
416.It Va bh_caplen
417The length of the captured portion of the packet.
418This is the minimum of
419the truncation amount specified by the filter and the length of the packet.
420.It Va bh_datalen
421The length of the packet off the wire.
422This value is independent of the truncation amount specified by the filter.
423.It Va bh_hdrlen
424The length of the BPF header, which may not be equal to
425.Em sizeof(struct bpf_hdr) .
426.El
427.Pp
428The
429.Va bh_hdrlen
430field exists to account for
431padding between the header and the link level protocol.
432The purpose here is to guarantee proper alignment of the packet
433data structures, which is required on alignment sensitive
434architectures and improves performance on many other architectures.
435The packet filter ensures that the
436.Va bpf_hdr
437and the
438.Em network layer
439header will be word aligned.
440Suitable precautions must be taken when accessing the link layer
441protocol fields on alignment restricted machines.
442(This isn't a problem on an Ethernet, since
443the type field is a short falling on an even offset,
444and the addresses are probably accessed in a bytewise fashion).
445.Pp
446Additionally, individual packets are padded so that each starts
447on a word boundary.
448This requires that an application
449has some knowledge of how to get from packet to packet.
450The macro
451.Dv BPF_WORDALIGN
452is defined in
453.In net/bpf.h
454to facilitate this process.
455It rounds up its argument
456to the nearest word aligned value (where a word is
457.Dv BPF_ALIGNMENT
458bytes wide).
459.Pp
460For example, if
461.Sq Va p
462points to the start of a packet, this expression
463will advance it to the next packet:
464.Pp
465.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
466.Pp
467For the alignment mechanisms to work properly, the
468buffer passed to
469.Xr read 2
470must itself be word aligned.
471.Xr malloc 3
472will always return an aligned buffer.
473.Sh FILTER MACHINE
474A filter program is an array of instructions, with all branches forwardly
475directed, terminated by a
476.Sy return
477instruction.
478Each instruction performs some action on the pseudo-machine state,
479which consists of an accumulator, index register, scratch memory store,
480and implicit program counter.
481.Pp
482The following structure defines the instruction format:
483.Bd -literal -offset indent
484struct bpf_insn {
485	uint16_t code;
486	u_char 	jt;
487	u_char 	jf;
488	uint32_t k;
489};
490.Ed
491.Pp
492The
493.Va k
494field is used in different ways by different instructions,
495and the
496.Va jt
497and
498.Va jf
499fields are used as offsets
500by the branch instructions.
501The opcodes are encoded in a semi-hierarchical fashion.
502There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
503BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC.
504Various other mode and
505operator bits are or'd into the class to give the actual instructions.
506The classes and modes are defined in
507.In net/bpf.h .
508.Pp
509Below are the semantics for each defined BPF instruction.
510We use the convention that A is the accumulator, X is the index register,
511P[] packet data, and M[] scratch memory store.
512P[i:n] gives the data at byte offset
513.Dq i
514in the packet,
515interpreted as a word (n=4),
516unsigned halfword (n=2), or unsigned byte (n=1).
517M[i] gives the i'th word in the scratch memory store, which is only
518addressed in word units.
519The memory store is indexed from 0 to BPF_MEMWORDS-1.
520.Va k ,
521.Va jt ,
522and
523.Va jf
524are the corresponding fields in the
525instruction definition.
526.Dq len
527refers to the length of the packet.
528.Bl -tag -width indent -offset indent
529.It Sy BPF_LD
530These instructions copy a value into the accumulator.
531The type of the source operand is specified by an
532.Dq addressing mode
533and can be a constant
534.Sy ( BPF_IMM ) ,
535packet data at a fixed offset
536.Sy ( BPF_ABS ) ,
537packet data at a variable offset
538.Sy ( BPF_IND ) ,
539the packet length
540.Sy ( BPF_LEN ) ,
541or a word in the scratch memory store
542.Sy ( BPF_MEM ) .
543For
544.Sy BPF_IND
545and
546.Sy BPF_ABS ,
547the data size must be specified as a word
548.Sy ( BPF_W ) ,
549halfword
550.Sy ( BPF_H ) ,
551or byte
552.Sy ( BPF_B ) .
553Arithmetic overflow when calculating a variable offset terminates
554the filter program and the packet is ignored.
555The semantics of all the recognized BPF_LD instructions follow.
556.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A <- P[k:4]" -offset indent
557.It Sy BPF_LD+BPF_W+BPF_ABS Ta A <- P[k:4]
558.It Sy BPF_LD+BPF_H+BPF_ABS Ta A <- P[k:2]
559.It Sy BPF_LD+BPF_B+BPF_ABS Ta A <- P[k:1]
560.It Sy BPF_LD+BPF_W+BPF_IND Ta A <- P[X+k:4]
561.It Sy BPF_LD+BPF_H+BPF_IND Ta A <- P[X+k:2]
562.It Sy BPF_LD+BPF_B+BPF_IND Ta A <- P[X+k:1]
563.It Sy BPF_LD+BPF_W+BPF_LEN Ta A <- len
564.It Sy BPF_LD+BPF_IMM Ta A <- k
565.It Sy BPF_LD+BPF_MEM Ta A <- M[k]
566.El
567.It Sy BPF_LDX
568These instructions load a value into the index register.
569Note that the addressing modes are more restricted than those of
570the accumulator loads, but they include
571.Sy BPF_MSH ,
572a hack for efficiently loading the IP header length.
573.Bl -column "BPF_LDX_BPF_W_BPF_MEM" "X <- k" -offset indent
574.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X <- k
575.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X <- M[k]
576.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X <- len
577.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X <- 4*(P[k:1]&0xf)
578.El
579.It Sy BPF_ST
580This instruction stores the accumulator into the scratch memory.
581We do not need an addressing mode since there is only one possibility
582for the destination.
583.Bl -column "BPF_ST" "M[k] <- A" -offset indent
584.It Sy BPF_ST Ta M[k] <- A
585.El
586.It Sy BPF_STX
587This instruction stores the index register in the scratch memory store.
588.Bl -column "BPF_STX" "M[k] <- X" -offset indent
589.It Sy BPF_STX Ta M[k] <- X
590.El
591.It Sy BPF_ALU
592The alu instructions perform operations between the accumulator and
593index register or constant, and store the result back in the accumulator.
594For binary operations, a source mode is required
595.Sy ( BPF_K
596or
597.Sy BPF_X ) .
598.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A <- A + k" -offset indent
599.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A <- A + k
600.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A <- A - k
601.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A <- A * k
602.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A <- A / k
603.It Sy BPF_ALU+BPF_AND+BPF_K Ta A <- A & k
604.It Sy BPF_ALU+BPF_OR+BPF_K Ta A <- A | k
605.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A <- A << k
606.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A <- A >> k
607.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A <- A + X
608.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A <- A - X
609.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A <- A * X
610.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A <- A / X
611.It Sy BPF_ALU+BPF_AND+BPF_X Ta A <- A & X
612.It Sy BPF_ALU+BPF_OR+BPF_X Ta A <- A | X
613.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A <- A << X
614.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A <- A >> X
615.It Sy BPF_ALU+BPF_NEG Ta A <- -A
616.El
617.It Sy BPF_JMP
618The jump instructions alter flow of control.
619Conditional jumps compare the accumulator against a constant
620.Sy ( BPF_K )
621or the index register
622.Sy ( BPF_X ) .
623If the result is true (or non-zero),
624the true branch is taken, otherwise the false branch is taken.
625Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
626However, the jump always
627.Sy ( BPF_JA )
628opcode uses the 32 bit
629.Va k
630field as the offset, allowing arbitrarily distant destinations.
631All conditionals use unsigned comparison conventions.
632.Bl -column "BPF_JMP+BPF_JSET+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent
633.It Sy BPF_JMP+BPF_JA Ta pc += k
634.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A > k) ? jt : jf"
635.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
636.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
637.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A & k) ? jt : jf"
638.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A > X) ? jt : jf"
639.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
640.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
641.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A & X) ? jt : jf"
642.El
643.It Sy BPF_RET
644The return instructions terminate the filter program and specify the amount
645of packet to accept (i.e., they return the truncation amount).
646A return value of zero indicates that the packet should be ignored.
647The return value is either a constant
648.Sy ( BPF_K )
649or the accumulator
650.Sy ( BPF_A ) .
651.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent
652.It Sy BPF_RET+BPF_A Ta accept A bytes
653.It Sy BPF_RET+BPF_K Ta accept k bytes
654.El
655.It Sy BPF_MISC
656The miscellaneous category was created for anything that doesn't
657fit into the above classes, and for any new instructions that might need to
658be added.
659Currently, these are the register transfer instructions
660that copy the index register to the accumulator or vice versa.
661.Bl -column "BPF_MISC+BPF_TAX" "X <- A" -offset indent
662.It Sy BPF_MISC+BPF_TAX Ta X <- A
663.It Sy BPF_MISC+BPF_TXA Ta A <- X
664.El
665.Pp
666Also, two instructions to call a "coprocessor" if initialized by the kernel
667component.
668There is no coprocessor by default.
669.Bl -column "BPF_MISC+BPF_COPX" "A <- funcs[X](...)" -offset indent
670.It Sy BPF_MISC+BPF_COP Ta A <- funcs[k](..)
671.It Sy BPF_MISC+BPF_COPX Ta A <- funcs[X](..)
672.El
673.Pp
674If the coprocessor is not set or the function index is out of range, these
675instructions will abort the program and return zero.
676.El
677.Pp
678The BPF interface provides the following macros to facilitate
679array initializers:
680.Bd -unfilled -offset indent
681.Fn BPF_STMT opcode operand
682.Fn BPF_JUMP opcode operand true_offset false_offset
683.Ed
684.Sh SYSCTLS
685The following sysctls are available when
686.Nm
687is enabled:
688.Bl -tag -width "XnetXbpfXmaxbufsizeXX"
689.It Li net.bpf.maxbufsize
690Sets the maximum buffer size available for
691.Nm
692peers.
693.It Li net.bpf.stats
694Shows
695.Nm
696statistics.
697They can be retrieved with the
698.Xr netstat 1
699utility.
700.It Li net.bpf.peers
701Shows the current
702.Nm
703peers.
704This is only available to the super user and can also be retrieved with the
705.Xr netstat 1
706utility.
707.El
708.Pp
709On architectures with
710.Xr bpfjit 4
711support, the additional sysctl is available:
712.Bl -tag -width "XnetXbpfXjitXX"
713.It Li net.bpf.jit
714Toggle
715.Sy Just-In-Time
716compilation of new filter programs.
717In order to enable Just-In-Time compilation,
718the bpfjit kernel module must be loaded.
719Changing a value of this sysctl doesn't affect
720existing filter programs.
721.El
722.Sh FILES
723.Pa /dev/bpf
724.Sh EXAMPLES
725The following filter is taken from the Reverse ARP Daemon.
726It accepts only Reverse ARP requests.
727.Bd -literal -offset indent
728struct bpf_insn insns[] = {
729	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
730	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
731	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
732	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
733	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
734	    sizeof(struct ether_header)),
735	BPF_STMT(BPF_RET+BPF_K, 0),
736};
737.Ed
738.Pp
739This filter accepts only IP packets between host 128.3.112.15 and
740128.3.112.35.
741.Bd -literal -offset indent
742struct bpf_insn insns[] = {
743	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
744	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
745	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
746	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
747	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
748	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
749	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
750	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
751	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
752	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
753	BPF_STMT(BPF_RET+BPF_K, 0),
754};
755.Ed
756.Pp
757Finally, this filter returns only TCP finger packets.
758We must parse the IP header to reach the TCP header.
759The
760.Sy BPF_JSET
761instruction checks that the IP fragment offset is 0 so we are sure
762that we have a TCP header.
763.Bd -literal -offset indent
764struct bpf_insn insns[] = {
765	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
766	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
767	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
768	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
769	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
770	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
771	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
772	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
773	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
774	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
775	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
776	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
777	BPF_STMT(BPF_RET+BPF_K, 0),
778};
779.Ed
780.Sh SEE ALSO
781.Xr ioctl 2 ,
782.Xr read 2 ,
783.Xr select 2 ,
784.Xr signal 3 ,
785.Xr bpfjit 4 ,
786.Xr tcpdump 8
787.Rs
788.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
789.%A S. McCanne
790.%A V. Jacobson
791.%J Proceedings of the 1993 Winter USENIX
792.%C Technical Conference, San Diego, CA
793.Re
794.Sh HISTORY
795The Enet packet filter was created in 1980 by Mike Accetta and
796Rick Rashid at Carnegie-Mellon University.
797Jeffrey Mogul, at Stanford, ported the code to BSD and continued
798its development from 1983 on.
799Since then, it has evolved into the ULTRIX Packet Filter
800at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
801.Sh AUTHORS
802.An -nosplit
803.An Steven McCanne ,
804of Lawrence Berkeley Laboratory, implemented BPF in Summer 1990.
805The design was in collaboration with
806.An Van Jacobson ,
807also of Lawrence Berkeley Laboratory.
808.Sh BUGS
809The read buffer must be of a fixed size (returned by the
810.Dv BIOCGBLEN
811ioctl).
812.Pp
813A file that does not request promiscuous mode may receive promiscuously
814received packets as a side effect of another file requesting this
815mode on the same hardware interface.
816This could be fixed in the kernel with additional processing overhead.
817However, we favor the model where
818all files must assume that the interface is promiscuous, and if
819so desired, must use a filter to reject foreign packets.
820.Pp
821Under SunOS, if a BPF application reads more than 2^31 bytes of
822data, read will fail in
823.Er EINVAL .
824You can either fix the bug in SunOS,
825or lseek to 0 when read fails for this reason.
826.Pp
827.Dq Immediate mode
828and the
829.Dq read timeout
830are misguided features.
831This functionality can be emulated with non-blocking mode and
832.Xr select 2 .
833