xref: /netbsd-src/share/man/man4/bpf.4 (revision 627f7eb200a4419d89b531d55fccd2ee3ffdcde0)
1.\" -*- nroff -*-
2.\"
3.\"	$NetBSD: bpf.4,v 1.63 2020/06/12 20:58:43 wiz Exp $
4.\"
5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
6.\"	The Regents of the University of California.  All rights reserved.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that: (1) source code distributions
10.\" retain the above copyright notice and this paragraph in its entirety, (2)
11.\" distributions including binary code include the above copyright notice and
12.\" this paragraph in its entirety in the documentation or other materials
13.\" provided with the distribution, and (3) all advertising materials mentioning
14.\" features or use of this software display the following acknowledgement:
15.\" ``This product includes software developed by the University of California,
16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
17.\" the University nor the names of its contributors may be used to endorse
18.\" or promote products derived from this software without specific prior
19.\" written permission.
20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
23.\"
24.\" This document is derived in part from the enet man page (enet.4)
25.\" distributed with 4.3BSD Unix.
26.\"
27.Dd June 11, 2020
28.Dt BPF 4
29.Os
30.Sh NAME
31.Nm bpf
32.Nd Berkeley Packet Filter raw network interface
33.Sh SYNOPSIS
34.Cd "pseudo-device bpfilter"
35.Sh DESCRIPTION
36The Berkeley Packet Filter
37provides a raw interface to data link layers in a protocol
38independent fashion.
39All packets on the network, even those destined for other hosts,
40are accessible through this mechanism.
41.Pp
42The packet filter appears as a character special device,
43.Pa /dev/bpf .
44After opening the device, the file descriptor must be bound to a
45specific network interface with the
46.Dv BIOCSETIF
47ioctl.
48A given interface can be shared by multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51Associated with each open instance of a
52.Nm
53file is a user-settable packet filter.
54Whenever a packet is received by an interface,
55all file descriptors listening on that interface apply their filter.
56Each descriptor that accepts the packet receives its own copy.
57.Pp
58Reads from these files return the next group of packets
59that have matched the filter.
60To improve performance, the buffer passed to read must be
61the same size as the buffers used internally by
62.Nm .
63This size is returned by the
64.Dv BIOCGBLEN
65ioctl (see below), and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily
68truncated.
69.Pp
70Since packet data is in network byte order, applications should use the
71.Xr byteorder 3
72macros to extract multi-byte values.
73.Pp
74A packet can be sent out on the network by writing to a
75.Nm
76file descriptor.
77The writes are unbuffered, meaning only one packet can be processed per write.
78Currently, only writes to Ethernets and SLIP links are supported.
79.Sh IOCTLS
80The
81.Xr ioctl 2
82command codes below are defined in
83.In net/bpf.h .
84All commands require these includes:
85.Bd -literal -offset indent
86#include <sys/types.h>
87#include <sys/time.h>
88#include <sys/ioctl.h>
89#include <net/bpf.h>
90.Ed
91.Pp
92Additionally,
93.Dv BIOCGETIF
94and
95.Dv BIOCSETIF
96require
97.Pa <net/if.h> .
98.Pp
99The (third) argument to the
100.Xr ioctl 2
101should be a pointer to the type indicated.
102.Bl -tag -width indent -offset indent
103.It Dv BIOCGBLEN ( u_int )
104Returns the required buffer length for reads on
105.Nm
106files.
107.It Dv BIOCSBLEN ( u_int )
108Sets the buffer length for reads on
109.Nm
110files.
111The buffer must be set before the file is attached to an interface with
112.Dv BIOCSETIF .
113If the requested buffer size cannot be accommodated, the closest
114allowable size will be set and returned in the argument.
115A read call will result in
116.Er EINVAL
117if it is passed a buffer that is not this size.
118.It Dv BIOCGDLT ( u_int )
119Returns the type of the data link layer underlying the attached interface.
120.Er EINVAL
121is returned if no interface has been specified.
122The device types, prefixed with
123.Dq DLT_ ,
124are defined in
125.In net/bpf.h .
126.It Dv BIOCGDLTLIST ( struct bpf_dltlist )
127Returns an array of the available types of the data link layer
128underlying the attached interface:
129.Bd -literal -offset indent
130struct bpf_dltlist {
131	u_int bfl_len;
132	u_int *bfl_list;
133};
134.Ed
135.Pp
136The available types are returned in the array pointed to by the
137.Va bfl_list
138field while their length in u_int is supplied to the
139.Va bfl_len
140field.
141.Er ENOMEM
142is returned if there is not enough buffer space and
143.Er EFAULT
144is returned if a bad address is encountered.
145The
146.Va bfl_len
147field is modified on return to indicate the actual length in u_int
148of the array returned.
149If
150.Va bfl_list
151is
152.Dv NULL ,
153the
154.Va bfl_len
155field is set to indicate the required length of an array in u_int.
156.It Dv BIOCSDLT ( u_int )
157Changes the type of the data link layer underlying the attached interface.
158.Er EINVAL
159is returned if no interface has been specified or the specified
160type is not available for the interface.
161.It Dv BIOCPROMISC
162Forces the interface into promiscuous mode.
163All packets, not just those destined for the local host, are processed.
164Since more than one file can be listening on a given interface,
165a listener that opened its interface non-promiscuously may receive
166packets promiscuously.
167This problem can be remedied with an appropriate filter.
168.Pp
169The interface remains in promiscuous mode until all files listening
170promiscuously are closed.
171.It Dv BIOCFLUSH
172Flushes the buffer of incoming packets,
173and resets the statistics that are returned by
174.Dv BIOCGSTATS .
175.It Dv BIOCGETIF ( struct ifreq )
176Returns the name of the hardware interface that the file is listening on.
177The name is returned in the ifr_name field of
178.Fa ifr .
179All other fields are undefined.
180.It Dv BIOCSETIF ( struct ifreq )
181Sets the hardware interface associated with the file.
182This command must be performed before any packets can be read.
183The device is indicated by name using the
184.Dv ifr_name
185field of the
186.Fa ifreq .
187Additionally, performs the actions of
188.Dv BIOCFLUSH .
189.It Dv BIOCSRTIMEOUT , BIOCGRTIMEOUT ( struct timeval )
190Sets or gets the read timeout parameter.
191The
192.Fa timeval
193specifies the length of time to wait before timing
194out on a read request.
195This parameter is initialized to zero by
196.Xr open 2 ,
197indicating no timeout.
198.It Dv BIOCGSTATS ( struct bpf_stat )
199Returns the following structure of packet statistics:
200.Bd -literal -offset indent
201struct bpf_stat {
202	uint64_t bs_recv;
203	uint64_t bs_drop;
204	uint64_t bs_capt;
205	uint64_t bs_padding[13];
206};
207.Ed
208.Pp
209The fields are:
210.Bl -tag -width bs_recv -offset indent
211.It Va bs_recv
212the number of packets received by the descriptor since opened or reset
213(including any buffered since the last read call);
214.It Va bs_drop
215the number of packets which were accepted by the filter but dropped by the
216kernel because of buffer overflows
217(i.e., the application's reads aren't keeping up with the packet
218traffic); and
219.It Va bs_capt
220the number of packets accepted by the filter.
221.El
222.It Dv BIOCIMMEDIATE ( u_int )
223Enables or disables
224.Dq immediate mode ,
225based on the truth value of the argument.
226When immediate mode is enabled, reads return immediately upon packet
227reception.
228Otherwise, a read will block until either the kernel buffer
229becomes full or a timeout occurs.
230This is useful for programs like
231.Xr rarpd 8 ,
232which must respond to messages in real time.
233The default for a new file is off.
234.Dv BIOCLOCK
235Set the locked flag on the bpf descriptor.
236This prevents the execution of ioctl commands which could change the
237underlying operating parameters of the device.
238.It Dv BIOCSETF ( struct bpf_program )
239Sets the filter program used by the kernel to discard uninteresting
240packets.
241An array of instructions and its length are passed in using the following structure:
242.Bd -literal -offset indent
243struct bpf_program {
244	u_int bf_len;
245	struct bpf_insn *bf_insns;
246};
247.Ed
248.Pp
249The filter program is pointed to by the
250.Va bf_insns
251field while its length in units of
252.Sq struct bpf_insn
253is given by the
254.Va bf_len
255field.
256Also, the actions of
257.Dv BIOCFLUSH
258are performed.
259.Pp
260See section
261.Sy FILTER MACHINE
262for an explanation of the filter language.
263.It Dv BIOCSETWF ( struct bpf_program )
264Sets the write filter program used by the kernel to control what type
265of packets can be written to the interface.
266See the
267.Dv BIOCSETF
268command for more information on the bpf filter program.
269.It Dv BIOCVERSION ( struct bpf_version )
270Returns the major and minor version numbers of the filter language currently
271recognized by the kernel.
272Before installing a filter, applications must check
273that the current version is compatible with the running kernel.
274Version numbers are compatible if the major numbers match and the
275application minor is less than or equal to the kernel minor.
276The kernel version number is returned in the following structure:
277.Bd -literal -offset indent
278struct bpf_version {
279	u_short bv_major;
280	u_short bv_minor;
281};
282.Ed
283.Pp
284The current version numbers are given by
285.Dv BPF_MAJOR_VERSION
286and
287.Dv BPF_MINOR_VERSION
288from
289.In net/bpf.h .
290An incompatible filter
291may result in undefined behavior (most likely, an error returned by
292.Xr ioctl 2
293or haphazard packet matching).
294.It Dv BIOCSRSIG , BIOCGRSIG ( u_int )
295Sets or gets the receive signal.
296This signal will be sent to the process or process group specified by
297.Dv FIOSETOWN .
298It defaults to
299.Dv SIGIO .
300.It Dv BIOCGHDRCMPLT , BIOCSHDRCMPLT ( u_int )
301Sets or gets the status of the
302.Dq header complete
303flag.
304Set to zero if the link level source address should be filled in
305automatically by the interface output routine.
306Set to one if the link level source address will be written,
307as provided, to the wire.
308This flag is initialized to zero by default.
309.It Dv BIOCGSEESENT , BIOCSSEESENT ( u_int )
310These commands are obsolete but left for compatibility.
311Use
312.Dv BIOCSDIRECTION
313and
314.Dv BIOCGDIRECTION
315instead.
316Set or get the flag determining whether locally generated packets on the
317interface should be returned by BPF.
318Set to zero to see only incoming packets on the interface.
319Set to one to see packets originating locally and remotely on the interface.
320This flag is initialized to one by default.
321.It Dv BIOCSDIRECTION
322.It Dv BIOCGDIRECTION
323.Pq Li u_int
324Set or get the setting determining whether incoming, outgoing, or all packets
325on the interface should be returned by BPF.
326Set to
327.Dv BPF_D_IN
328to see only incoming packets on the interface.
329Set to
330.Dv BPF_D_INOUT
331to see packets originating locally and remotely on the interface.
332Set to
333.Dv BPF_D_OUT
334to see only outgoing packets on the interface.
335This setting is initialized to
336.Dv BPF_D_INOUT
337by default.
338.It Dv BIOCFEEDBACK , BIOCSFEEDBACK , BIOCGFEEDBACK ( u_int )
339Set (or get)
340.Dq packet feedback mode .
341This allows injected packets to be fed back as input to the interface when
342output via the interface is successful.
343The first name is meant for
344.Fx
345compatibility, the two others follow the Get/Set convention.
346.\"When
347.\".Dv BPF_D_INOUT
348.\"direction is set, injected
349Injected
350outgoing packets are not returned by BPF to avoid
351duplication.
352This flag is initialized to zero by default.
353.El
354.Sh STANDARD IOCTLS
355.Nm
356now supports several standard
357.Xr ioctl 2 Ns 's
358which allow the user to do async and/or non-blocking I/O to an open
359.Nm bpf
360file descriptor.
361.Bl -tag -width indent -offset indent
362.It Dv FIONREAD ( int )
363Returns the number of bytes that are immediately available for reading.
364.It Dv FIONBIO ( int )
365Set or clear non-blocking I/O.
366If arg is non-zero, then doing a
367.Xr read 2
368when no data is available will return -1 and
369.Va errno
370will be set to
371.Er EAGAIN .
372If arg is zero, non-blocking I/O is disabled.
373Note: setting this
374overrides the timeout set by
375.Dv BIOCSRTIMEOUT .
376.It Dv FIOASYNC ( int )
377Enable or disable async I/O.
378When enabled (arg is non-zero), the process or process group specified by
379.Dv FIOSETOWN
380will start receiving SIGIO's when packets
381arrive.
382Note that you must do an
383.Dv FIOSETOWN
384in order for this to take effect, as
385the system will not default this for you.
386The signal may be changed via
387.Dv BIOCSRSIG .
388.It Dv FIOSETOWN , FIOGETOWN ( int )
389Set or get the process or process group (if negative) that should receive SIGIO
390when packets are available.
391The signal may be changed using
392.Dv BIOCSRSIG
393(see above).
394.El
395.Sh BPF HEADER
396The following structure is prepended to each packet returned by
397.Xr read 2 :
398.Bd -literal -offset indent
399struct bpf_hdr {
400	struct bpf_timeval bh_tstamp;
401	uint32_t bh_caplen;
402	uint32_t bh_datalen;
403	uint16_t bh_hdrlen;
404};
405.Ed
406.Pp
407The fields, whose values are stored in host order, are:
408.Bl -tag -width bh_datalen -offset indent
409.It Va bh_tstamp
410The time at which the packet was processed by the packet filter.
411This structure differs from the standard
412.Vt struct timeval
413in that both members are of type
414.Vt long .
415.It Va bh_caplen
416The length of the captured portion of the packet.
417This is the minimum of
418the truncation amount specified by the filter and the length of the packet.
419.It Va bh_datalen
420The length of the packet off the wire.
421This value is independent of the truncation amount specified by the filter.
422.It Va bh_hdrlen
423The length of the BPF header, which may not be equal to
424.Em sizeof(struct bpf_hdr) .
425.El
426.Pp
427The
428.Va bh_hdrlen
429field exists to account for
430padding between the header and the link level protocol.
431The purpose here is to guarantee proper alignment of the packet
432data structures, which is required on alignment sensitive
433architectures and improves performance on many other architectures.
434The packet filter ensures that the
435.Va bpf_hdr
436and the
437.Em network layer
438header will be word aligned.
439Suitable precautions must be taken when accessing the link layer
440protocol fields on alignment restricted machines.
441(This isn't a problem on an Ethernet, since
442the type field is a short falling on an even offset,
443and the addresses are probably accessed in a bytewise fashion).
444.Pp
445Additionally, individual packets are padded so that each starts
446on a word boundary.
447This requires that an application
448has some knowledge of how to get from packet to packet.
449The macro
450.Dv BPF_WORDALIGN
451is defined in
452.In net/bpf.h
453to facilitate this process.
454It rounds up its argument
455to the nearest word aligned value (where a word is
456.Dv BPF_ALIGNMENT
457bytes wide).
458.Pp
459For example, if
460.Sq Va p
461points to the start of a packet, this expression
462will advance it to the next packet:
463.Pp
464.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
465.Pp
466For the alignment mechanisms to work properly, the
467buffer passed to
468.Xr read 2
469must itself be word aligned.
470.Xr malloc 3
471will always return an aligned buffer.
472.Sh FILTER MACHINE
473A filter program is an array of instructions, with all branches forwardly
474directed, terminated by a
475.Sy return
476instruction.
477Each instruction performs some action on the pseudo-machine state,
478which consists of an accumulator, index register, scratch memory store,
479and implicit program counter.
480.Pp
481The following structure defines the instruction format:
482.Bd -literal -offset indent
483struct bpf_insn {
484	uint16_t code;
485	u_char 	jt;
486	u_char 	jf;
487	uint32_t k;
488};
489.Ed
490.Pp
491The
492.Va k
493field is used in different ways by different instructions,
494and the
495.Va jt
496and
497.Va jf
498fields are used as offsets
499by the branch instructions.
500The opcodes are encoded in a semi-hierarchical fashion.
501There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
502BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC.
503Various other mode and
504operator bits are or'd into the class to give the actual instructions.
505The classes and modes are defined in
506.In net/bpf.h .
507.Pp
508Below are the semantics for each defined BPF instruction.
509We use the convention that A is the accumulator, X is the index register,
510P[] packet data, and M[] scratch memory store.
511P[i:n] gives the data at byte offset
512.Dq i
513in the packet,
514interpreted as a word (n=4),
515unsigned halfword (n=2), or unsigned byte (n=1).
516M[i] gives the i'th word in the scratch memory store, which is only
517addressed in word units.
518The memory store is indexed from 0 to BPF_MEMWORDS-1.
519.Va k ,
520.Va jt ,
521and
522.Va jf
523are the corresponding fields in the
524instruction definition.
525.Dq len
526refers to the length of the packet.
527.Bl -tag -width indent -offset indent
528.It Sy BPF_LD
529These instructions copy a value into the accumulator.
530The type of the source operand is specified by an
531.Dq addressing mode
532and can be a constant
533.Sy ( BPF_IMM ) ,
534packet data at a fixed offset
535.Sy ( BPF_ABS ) ,
536packet data at a variable offset
537.Sy ( BPF_IND ) ,
538the packet length
539.Sy ( BPF_LEN ) ,
540or a word in the scratch memory store
541.Sy ( BPF_MEM ) .
542For
543.Sy BPF_IND
544and
545.Sy BPF_ABS ,
546the data size must be specified as a word
547.Sy ( BPF_W ) ,
548halfword
549.Sy ( BPF_H ) ,
550or byte
551.Sy ( BPF_B ) .
552Arithmetic overflow when calculating a variable offset terminates
553the filter program and the packet is ignored.
554The semantics of all the recognized BPF_LD instructions follow.
555.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A <- P[k:4]" -offset indent
556.It Sy BPF_LD+BPF_W+BPF_ABS Ta A <- P[k:4]
557.It Sy BPF_LD+BPF_H+BPF_ABS Ta A <- P[k:2]
558.It Sy BPF_LD+BPF_B+BPF_ABS Ta A <- P[k:1]
559.It Sy BPF_LD+BPF_W+BPF_IND Ta A <- P[X+k:4]
560.It Sy BPF_LD+BPF_H+BPF_IND Ta A <- P[X+k:2]
561.It Sy BPF_LD+BPF_B+BPF_IND Ta A <- P[X+k:1]
562.It Sy BPF_LD+BPF_W+BPF_LEN Ta A <- len
563.It Sy BPF_LD+BPF_IMM Ta A <- k
564.It Sy BPF_LD+BPF_MEM Ta A <- M[k]
565.El
566.It Sy BPF_LDX
567These instructions load a value into the index register.
568Note that the addressing modes are more restricted than those of
569the accumulator loads, but they include
570.Sy BPF_MSH ,
571a hack for efficiently loading the IP header length.
572.Bl -column "BPF_LDX_BPF_W_BPF_MEM" "X <- k" -offset indent
573.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X <- k
574.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X <- M[k]
575.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X <- len
576.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X <- 4*(P[k:1]&0xf)
577.El
578.It Sy BPF_ST
579This instruction stores the accumulator into the scratch memory.
580We do not need an addressing mode since there is only one possibility
581for the destination.
582.Bl -column "BPF_ST" "M[k] <- A" -offset indent
583.It Sy BPF_ST Ta M[k] <- A
584.El
585.It Sy BPF_STX
586This instruction stores the index register in the scratch memory store.
587.Bl -column "BPF_STX" "M[k] <- X" -offset indent
588.It Sy BPF_STX Ta M[k] <- X
589.El
590.It Sy BPF_ALU
591The alu instructions perform operations between the accumulator and
592index register or constant, and store the result back in the accumulator.
593For binary operations, a source mode is required
594.Sy ( BPF_K
595or
596.Sy BPF_X ) .
597.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A <- A + k" -offset indent
598.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A <- A + k
599.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A <- A - k
600.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A <- A * k
601.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A <- A / k
602.It Sy BPF_ALU+BPF_AND+BPF_K Ta A <- A & k
603.It Sy BPF_ALU+BPF_OR+BPF_K Ta A <- A | k
604.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A <- A << k
605.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A <- A >> k
606.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A <- A + X
607.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A <- A - X
608.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A <- A * X
609.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A <- A / X
610.It Sy BPF_ALU+BPF_AND+BPF_X Ta A <- A & X
611.It Sy BPF_ALU+BPF_OR+BPF_X Ta A <- A | X
612.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A <- A << X
613.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A <- A >> X
614.It Sy BPF_ALU+BPF_NEG Ta A <- -A
615.El
616.It Sy BPF_JMP
617The jump instructions alter flow of control.
618Conditional jumps compare the accumulator against a constant
619.Sy ( BPF_K )
620or the index register
621.Sy ( BPF_X ) .
622If the result is true (or non-zero),
623the true branch is taken, otherwise the false branch is taken.
624Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
625However, the jump always
626.Sy ( BPF_JA )
627opcode uses the 32 bit
628.Va k
629field as the offset, allowing arbitrarily distant destinations.
630All conditionals use unsigned comparison conventions.
631.Bl -column "BPF_JMP+BPF_JSET+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent
632.It Sy BPF_JMP+BPF_JA Ta pc += k
633.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A > k) ? jt : jf"
634.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
635.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
636.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A & k) ? jt : jf"
637.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A > X) ? jt : jf"
638.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
639.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
640.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A & X) ? jt : jf"
641.El
642.It Sy BPF_RET
643The return instructions terminate the filter program and specify the amount
644of packet to accept (i.e., they return the truncation amount).
645A return value of zero indicates that the packet should be ignored.
646The return value is either a constant
647.Sy ( BPF_K )
648or the accumulator
649.Sy ( BPF_A ) .
650.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent
651.It Sy BPF_RET+BPF_A Ta accept A bytes
652.It Sy BPF_RET+BPF_K Ta accept k bytes
653.El
654.It Sy BPF_MISC
655The miscellaneous category was created for anything that doesn't
656fit into the above classes, and for any new instructions that might need to
657be added.
658Currently, these are the register transfer instructions
659that copy the index register to the accumulator or vice versa.
660.Bl -column "BPF_MISC+BPF_TAX" "X <- A" -offset indent
661.It Sy BPF_MISC+BPF_TAX Ta X <- A
662.It Sy BPF_MISC+BPF_TXA Ta A <- X
663.El
664.Pp
665Also, two instructions to call a "coprocessor" if initialized by the kernel
666component.
667There is no coprocessor by default.
668.Bl -column "BPF_MISC+BPF_COPX" "A <- funcs[X](...)" -offset indent
669.It Sy BPF_MISC+BPF_COP Ta A <- funcs[k](..)
670.It Sy BPF_MISC+BPF_COPX Ta A <- funcs[X](..)
671.El
672.Pp
673If the coprocessor is not set or the function index is out of range, these
674instructions will abort the program and return zero.
675.El
676.Pp
677The BPF interface provides the following macros to facilitate
678array initializers:
679.Bd -unfilled -offset indent
680.Fn BPF_STMT opcode operand
681.Fn BPF_JUMP opcode operand true_offset false_offset
682.Ed
683.Sh SYSCTLS
684The following sysctls are available when
685.Nm
686is enabled:
687.Bl -tag -width "XnetXbpfXmaxbufsizeXX"
688.It Li net.bpf.maxbufsize
689Sets the maximum buffer size available for
690.Nm
691peers.
692.It Li net.bpf.stats
693Shows
694.Nm
695statistics.
696They can be retrieved with the
697.Xr netstat 1
698utility.
699.It Li net.bpf.peers
700Shows the current
701.Nm
702peers.
703This is only available to the super user and can also be retrieved with the
704.Xr netstat 1
705utility.
706.El
707.Pp
708On architectures with
709.Xr bpfjit 4
710support, the additional sysctl is available:
711.Bl -tag -width "XnetXbpfXjitXX"
712.It Li net.bpf.jit
713Toggle
714.Sy Just-In-Time
715compilation of new filter programs.
716In order to enable Just-In-Time compilation,
717the bpfjit kernel module must be loaded.
718Changing a value of this sysctl doesn't affect
719existing filter programs.
720.El
721.Sh FILES
722.Pa /dev/bpf
723.Sh EXAMPLES
724The following filter is taken from the Reverse ARP Daemon.
725It accepts only Reverse ARP requests.
726.Bd -literal -offset indent
727struct bpf_insn insns[] = {
728	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
729	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
730	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
731	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
732	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
733	    sizeof(struct ether_header)),
734	BPF_STMT(BPF_RET+BPF_K, 0),
735};
736.Ed
737.Pp
738This filter accepts only IP packets between host 128.3.112.15 and
739128.3.112.35.
740.Bd -literal -offset indent
741struct bpf_insn insns[] = {
742	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
743	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
744	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
745	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
746	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
747	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
748	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
749	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
750	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
751	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
752	BPF_STMT(BPF_RET+BPF_K, 0),
753};
754.Ed
755.Pp
756Finally, this filter returns only TCP finger packets.
757We must parse the IP header to reach the TCP header.
758The
759.Sy BPF_JSET
760instruction checks that the IP fragment offset is 0 so we are sure
761that we have a TCP header.
762.Bd -literal -offset indent
763struct bpf_insn insns[] = {
764	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
765	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
766	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
767	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
768	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
769	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
770	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
771	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
772	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
773	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
774	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
775	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
776	BPF_STMT(BPF_RET+BPF_K, 0),
777};
778.Ed
779.Sh SEE ALSO
780.Xr ioctl 2 ,
781.Xr read 2 ,
782.Xr select 2 ,
783.Xr signal 3 ,
784.Xr bpfjit 4 ,
785.Xr tcpdump 8
786.Rs
787.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
788.%A S. McCanne
789.%A V. Jacobson
790.%J Proceedings of the 1993 Winter USENIX
791.%C Technical Conference, San Diego, CA
792.Re
793.Sh HISTORY
794The Enet packet filter was created in 1980 by Mike Accetta and
795Rick Rashid at Carnegie-Mellon University.
796Jeffrey Mogul, at Stanford, ported the code to BSD and continued
797its development from 1983 on.
798Since then, it has evolved into the ULTRIX Packet Filter
799at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
800.Sh AUTHORS
801.An -nosplit
802.An Steven McCanne ,
803of Lawrence Berkeley Laboratory, implemented BPF in Summer 1990.
804The design was in collaboration with
805.An Van Jacobson ,
806also of Lawrence Berkeley Laboratory.
807.Sh BUGS
808The read buffer must be of a fixed size (returned by the
809.Dv BIOCGBLEN
810ioctl).
811.Pp
812A file that does not request promiscuous mode may receive promiscuously
813received packets as a side effect of another file requesting this
814mode on the same hardware interface.
815This could be fixed in the kernel with additional processing overhead.
816However, we favor the model where
817all files must assume that the interface is promiscuous, and if
818so desired, must use a filter to reject foreign packets.
819.Pp
820Under SunOS, if a BPF application reads more than 2^31 bytes of
821data, read will fail in
822.Er EINVAL .
823You can either fix the bug in SunOS,
824or lseek to 0 when read fails for this reason.
825.Pp
826.Dq Immediate mode
827and the
828.Dq read timeout
829are misguided features.
830This functionality can be emulated with non-blocking mode and
831.Xr select 2 .
832