xref: /openbsd-src/share/man/man4/bpf.4 (revision 898184e3e61f9129feb5978fad5a8c6865f00b92)
1.\"	$OpenBSD: bpf.4,v 1.31 2010/04/09 16:25:21 jmc Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd $Mdocdate: April 9 2010 $
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf0 ,
42.Pa /dev/bpf1 ,
43etc.
44After opening the device, the file descriptor must be bound to a specific
45network interface with the
46.Dv BIOCSETIF
47.Xr ioctl 2 .
48A given interface can be shared between multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51A separate device file is required for each minor device.
52If a file is in use, the open will fail and
53.Va errno
54will be set to
55.Er EBUSY .
56The number of open files can be increased by creating additional
57device nodes with the
58.Xr MAKEDEV 8
59script.
60.Pp
61Associated with each open instance of a
62.Nm
63file is a user-settable
64packet filter.
65Whenever a packet is received by an interface, all file descriptors
66listening on that interface apply their filter.
67Each descriptor that accepts the packet receives its own copy.
68.Pp
69Reads from these files return the next group of packets that have matched
70the filter.
71To improve performance, the buffer passed to read must be the same size as
72the buffers used internally by
73.Nm bpf .
74This size is returned by the
75.Dv BIOCGBLEN
76.Xr ioctl 2
77and can be set with
78.Dv BIOCSBLEN .
79Note that an individual packet larger than this size is necessarily truncated.
80.Pp
81A packet can be sent out on the network by writing to a
82.Nm
83file descriptor.
84Each descriptor can also have a user-settable filter
85for controlling the writes.
86Only packets matching the filter are sent out of the interface.
87The writes are unbuffered, meaning only one packet can be processed per write.
88.Pp
89Once a descriptor is configured, further changes to the configuration
90can be prevented using the
91.Dv BIOCLOCK
92.Xr ioctl 2 .
93.Sh IOCTL INTERFACE
94The
95.Xr ioctl 2
96command codes below are defined in
97.Aq Pa net/bpf.h .
98All commands require these includes:
99.Bd -unfilled -offset indent
100.Cd #include <sys/types.h>
101.Cd #include <sys/time.h>
102.Cd #include <sys/ioctl.h>
103.Cd #include <net/bpf.h>
104.Ed
105.Pp
106Additionally,
107.Dv BIOCGETIF
108and
109.Dv BIOCSETIF
110require
111.Aq Pa sys/socket.h
112and
113.Aq Pa net/if.h .
114.Pp
115The (third) argument to the
116.Xr ioctl 2
117call should be a pointer to the type indicated.
118.Pp
119.Bl -tag -width Ds -compact
120.It Dv BIOCGBLEN Fa "u_int *"
121Returns the required buffer length for reads on
122.Nm
123files.
124.Pp
125.It Dv BIOCSBLEN Fa "u_int *"
126Sets the buffer length for reads on
127.Nm
128files.
129The buffer must be set before the file is attached to an interface with
130.Dv BIOCSETIF .
131If the requested buffer size cannot be accommodated, the closest allowable
132size will be set and returned in the argument.
133A read call will result in
134.Er EINVAL
135if it is passed a buffer that is not this size.
136.Pp
137.It Dv BIOCGDLT Fa "u_int *"
138Returns the type of the data link layer underlying the attached interface.
139.Er EINVAL
140is returned if no interface has been specified.
141The device types, prefixed with
142.Dq DLT_ ,
143are defined in
144.Aq Pa net/bpf.h .
145.Pp
146.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *"
147Returns an array of the available types of the data link layer
148underlying the attached interface:
149.Bd -literal -offset indent
150struct bpf_dltlist {
151	u_int bfl_len;
152	u_int *bfl_list;
153};
154.Ed
155.Pp
156The available types are returned in the array pointed to by the
157.Va bfl_list
158field while their length in
159.Vt u_int
160is supplied to the
161.Va bfl_len
162field.
163.Er ENOMEM
164is returned if there is not enough buffer space and
165.Er EFAULT
166is returned if a bad address is encountered.
167The
168.Va bfl_len
169field is modified on return to indicate the actual length in
170.Vt u_int
171of the array returned.
172If
173.Va bfl_list
174is
175.Dv NULL ,
176the
177.Va bfl_len
178field is set to indicate the required length of the array in
179.Vt u_int .
180.Pp
181.It Dv BIOCSDLT Fa "u_int *"
182Changes the type of the data link layer underlying the attached interface.
183.Er EINVAL
184is returned if no interface has been specified or the specified
185type is not available for the interface.
186.Pp
187.It Dv BIOCPROMISC
188Forces the interface into promiscuous mode.
189All packets, not just those destined for the local host, are processed.
190Since more than one file can be listening on a given interface, a listener
191that opened its interface non-promiscuously may receive packets promiscuously.
192This problem can be remedied with an appropriate filter.
193.Pp
194The interface remains in promiscuous mode until all files listening
195promiscuously are closed.
196.Pp
197.It Dv BIOCFLUSH
198Flushes the buffer of incoming packets and resets the statistics that are
199returned by
200.Dv BIOCGSTATS .
201.Pp
202.It Dv BIOCLOCK
203This ioctl is designed to prevent the security issues associated
204with an open
205.Nm
206descriptor in unprivileged programs.
207Even with dropped privileges, an open
208.Nm
209descriptor can be abused by a rogue program to listen on any interface
210on the system, send packets on these interfaces if the descriptor was
211opened read-write and send signals to arbitrary processes using the
212signaling mechanism of
213.Nm bpf .
214By allowing only
215.Dq known safe
216ioctls, the
217.Dv BIOCLOCK
218ioctl prevents this abuse.
219The allowable ioctls are
220.Dv BIOCFLUSH ,
221.Dv BIOCGBLEN ,
222.Dv BIOCGDIRFILT ,
223.Dv BIOCGDLT ,
224.Dv BIOCGDLTLIST ,
225.Dv BIOCGETIF ,
226.Dv BIOCGHDRCMPLT ,
227.Dv BIOCGRSIG ,
228.Dv BIOCGRTIMEOUT ,
229.Dv BIOCGSTATS ,
230.Dv BIOCIMMEDIATE ,
231.Dv BIOCLOCK ,
232.Dv BIOCSRTIMEOUT ,
233.Dv BIOCVERSION ,
234.Dv TIOCGPGRP ,
235and
236.Dv FIONREAD .
237Use of any other ioctl is denied with error
238.Er EPERM .
239Once a descriptor is locked, it is not possible to unlock it.
240A process with root privileges is not affected by the lock.
241.Pp
242A privileged program can open a
243.Nm
244device, drop privileges, set the interface, filters and modes on the
245descriptor, and lock it.
246Once the descriptor is locked, the system is safe
247from further abuse through the descriptor.
248Locking a descriptor does not prevent writes.
249If the application does not need to send packets through
250.Nm bpf ,
251it can open the device read-only to prevent writing.
252If sending packets is necessary, a write-filter can be set before locking the
253descriptor to prevent arbitrary packets from being sent out.
254.Pp
255.It Dv BIOCGETIF Fa "struct ifreq *"
256Returns the name of the hardware interface that the file is listening on.
257The name is returned in the
258.Fa ifr_name
259field of the
260.Li struct ifreq .
261All other fields are undefined.
262.Pp
263.It Dv BIOCSETIF Fa "struct ifreq *"
264Sets the hardware interface associated with the file.
265This command must be performed before any packets can be read.
266The device is indicated by name using the
267.Fa ifr_name
268field of the
269.Li struct ifreq .
270Additionally, performs the actions of
271.Dv BIOCFLUSH .
272.Pp
273.It Dv BIOCSRTIMEOUT Fa "struct timeval *"
274.It Dv BIOCGRTIMEOUT Fa "struct timeval *"
275Sets or gets the read timeout parameter.
276The
277.Ar timeval
278specifies the length of time to wait before timing out on a read request.
279This parameter is initialized to zero by
280.Xr open 2 ,
281indicating no timeout.
282.Pp
283.It Dv BIOCGSTATS Fa "struct bpf_stat *"
284Returns the following structure of packet statistics:
285.Bd -literal -offset indent
286struct bpf_stat {
287	u_int bs_recv;
288	u_int bs_drop;
289};
290.Ed
291.Pp
292The fields are:
293.Bl -tag -width bs_recv
294.It Fa bs_recv
295Number of packets received by the descriptor since opened or reset (including
296any buffered since the last read call).
297.It Fa bs_drop
298Number of packets which were accepted by the filter but dropped by the kernel
299because of buffer overflows (i.e., the application's reads aren't keeping up
300with the packet traffic).
301.El
302.Pp
303.It Dv BIOCIMMEDIATE Fa "u_int *"
304Enables or disables
305.Dq immediate mode ,
306based on the truth value of the argument.
307When immediate mode is enabled, reads return immediately upon packet reception.
308Otherwise, a read will block until either the kernel buffer becomes full or a
309timeout occurs.
310This is useful for programs like
311.Xr rarpd 8 ,
312which must respond to messages in real time.
313The default for a new file is off.
314.Pp
315.It Dv BIOCSETF Fa "struct bpf_program *"
316Sets the filter program used by the kernel to discard uninteresting packets.
317An array of instructions and its length are passed in using the following
318structure:
319.Bd -literal -offset indent
320struct bpf_program {
321	u_int bf_len;
322	struct bpf_insn *bf_insns;
323};
324.Ed
325.Pp
326The filter program is pointed to by the
327.Fa bf_insns
328field, while its length in units of
329.Li struct bpf_insn
330is given by the
331.Fa bf_len
332field.
333Also, the actions of
334.Dv BIOCFLUSH
335are performed.
336.Pp
337See section
338.Sx FILTER MACHINE
339for an explanation of the filter language.
340.Pp
341.It Dv BIOCSETWF Fa "struct bpf_program *"
342Sets the filter program used by the kernel to filter the packets
343written to the descriptor before the packets are sent out on the
344network.
345See
346.Dv BIOCSETF
347for a description of the filter program.
348This ioctl also acts as
349.Dv BIOCFLUSH .
350.Pp
351Note that the filter operates on the packet data written to the descriptor.
352If the
353.Dq header complete
354flag is not set, the kernel sets the link-layer source address
355of the packet after filtering.
356.Pp
357.It Dv BIOCVERSION Fa "struct bpf_version *"
358Returns the major and minor version numbers of the filter language currently
359recognized by the kernel.
360Before installing a filter, applications must check that the current version
361is compatible with the running kernel.
362Version numbers are compatible if the major numbers match and the application
363minor is less than or equal to the kernel minor.
364The kernel version number is returned in the following structure:
365.Bd -literal -offset indent
366struct bpf_version {
367	u_short bv_major;
368	u_short bv_minor;
369};
370.Ed
371.Pp
372The current version numbers are given by
373.Dv BPF_MAJOR_VERSION
374and
375.Dv BPF_MINOR_VERSION
376from
377.Aq Pa net/bpf.h .
378An incompatible filter may result in undefined behavior (most likely, an
379error returned by
380.Xr ioctl 2
381or haphazard packet matching).
382.Pp
383.It Dv BIOCSRSIG Fa "u_int *"
384.It Dv BIOCGRSIG Fa "u_int *"
385Sets or gets the receive signal.
386This signal will be sent to the process or process group specified by
387.Dv FIOSETOWN .
388It defaults to
389.Dv SIGIO .
390.Pp
391.It Dv BIOCSHDRCMPLT Fa "u_int *"
392.It Dv BIOCGHDRCMPLT Fa "u_int *"
393Sets or gets the status of the
394.Dq header complete
395flag.
396Set to zero if the link level source address should be filled in
397automatically by the interface output routine.
398Set to one if the link level source address will be written,
399as provided, to the wire.
400This flag is initialized to zero by default.
401.Pp
402.It Dv BIOCSFILDROP Fa "u_int *"
403.It Dv BIOCGFILDROP Fa "u_int *"
404Sets or gets the status of the
405.Dq filter drop
406flag.
407If non-zero, packets matching any filters will be reported to the
408associated interface so that they can be dropped.
409.Pp
410.It Dv BIOCSDIRFILT Fa "u_int *"
411.It Dv BIOCGDIRFILT Fa "u_int *"
412Sets or gets the status of the
413.Dq direction filter
414flag.
415If non-zero, packets matching the specified direction (either
416.Dv BPF_DIRECTION_IN
417or
418.Dv BPF_DIRECTION_OUT )
419will be ignored.
420.El
421.Ss Standard ioctls
422.Nm
423now supports several standard ioctls which allow the user to do asynchronous
424and/or non-blocking I/O to an open
425.Nm
426file descriptor.
427.Pp
428.Bl -tag -width Ds -compact
429.It Dv FIONREAD Fa "int *"
430Returns the number of bytes that are immediately available for reading.
431.Pp
432.It Dv FIONBIO Fa "int *"
433Sets or clears non-blocking I/O.
434If the argument is non-zero, enable non-blocking I/O.
435If the argument is zero, disable non-blocking I/O.
436If non-blocking I/O is enabled, the return value of a read while no data
437is available will be 0.
438The non-blocking read behavior is different from performing non-blocking
439reads on other file descriptors, which will return \-1 and set
440.Va errno
441to
442.Er EAGAIN
443if no data is available.
444Note: setting this overrides the timeout set by
445.Dv BIOCSRTIMEOUT .
446.Pp
447.It Dv FIOASYNC Fa "int *"
448Enables or disables asynchronous I/O.
449When enabled (argument is non-zero), the process or process group specified
450by
451.Dv FIOSETOWN
452will start receiving
453.Dv SIGIO
454signals when packets arrive.
455Note that you must perform an
456.Dv FIOSETOWN
457command in order for this to take effect, as the system will not do it by
458default.
459The signal may be changed via
460.Dv BIOCSRSIG .
461.Pp
462.It Dv FIOSETOWN Fa "int *"
463.It Dv FIOGETOWN Fa "int *"
464Sets or gets the process or process group (if negative) that should receive
465.Dv SIGIO
466when packets are available.
467The signal may be changed using
468.Dv BIOCSRSIG
469(see above).
470.El
471.Ss BPF header
472The following structure is prepended to each packet returned by
473.Xr read 2 :
474.Bd -literal -offset indent
475struct bpf_hdr {
476	struct bpf_timeval bh_tstamp;
477	u_int32_t	bh_caplen;
478	u_int32_t	bh_datalen;
479	u_int16_t	bh_hdrlen;
480};
481.Ed
482.Pp
483The fields, stored in host order, are as follows:
484.Bl -tag -width Ds
485.It Fa bh_tstamp
486Time at which the packet was processed by the packet filter.
487.It Fa bh_caplen
488Length of the captured portion of the packet.
489This is the minimum of the truncation amount specified by the filter and the
490length of the packet.
491.It Fa bh_datalen
492Length of the packet off the wire.
493This value is independent of the truncation amount specified by the filter.
494.It Fa bh_hdrlen
495Length of the BPF header, which may not be equal to
496.Li sizeof(struct bpf_hdr) .
497.El
498.Pp
499The
500.Fa bh_hdrlen
501field exists to account for padding between the header and the link level
502protocol.
503The purpose here is to guarantee proper alignment of the packet data
504structures, which is required on alignment-sensitive architectures and
505improves performance on many other architectures.
506The packet filter ensures that the
507.Fa bpf_hdr
508and the network layer header will be word aligned.
509Suitable precautions must be taken when accessing the link layer protocol
510fields on alignment restricted machines.
511(This isn't a problem on an Ethernet, since the type field is a
512.Li short
513falling on an even offset, and the addresses are probably accessed in a
514bytewise fashion).
515.Pp
516Additionally, individual packets are padded so that each starts on a
517word boundary.
518This requires that an application has some knowledge of how to get from packet
519to packet.
520The macro
521.Dv BPF_WORDALIGN
522is defined in
523.Aq Pa net/bpf.h
524to facilitate this process.
525It rounds up its argument to the nearest word aligned value (where a word is
526.Dv BPF_ALIGNMENT
527bytes wide).
528For example, if
529.Va p
530points to the start of a packet, this expression will advance it to the
531next packet:
532.Pp
533.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
534.Pp
535For the alignment mechanisms to work properly, the buffer passed to
536.Xr read 2
537must itself be word aligned.
538.Xr malloc 3
539will always return an aligned buffer.
540.Ss Filter machine
541A filter program is an array of instructions with all branches forwardly
542directed, terminated by a
543.Dq return
544instruction.
545Each instruction performs some action on the pseudo-machine state, which
546consists of an accumulator, index register, scratch memory store, and
547implicit program counter.
548.Pp
549The following structure defines the instruction format:
550.Bd -literal -offset indent
551struct bpf_insn {
552	u_int16_t	code;
553	u_char		jt;
554	u_char		jf;
555	u_int32_t	k;
556};
557.Ed
558.Pp
559The
560.Fa k
561field is used in different ways by different instructions, and the
562.Fa jt
563and
564.Fa jf
565fields are used as offsets by the branch instructions.
566The opcodes are encoded in a semi-hierarchical fashion.
567There are eight classes of instructions:
568.Dv BPF_LD ,
569.Dv BPF_LDX ,
570.Dv BPF_ST ,
571.Dv BPF_STX ,
572.Dv BPF_ALU ,
573.Dv BPF_JMP ,
574.Dv BPF_RET ,
575and
576.Dv BPF_MISC .
577Various other mode and operator bits are logically OR'd into the class to
578give the actual instructions.
579The classes and modes are defined in
580.Aq Pa net/bpf.h .
581Below are the semantics for each defined
582.Nm
583instruction.
584We use the convention that A is the accumulator, X is the index register,
585P[] packet data, and M[] scratch memory store.
586P[i:n] gives the data at byte offset
587.Dq i
588in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
589unsigned byte (n=1).
590M[i] gives the i'th word in the scratch memory store, which is only addressed
591in word units.
592The memory store is indexed from 0 to
593.Dv BPF_MEMWORDS Ns \-1 .
594.Fa k ,
595.Fa jt ,
596and
597.Fa jf
598are the corresponding fields in the instruction definition.
599.Dq len
600refers to the length of the packet.
601.Bl -tag -width Ds
602.It Dv BPF_LD
603These instructions copy a value into the accumulator.
604The type of the source operand is specified by an
605.Dq addressing mode
606and can be a constant
607.Pf ( Dv BPF_IMM ) ,
608packet data at a fixed offset
609.Pf ( Dv BPF_ABS ) ,
610packet data at a variable offset
611.Pf ( Dv BPF_IND ) ,
612the packet length
613.Pf ( Dv BPF_LEN ) ,
614or a word in the scratch memory store
615.Pf ( Dv BPF_MEM ) .
616For
617.Dv BPF_IND
618and
619.Dv BPF_ABS ,
620the data size must be specified as a word
621.Pf ( Dv BPF_W ) ,
622halfword
623.Pf ( Dv BPF_H ) ,
624or byte
625.Pf ( Dv BPF_B ) .
626The semantics of all recognized
627.Dv BPF_LD
628instructions follow.
629.Pp
630.Bl -tag -width 32n -compact
631.Sm off
632.It Xo Dv BPF_LD No + Dv BPF_W No +
633.Dv BPF_ABS
634.Xc
635.Sm on
636A <- P[k:4]
637.Sm off
638.It Xo Dv BPF_LD No + Dv BPF_H No +
639.Dv BPF_ABS
640.Xc
641.Sm on
642A <- P[k:2]
643.Sm off
644.It Xo Dv BPF_LD No + Dv BPF_B No +
645.Dv BPF_ABS
646.Xc
647.Sm on
648A <- P[k:1]
649.Sm off
650.It Xo Dv BPF_LD No + Dv BPF_W No +
651.Dv BPF_IND
652.Xc
653.Sm on
654A <- P[X+k:4]
655.Sm off
656.It Xo Dv BPF_LD No + Dv BPF_H No +
657.Dv BPF_IND
658.Xc
659.Sm on
660A <- P[X+k:2]
661.Sm off
662.It Xo Dv BPF_LD No + Dv BPF_B No +
663.Dv BPF_IND
664.Xc
665.Sm on
666A <- P[X+k:1]
667.Sm off
668.It Xo Dv BPF_LD No + Dv BPF_W No +
669.Dv BPF_LEN
670.Xc
671.Sm on
672A <- len
673.Sm off
674.It Dv BPF_LD No + Dv BPF_IMM
675.Sm on
676A <- k
677.Sm off
678.It Dv BPF_LD No + Dv BPF_MEM
679.Sm on
680A <- M[k]
681.El
682.It Dv BPF_LDX
683These instructions load a value into the index register.
684Note that the addressing modes are more restricted than those of the
685accumulator loads, but they include
686.Dv BPF_MSH ,
687a hack for efficiently loading the IP header length.
688.Pp
689.Bl -tag -width 32n -compact
690.Sm off
691.It Xo Dv BPF_LDX No + Dv BPF_W No +
692.Dv BPF_IMM
693.Xc
694.Sm on
695X <- k
696.Sm off
697.It Xo Dv BPF_LDX No + Dv BPF_W No +
698.Dv BPF_MEM
699.Xc
700.Sm on
701X <- M[k]
702.Sm off
703.It Xo Dv BPF_LDX No + Dv BPF_W No +
704.Dv BPF_LEN
705.Xc
706.Sm on
707X <- len
708.Sm off
709.It Xo Dv BPF_LDX No + Dv BPF_B No +
710.Dv BPF_MSH
711.Xc
712.Sm on
713X <- 4*(P[k:1]&0xf)
714.El
715.It Dv BPF_ST
716This instruction stores the accumulator into the scratch memory.
717We do not need an addressing mode since there is only one possibility for
718the destination.
719.Pp
720.Bl -tag -width 32n -compact
721.It Dv BPF_ST
722M[k] <- A
723.El
724.It Dv BPF_STX
725This instruction stores the index register in the scratch memory store.
726.Pp
727.Bl -tag -width 32n -compact
728.It Dv BPF_STX
729M[k] <- X
730.El
731.It Dv BPF_ALU
732The ALU instructions perform operations between the accumulator and index
733register or constant, and store the result back in the accumulator.
734For binary operations, a source mode is required
735.Pf ( Dv BPF_K
736or
737.Dv BPF_X ) .
738.Pp
739.Bl -tag -width 32n -compact
740.Sm off
741.It Xo Dv BPF_ALU No + BPF_ADD No +
742.Dv BPF_K
743.Xc
744.Sm on
745A <- A + k
746.Sm off
747.It Xo Dv BPF_ALU No + BPF_SUB No +
748.Dv BPF_K
749.Xc
750.Sm on
751A <- A - k
752.Sm off
753.It Xo Dv BPF_ALU No + BPF_MUL No +
754.Dv BPF_K
755.Xc
756.Sm on
757A <- A * k
758.Sm off
759.It Xo Dv BPF_ALU No + BPF_DIV No +
760.Dv BPF_K
761.Xc
762.Sm on
763A <- A / k
764.Sm off
765.It Xo Dv BPF_ALU No + BPF_AND No +
766.Dv BPF_K
767.Xc
768.Sm on
769A <- A & k
770.Sm off
771.It Xo Dv BPF_ALU No + BPF_OR No +
772.Dv BPF_K
773.Xc
774.Sm on
775A <- A | k
776.Sm off
777.It Xo Dv BPF_ALU No + BPF_LSH No +
778.Dv BPF_K
779.Xc
780.Sm on
781A <- A << k
782.Sm off
783.It Xo Dv BPF_ALU No + BPF_RSH No +
784.Dv BPF_K
785.Xc
786.Sm on
787A <- A >> k
788.Sm off
789.It Xo Dv BPF_ALU No + BPF_ADD No +
790.Dv BPF_X
791.Xc
792.Sm on
793A <- A + X
794.Sm off
795.It Xo Dv BPF_ALU No + BPF_SUB No +
796.Dv BPF_X
797.Xc
798.Sm on
799A <- A - X
800.Sm off
801.It Xo Dv BPF_ALU No + BPF_MUL No +
802.Dv BPF_X
803.Xc
804.Sm on
805A <- A * X
806.Sm off
807.It Xo Dv BPF_ALU No + BPF_DIV No +
808.Dv BPF_X
809.Xc
810.Sm on
811A <- A / X
812.Sm off
813.It Xo Dv BPF_ALU No + BPF_AND No +
814.Dv BPF_X
815.Xc
816.Sm on
817A <- A & X
818.Sm off
819.It Xo Dv BPF_ALU No + BPF_OR No +
820.Dv BPF_X
821.Xc
822.Sm on
823A <- A | X
824.Sm off
825.It Xo Dv BPF_ALU No + BPF_LSH No +
826.Dv BPF_X
827.Xc
828.Sm on
829A <- A << X
830.Sm off
831.It Xo Dv BPF_ALU No + BPF_RSH No +
832.Dv BPF_X
833.Xc
834.Sm on
835A <- A >> X
836.Sm off
837.It Dv BPF_ALU No + BPF_NEG
838.Sm on
839A <- -A
840.El
841.It Dv BPF_JMP
842The jump instructions alter flow of control.
843Conditional jumps compare the accumulator against a constant
844.Pf ( Dv BPF_K )
845or the index register
846.Pf ( Dv BPF_X ) .
847If the result is true (or non-zero), the true branch is taken, otherwise the
848false branch is taken.
849Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
850However, the jump always
851.Pf ( Dv BPF_JA )
852opcode uses the 32-bit
853.Fa k
854field as the offset, allowing arbitrarily distant destinations.
855All conditionals use unsigned comparison conventions.
856.Pp
857.Bl -tag -width 32n -compact
858.Sm off
859.It Dv BPF_JMP No + BPF_JA
860pc += k
861.Sm on
862.Sm off
863.It Xo Dv BPF_JMP No + BPF_JGT No +
864.Dv BPF_K
865.Xc
866.Sm on
867pc += (A > k) ? jt : jf
868.Sm off
869.It Xo Dv BPF_JMP No + BPF_JGE No +
870.Dv BPF_K
871.Xc
872.Sm on
873pc += (A >= k) ? jt : jf
874.Sm off
875.It Xo Dv BPF_JMP No + BPF_JEQ No +
876.Dv BPF_K
877.Xc
878.Sm on
879pc += (A == k) ? jt : jf
880.Sm off
881.It Xo Dv BPF_JMP No + BPF_JSET No +
882.Dv BPF_K
883.Xc
884.Sm on
885pc += (A & k) ? jt : jf
886.Sm off
887.It Xo Dv BPF_JMP No + BPF_JGT No +
888.Dv BPF_X
889.Xc
890.Sm on
891pc += (A > X) ? jt : jf
892.Sm off
893.It Xo Dv BPF_JMP No + BPF_JGE No +
894.Dv BPF_X
895.Xc
896.Sm on
897pc += (A >= X) ? jt : jf
898.Sm off
899.It Xo Dv BPF_JMP No + BPF_JEQ No +
900.Dv BPF_X
901.Xc
902.Sm on
903pc += (A == X) ? jt : jf
904.Sm off
905.It Xo Dv BPF_JMP No + BPF_JSET No +
906.Dv BPF_X
907.Xc
908.Sm on
909pc += (A & X) ? jt : jf
910.El
911.It Dv BPF_RET
912The return instructions terminate the filter program and specify the
913amount of packet to accept (i.e., they return the truncation amount)
914or, for the write filter, the maximum acceptable size for the packet
915(i.e., the packet is dropped if it is larger than the returned
916amount).
917A return value of zero indicates that the packet should be ignored/dropped.
918The return value is either a constant
919.Pf ( Dv BPF_K )
920or the accumulator
921.Pf ( Dv BPF_A ) .
922.Pp
923.Bl -tag -width 32n -compact
924.It Dv BPF_RET No + Dv BPF_A
925Accept A bytes.
926.It Dv BPF_RET No + Dv BPF_K
927Accept k bytes.
928.El
929.It Dv BPF_MISC
930The miscellaneous category was created for anything that doesn't fit into
931the above classes, and for any new instructions that might need to be added.
932Currently, these are the register transfer instructions that copy the index
933register to the accumulator or vice versa.
934.Pp
935.Bl -tag -width 32n -compact
936.Sm off
937.It Dv BPF_MISC No + Dv BPF_TAX
938.Sm on
939X <- A
940.Sm off
941.It Dv BPF_MISC No + Dv BPF_TXA
942.Sm on
943A <- X
944.El
945.El
946.Pp
947The
948.Nm
949interface provides the following macros to facilitate array initializers:
950.Bd -filled -offset indent
951.Dv BPF_STMT ( Ns Ar opcode ,
952.Ar operand )
953.Pp
954.Dv BPF_JUMP ( Ns Ar opcode ,
955.Ar operand ,
956.Ar true_offset ,
957.Ar false_offset )
958.Ed
959.Sh FILES
960.Bl -tag -width /dev/bpf[0-9] -compact
961.It Pa /dev/bpf[0-9]
962.Nm
963devices
964.El
965.Sh EXAMPLES
966The following filter is taken from the Reverse ARP daemon.
967It accepts only Reverse ARP requests.
968.Bd -literal -offset indent
969struct bpf_insn insns[] = {
970	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
971	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
972	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
973	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
974	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
975	    sizeof(struct ether_header)),
976	BPF_STMT(BPF_RET+BPF_K, 0),
977};
978.Ed
979.Pp
980This filter accepts only IP packets between host 128.3.112.15 and
981128.3.112.35.
982.Bd -literal -offset indent
983struct bpf_insn insns[] = {
984	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
985	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
986	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
987	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
988	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
989	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
990	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
991	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
992	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
993	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
994	BPF_STMT(BPF_RET+BPF_K, 0),
995};
996.Ed
997.Pp
998Finally, this filter returns only TCP finger packets.
999We must parse the IP header to reach the TCP header.
1000The
1001.Dv BPF_JSET
1002instruction checks that the IP fragment offset is 0 so we are sure that we
1003have a TCP header.
1004.Bd -literal -offset indent
1005struct bpf_insn insns[] = {
1006	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
1007	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
1008	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
1009	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
1010	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
1011	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
1012	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
1013	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
1014	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
1015	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
1016	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
1017	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1018	BPF_STMT(BPF_RET+BPF_K, 0),
1019};
1020.Ed
1021.Sh SEE ALSO
1022.Xr ioctl 2 ,
1023.Xr read 2 ,
1024.Xr select 2 ,
1025.Xr signal 3 ,
1026.Xr MAKEDEV 8 ,
1027.Xr tcpdump 8
1028.Rs
1029.%A McCanne, S.
1030.%A Jacobson, V.
1031.%J "An efficient, extensible, and portable network monitor"
1032.Re
1033.Sh HISTORY
1034The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
1035at Carnegie-Mellon University.
1036Jeffrey Mogul, at Stanford, ported the code to BSD and continued its
1037development from 1983 on.
1038Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
1039NIT module under SunOS 4.1, and BPF.
1040.Sh AUTHORS
1041Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1042Much of the design is due to Van Jacobson.
1043.Sh BUGS
1044The read buffer must be of a fixed size (returned by the
1045.Dv BIOCGBLEN
1046ioctl).
1047.Pp
1048A file that does not request promiscuous mode may receive promiscuously
1049received packets as a side effect of another file requesting this mode on
1050the same hardware interface.
1051This could be fixed in the kernel with additional processing overhead.
1052However, we favor the model where all files must assume that the interface
1053is promiscuous, and if so desired, must utilize a filter to reject foreign
1054packets.
1055