xref: /openbsd-src/share/man/man4/bpf.4 (revision 1a8dbaac879b9f3335ad7fb25429ce63ac1d6bac)
1.\"	$OpenBSD: bpf.4,v 1.43 2020/09/30 19:25:40 tb Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd $Mdocdate: September 30 2020 $
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf .
42After opening the device, the file descriptor must be bound to a specific
43network interface with the
44.Dv BIOCSETIF
45.Xr ioctl 2 .
46A given interface can be shared between multiple listeners, and the filter
47underlying each descriptor will see an identical packet stream.
48.Pp
49Associated with each open instance of a
50.Nm
51file is a user-settable
52packet filter.
53Whenever a packet is received by an interface, all file descriptors
54listening on that interface apply their filter.
55Each descriptor that accepts the packet receives its own copy.
56.Pp
57Reads from these files return the next group of packets that have matched
58the filter.
59To improve performance, the buffer passed to read must be the same size as
60the buffers used internally by
61.Nm bpf .
62This size is returned by the
63.Dv BIOCGBLEN
64.Xr ioctl 2
65and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily truncated.
68.Pp
69A packet can be sent out on the network by writing to a
70.Nm
71file descriptor.
72Each descriptor can also have a user-settable filter
73for controlling the writes.
74Only packets matching the filter are sent out of the interface.
75The writes are unbuffered, meaning only one packet can be processed per write.
76.Pp
77Once a descriptor is configured, further changes to the configuration
78can be prevented using the
79.Dv BIOCLOCK
80.Xr ioctl 2 .
81.Sh IOCTL INTERFACE
82The
83.Xr ioctl 2
84command codes below are defined in
85.In net/bpf.h .
86All commands require these includes:
87.Pp
88.nr nS 1
89.In sys/types.h
90.In sys/time.h
91.In sys/ioctl.h
92.In net/bpf.h
93.nr nS 0
94.Pp
95Additionally,
96.Dv BIOCGETIF
97and
98.Dv BIOCSETIF
99require
100.In sys/socket.h
101and
102.In net/if.h .
103.Pp
104The (third) argument to the
105.Xr ioctl 2
106call should be a pointer to the type indicated.
107.Pp
108.Bl -tag -width Ds -compact
109.It Dv BIOCGBLEN Fa "u_int *"
110Returns the required buffer length for reads on
111.Nm
112files.
113.Pp
114.It Dv BIOCSBLEN Fa "u_int *"
115Sets the buffer length for reads on
116.Nm
117files.
118The buffer must be set before the file is attached to an interface with
119.Dv BIOCSETIF .
120If the requested buffer size cannot be accommodated, the closest allowable
121size will be set and returned in the argument.
122A read call will result in
123.Er EINVAL
124if it is passed a buffer that is not this size.
125.Pp
126.It Dv BIOCGDLT Fa "u_int *"
127Returns the type of the data link layer underlying the attached interface.
128.Er EINVAL
129is returned if no interface has been specified.
130The device types, prefixed with
131.Dq DLT_ ,
132are defined in
133.In net/bpf.h .
134.Pp
135.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *"
136Returns an array of the available types of the data link layer
137underlying the attached interface:
138.Bd -literal -offset indent
139struct bpf_dltlist {
140	u_int bfl_len;
141	u_int *bfl_list;
142};
143.Ed
144.Pp
145The available types are returned in the array pointed to by the
146.Va bfl_list
147field while their length in
148.Vt u_int
149is supplied to the
150.Va bfl_len
151field.
152.Er ENOMEM
153is returned if there is not enough buffer space and
154.Er EFAULT
155is returned if a bad address is encountered.
156The
157.Va bfl_len
158field is modified on return to indicate the actual length in
159.Vt u_int
160of the array returned.
161If
162.Va bfl_list
163is
164.Dv NULL ,
165the
166.Va bfl_len
167field is set to indicate the required length of the array in
168.Vt u_int .
169.Pp
170.It Dv BIOCSDLT Fa "u_int *"
171Changes the type of the data link layer underlying the attached interface.
172.Er EINVAL
173is returned if no interface has been specified or the specified
174type is not available for the interface.
175.Pp
176.It Dv BIOCPROMISC
177Forces the interface into promiscuous mode.
178All packets, not just those destined for the local host, are processed.
179Since more than one file can be listening on a given interface, a listener
180that opened its interface non-promiscuously may receive packets promiscuously.
181This problem can be remedied with an appropriate filter.
182.Pp
183The interface remains in promiscuous mode until all files listening
184promiscuously are closed.
185.Pp
186.It Dv BIOCFLUSH
187Flushes the buffer of incoming packets and resets the statistics that are
188returned by
189.Dv BIOCGSTATS .
190.Pp
191.It Dv BIOCLOCK
192This ioctl is designed to prevent the security issues associated
193with an open
194.Nm
195descriptor in unprivileged programs.
196Even with dropped privileges, an open
197.Nm
198descriptor can be abused by a rogue program to listen on any interface
199on the system, send packets on these interfaces if the descriptor was
200opened read-write and send signals to arbitrary processes using the
201signaling mechanism of
202.Nm bpf .
203By allowing only
204.Dq known safe
205ioctls, the
206.Dv BIOCLOCK
207ioctl prevents this abuse.
208The allowable ioctls are
209.Dv BIOCFLUSH ,
210.Dv BIOCGBLEN ,
211.Dv BIOCGDIRFILT ,
212.Dv BIOCGDLT ,
213.Dv BIOCGDIRFILT ,
214.Dv BIOCGDLTLIST ,
215.Dv BIOCGETIF ,
216.Dv BIOCGHDRCMPLT ,
217.Dv BIOCGRSIG ,
218.Dv BIOCGRTIMEOUT ,
219.Dv BIOCGSTATS ,
220.Dv BIOCIMMEDIATE ,
221.Dv BIOCLOCK ,
222.Dv BIOCSRTIMEOUT ,
223.Dv BIOCVERSION ,
224.Dv TIOCGPGRP ,
225and
226.Dv FIONREAD .
227Use of any other ioctl is denied with error
228.Er EPERM .
229Once a descriptor is locked, it is not possible to unlock it.
230A process with root privileges is not affected by the lock.
231.Pp
232A privileged program can open a
233.Nm
234device, drop privileges, set the interface, filters and modes on the
235descriptor, and lock it.
236Once the descriptor is locked, the system is safe
237from further abuse through the descriptor.
238Locking a descriptor does not prevent writes.
239If the application does not need to send packets through
240.Nm bpf ,
241it can open the device read-only to prevent writing.
242If sending packets is necessary, a write-filter can be set before locking the
243descriptor to prevent arbitrary packets from being sent out.
244.Pp
245.It Dv BIOCGETIF Fa "struct ifreq *"
246Returns the name of the hardware interface that the file is listening on.
247The name is returned in the
248.Fa ifr_name
249field of the
250.Li struct ifreq .
251All other fields are undefined.
252.Pp
253.It Dv BIOCSETIF Fa "struct ifreq *"
254Sets the hardware interface associated with the file.
255This command must be performed before any packets can be read.
256The device is indicated by name using the
257.Fa ifr_name
258field of the
259.Li struct ifreq .
260Additionally, performs the actions of
261.Dv BIOCFLUSH .
262.Pp
263.It Dv BIOCSRTIMEOUT Fa "struct timeval *"
264.It Dv BIOCGRTIMEOUT Fa "struct timeval *"
265Sets or gets the read timeout parameter.
266The
267.Ar timeval
268specifies the length of time to wait before timing out on a read request.
269This parameter is initialized to zero by
270.Xr open 2 ,
271indicating no timeout.
272.Pp
273.It Dv BIOCGSTATS Fa "struct bpf_stat *"
274Returns the following structure of packet statistics:
275.Bd -literal -offset indent
276struct bpf_stat {
277	u_int bs_recv;
278	u_int bs_drop;
279};
280.Ed
281.Pp
282The fields are:
283.Bl -tag -width bs_recv
284.It Fa bs_recv
285Number of packets received by the descriptor since opened or reset (including
286any buffered since the last read call).
287.It Fa bs_drop
288Number of packets which were accepted by the filter but dropped by the kernel
289because of buffer overflows (i.e., the application's reads aren't keeping up
290with the packet traffic).
291.El
292.Pp
293.It Dv BIOCIMMEDIATE Fa "u_int *"
294Enables or disables
295.Dq immediate mode ,
296based on the truth value of the argument.
297When immediate mode is enabled, reads return immediately upon packet reception.
298Otherwise, a read will block until either the kernel buffer becomes full or a
299timeout occurs.
300This is useful for programs like
301.Xr rarpd 8 ,
302which must respond to messages in real time.
303The default for a new file is off.
304.Pp
305.It Dv BIOCSETF Fa "struct bpf_program *"
306Sets the filter program used by the kernel to discard uninteresting packets.
307An array of instructions and its length are passed in using the following
308structure:
309.Bd -literal -offset indent
310struct bpf_program {
311	u_int bf_len;
312	struct bpf_insn *bf_insns;
313};
314.Ed
315.Pp
316The filter program is pointed to by the
317.Fa bf_insns
318field, while its length in units of
319.Li struct bpf_insn
320is given by the
321.Fa bf_len
322field.
323Also, the actions of
324.Dv BIOCFLUSH
325are performed.
326.Pp
327See section
328.Sx FILTER MACHINE
329for an explanation of the filter language.
330.Pp
331.It Dv BIOCSETWF Fa "struct bpf_program *"
332Sets the filter program used by the kernel to filter the packets
333written to the descriptor before the packets are sent out on the
334network.
335See
336.Dv BIOCSETF
337for a description of the filter program.
338This ioctl also acts as
339.Dv BIOCFLUSH .
340.Pp
341Note that the filter operates on the packet data written to the descriptor.
342If the
343.Dq header complete
344flag is not set, the kernel sets the link-layer source address
345of the packet after filtering.
346.Pp
347.It Dv BIOCVERSION Fa "struct bpf_version *"
348Returns the major and minor version numbers of the filter language currently
349recognized by the kernel.
350Before installing a filter, applications must check that the current version
351is compatible with the running kernel.
352Version numbers are compatible if the major numbers match and the application
353minor is less than or equal to the kernel minor.
354The kernel version number is returned in the following structure:
355.Bd -literal -offset indent
356struct bpf_version {
357	u_short bv_major;
358	u_short bv_minor;
359};
360.Ed
361.Pp
362The current version numbers are given by
363.Dv BPF_MAJOR_VERSION
364and
365.Dv BPF_MINOR_VERSION
366from
367.In net/bpf.h .
368An incompatible filter may result in undefined behavior (most likely, an
369error returned by
370.Xr ioctl 2
371or haphazard packet matching).
372.Pp
373.It Dv BIOCSRSIG Fa "u_int *"
374.It Dv BIOCGRSIG Fa "u_int *"
375Sets or gets the receive signal.
376This signal will be sent to the process or process group specified by
377.Dv FIOSETOWN .
378It defaults to
379.Dv SIGIO .
380.Pp
381.It Dv BIOCSHDRCMPLT Fa "u_int *"
382.It Dv BIOCGHDRCMPLT Fa "u_int *"
383Sets or gets the status of the
384.Dq header complete
385flag.
386Set to zero if the link level source address should be filled in
387automatically by the interface output routine.
388Set to one if the link level source address will be written,
389as provided, to the wire.
390This flag is initialized to zero by default.
391.Pp
392.It Dv BIOCSFILDROP Fa "u_int *"
393.It Dv BIOCGFILDROP Fa "u_int *"
394Sets or gets the
395.Dq filter drop
396action.
397The supported actions for packets matching the filter are:
398.Pp
399.Bl -tag -width "BPF_FILDROP_CAPTURE" -compact
400.It Dv BPF_FILDROP_PASS
401Accept and capture
402.It Dv BPF_FILDROP_CAPTURE
403Drop and capture
404.It Dv BPF_FILDROP_DROP
405Drop and do not capture
406.El
407.Pp
408Packets matching any filter configured to drop packets will be
409reported to the associated interface so that they can be dropped.
410The default action is
411.Dv BPF_FILDROP_PASS .
412.Pp
413.It Dv BIOCSDIRFILT Fa "u_int *"
414.It Dv BIOCGDIRFILT Fa "u_int *"
415Sets or gets the status of the
416.Dq direction filter
417flag.
418If non-zero, packets matching the specified direction (either
419.Dv BPF_DIRECTION_IN
420or
421.Dv BPF_DIRECTION_OUT )
422will be ignored.
423.El
424.Ss Standard ioctls
425.Nm
426now supports several standard ioctls which allow the user to do asynchronous
427and/or non-blocking I/O to an open
428.Nm
429file descriptor.
430.Pp
431.Bl -tag -width Ds -compact
432.It Dv FIONREAD Fa "int *"
433Returns the number of bytes that are immediately available for reading.
434.Pp
435.It Dv FIONBIO Fa "int *"
436Sets or clears non-blocking I/O.
437If the argument is non-zero, enable non-blocking I/O.
438If the argument is zero, disable non-blocking I/O.
439If non-blocking I/O is enabled, the return value of a read while no data
440is available will be 0.
441The non-blocking read behavior is different from performing non-blocking
442reads on other file descriptors, which will return \-1 and set
443.Va errno
444to
445.Er EAGAIN
446if no data is available.
447Note: setting this overrides the timeout set by
448.Dv BIOCSRTIMEOUT .
449.Pp
450.It Dv FIOASYNC Fa "int *"
451Enables or disables asynchronous I/O.
452When enabled (argument is non-zero), the process or process group specified
453by
454.Dv FIOSETOWN
455will start receiving
456.Dv SIGIO
457signals when packets arrive.
458Note that you must perform an
459.Dv FIOSETOWN
460command in order for this to take effect, as the system will not do it by
461default.
462The signal may be changed via
463.Dv BIOCSRSIG .
464.Pp
465.It Dv FIOSETOWN Fa "int *"
466.It Dv FIOGETOWN Fa "int *"
467Sets or gets the process or process group (if negative) that should receive
468.Dv SIGIO
469when packets are available.
470The signal may be changed using
471.Dv BIOCSRSIG
472(see above).
473.El
474.Ss BPF header
475The following structure is prepended to each packet returned by
476.Xr read 2 :
477.Bd -literal -offset indent
478struct bpf_hdr {
479	struct bpf_timeval bh_tstamp;
480	u_int32_t	bh_caplen;
481	u_int32_t	bh_datalen;
482	u_int16_t	bh_hdrlen;
483};
484.Ed
485.Pp
486The fields, stored in host order, are as follows:
487.Bl -tag -width Ds
488.It Fa bh_tstamp
489Time at which the packet was processed by the packet filter.
490.It Fa bh_caplen
491Length of the captured portion of the packet.
492This is the minimum of the truncation amount specified by the filter and the
493length of the packet.
494.It Fa bh_datalen
495Length of the packet off the wire.
496This value is independent of the truncation amount specified by the filter.
497.It Fa bh_hdrlen
498Length of the BPF header, which may not be equal to
499.Li sizeof(struct bpf_hdr) .
500.El
501.Pp
502The
503.Fa bh_hdrlen
504field exists to account for padding between the header and the link level
505protocol.
506The purpose here is to guarantee proper alignment of the packet data
507structures, which is required on alignment-sensitive architectures and
508improves performance on many other architectures.
509The packet filter ensures that the
510.Fa bpf_hdr
511and the network layer header will be word aligned.
512Suitable precautions must be taken when accessing the link layer protocol
513fields on alignment restricted machines.
514(This isn't a problem on an Ethernet, since the type field is a
515.Li short
516falling on an even offset, and the addresses are probably accessed in a
517bytewise fashion).
518.Pp
519Additionally, individual packets are padded so that each starts on a
520word boundary.
521This requires that an application has some knowledge of how to get from packet
522to packet.
523The macro
524.Dv BPF_WORDALIGN
525is defined in
526.In net/bpf.h
527to facilitate this process.
528It rounds up its argument to the nearest word aligned value (where a word is
529.Dv BPF_ALIGNMENT
530bytes wide).
531For example, if
532.Va p
533points to the start of a packet, this expression will advance it to the
534next packet:
535.Pp
536.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
537.Pp
538For the alignment mechanisms to work properly, the buffer passed to
539.Xr read 2
540must itself be word aligned.
541.Xr malloc 3
542will always return an aligned buffer.
543.Ss Filter machine
544A filter program is an array of instructions with all branches forwardly
545directed, terminated by a
546.Dq return
547instruction.
548Each instruction performs some action on the pseudo-machine state, which
549consists of an accumulator, index register, scratch memory store, and
550implicit program counter.
551.Pp
552The following structure defines the instruction format:
553.Bd -literal -offset indent
554struct bpf_insn {
555	u_int16_t	code;
556	u_char		jt;
557	u_char		jf;
558	u_int32_t	k;
559};
560.Ed
561.Pp
562The
563.Fa k
564field is used in different ways by different instructions, and the
565.Fa jt
566and
567.Fa jf
568fields are used as offsets by the branch instructions.
569The opcodes are encoded in a semi-hierarchical fashion.
570There are eight classes of instructions:
571.Dv BPF_LD ,
572.Dv BPF_LDX ,
573.Dv BPF_ST ,
574.Dv BPF_STX ,
575.Dv BPF_ALU ,
576.Dv BPF_JMP ,
577.Dv BPF_RET ,
578and
579.Dv BPF_MISC .
580Various other mode and operator bits are logically OR'd into the class to
581give the actual instructions.
582The classes and modes are defined in
583.In net/bpf.h .
584Below are the semantics for each defined
585.Nm
586instruction.
587We use the convention that A is the accumulator, X is the index register,
588P[] packet data, and M[] scratch memory store.
589P[i:n] gives the data at byte offset
590.Dq i
591in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
592unsigned byte (n=1).
593M[i] gives the i'th word in the scratch memory store, which is only addressed
594in word units.
595The memory store is indexed from 0 to
596.Dv BPF_MEMWORDS Ns \-1 .
597.Fa k ,
598.Fa jt ,
599and
600.Fa jf
601are the corresponding fields in the instruction definition.
602.Dq len
603refers to the length of the packet.
604.Bl -tag -width Ds
605.It Dv BPF_LD
606These instructions copy a value into the accumulator.
607The type of the source operand is specified by an
608.Dq addressing mode
609and can be a constant
610.Pf ( Dv BPF_IMM ) ,
611packet data at a fixed offset
612.Pf ( Dv BPF_ABS ) ,
613packet data at a variable offset
614.Pf ( Dv BPF_IND ) ,
615the packet length
616.Pf ( Dv BPF_LEN ) ,
617a random number
618.Pf ( Dv BPF_RND ) ,
619or a word in the scratch memory store
620.Pf ( Dv BPF_MEM ) .
621For
622.Dv BPF_IND
623and
624.Dv BPF_ABS ,
625the data size must be specified as a word
626.Pf ( Dv BPF_W ) ,
627halfword
628.Pf ( Dv BPF_H ) ,
629or byte
630.Pf ( Dv BPF_B ) .
631The semantics of all recognized
632.Dv BPF_LD
633instructions follow.
634.Pp
635.Bl -tag -width 32n -compact
636.Sm off
637.It Xo Dv BPF_LD No + Dv BPF_W No +
638.Dv BPF_ABS
639.Xc
640.Sm on
641A <- P[k:4]
642.Sm off
643.It Xo Dv BPF_LD No + Dv BPF_H No +
644.Dv BPF_ABS
645.Xc
646.Sm on
647A <- P[k:2]
648.Sm off
649.It Xo Dv BPF_LD No + Dv BPF_B No +
650.Dv BPF_ABS
651.Xc
652.Sm on
653A <- P[k:1]
654.Sm off
655.It Xo Dv BPF_LD No + Dv BPF_W No +
656.Dv BPF_IND
657.Xc
658.Sm on
659A <- P[X+k:4]
660.Sm off
661.It Xo Dv BPF_LD No + Dv BPF_H No +
662.Dv BPF_IND
663.Xc
664.Sm on
665A <- P[X+k:2]
666.Sm off
667.It Xo Dv BPF_LD No + Dv BPF_B No +
668.Dv BPF_IND
669.Xc
670.Sm on
671A <- P[X+k:1]
672.Sm off
673.It Xo Dv BPF_LD No + Dv BPF_W No +
674.Dv BPF_LEN
675.Xc
676.Sm on
677A <- len
678.Sm off
679.It Xo Dv BPF_LD No + Dv BPF_W No +
680.Dv BPF_RND
681.Xc
682.Sm on
683A <- arc4random()
684.Sm off
685.It Dv BPF_LD No + Dv BPF_IMM
686.Sm on
687A <- k
688.Sm off
689.It Dv BPF_LD No + Dv BPF_MEM
690.Sm on
691A <- M[k]
692.El
693.It Dv BPF_LDX
694These instructions load a value into the index register.
695Note that the addressing modes are more restricted than those of the
696accumulator loads, but they include
697.Dv BPF_MSH ,
698a hack for efficiently loading the IP header length.
699.Pp
700.Bl -tag -width 32n -compact
701.Sm off
702.It Xo Dv BPF_LDX No + Dv BPF_W No +
703.Dv BPF_IMM
704.Xc
705.Sm on
706X <- k
707.Sm off
708.It Xo Dv BPF_LDX No + Dv BPF_W No +
709.Dv BPF_MEM
710.Xc
711.Sm on
712X <- M[k]
713.Sm off
714.It Xo Dv BPF_LDX No + Dv BPF_W No +
715.Dv BPF_LEN
716.Xc
717.Sm on
718X <- len
719.Sm off
720.It Xo Dv BPF_LDX No + Dv BPF_B No +
721.Dv BPF_MSH
722.Xc
723.Sm on
724X <- 4*(P[k:1]&0xf)
725.El
726.It Dv BPF_ST
727This instruction stores the accumulator into the scratch memory.
728We do not need an addressing mode since there is only one possibility for
729the destination.
730.Pp
731.Bl -tag -width 32n -compact
732.It Dv BPF_ST
733M[k] <- A
734.El
735.It Dv BPF_STX
736This instruction stores the index register in the scratch memory store.
737.Pp
738.Bl -tag -width 32n -compact
739.It Dv BPF_STX
740M[k] <- X
741.El
742.It Dv BPF_ALU
743The ALU instructions perform operations between the accumulator and index
744register or constant, and store the result back in the accumulator.
745For binary operations, a source mode is required
746.Pf ( Dv BPF_K
747or
748.Dv BPF_X ) .
749.Pp
750.Bl -tag -width 32n -compact
751.Sm off
752.It Xo Dv BPF_ALU No + BPF_ADD No +
753.Dv BPF_K
754.Xc
755.Sm on
756A <- A + k
757.Sm off
758.It Xo Dv BPF_ALU No + BPF_SUB No +
759.Dv BPF_K
760.Xc
761.Sm on
762A <- A - k
763.Sm off
764.It Xo Dv BPF_ALU No + BPF_MUL No +
765.Dv BPF_K
766.Xc
767.Sm on
768A <- A * k
769.Sm off
770.It Xo Dv BPF_ALU No + BPF_DIV No +
771.Dv BPF_K
772.Xc
773.Sm on
774A <- A / k
775.Sm off
776.It Xo Dv BPF_ALU No + BPF_AND No +
777.Dv BPF_K
778.Xc
779.Sm on
780A <- A & k
781.Sm off
782.It Xo Dv BPF_ALU No + BPF_OR No +
783.Dv BPF_K
784.Xc
785.Sm on
786A <- A | k
787.Sm off
788.It Xo Dv BPF_ALU No + BPF_LSH No +
789.Dv BPF_K
790.Xc
791.Sm on
792A <- A << k
793.Sm off
794.It Xo Dv BPF_ALU No + BPF_RSH No +
795.Dv BPF_K
796.Xc
797.Sm on
798A <- A >> k
799.Sm off
800.It Xo Dv BPF_ALU No + BPF_ADD No +
801.Dv BPF_X
802.Xc
803.Sm on
804A <- A + X
805.Sm off
806.It Xo Dv BPF_ALU No + BPF_SUB No +
807.Dv BPF_X
808.Xc
809.Sm on
810A <- A - X
811.Sm off
812.It Xo Dv BPF_ALU No + BPF_MUL No +
813.Dv BPF_X
814.Xc
815.Sm on
816A <- A * X
817.Sm off
818.It Xo Dv BPF_ALU No + BPF_DIV No +
819.Dv BPF_X
820.Xc
821.Sm on
822A <- A / X
823.Sm off
824.It Xo Dv BPF_ALU No + BPF_AND No +
825.Dv BPF_X
826.Xc
827.Sm on
828A <- A & X
829.Sm off
830.It Xo Dv BPF_ALU No + BPF_OR No +
831.Dv BPF_X
832.Xc
833.Sm on
834A <- A | X
835.Sm off
836.It Xo Dv BPF_ALU No + BPF_LSH No +
837.Dv BPF_X
838.Xc
839.Sm on
840A <- A << X
841.Sm off
842.It Xo Dv BPF_ALU No + BPF_RSH No +
843.Dv BPF_X
844.Xc
845.Sm on
846A <- A >> X
847.Sm off
848.It Dv BPF_ALU No + BPF_NEG
849.Sm on
850A <- -A
851.El
852.It Dv BPF_JMP
853The jump instructions alter flow of control.
854Conditional jumps compare the accumulator against a constant
855.Pf ( Dv BPF_K )
856or the index register
857.Pf ( Dv BPF_X ) .
858If the result is true (or non-zero), the true branch is taken, otherwise the
859false branch is taken.
860Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
861However, the jump always
862.Pf ( Dv BPF_JA )
863opcode uses the 32-bit
864.Fa k
865field as the offset, allowing arbitrarily distant destinations.
866All conditionals use unsigned comparison conventions.
867.Pp
868.Bl -tag -width 32n -compact
869.Sm off
870.It Dv BPF_JMP No + BPF_JA
871pc += k
872.Sm on
873.Sm off
874.It Xo Dv BPF_JMP No + BPF_JGT No +
875.Dv BPF_K
876.Xc
877.Sm on
878pc += (A > k) ? jt : jf
879.Sm off
880.It Xo Dv BPF_JMP No + BPF_JGE No +
881.Dv BPF_K
882.Xc
883.Sm on
884pc += (A >= k) ? jt : jf
885.Sm off
886.It Xo Dv BPF_JMP No + BPF_JEQ No +
887.Dv BPF_K
888.Xc
889.Sm on
890pc += (A == k) ? jt : jf
891.Sm off
892.It Xo Dv BPF_JMP No + BPF_JSET No +
893.Dv BPF_K
894.Xc
895.Sm on
896pc += (A & k) ? jt : jf
897.Sm off
898.It Xo Dv BPF_JMP No + BPF_JGT No +
899.Dv BPF_X
900.Xc
901.Sm on
902pc += (A > X) ? jt : jf
903.Sm off
904.It Xo Dv BPF_JMP No + BPF_JGE No +
905.Dv BPF_X
906.Xc
907.Sm on
908pc += (A >= X) ? jt : jf
909.Sm off
910.It Xo Dv BPF_JMP No + BPF_JEQ No +
911.Dv BPF_X
912.Xc
913.Sm on
914pc += (A == X) ? jt : jf
915.Sm off
916.It Xo Dv BPF_JMP No + BPF_JSET No +
917.Dv BPF_X
918.Xc
919.Sm on
920pc += (A & X) ? jt : jf
921.El
922.It Dv BPF_RET
923The return instructions terminate the filter program and specify the
924amount of packet to accept (i.e., they return the truncation amount)
925or, for the write filter, the maximum acceptable size for the packet
926(i.e., the packet is dropped if it is larger than the returned
927amount).
928A return value of zero indicates that the packet should be ignored/dropped.
929The return value is either a constant
930.Pf ( Dv BPF_K )
931or the accumulator
932.Pf ( Dv BPF_A ) .
933.Pp
934.Bl -tag -width 32n -compact
935.It Dv BPF_RET No + Dv BPF_A
936Accept A bytes.
937.It Dv BPF_RET No + Dv BPF_K
938Accept k bytes.
939.El
940.It Dv BPF_MISC
941The miscellaneous category was created for anything that doesn't fit into
942the above classes, and for any new instructions that might need to be added.
943Currently, these are the register transfer instructions that copy the index
944register to the accumulator or vice versa.
945.Pp
946.Bl -tag -width 32n -compact
947.Sm off
948.It Dv BPF_MISC No + Dv BPF_TAX
949.Sm on
950X <- A
951.Sm off
952.It Dv BPF_MISC No + Dv BPF_TXA
953.Sm on
954A <- X
955.El
956.El
957.Pp
958The
959.Nm
960interface provides the following macros to facilitate array initializers:
961.Bd -filled -offset indent
962.Dv BPF_STMT ( Ns Ar opcode ,
963.Ar operand )
964.Pp
965.Dv BPF_JUMP ( Ns Ar opcode ,
966.Ar operand ,
967.Ar true_offset ,
968.Ar false_offset )
969.Ed
970.Sh FILES
971.Bl -tag -width /dev/bpf -compact
972.It Pa /dev/bpf
973.Nm
974device
975.El
976.Sh EXAMPLES
977The following filter is taken from the Reverse ARP daemon.
978It accepts only Reverse ARP requests.
979.Bd -literal -offset indent
980struct bpf_insn insns[] = {
981	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
982	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
983	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
984	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
985	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
986	    sizeof(struct ether_header)),
987	BPF_STMT(BPF_RET+BPF_K, 0),
988};
989.Ed
990.Pp
991This filter accepts only IP packets between host 128.3.112.15 and
992128.3.112.35.
993.Bd -literal -offset indent
994struct bpf_insn insns[] = {
995	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
996	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
997	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
998	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
999	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
1000	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
1001	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
1002	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
1003	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
1004	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1005	BPF_STMT(BPF_RET+BPF_K, 0),
1006};
1007.Ed
1008.Pp
1009Finally, this filter returns only TCP finger packets.
1010We must parse the IP header to reach the TCP header.
1011The
1012.Dv BPF_JSET
1013instruction checks that the IP fragment offset is 0 so we are sure that we
1014have a TCP header.
1015.Bd -literal -offset indent
1016struct bpf_insn insns[] = {
1017	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
1018	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
1019	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
1020	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
1021	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
1022	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
1023	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
1024	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
1025	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
1026	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
1027	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
1028	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1029	BPF_STMT(BPF_RET+BPF_K, 0),
1030};
1031.Ed
1032.Sh ERRORS
1033If the
1034.Xr ioctl 2
1035call fails,
1036.Xr errno 2
1037is set to one of the following values:
1038.Bl -tag -width Er
1039.It Bq Er EINVAL
1040The timeout used in a
1041.Dv BIOCSRTIMEOUT
1042request is negative.
1043.It Bq Er EINVAL
1044The timeout used in a
1045.Dv BIOCSRTIMEOUT
1046request specified a microsecond value less than zero or
1047greater than or equal to 1 million.
1048.It Bq Er EOVERFLOW
1049The timeout used in a
1050.Dv BIOCSRTIMEOUT
1051request is too large to be represented by an
1052.Vt int .
1053.El
1054.Sh SEE ALSO
1055.Xr ioctl 2 ,
1056.Xr read 2 ,
1057.Xr select 2 ,
1058.Xr signal 3 ,
1059.Xr MAKEDEV 8 ,
1060.Xr tcpdump 8 ,
1061.Xr arc4random 9
1062.Rs
1063.%A McCanne, S.
1064.%A Jacobson, V.
1065.%D January 1993
1066.%J 1993 Winter USENIX Conference
1067.%T The BSD Packet Filter: A New Architecture for User-level Packet Capture
1068.Re
1069.Sh HISTORY
1070The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
1071at Carnegie-Mellon University.
1072Jeffrey Mogul, at Stanford, ported the code to
1073.Bx
1074and continued its
1075development from 1983 on.
1076Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
1077NIT module under SunOS 4.1, and BPF.
1078.Sh AUTHORS
1079.An -nosplit
1080.An Steve McCanne
1081of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1082Much of the design is due to
1083.An Van Jacobson .
1084.Sh BUGS
1085The read buffer must be of a fixed size (returned by the
1086.Dv BIOCGBLEN
1087ioctl).
1088.Pp
1089A file that does not request promiscuous mode may receive promiscuously
1090received packets as a side effect of another file requesting this mode on
1091the same hardware interface.
1092This could be fixed in the kernel with additional processing overhead.
1093However, we favor the model where all files must assume that the interface
1094is promiscuous, and if so desired, must utilize a filter to reject foreign
1095packets.
1096