xref: /openbsd-src/share/man/man4/bpf.4 (revision f6aab3d83b51b91c24247ad2c2573574de475a82)
1.\"	$OpenBSD: bpf.4,v 1.45 2023/03/09 06:01:40 dlg Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd $Mdocdate: March 9 2023 $
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf .
42After opening the device, the file descriptor must be bound to a specific
43network interface with the
44.Dv BIOCSETIF
45.Xr ioctl 2 .
46A given interface can be shared between multiple listeners, and the filter
47underlying each descriptor will see an identical packet stream.
48.Pp
49Associated with each open instance of a
50.Nm
51file is a user-settable
52packet filter.
53Whenever a packet is received by an interface, all file descriptors
54listening on that interface apply their filter.
55Each descriptor that accepts the packet receives its own copy.
56.Pp
57Reads from these files return the next group of packets that have matched
58the filter.
59To improve performance, the buffer passed to read must be the same size as
60the buffers used internally by
61.Nm bpf .
62This size is returned by the
63.Dv BIOCGBLEN
64.Xr ioctl 2
65and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily truncated.
68.Pp
69A packet can be sent out on the network by writing to a
70.Nm
71file descriptor.
72Each descriptor can also have a user-settable filter
73for controlling the writes.
74Only packets matching the filter are sent out of the interface.
75The writes are unbuffered, meaning only one packet can be processed per write.
76.Pp
77Once a descriptor is configured, further changes to the configuration
78can be prevented using the
79.Dv BIOCLOCK
80.Xr ioctl 2 .
81.Sh IOCTL INTERFACE
82The
83.Xr ioctl 2
84command codes below are defined in
85.In net/bpf.h .
86All commands require these includes:
87.Pp
88.nr nS 1
89.In sys/types.h
90.In sys/time.h
91.In sys/ioctl.h
92.In net/bpf.h
93.nr nS 0
94.Pp
95Additionally,
96.Dv BIOCGETIF
97and
98.Dv BIOCSETIF
99require
100.In sys/socket.h
101and
102.In net/if.h .
103.Pp
104The (third) argument to the
105.Xr ioctl 2
106call should be a pointer to the type indicated.
107.Pp
108.Bl -tag -width Ds -compact
109.It Dv BIOCGBLEN Fa "u_int *"
110Returns the required buffer length for reads on
111.Nm
112files.
113.Pp
114.It Dv BIOCSBLEN Fa "u_int *"
115Sets the buffer length for reads on
116.Nm
117files.
118The buffer must be set before the file is attached to an interface with
119.Dv BIOCSETIF .
120If the requested buffer size cannot be accommodated, the closest allowable
121size will be set and returned in the argument.
122A read call will result in
123.Er EINVAL
124if it is passed a buffer that is not this size.
125.Pp
126.It Dv BIOCGDLT Fa "u_int *"
127Returns the type of the data link layer underlying the attached interface.
128.Er EINVAL
129is returned if no interface has been specified.
130The device types, prefixed with
131.Dq DLT_ ,
132are defined in
133.In net/bpf.h .
134.Pp
135.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *"
136Returns an array of the available types of the data link layer
137underlying the attached interface:
138.Bd -literal -offset indent
139struct bpf_dltlist {
140	u_int bfl_len;
141	u_int *bfl_list;
142};
143.Ed
144.Pp
145The available types are returned in the array pointed to by the
146.Va bfl_list
147field while their length in
148.Vt u_int
149is supplied to the
150.Va bfl_len
151field.
152.Er ENOMEM
153is returned if there is not enough buffer space and
154.Er EFAULT
155is returned if a bad address is encountered.
156The
157.Va bfl_len
158field is modified on return to indicate the actual length in
159.Vt u_int
160of the array returned.
161If
162.Va bfl_list
163is
164.Dv NULL ,
165the
166.Va bfl_len
167field is set to indicate the required length of the array in
168.Vt u_int .
169.Pp
170.It Dv BIOCSDLT Fa "u_int *"
171Changes the type of the data link layer underlying the attached interface.
172.Er EINVAL
173is returned if no interface has been specified or the specified
174type is not available for the interface.
175.Pp
176.It Dv BIOCPROMISC
177Forces the interface into promiscuous mode.
178All packets, not just those destined for the local host, are processed.
179Since more than one file can be listening on a given interface, a listener
180that opened its interface non-promiscuously may receive packets promiscuously.
181This problem can be remedied with an appropriate filter.
182.Pp
183The interface remains in promiscuous mode until all files listening
184promiscuously are closed.
185.Pp
186.It Dv BIOCFLUSH
187Flushes the buffer of incoming packets and resets the statistics that are
188returned by
189.Dv BIOCGSTATS .
190.Pp
191.It Dv BIOCLOCK
192This ioctl is designed to prevent the security issues associated
193with an open
194.Nm
195descriptor in unprivileged programs.
196Even with dropped privileges, an open
197.Nm
198descriptor can be abused by a rogue program to listen on any interface
199on the system, send packets on these interfaces if the descriptor was
200opened read-write and send signals to arbitrary processes using the
201signaling mechanism of
202.Nm bpf .
203By allowing only
204.Dq known safe
205ioctls, the
206.Dv BIOCLOCK
207ioctl prevents this abuse.
208The allowable ioctls are
209.Dv BIOCFLUSH ,
210.Dv BIOCGBLEN ,
211.Dv BIOCGDIRFILT ,
212.Dv BIOCGDLT ,
213.Dv BIOCGDIRFILT ,
214.Dv BIOCGDLTLIST ,
215.Dv BIOCGETIF ,
216.Dv BIOCGHDRCMPLT ,
217.Dv BIOCGRSIG ,
218.Dv BIOCGRTIMEOUT ,
219.Dv BIOCGSTATS ,
220.Dv BIOCIMMEDIATE ,
221.Dv BIOCLOCK ,
222.Dv BIOCSRTIMEOUT ,
223.Dv BIOCSWTIMEOUT ,
224.Dv BIOCDWTIMEOUT ,
225.Dv BIOCVERSION ,
226.Dv TIOCGPGRP ,
227and
228.Dv FIONREAD .
229Use of any other ioctl is denied with error
230.Er EPERM .
231Once a descriptor is locked, it is not possible to unlock it.
232A process with root privileges is not affected by the lock.
233.Pp
234A privileged program can open a
235.Nm
236device, drop privileges, set the interface, filters and modes on the
237descriptor, and lock it.
238Once the descriptor is locked, the system is safe
239from further abuse through the descriptor.
240Locking a descriptor does not prevent writes.
241If the application does not need to send packets through
242.Nm bpf ,
243it can open the device read-only to prevent writing.
244If sending packets is necessary, a write-filter can be set before locking the
245descriptor to prevent arbitrary packets from being sent out.
246.Pp
247.It Dv BIOCGETIF Fa "struct ifreq *"
248Returns the name of the hardware interface that the file is listening on.
249The name is returned in the
250.Fa ifr_name
251field of the
252.Vt struct ifreq .
253All other fields are undefined.
254.Pp
255.It Dv BIOCSETIF Fa "struct ifreq *"
256Sets the hardware interface associated with the file.
257This command must be performed before any packets can be read.
258The device is indicated by name using the
259.Fa ifr_name
260field of the
261.Vt struct ifreq .
262Additionally, performs the actions of
263.Dv BIOCFLUSH .
264.Pp
265.It Dv BIOCSRTIMEOUT Fa "struct timeval *"
266.It Dv BIOCGRTIMEOUT Fa "struct timeval *"
267Sets or gets the read timeout parameter.
268The
269.Ar timeval
270specifies the length of time to wait before timing out on a read request.
271This parameter is initialized to zero by
272.Xr open 2 ,
273indicating no timeout.
274.Pp
275.It Dv BIOCGSTATS Fa "struct bpf_stat *"
276Returns the following structure of packet statistics:
277.Bd -literal -offset indent
278struct bpf_stat {
279	u_int bs_recv;
280	u_int bs_drop;
281};
282.Ed
283.Pp
284The fields are:
285.Bl -tag -width bs_recv
286.It Fa bs_recv
287Number of packets received by the descriptor since opened or reset (including
288any buffered since the last read call).
289.It Fa bs_drop
290Number of packets which were accepted by the filter but dropped by the kernel
291because of buffer overflows (i.e., the application's reads aren't keeping up
292with the packet traffic).
293.El
294.Pp
295.It Dv BIOCIMMEDIATE Fa "u_int *"
296Enables or disables
297.Dq immediate mode ,
298based on the truth value of the argument.
299When immediate mode is enabled, reads return immediately upon packet reception.
300Otherwise, a read will block until either the kernel buffer becomes full or a
301timeout occurs.
302This is useful for programs like
303.Xr rarpd 8 ,
304which must respond to messages in real time.
305The default for a new file is off.
306.Pp
307.It Dv BIOCSWTIMEOUT Fa "struct timeval *"
308.It Dv BIOCGWTIMEOUT Fa "struct timeval *"
309.It Dv BIOCDWTIMEOUT
310Sets, gets, or deletes (resets) the wait timeout parameter.
311The
312.Ar timeval
313specifies the length of time to wait between receiving a packet and
314the kernel buffer becoming readable.
315By default, or when reset, the wait timeout is infinite, meaning
316the age of packets in the kernel buffer does not make the buffer
317readable.
318.Pp
319.It Dv BIOCSETF Fa "struct bpf_program *"
320Sets the filter program used by the kernel to discard uninteresting packets.
321An array of instructions and its length are passed in using the following
322structure:
323.Bd -literal -offset indent
324struct bpf_program {
325	u_int bf_len;
326	struct bpf_insn *bf_insns;
327};
328.Ed
329.Pp
330The filter program is pointed to by the
331.Fa bf_insns
332field, while its length in units of
333.Vt struct bpf_insn
334is given by the
335.Fa bf_len
336field.
337Also, the actions of
338.Dv BIOCFLUSH
339are performed.
340.Pp
341See section
342.Sx FILTER MACHINE
343for an explanation of the filter language.
344.Pp
345.It Dv BIOCSETWF Fa "struct bpf_program *"
346Sets the filter program used by the kernel to filter the packets
347written to the descriptor before the packets are sent out on the
348network.
349See
350.Dv BIOCSETF
351for a description of the filter program.
352This ioctl also acts as
353.Dv BIOCFLUSH .
354.Pp
355Note that the filter operates on the packet data written to the descriptor.
356If the
357.Dq header complete
358flag is not set, the kernel sets the link-layer source address
359of the packet after filtering.
360.Pp
361.It Dv BIOCVERSION Fa "struct bpf_version *"
362Returns the major and minor version numbers of the filter language currently
363recognized by the kernel.
364Before installing a filter, applications must check that the current version
365is compatible with the running kernel.
366Version numbers are compatible if the major numbers match and the application
367minor is less than or equal to the kernel minor.
368The kernel version number is returned in the following structure:
369.Bd -literal -offset indent
370struct bpf_version {
371	u_short bv_major;
372	u_short bv_minor;
373};
374.Ed
375.Pp
376The current version numbers are given by
377.Dv BPF_MAJOR_VERSION
378and
379.Dv BPF_MINOR_VERSION
380from
381.In net/bpf.h .
382An incompatible filter may result in undefined behavior (most likely, an
383error returned by
384.Xr ioctl 2
385or haphazard packet matching).
386.Pp
387.It Dv BIOCSRSIG Fa "u_int *"
388.It Dv BIOCGRSIG Fa "u_int *"
389Sets or gets the receive signal.
390This signal will be sent to the process or process group specified by
391.Dv FIOSETOWN .
392It defaults to
393.Dv SIGIO .
394.Pp
395.It Dv BIOCSHDRCMPLT Fa "u_int *"
396.It Dv BIOCGHDRCMPLT Fa "u_int *"
397Sets or gets the status of the
398.Dq header complete
399flag.
400Set to zero if the link level source address should be filled in
401automatically by the interface output routine.
402Set to one if the link level source address will be written,
403as provided, to the wire.
404This flag is initialized to zero by default.
405.Pp
406.It Dv BIOCSFILDROP Fa "u_int *"
407.It Dv BIOCGFILDROP Fa "u_int *"
408Sets or gets the
409.Dq filter drop
410action.
411The supported actions for packets matching the filter are:
412.Pp
413.Bl -tag -width "BPF_FILDROP_CAPTURE" -compact
414.It Dv BPF_FILDROP_PASS
415Accept and capture
416.It Dv BPF_FILDROP_CAPTURE
417Drop and capture
418.It Dv BPF_FILDROP_DROP
419Drop and do not capture
420.El
421.Pp
422Packets matching any filter configured to drop packets will be
423reported to the associated interface so that they can be dropped.
424The default action is
425.Dv BPF_FILDROP_PASS .
426.Pp
427.It Dv BIOCSDIRFILT Fa "u_int *"
428.It Dv BIOCGDIRFILT Fa "u_int *"
429Sets or gets the status of the
430.Dq direction filter
431flag.
432If non-zero, packets matching the specified direction (either
433.Dv BPF_DIRECTION_IN
434or
435.Dv BPF_DIRECTION_OUT )
436will be ignored.
437.El
438.Ss Standard ioctls
439.Nm
440now supports several standard ioctls which allow the user to do asynchronous
441and/or non-blocking I/O to an open
442.Nm
443file descriptor.
444.Pp
445.Bl -tag -width Ds -compact
446.It Dv FIONREAD Fa "int *"
447Returns the number of bytes that are immediately available for reading.
448.Pp
449.It Dv FIONBIO Fa "int *"
450Sets or clears non-blocking I/O.
451If the argument is non-zero, enable non-blocking I/O.
452If the argument is zero, disable non-blocking I/O.
453If non-blocking I/O is enabled, the return value of a read while no data
454is available will be 0.
455The non-blocking read behavior is different from performing non-blocking
456reads on other file descriptors, which will return \-1 and set
457.Va errno
458to
459.Er EAGAIN
460if no data is available.
461Note: setting this overrides the timeout set by
462.Dv BIOCSRTIMEOUT .
463.Pp
464.It Dv FIOASYNC Fa "int *"
465Enables or disables asynchronous I/O.
466When enabled (argument is non-zero), the process or process group specified
467by
468.Dv FIOSETOWN
469will start receiving
470.Dv SIGIO
471signals when packets arrive.
472Note that you must perform an
473.Dv FIOSETOWN
474command in order for this to take effect, as the system will not do it by
475default.
476The signal may be changed via
477.Dv BIOCSRSIG .
478.Pp
479.It Dv FIOSETOWN Fa "int *"
480.It Dv FIOGETOWN Fa "int *"
481Sets or gets the process or process group (if negative) that should receive
482.Dv SIGIO
483when packets are available.
484The signal may be changed using
485.Dv BIOCSRSIG
486(see above).
487.El
488.Ss BPF header
489The following structure is prepended to each packet returned by
490.Xr read 2 :
491.Bd -literal -offset indent
492struct bpf_hdr {
493	struct bpf_timeval bh_tstamp;
494	u_int32_t	bh_caplen;
495	u_int32_t	bh_datalen;
496	u_int16_t	bh_hdrlen;
497};
498.Ed
499.Pp
500The fields, stored in host order, are as follows:
501.Bl -tag -width Ds
502.It Fa bh_tstamp
503Time at which the packet was processed by the packet filter.
504.It Fa bh_caplen
505Length of the captured portion of the packet.
506This is the minimum of the truncation amount specified by the filter and the
507length of the packet.
508.It Fa bh_datalen
509Length of the packet off the wire.
510This value is independent of the truncation amount specified by the filter.
511.It Fa bh_hdrlen
512Length of the BPF header, which may not be equal to
513.Li sizeof(struct bpf_hdr) .
514.El
515.Pp
516The
517.Fa bh_hdrlen
518field exists to account for padding between the header and the link level
519protocol.
520The purpose here is to guarantee proper alignment of the packet data
521structures, which is required on alignment-sensitive architectures and
522improves performance on many other architectures.
523The packet filter ensures that the
524.Fa bpf_hdr
525and the network layer header will be word aligned.
526Suitable precautions must be taken when accessing the link layer protocol
527fields on alignment restricted machines.
528(This isn't a problem on an Ethernet, since the type field is a
529.Vt short
530falling on an even offset, and the addresses are probably accessed in a
531bytewise fashion).
532.Pp
533Additionally, individual packets are padded so that each starts on a
534word boundary.
535This requires that an application has some knowledge of how to get from packet
536to packet.
537The macro
538.Dv BPF_WORDALIGN
539is defined in
540.In net/bpf.h
541to facilitate this process.
542It rounds up its argument to the nearest word aligned value (where a word is
543.Dv BPF_ALIGNMENT
544bytes wide).
545For example, if
546.Va p
547points to the start of a packet, this expression will advance it to the
548next packet:
549.Pp
550.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
551.Pp
552For the alignment mechanisms to work properly, the buffer passed to
553.Xr read 2
554must itself be word aligned.
555.Xr malloc 3
556will always return an aligned buffer.
557.Ss Filter machine
558A filter program is an array of instructions with all branches forwardly
559directed, terminated by a
560.Dq return
561instruction.
562Each instruction performs some action on the pseudo-machine state, which
563consists of an accumulator, index register, scratch memory store, and
564implicit program counter.
565.Pp
566The following structure defines the instruction format:
567.Bd -literal -offset indent
568struct bpf_insn {
569	u_int16_t	code;
570	u_char		jt;
571	u_char		jf;
572	u_int32_t	k;
573};
574.Ed
575.Pp
576The
577.Fa k
578field is used in different ways by different instructions, and the
579.Fa jt
580and
581.Fa jf
582fields are used as offsets by the branch instructions.
583The opcodes are encoded in a semi-hierarchical fashion.
584There are eight classes of instructions:
585.Dv BPF_LD ,
586.Dv BPF_LDX ,
587.Dv BPF_ST ,
588.Dv BPF_STX ,
589.Dv BPF_ALU ,
590.Dv BPF_JMP ,
591.Dv BPF_RET ,
592and
593.Dv BPF_MISC .
594Various other mode and operator bits are logically OR'd into the class to
595give the actual instructions.
596The classes and modes are defined in
597.In net/bpf.h .
598Below are the semantics for each defined
599.Nm
600instruction.
601We use the convention that A is the accumulator, X is the index register,
602P[] packet data, and M[] scratch memory store.
603P[i:n] gives the data at byte offset
604.Dq i
605in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
606unsigned byte (n=1).
607M[i] gives the i'th word in the scratch memory store, which is only addressed
608in word units.
609The memory store is indexed from 0 to
610.Dv BPF_MEMWORDS Ns \-1 .
611.Fa k ,
612.Fa jt ,
613and
614.Fa jf
615are the corresponding fields in the instruction definition.
616.Dq len
617refers to the length of the packet.
618.Bl -tag -width Ds
619.It Dv BPF_LD
620These instructions copy a value into the accumulator.
621The type of the source operand is specified by an
622.Dq addressing mode
623and can be a constant
624.Pf ( Dv BPF_IMM ) ,
625packet data at a fixed offset
626.Pf ( Dv BPF_ABS ) ,
627packet data at a variable offset
628.Pf ( Dv BPF_IND ) ,
629the packet length
630.Pf ( Dv BPF_LEN ) ,
631a random number
632.Pf ( Dv BPF_RND ) ,
633or a word in the scratch memory store
634.Pf ( Dv BPF_MEM ) .
635For
636.Dv BPF_IND
637and
638.Dv BPF_ABS ,
639the data size must be specified as a word
640.Pf ( Dv BPF_W ) ,
641halfword
642.Pf ( Dv BPF_H ) ,
643or byte
644.Pf ( Dv BPF_B ) .
645The semantics of all recognized
646.Dv BPF_LD
647instructions follow.
648.Pp
649.Bl -tag -width 32n -compact
650.Sm off
651.It Xo Dv BPF_LD No + Dv BPF_W No +
652.Dv BPF_ABS
653.Xc
654.Sm on
655A <- P[k:4]
656.Sm off
657.It Xo Dv BPF_LD No + Dv BPF_H No +
658.Dv BPF_ABS
659.Xc
660.Sm on
661A <- P[k:2]
662.Sm off
663.It Xo Dv BPF_LD No + Dv BPF_B No +
664.Dv BPF_ABS
665.Xc
666.Sm on
667A <- P[k:1]
668.Sm off
669.It Xo Dv BPF_LD No + Dv BPF_W No +
670.Dv BPF_IND
671.Xc
672.Sm on
673A <- P[X+k:4]
674.Sm off
675.It Xo Dv BPF_LD No + Dv BPF_H No +
676.Dv BPF_IND
677.Xc
678.Sm on
679A <- P[X+k:2]
680.Sm off
681.It Xo Dv BPF_LD No + Dv BPF_B No +
682.Dv BPF_IND
683.Xc
684.Sm on
685A <- P[X+k:1]
686.Sm off
687.It Xo Dv BPF_LD No + Dv BPF_W No +
688.Dv BPF_LEN
689.Xc
690.Sm on
691A <- len
692.Sm off
693.It Xo Dv BPF_LD No + Dv BPF_W No +
694.Dv BPF_RND
695.Xc
696.Sm on
697A <- arc4random()
698.Sm off
699.It Dv BPF_LD No + Dv BPF_IMM
700.Sm on
701A <- k
702.Sm off
703.It Dv BPF_LD No + Dv BPF_MEM
704.Sm on
705A <- M[k]
706.El
707.It Dv BPF_LDX
708These instructions load a value into the index register.
709Note that the addressing modes are more restricted than those of the
710accumulator loads, but they include
711.Dv BPF_MSH ,
712a hack for efficiently loading the IP header length.
713.Pp
714.Bl -tag -width 32n -compact
715.Sm off
716.It Xo Dv BPF_LDX No + Dv BPF_W No +
717.Dv BPF_IMM
718.Xc
719.Sm on
720X <- k
721.Sm off
722.It Xo Dv BPF_LDX No + Dv BPF_W No +
723.Dv BPF_MEM
724.Xc
725.Sm on
726X <- M[k]
727.Sm off
728.It Xo Dv BPF_LDX No + Dv BPF_W No +
729.Dv BPF_LEN
730.Xc
731.Sm on
732X <- len
733.Sm off
734.It Xo Dv BPF_LDX No + Dv BPF_B No +
735.Dv BPF_MSH
736.Xc
737.Sm on
738X <- 4*(P[k:1]&0xf)
739.El
740.It Dv BPF_ST
741This instruction stores the accumulator into the scratch memory.
742We do not need an addressing mode since there is only one possibility for
743the destination.
744.Pp
745.Bl -tag -width 32n -compact
746.It Dv BPF_ST
747M[k] <- A
748.El
749.It Dv BPF_STX
750This instruction stores the index register in the scratch memory store.
751.Pp
752.Bl -tag -width 32n -compact
753.It Dv BPF_STX
754M[k] <- X
755.El
756.It Dv BPF_ALU
757The ALU instructions perform operations between the accumulator and index
758register or constant, and store the result back in the accumulator.
759For binary operations, a source mode is required
760.Pf ( Dv BPF_K
761or
762.Dv BPF_X ) .
763.Pp
764.Bl -tag -width 32n -compact
765.Sm off
766.It Xo Dv BPF_ALU No + BPF_ADD No +
767.Dv BPF_K
768.Xc
769.Sm on
770A <- A + k
771.Sm off
772.It Xo Dv BPF_ALU No + BPF_SUB No +
773.Dv BPF_K
774.Xc
775.Sm on
776A <- A - k
777.Sm off
778.It Xo Dv BPF_ALU No + BPF_MUL No +
779.Dv BPF_K
780.Xc
781.Sm on
782A <- A * k
783.Sm off
784.It Xo Dv BPF_ALU No + BPF_DIV No +
785.Dv BPF_K
786.Xc
787.Sm on
788A <- A / k
789.Sm off
790.It Xo Dv BPF_ALU No + BPF_AND No +
791.Dv BPF_K
792.Xc
793.Sm on
794A <- A & k
795.Sm off
796.It Xo Dv BPF_ALU No + BPF_OR No +
797.Dv BPF_K
798.Xc
799.Sm on
800A <- A | k
801.Sm off
802.It Xo Dv BPF_ALU No + BPF_LSH No +
803.Dv BPF_K
804.Xc
805.Sm on
806A <- A << k
807.Sm off
808.It Xo Dv BPF_ALU No + BPF_RSH No +
809.Dv BPF_K
810.Xc
811.Sm on
812A <- A >> k
813.Sm off
814.It Xo Dv BPF_ALU No + BPF_ADD No +
815.Dv BPF_X
816.Xc
817.Sm on
818A <- A + X
819.Sm off
820.It Xo Dv BPF_ALU No + BPF_SUB No +
821.Dv BPF_X
822.Xc
823.Sm on
824A <- A - X
825.Sm off
826.It Xo Dv BPF_ALU No + BPF_MUL No +
827.Dv BPF_X
828.Xc
829.Sm on
830A <- A * X
831.Sm off
832.It Xo Dv BPF_ALU No + BPF_DIV No +
833.Dv BPF_X
834.Xc
835.Sm on
836A <- A / X
837.Sm off
838.It Xo Dv BPF_ALU No + BPF_AND No +
839.Dv BPF_X
840.Xc
841.Sm on
842A <- A & X
843.Sm off
844.It Xo Dv BPF_ALU No + BPF_OR No +
845.Dv BPF_X
846.Xc
847.Sm on
848A <- A | X
849.Sm off
850.It Xo Dv BPF_ALU No + BPF_LSH No +
851.Dv BPF_X
852.Xc
853.Sm on
854A <- A << X
855.Sm off
856.It Xo Dv BPF_ALU No + BPF_RSH No +
857.Dv BPF_X
858.Xc
859.Sm on
860A <- A >> X
861.Sm off
862.It Dv BPF_ALU No + BPF_NEG
863.Sm on
864A <- -A
865.El
866.It Dv BPF_JMP
867The jump instructions alter flow of control.
868Conditional jumps compare the accumulator against a constant
869.Pf ( Dv BPF_K )
870or the index register
871.Pf ( Dv BPF_X ) .
872If the result is true (or non-zero), the true branch is taken, otherwise the
873false branch is taken.
874Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
875However, the jump always
876.Pf ( Dv BPF_JA )
877opcode uses the 32-bit
878.Fa k
879field as the offset, allowing arbitrarily distant destinations.
880All conditionals use unsigned comparison conventions.
881.Pp
882.Bl -tag -width 32n -compact
883.Sm off
884.It Dv BPF_JMP No + BPF_JA
885pc += k
886.Sm on
887.Sm off
888.It Xo Dv BPF_JMP No + BPF_JGT No +
889.Dv BPF_K
890.Xc
891.Sm on
892pc += (A > k) ? jt : jf
893.Sm off
894.It Xo Dv BPF_JMP No + BPF_JGE No +
895.Dv BPF_K
896.Xc
897.Sm on
898pc += (A >= k) ? jt : jf
899.Sm off
900.It Xo Dv BPF_JMP No + BPF_JEQ No +
901.Dv BPF_K
902.Xc
903.Sm on
904pc += (A == k) ? jt : jf
905.Sm off
906.It Xo Dv BPF_JMP No + BPF_JSET No +
907.Dv BPF_K
908.Xc
909.Sm on
910pc += (A & k) ? jt : jf
911.Sm off
912.It Xo Dv BPF_JMP No + BPF_JGT No +
913.Dv BPF_X
914.Xc
915.Sm on
916pc += (A > X) ? jt : jf
917.Sm off
918.It Xo Dv BPF_JMP No + BPF_JGE No +
919.Dv BPF_X
920.Xc
921.Sm on
922pc += (A >= X) ? jt : jf
923.Sm off
924.It Xo Dv BPF_JMP No + BPF_JEQ No +
925.Dv BPF_X
926.Xc
927.Sm on
928pc += (A == X) ? jt : jf
929.Sm off
930.It Xo Dv BPF_JMP No + BPF_JSET No +
931.Dv BPF_X
932.Xc
933.Sm on
934pc += (A & X) ? jt : jf
935.El
936.It Dv BPF_RET
937The return instructions terminate the filter program and specify the
938amount of packet to accept (i.e., they return the truncation amount)
939or, for the write filter, the maximum acceptable size for the packet
940(i.e., the packet is dropped if it is larger than the returned
941amount).
942A return value of zero indicates that the packet should be ignored/dropped.
943The return value is either a constant
944.Pf ( Dv BPF_K )
945or the accumulator
946.Pf ( Dv BPF_A ) .
947.Pp
948.Bl -tag -width 32n -compact
949.It Dv BPF_RET No + Dv BPF_A
950Accept A bytes.
951.It Dv BPF_RET No + Dv BPF_K
952Accept k bytes.
953.El
954.It Dv BPF_MISC
955The miscellaneous category was created for anything that doesn't fit into
956the above classes, and for any new instructions that might need to be added.
957Currently, these are the register transfer instructions that copy the index
958register to the accumulator or vice versa.
959.Pp
960.Bl -tag -width 32n -compact
961.Sm off
962.It Dv BPF_MISC No + Dv BPF_TAX
963.Sm on
964X <- A
965.Sm off
966.It Dv BPF_MISC No + Dv BPF_TXA
967.Sm on
968A <- X
969.El
970.El
971.Pp
972The
973.Nm
974interface provides the following macros to facilitate array initializers:
975.Bd -filled -offset indent
976.Dv BPF_STMT ( Ns Ar opcode ,
977.Ar operand )
978.Pp
979.Dv BPF_JUMP ( Ns Ar opcode ,
980.Ar operand ,
981.Ar true_offset ,
982.Ar false_offset )
983.Ed
984.Sh FILES
985.Bl -tag -width /dev/bpf -compact
986.It Pa /dev/bpf
987.Nm
988device
989.El
990.Sh EXAMPLES
991The following filter is taken from the Reverse ARP daemon.
992It accepts only Reverse ARP requests.
993.Bd -literal -offset indent
994struct bpf_insn insns[] = {
995	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
996	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
997	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
998	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
999	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
1000	    sizeof(struct ether_header)),
1001	BPF_STMT(BPF_RET+BPF_K, 0),
1002};
1003.Ed
1004.Pp
1005This filter accepts only IP packets between host 128.3.112.15 and
1006128.3.112.35.
1007.Bd -literal -offset indent
1008struct bpf_insn insns[] = {
1009	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
1010	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
1011	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
1012	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
1013	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
1014	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
1015	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
1016	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
1017	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
1018	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1019	BPF_STMT(BPF_RET+BPF_K, 0),
1020};
1021.Ed
1022.Pp
1023Finally, this filter returns only TCP finger packets.
1024We must parse the IP header to reach the TCP header.
1025The
1026.Dv BPF_JSET
1027instruction checks that the IP fragment offset is 0 so we are sure that we
1028have a TCP header.
1029.Bd -literal -offset indent
1030struct bpf_insn insns[] = {
1031	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
1032	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
1033	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
1034	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
1035	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
1036	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
1037	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
1038	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
1039	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
1040	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
1041	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
1042	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1043	BPF_STMT(BPF_RET+BPF_K, 0),
1044};
1045.Ed
1046.Sh ERRORS
1047If the
1048.Xr ioctl 2
1049call fails,
1050.Xr errno 2
1051is set to one of the following values:
1052.Bl -tag -width Er
1053.It Bq Er EINVAL
1054The timeout used in a
1055.Dv BIOCSRTIMEOUT
1056request is negative.
1057.It Bq Er EINVAL
1058The timeout used in a
1059.Dv BIOCSRTIMEOUT
1060request specified a microsecond value less than zero or
1061greater than or equal to 1 million.
1062.It Bq Er EOVERFLOW
1063The timeout used in a
1064.Dv BIOCSRTIMEOUT
1065request is too large to be represented by an
1066.Vt int .
1067.El
1068.Sh SEE ALSO
1069.Xr ioctl 2 ,
1070.Xr read 2 ,
1071.Xr select 2 ,
1072.Xr signal 3 ,
1073.Xr MAKEDEV 8 ,
1074.Xr tcpdump 8 ,
1075.Xr arc4random 9
1076.Rs
1077.%A McCanne, S.
1078.%A Jacobson, V.
1079.%D January 1993
1080.%J 1993 Winter USENIX Conference
1081.%T The BSD Packet Filter: A New Architecture for User-level Packet Capture
1082.Re
1083.Sh HISTORY
1084The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
1085at Carnegie-Mellon University.
1086Jeffrey Mogul, at Stanford, ported the code to
1087.Bx
1088and continued its
1089development from 1983 on.
1090Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
1091NIT module under SunOS 4.1, and BPF.
1092.Sh AUTHORS
1093.An -nosplit
1094.An Steve McCanne
1095of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1096Much of the design is due to
1097.An Van Jacobson .
1098.Sh BUGS
1099The read buffer must be of a fixed size (returned by the
1100.Dv BIOCGBLEN
1101ioctl).
1102.Pp
1103A file that does not request promiscuous mode may receive promiscuously
1104received packets as a side effect of another file requesting this mode on
1105the same hardware interface.
1106This could be fixed in the kernel with additional processing overhead.
1107However, we favor the model where all files must assume that the interface
1108is promiscuous, and if so desired, must utilize a filter to reject foreign
1109packets.
1110