xref: /netbsd-src/share/man/man4/bpf.4 (revision 5b84b3983f71fd20a534cfa5d1556623a8aaa717)
1.\" -*- nroff -*-
2.\"
3.\"	$NetBSD: bpf.4,v 1.34 2005/09/10 22:40:37 wiz Exp $
4.\"
5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
6.\"	The Regents of the University of California.  All rights reserved.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that: (1) source code distributions
10.\" retain the above copyright notice and this paragraph in its entirety, (2)
11.\" distributions including binary code include the above copyright notice and
12.\" this paragraph in its entirety in the documentation or other materials
13.\" provided with the distribution, and (3) all advertising materials mentioning
14.\" features or use of this software display the following acknowledgement:
15.\" ``This product includes software developed by the University of California,
16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
17.\" the University nor the names of its contributors may be used to endorse
18.\" or promote products derived from this software without specific prior
19.\" written permission.
20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
23.\"
24.\" This document is derived in part from the enet man page (enet.4)
25.\" distributed with 4.3BSD Unix.
26.\"
27.Dd September 6, 2005
28.Dt BPF 4
29.Os
30.Sh NAME
31.Nm bpf
32.Nd Berkeley Packet Filter raw network interface
33.Sh SYNOPSIS
34.Cd "pseudo-device bpfilter 16"
35.Sh DESCRIPTION
36The Berkeley Packet Filter
37provides a raw interface to data link layers in a protocol
38independent fashion.
39All packets on the network, even those destined for other hosts,
40are accessible through this mechanism.
41.Pp
42The packet filter appears as a character special device,
43.Pa /dev/bpf .
44After opening the device, the file descriptor must be bound to a
45specific network interface with the
46.Dv BIOSETIF
47ioctl.
48A given interface can be shared by multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50The total number of open
51files is limited to the value given in the kernel configuration; the
52example given in the SYNOPSIS above sets the limit to 16.
53.Pp
54A separate device file is required for each minor device.
55If a file is in use, the open will fail and
56.Va errno
57will be set to
58.Er EBUSY .
59.Pp
60Associated with each open instance of a
61.Nm
62file is a user-settable packet filter.
63Whenever a packet is received by an interface,
64all file descriptors listening on that interface apply their filter.
65Each descriptor that accepts the packet receives its own copy.
66.Pp
67Reads from these files return the next group of packets
68that have matched the filter.
69To improve performance, the buffer passed to read must be
70the same size as the buffers used internally by
71.Nm .
72This size is returned by the
73.Dv BIOCGBLEN
74ioctl (see below), and under
75BSD, can be set with
76.Dv BIOCSBLEN .
77Note that an individual packet larger than this size is necessarily
78truncated.
79.Pp
80The packet filter will support any link level protocol that has fixed length
81headers.  Currently, only Ethernet, SLIP and PPP drivers have been
82modified to interact with
83.Nm .
84.Pp
85Since packet data is in network byte order, applications should use the
86.Xr byteorder 3
87macros to extract multi-byte values.
88.Pp
89A packet can be sent out on the network by writing to a
90.Nm
91file descriptor.  The writes are unbuffered, meaning only one
92packet can be processed per write.
93Currently, only writes to Ethernets and SLIP links are supported.
94.Sh IOCTLS
95The
96.Xr ioctl 2
97command codes below are defined in
98.Aq Pa net/bpf.h .
99All commands require these includes:
100.Bd -literal -offset indent
101#include \*[Lt]sys/types.h\*[Gt]
102#include \*[Lt]sys/time.h\*[Gt]
103#include \*[Lt]sys/ioctl.h\*[Gt]
104#include \*[Lt]net/bpf.h\*[Gt]
105.Ed
106.Pp
107Additionally,
108.Dv BIOCGETIF
109and
110.Dv BIOCSETIF
111require
112.Pa \*[Lt]net/if.h\*[Gt] .
113.Pp
114The (third) argument to the
115.Xr ioctl 2
116should be a pointer to the type indicated.
117.Bl -tag -width indent -offset indent
118.It Dv "BIOCGBLEN (u_int)"
119Returns the required buffer length for reads on
120.Nm
121files.
122.It Dv "BIOCSBLEN (u_int)"
123Sets the buffer length for reads on
124.Nm
125files.  The buffer must be set before the file is attached to an interface
126with
127.Dv BIOCSETIF .
128If the requested buffer size cannot be accommodated, the closest
129allowable size will be set and returned in the argument.
130A read call will result in EIO if it is passed a buffer that is not this size.
131.It Dv BIOCGDLT (u_int)
132Returns the type of the data link layer underlying the attached interface.
133.Er EINVAL
134is returned if no interface has been specified.
135The device types, prefixed with
136.Dq DLT_ ,
137are defined in
138.Aq Pa net/bpf.h .
139.It Dv BIOCGDLTLIST (struct bpf_dltlist)
140Returns an array of available type of the data link layer
141underlying the attached interface:
142.Bd -literal -offset indent
143struct bpf_dltlist {
144	u_int bfl_len;
145	u_int *bfl_list;
146};
147.Ed
148.Pp
149The available type is returned to the array pointed to the
150.Va bfl_list
151field while its length in u_int is supplied to the
152.Va bfl_len
153field.
154.Er ENOMEM
155is returned if there is not enough buffer.  The
156.Va bfl_len
157field is modified on return to indicate the actual length in u_int
158of the array returned.
159If
160.Va bfl_list
161is
162.Dv NULL ,
163the
164.Va bfl_len
165field is returned to indicate the required length of an array in u_int.
166.It Dv BIOCSDLT (u_int)
167Change the type of the data link layer underlying the attached interface.
168.Er EINVAL
169is returned if no interface has been specified or the specified
170type is not available for the interface.
171.It Dv BIOCPROMISC
172Forces the interface into promiscuous mode.
173All packets, not just those destined for the local host, are processed.
174Since more than one file can be listening on a given interface,
175a listener that opened its interface non-promiscuously may receive
176packets promiscuously.  This problem can be remedied with an
177appropriate filter.
178.Pp
179The interface remains in promiscuous mode until all files listening
180promiscuously are closed.
181.It Dv BIOCFLUSH
182Flushes the buffer of incoming packets,
183and resets the statistics that are returned by
184.Dv BIOCGSTATS .
185.It Dv BIOCGETIF (struct ifreq)
186Returns the name of the hardware interface that the file is listening on.
187The name is returned in the ifr_name field of
188.Fa ifr .
189All other fields are undefined.
190.It Dv BIOCSETIF (struct ifreq)
191Sets the hardware interface associate with the file.  This
192command must be performed before any packets can be read.
193The device is indicated by name using the
194.Dv ifr_name
195field of the
196.Fa ifreq .
197Additionally, performs the actions of
198.Dv BIOCFLUSH .
199.It Dv BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
200Set or get the read timeout parameter.
201The
202.Fa timeval
203specifies the length of time to wait before timing
204out on a read request.
205This parameter is initialized to zero by
206.Xr open 2 ,
207indicating no timeout.
208.It Dv BIOCGSTATS (struct bpf_stat)
209Returns the following structure of packet statistics:
210.Bd -literal -offset indent
211struct bpf_stat {
212	u_int64_t bs_recv;
213	u_int64_t bs_drop;
214	u_int64_t bs_capt;
215	u_int64_t bs_padding[13];
216};
217.Ed
218.Pp
219The fields are:
220.Bl -tag -width bs_recv -offset indent
221.It Va bs_recv
222the number of packets received by the descriptor since opened or reset
223(including any buffered since the last read call);
224.It Va bs_drop
225the number of packets which were accepted by the filter but dropped by the
226kernel because of buffer overflows
227(i.e., the application's reads aren't keeping up with the packet
228traffic); and
229.It Va bs_capt
230the number of packets accepted by the filter.
231.El
232.It Dv BIOCIMMEDIATE (u_int)
233Enable or disable
234.Dq immediate mode ,
235based on the truth value of the argument.
236When immediate mode is enabled, reads return immediately upon packet
237reception.  Otherwise, a read will block until either the kernel buffer
238becomes full or a timeout occurs.
239This is useful for programs like
240.Xr rarpd 8 ,
241which must respond to messages in real time.
242The default for a new file is off.
243.It Dv BIOCSETF (struct bpf_program)
244Sets the filter program used by the kernel to discard uninteresting
245packets.  An array of instructions and its length is passed in using
246the following structure:
247.Bd -literal -offset indent
248struct bpf_program {
249	u_int bf_len;
250	struct bpf_insn *bf_insns;
251};
252.Ed
253.Pp
254The filter program is pointed to by the
255.Va bf_insns
256field while its length in units of
257.Sq struct bpf_insn
258is given by the
259.Va bf_len
260field.
261Also, the actions of
262.Dv BIOCFLUSH
263are performed.
264.Pp
265See section
266.Sy FILTER MACHINE
267for an explanation of the filter language.
268.It Dv BIOCVERSION (struct bpf_version)
269Returns the major and minor version numbers of the filter language currently
270recognized by the kernel.  Before installing a filter, applications must check
271that the current version is compatible with the running kernel.  Version
272numbers are compatible if the major numbers match and the application minor
273is less than or equal to the kernel minor.  The kernel version number is
274returned in the following structure:
275.Bd -literal -offset indent
276struct bpf_version {
277	u_short bv_major;
278	u_short bv_minor;
279};
280.Ed
281.Pp
282The current version numbers are given by
283.Dv BPF_MAJOR_VERSION
284and
285.Dv BPF_MINOR_VERSION
286from
287.Aq Pa net/bpf.h .
288An incompatible filter
289may result in undefined behavior (most likely, an error returned by
290.Xr ioctl 2
291or haphazard packet matching).
292.It Dv BIOCSRSIG BIOCGRSIG (u_int signal)
293Set or get the receive signal.  This signal will be sent to the process or process group
294specified by FIOSETOWN.  It defaults to SIGIO.
295.It Dv BIOGHDRCMPLT BIOSHDRCMPLT (u_int)
296Enable/disable or get the
297.Dq header complete
298flag status.
299If enabled, packets written to the bpf file descriptor will not have
300network layer headers rewritten in the interface output routine.
301By default, the flag is disabled (value is 0).
302.It Dv BIOCGSEESENT BIOCSSEESENT (u_int)
303Enable/disable or get the
304.Dq see sent
305flag status.
306If enabled, packets sent will be passed to the filter.
307By default, the flag is enabled (value is 1).
308.El
309.Sh STANDARD IOCTLS
310.Nm
311now supports several standard
312.Xr ioctl 2 Ns 's
313which allow the user to do async and/or non-blocking I/O to an open
314.I bpf
315file descriptor.
316.Bl -tag -width indent -offset indent
317.It Dv FIONREAD (int)
318Returns the number of bytes that are immediately available for reading.
319.It Dv SIOCGIFADDR (struct ifreq)
320Returns the address associated with the interface.
321.It Dv FIONBIO (int)
322Set or clear non-blocking I/O.  If arg is non-zero, then doing a
323.Xr read 2
324when no data is available will return -1 and
325.Va errno
326will be set to
327.Er EAGAIN .
328If arg is zero, non-blocking I/O is disabled.  Note:  setting this
329overrides the timeout set by
330.Dv BIOCSRTIMEOUT .
331.It Dv FIOASYNC (int)
332Enable or disable async I/O.  When enabled (arg is non-zero), the process or
333process group specified by FIOSETOWN will start receiving SIGIO's when packets
334arrive.
335Note that you must do an FIOSETOWN in order for this to take affect, as
336the system will not default this for you.
337The signal may be changed via
338.Dv BIOCSRSIG .
339.It Dv FIOSETOWN FIOGETOWN (int)
340Set or get the process or process group (if negative) that should receive SIGIO
341when packets are available.
342The signal may be changed using
343.Dv BIOCSRSIG
344(see above).
345.El
346.Sh BPF HEADER
347The following structure is prepended to each packet returned by
348.Xr read 2 :
349.Bd -literal -offset indent
350struct bpf_hdr {
351	struct timeval bh_tstamp;
352	u_int32_t bh_caplen;
353	u_int32_t bh_datalen;
354	u_int16_t bh_hdrlen;
355};
356.Ed
357.Pp
358The fields, whose values are stored in host order, and are:
359.Bl -tag -width bh_datalen -offset indent
360.It Va bh_tstamp
361The time at which the packet was processed by the packet filter.
362.It Va bh_caplen
363The length of the captured portion of the packet.  This is the minimum of
364the truncation amount specified by the filter and the length of the packet.
365.It Va bh_datalen
366The length of the packet off the wire.
367This value is independent of the truncation amount specified by the filter.
368.It Va bh_hdrlen
369The length of the BPF header, which may not be equal to
370.Em sizeof(struct bpf_hdr) .
371.El
372.Pp
373The
374.Va bh_hdrlen
375field exists to account for
376padding between the header and the link level protocol.
377The purpose here is to guarantee proper alignment of the packet
378data structures, which is required on alignment sensitive
379architectures and improves performance on many other architectures.
380The packet filter ensures that the
381.Va bpf_hdr
382and the
383.Em network layer
384header will be word aligned.  Suitable precautions
385must be taken when accessing the link layer protocol fields on alignment
386restricted machines.  (This isn't a problem on an Ethernet, since
387the type field is a short falling on an even offset,
388and the addresses are probably accessed in a bytewise fashion).
389.Pp
390Additionally, individual packets are padded so that each starts
391on a word boundary.  This requires that an application
392has some knowledge of how to get from packet to packet.
393The macro
394.Dv BPF_WORDALIGN
395is defined in
396.Aq Pa net/bpf.h
397to facilitate this process.
398It rounds up its argument
399to the nearest word aligned value (where a word is BPF_ALIGNMENT bytes wide).
400.Pp
401For example, if
402.Sq Va p
403points to the start of a packet, this expression
404will advance it to the next packet:
405.Pp
406.Dl p = (char *)p + BPF_WORDALIGN(p-\*[Gt]bh_hdrlen + p-\*[Gt]bh_caplen)
407.Pp
408For the alignment mechanisms to work properly, the
409buffer passed to
410.Xr read 2
411must itself be word aligned.
412.Xr malloc 3
413will always return an aligned buffer.
414.Sh FILTER MACHINE
415A filter program is an array of instructions, with all branches forwardly
416directed, terminated by a
417.Sy return
418instruction.
419Each instruction performs some action on the pseudo-machine state,
420which consists of an accumulator, index register, scratch memory store,
421and implicit program counter.
422.Pp
423The following structure defines the instruction format:
424.Bd -literal -offset indent
425struct bpf_insn {
426	u_int16_t code;
427	u_char 	jt;
428	u_char 	jf;
429	int32_t k;
430};
431.Ed
432.Pp
433The
434.Va k
435field is used in different ways by different instructions,
436and the
437.Va jt
438and
439.Va jf
440fields are used as offsets
441by the branch instructions.
442The opcodes are encoded in a semi-hierarchical fashion.
443There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
444BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC.  Various other mode and
445operator bits are or'd into the class to give the actual instructions.
446The classes and modes are defined in
447.Aq Pa net/bpf.h .
448.Pp
449Below are the semantics for each defined BPF instruction.
450We use the convention that A is the accumulator, X is the index register,
451P[] packet data, and M[] scratch memory store.
452P[i:n] gives the data at byte offset
453.Dq i
454in the packet,
455interpreted as a word (n=4),
456unsigned halfword (n=2), or unsigned byte (n=1).
457M[i] gives the i'th word in the scratch memory store, which is only
458addressed in word units.  The memory store is indexed from 0 to BPF_MEMWORDS-1.
459.Va k ,
460.Va jt ,
461and
462.Va jf
463are the corresponding fields in the
464instruction definition.
465.Dq len
466refers to the length of the packet.
467.Bl -tag -width indent -offset indent
468.It Sy BPF_LD
469These instructions copy a value into the accumulator.  The type of the
470source operand is specified by an
471.Dq addressing mode
472and can be a constant
473.Sy ( BBPF_IMM ) ,
474packet data at a fixed offset
475.Sy ( BPF_ABS ) ,
476packet data at a variable offset
477.Sy ( BPF_IND ) ,
478the packet length
479.Sy ( BPF_LEN ) ,
480or a word in the scratch memory store
481.Sy ( BPF_MEM ) .
482For
483.Sy BPF_IND
484and
485.Sy BPF_ABS ,
486the data size must be specified as a word
487.Sy ( BPF_W ) ,
488halfword
489.Sy ( BPF_H ) ,
490or byte
491.Sy ( BPF_B ) .
492The semantics of all the recognized BPF_LD instructions follow.
493.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A \*[Lt]- P[k:4]" -offset indent
494.It Sy BPF_LD+BPF_W+BPF_ABS Ta A \*[Lt]- P[k:4]
495.It Sy BPF_LD+BPF_H+BPF_ABS Ta A \*[Lt]- P[k:2]
496.It Sy BPF_LD+BPF_B+BPF_ABS Ta A \*[Lt]- P[k:1]
497.It Sy BPF_LD+BPF_W+BPF_IND Ta A \*[Lt]- P[X+k:4]
498.It Sy BPF_LD+BPF_H+BPF_IND Ta A \*[Lt]- P[X+k:2]
499.It Sy BPF_LD+BPF_B+BPF_IND Ta A \*[Lt]- P[X+k:1]
500.It Sy BPF_LD+BPF_W+BPF_LEN Ta A \*[Lt]- len
501.It Sy BPF_LD+BPF_IMM Ta A \*[Lt]- k
502.It Sy BPF_LD+BPF_MEM Ta A \*[Lt]- M[k]
503.El
504.It Sy BPF_LDX
505These instructions load a value into the index register.  Note that
506the addressing modes are more restricted than those of the accumulator loads,
507but they include
508.Sy BPF_MSH ,
509a hack for efficiently loading the IP header length.
510.Bl -column "BPF_LDX_BPF_W_BPF_IMM" "X \*[Lt]- k" -offset indent
511.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X \*[Lt]- k
512.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X \*[Lt]- M[k]
513.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X \*[Lt]- len
514.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X \*[Lt]- 4*(P[k:1]\*[Am]0xf)
515.El
516.It Sy BPF_ST
517This instruction stores the accumulator into the scratch memory.
518We do not need an addressing mode since there is only one possibility
519for the destination.
520.Bl -column "BPF_ST" "M[k] \*[Lt]- A" -offset indent
521.It Sy BPF_ST Ta M[k] \*[Lt]- A
522.El
523.It Sy BPF_STX
524This instruction stores the index register in the scratch memory store.
525.Bl -column "BPF_STX" "M[k] \*[Lt]- X" -offset indent
526.It Sy BPF_STX Ta M[k] \*[Lt]- X
527.El
528.It Sy BPF_ALU
529The alu instructions perform operations between the accumulator and
530index register or constant, and store the result back in the accumulator.
531For binary operations, a source mode is required
532.Sy ( BPF_K
533or
534.Sy BPF_X ) .
535.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A \*[Lt]- A + k" -offset indent
536.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A \*[Lt]- A + k
537.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A \*[Lt]- A - k
538.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A \*[Lt]- A * k
539.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A \*[Lt]- A / k
540.It Sy BPF_ALU+BPF_AND+BPF_K Ta A \*[Lt]- A \*[Am] k
541.It Sy BPF_ALU+BPF_OR+BPF_K Ta A \*[Lt]- A | k
542.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A \*[Lt]- A \*[Lt]\*[Lt] k
543.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A \*[Lt]- A \*[Gt]\*[Gt] k
544.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A \*[Lt]- A + X
545.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A \*[Lt]- A - X
546.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A \*[Lt]- A * X
547.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A \*[Lt]- A / X
548.It Sy BPF_ALU+BPF_AND+BPF_X Ta A \*[Lt]- A \*[Am] X
549.It Sy BPF_ALU+BPF_OR+BPF_X Ta A \*[Lt]- A | X
550.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A \*[Lt]- A \*[Lt]\*[Lt] X
551.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A \*[Lt]- A \*[Gt]\*[Gt] X
552.It Sy BPF_ALU+BPF_NEG Ta A \*[Lt]- -A
553.El
554.It Sy BPF_JMP
555The jump instructions alter flow of control.  Conditional jumps
556compare the accumulator against a constant
557.Sy ( BPF_K )
558or the index register
559.Sy ( BPF_X ) .
560If the result is true (or non-zero),
561the true branch is taken, otherwise the false branch is taken.
562Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
563However, the jump always
564.Sy ( BPF_JA )
565opcode uses the 32 bit
566.Va k
567field as the offset, allowing arbitrarily distant destinations.
568All conditionals use unsigned comparison conventions.
569.Bl -column "BPF_JMP+BPF_JGE+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent
570.It Sy BPF_JMP+BPF_JA Ta pc += k
571.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A \*[Gt] k) ? jt : jf"
572.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
573.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
574.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A \*[Am] k) ? jt : jf"
575.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A \*[Gt] X) ? jt : jf"
576.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
577.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
578.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A \*[Am] X) ? jt : jf"
579.El
580.It Sy BPF_RET
581The return instructions terminate the filter program and specify the amount
582of packet to accept (i.e., they return the truncation amount).  A return
583value of zero indicates that the packet should be ignored.
584The return value is either a constant
585.Sy ( BPF_K )
586or the accumulator
587.Sy ( BPF_A ) .
588.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent
589.It Sy BPF_RET+BPF_A Ta accept A bytes
590.It Sy BPF_RET+BPF_K Ta accept k bytes
591.El
592.It Sy BPF_MISC
593The miscellaneous category was created for anything that doesn't
594fit into the above classes, and for any new instructions that might need to
595be added.  Currently, these are the register transfer instructions
596that copy the index register to the accumulator or vice versa.
597.Bl -column "BPF_MISC+BPF_TAX" "X \*[Lt]- A" -offset indent
598.It Sy BPF_MISC+BPF_TAX Ta X \*[Lt]- A
599.It Sy BPF_MISC+BPF_TXA Ta A \*[Lt]- X
600.El
601.El
602.Pp
603The BPF interface provides the following macros to facilitate
604array initializers:
605.Bd -unfilled -offset indent
606.Sy BPF_STMT No (opcode, operand)
607.Sy BPF_JUMP No (opcode, operand, true_offset, false_offset)
608.Ed
609.Sh SYSCTLS
610The following sysctls are available when
611.Nm
612is enabled:
613.Pp
614.Bl -tag -width "XnetXbpfXmaxbufsizeXX"
615.It Li net.bpf.maxbufsize
616Sets the maximum buffer size available for
617.Nm
618peers.
619.It Li net.bpf.stats
620Shows
621.Nm
622statistics.
623They can be retrieved with the
624.Xr netstat 1
625utility.
626.It Li net.bpf.peers
627Shows the current
628.Nm
629peers.
630This is only available to the super user and can also be retrieved with the
631.Xr netstat 1
632utility.
633.El
634.Sh FILES
635.Pa /dev/bpf
636.Sh EXAMPLES
637The following filter is taken from the Reverse ARP Daemon.  It accepts
638only Reverse ARP requests.
639.Bd -literal -offset indent
640struct bpf_insn insns[] = {
641	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
642	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
643	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
644	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
645	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
646	    sizeof(struct ether_header)),
647	BPF_STMT(BPF_RET+BPF_K, 0),
648};
649.Ed
650.Pp
651This filter accepts only IP packets between host 128.3.112.15 and
652128.3.112.35.
653.Bd -literal -offset indent
654struct bpf_insn insns[] = {
655	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
656	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
657	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
658	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
659	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
660	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
661	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
662	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
663	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
664	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
665	BPF_STMT(BPF_RET+BPF_K, 0),
666};
667.Ed
668.Pp
669Finally, this filter returns only TCP finger packets.  We must parse
670the IP header to reach the TCP header.  The
671.Sy BPF_JSET
672instruction checks that the IP fragment offset is 0 so we are sure
673that we have a TCP header.
674.Bd -literal -offset indent
675struct bpf_insn insns[] = {
676	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
677	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
678	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
679	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
680	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
681	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
682	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
683	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
684	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
685	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
686	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
687	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
688	BPF_STMT(BPF_RET+BPF_K, 0),
689};
690.Ed
691.Sh SEE ALSO
692.Xr ioctl 2 ,
693.Xr read 2 ,
694.Xr select 2 ,
695.Xr signal 3 ,
696.Xr tcpdump 8
697.Rs
698.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
699.%A S. McCanne
700.%A V. Jacobson
701.%J Proceedings of the 1993 Winter USENIX
702.%C Technical Conference, San Diego, CA
703.Re
704.Sh HISTORY
705The Enet packet filter was created in 1980 by Mike Accetta and
706Rick Rashid at Carnegie-Mellon University.  Jeffrey Mogul, at
707Stanford, ported the code to BSD and continued its development from
7081983 on.  Since then, it has evolved into the ULTRIX Packet Filter
709at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
710.Sh AUTHORS
711Steven McCanne, of Lawrence Berkeley Laboratory, implemented BPF in
712Summer 1990.  The design was in collaboration with Van Jacobson,
713also of Lawrence Berkeley Laboratory.
714.Sh BUGS
715The read buffer must be of a fixed size (returned by the
716.Dv BIOCGBLEN
717ioctl).
718.Pp
719A file that does not request promiscuous mode may receive promiscuously
720received packets as a side effect of another file requesting this
721mode on the same hardware interface.  This could be fixed in the kernel
722with additional processing overhead.  However, we favor the model where
723all files must assume that the interface is promiscuous, and if
724so desired, must use a filter to reject foreign packets.
725.Pp
726Data link protocols with variable length headers are not currently supported.
727.Pp
728Under SunOS, if a BPF application reads more than 2^31 bytes of
729data, read will fail in
730.Er EINVAL .
731You can either fix the bug in SunOS,
732or lseek to 0 when read fails for this reason.
733.Pp
734.Dq Immediate mode
735and the
736.Dq read timeout
737are misguided features.
738This functionality can be emulated with non-blocking mode and
739.Xr select 2 .
740