xref: /dflybsd-src/share/man/man4/polling.4 (revision fda7d3889b1114d34ad3a52a7257a2b80fe24e4c)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
27.\"
28.Dd November 16, 2012
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
33.Nd network device driver polling support
34.Sh SYNOPSIS
35.Cd "options IFPOLL_ENABLE"
36.Sh DESCRIPTION
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
47of system responsiveness and performance.
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
53gives more control on the scheduling of a CPU between various
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
56.Ss Principles of Operation
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
65.Dx
66drivers).
67Furthermore, under heavy traffic load, the system might be
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
71Device polling disables interrupts by polling devices on clock
72interrupts.
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85Currently only network interface drivers support the
86.Nm
87feature.
88It is turned on and off with help of
89.Xr ifconfig 8
90command.
91An interface does not have to be
92.Dq up
93in order to turn on its
94.Nm
95feature.
96.Ss Loader Tunables
97The following tunables can be set from
98.Xr loader.conf 5
99.Em ( X
100is the CPU number):
101.Bl -tag -width indent -compact
102.It Va net.ifpoll.burst_max
103Default value for
104.Va net.ifpoll.X.rx.burst_max
105sysctl nodes.
106.Pp
107.It Va net.ifpoll.each_burst
108Default value for
109.Va net.ifpoll.X.rx.each_burst
110sysctl nodes.
111.Pp
112.It Va net.ifpoll.pollhz
113Default value for
114.Va net.ifpoll.X.pollhz
115sysctl nodes.
116.Pp
117.It Va net.ifpoll.status_frac
118Default value for
119.Va net.ifpoll.0.status_frac
120sysctl node.
121.Pp
122.It Va net.ifpoll.tx_frac
123Default value for
124.Va net.ifpoll.X.tx_frac
125sysctl nodes.
126.El
127.Ss MIB Variables
128The operation of
129.Nm
130is controlled by the following per CPU
131.Xr sysctl 8
132MIB variables
133.Em ( X
134is the CPU number):
135.Pp
136.Bl -tag -width indent -compact
137.It Va net.ifpoll.X.pollhz
138The polling frequency, whose range is 1 to 30000.
139Default is 4000.
140.Pp
141.It Va net.ifpoll.X.rx.user_frac
142When
143.Nm
144is enabled, and provided that there is some work to do,
145up to this percent of the CPU cycles is reserved to userland tasks,
146the remaining fraction being available for
147.Nm
148processing.
149Default is 50.
150.Pp
151.It Va net.ifpoll.X.rx.burst
152Maximum number of packets grabbed from each network interface in
153each timer tick.
154This number is dynamically adjusted by the kernel,
155according to the programmed
156.Va user_frac , burst_max ,
157CPU speed, and system load.
158.Pp
159.It Va net.ifpoll.X.rx.each_burst
160The burst above is split into smaller chunks of this number of
161packets, going round-robin among all interfaces registered for
162.Nm .
163This prevents the case that a large burst from a single interface
164can saturate the IP interrupt queue.
165Default is 15.
166.Pp
167.It Va net.ifpoll.X.rx.burst_max
168Upper bound for
169.Va net.ifpoll.X.rx.burst .
170Note that when
171.Nm
172is enabled, each interface can receive at most
173.Pq Va pollhz No * Va burst_max
174packets per second unless there are spare CPU cycles available for
175.Nm
176in the idle loop.
177This number should be tuned to match the expected load.
178Default is 375 which is adequate for 1000Mbit network and pollhz=4000.
179.Pp
180.It Va net.ifpoll.X.rx.handlers
181How many active devices have registered for packet reception
182.Nm .
183.Pp
184.It Va net.ifpoll.X.tx_frac
185Controls how often (every
186.Va tx_frac No / Va pollhz
187seconds) the tranmission queue is checked for packet transmission
188done events.
189Increasing this value reduces the time spent on checking packets
190transmission done events thus reduces bus load,
191but it also increases chance
192that the transmission queue getting saturated.
193Default is 1.
194.Pp
195.It Va net.ifpoll.X.tx.handlers
196How many active devices have registered for packet transmission
197.Nm .
198.Pp
199.It Va net.ifpoll.0.status_frac
200Controls how often (every
201.Va status_frac No / Va pollhz
202seconds) the status registers of the device are checked for error
203conditions and the like.
204Increasing this value reduces the load on the bus,
205but also delays the error detection.
206Default is 80.
207.Pp
208.It Va net.ifpoll.0.status.handlers
209How many active devices have registered for status
210.Nm .
211.Pp
212.It Va net.ifpoll.X.rx.short_ticks
213.It Va net.ifpoll.X.rx.lost_polls
214.It Va net.ifpoll.X.rx.pending_polls
215.It Va net.ifpoll.X.rx.residual_burst
216.It Va net.ifpoll.X.rx.phase
217.It Va net.ifpoll.X.rx.suspect
218.It Va net.ifpoll.X.rx.stalled
219.It Va net.ifpoll.X.tx.short_ticks
220.It Va net.ifpoll.X.tx.lost_polls
221.It Va net.ifpoll.X.tx.pending_polls
222.It Va net.ifpoll.X.tx.residual_burst
223.It Va net.ifpoll.X.tx.phase
224.It Va net.ifpoll.X.tx.suspect
225.It Va net.ifpoll.X.tx.stalled
226Debugging variables.
227.El
228.Sh SUPPORTED DEVICES
229Device polling requires explicit modifications to the device drivers.
230As of this writing, the
231.Xr bce 4 ,
232.Xr bge 4 ,
233.Xr bnx 4 ,
234.Xr dc 4 ,
235.Xr em 4 ,
236.Xr emx 4 ,
237.Xr fwe 4 ,
238.Xr fxp 4 ,
239.Xr igb 4 ,
240.Xr jme 4 ,
241.Xr nfe 4 ,
242.Xr nge 4 ,
243.Xr re 4 ,
244.Xr rl 4 ,
245.Xr sis 4 ,
246.Xr stge 4 ,
247.Xr vge 4 ,
248.Xr vr 4 ,
249and
250.Xr xl 4
251devices are supported,
252with others in the works.
253The
254.Xr emx 4 ,
255.Xr igb 4 ,
256and
257.Xr jme 4
258support multiple reception queues based
259.Nm .
260The modifications are rather straightforward, consisting in
261the extraction of the inner part of the interrupt service routine
262and writing a callback function,
263.Fn *_npoll ,
264which is invoked
265to probe the device for events and process them.
266(See the
267conditionally compiled sections of the devices mentioned above
268for more details.)
269.Pp
270In order to reduce the latency in processing packets,
271it is advisable to set the
272.Xr sysctl 8
273variable
274.Va net.ifpoll.X.pollhz
275to at least 1000.
276.Sh HISTORY
277Device polling first appeared in
278.Fx 4.6 .
279It was rewritten in
280.Dx 1.3 .
281.Sh AUTHORS
282.An -nosplit
283The device polling code was rewritten by
284.An Matt Dillon
285based on the original code by
286.An Luigi Rizzo Aq luigi@iet.unipi.it .
287.An Sepherosa Ziehau
288made the polling frequency settable at runtime,
289added per CPU polling
290and added multiple reception queue polling support.
291