xref: /netbsd-src/external/ibm-public/postfix/dist/proto/STRESS_README.html (revision 059c16a85b0b39d60ad6d18f53c09510815afa2b)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title>Postfix Stress-Dependent Configuration</title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
11<link rel='stylesheet' type='text/css' href='postfix-doc.css'>
12
13</head>
14
15<body>
16
17<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
18Stress-Dependent Configuration</h1>
19
20<hr>
21
22<h2>Overview </h2>
23
24<p> This document describes the symptoms of Postfix SMTP server
25overload. It presents permanent main.cf changes to avoid overload
26during normal operation, and temporary main.cf changes to cope with
27an unexpected burst of mail. This document makes specific suggestions
28for Postfix 2.5 and later which support stress-adaptive behavior,
29and for earlier Postfix versions that don't.  </p>
30
31<p> Topics covered in this document: </p>
32
33<ul>
34
35<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a>
36
37<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
38
39<li><a href="#concurrency"> Service more SMTP clients at the same time </a>
40
41<li><a href="#time"> Spend less time per SMTP client </a>
42
43<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
44
45<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
46
47<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
48
49<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
50
51<li><a href="#other"> Other measures to off-load zombies </a>
52
53<li><a href="#credits"> Credits </a>
54
55</ul>
56
57<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
58
59<p> Under normal conditions, the Postfix SMTP server responds
60immediately when an SMTP client connects to it; the time to deliver
61mail is noticeable only with large messages.  Performance degrades
62dramatically when the number of SMTP clients exceeds the number of
63Postfix SMTP server processes.  When an SMTP client connects while
64all Postfix SMTP server processes are busy, the client must wait
65until a server process becomes available. </p>
66
67<p> SMTP server overload may be caused by a surge of legitimate
68mail (example: a DNS registrar opens a new zone for registrations),
69by mistake (mail explosion caused by a forwarding loop) or by malice
70(worm outbreak, botnet, or other illegitimate activity).  </p>
71
72<p> Symptoms of Postfix SMTP server overload are: </p>
73
74<ul>
75
76<li> <p> Remote SMTP clients experience a long delay before Postfix
77sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
78
79<ul>
80
81<li> <p> NOTE: Broken DNS configurations can also cause lengthy
82delays before Postfix sends "220 hostname.example.com ...". These
83delays also exist when Postfix is NOT overloaded.  </p>
84
85<li> <p> NOTE:  To avoid "overload" delays for end-user mail
86clients, enable the "submission" service entry in master.cf (present
87since Postfix 2.1), and tell users to connect to this instead of
88the public SMTP service. </p>
89
90</ul>
91
92<li> <p> The Postfix SMTP server logs an increased number of "lost
93connection after CONNECT" events. This happens because remote SMTP
94clients disconnect before Postfix answers the connection. </p>
95
96<ul>
97
98<li> <p> NOTE: A portscan for open SMTP ports can also result in
99"lost connection ..." logfile messages. </p>
100
101</ul>
102
103<li> <p> Postfix 2.3 and later logs a warning that all server ports
104are busy: </p>
105
106<pre>
107Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
108 (25) has reached its process limit "30": new clients may experience
109 noticeable delays
110Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
111 condition, increase the process count in master.cf or reduce the
112 service time per client
113Oct  3 20:39:27 spike postfix/master[28905]: warning: see
114  <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of
115  stress-adapting configuration settings
116</pre>
117
118</ul>
119
120<p> Legitimate mail that doesn't get through during an episode of
121Postfix SMTP server overload is not necessarily lost. It should
122still arrive once the situation returns to normal, as long as the
123overload condition is temporary.  </p>
124
125<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
126
127<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
128It works as follows. When a "public" network service such as the
129SMTP server runs into an "all server ports are busy" condition, the
130Postfix master(8) daemon logs a warning, restarts the service
131(without interrupting existing network sessions), and runs the
132service with "-o stress=yes" on the server process command line:
133</p>
134
135<blockquote>
136<pre>
13780821  ??  S      0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
138</pre>
139</blockquote>
140
141<p> Normally, the Postfix master(8) daemon runs such a service with
142"-o stress=" on the command line (i.e.  with an empty parameter
143value):  </p>
144
145<blockquote>
146<pre>
14783326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
148</pre>
149</blockquote>
150
151<p> You won't see "-o stress" command-line parameters with services
152that have local clients only. These include services internal to
153Postfix such as the queue manager, and services that listen on a
154loopback interface only, such as after-filter SMTP services.  </p>
155
156<p> The "stress" parameter value is the key to making main.cf
157parameter settings stress adaptive. The following settings are the
158default with Postfix 2.6 and later. </p>
159
160<blockquote>
161<pre>
1621 smtpd_timeout = ${stress?{10}:{300}}s
1632 smtpd_hard_error_limit = ${stress?{1}:{20}}
1643 smtpd_junk_command_limit = ${stress?{1}:{100}}
1654 # Parameters added after Postfix 2.6:
1665 smtpd_per_record_deadline = ${stress?{yes}:{no}}
1676 smtpd_starttls_timeout = ${stress?{10}:{300}}s
1687 address_verify_poll_count = ${stress?{1}:{3}}
169</pre>
170</blockquote>
171
172<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y}
173instead of the newer form ${stress?{x}:{y}}. </p>
174
175<p> The syntax of ${name?{value}:{value}}, ${name?value} and
176${name:value} is explained at the beginning of the postconf(5)
177manual page. </p>
178
179<p> Translation: <p>
180
181<ul>
182
183<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
184value of 10 seconds instead of the default 300 seconds. Experience
185on the postfix-users list from a variety of sysadmins shows that
186reducing the "normal" smtpd_timeout to 60s is unlikely to affect
187legitimate clients. However, it is unlikely to become the Postfix
188default because it's not RFC compliant. Setting smtpd_timeout to
18910s or even 5s under stress will still allow most
190legitimate clients to connect and send mail, but may delay mail
191from some clients. No mail should be lost, as long as this measure
192is used only temporarily. </p>
193
194<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
195of 1 instead of the default 20. This disconnects clients
196after a single error, giving other clients a chance to connect.
197However, this may cause significant delays with legitimate mail,
198such as a mailing list that contains a few no-longer-active user
199names that didn't bother to unsubscribe. No mail should be lost,
200as long as this measure is used only temporarily. </p>
201
202<li> <p> Line 3: under conditions of stress, use an
203smtpd_junk_command_limit of 1 instead of the default 100. This
204prevents clients from keeping connections open by repeatedly
205sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p>
206
207<li> <p> Line 5: under conditions of stress, change the behavior
208of smtpd_timeout and smtpd_starttls_timeout, from a time limit per
209read or write system call, to a time limit to send or receive a
210complete record (an SMTP command line, SMTP response line, SMTP
211message content line, or TLS protocol message). </p>
212
213<li> <p> Line 6: under conditions of stress, reduce the time limit
214for TLS protocol handshake messages to 10 seconds, from the default
215value of 300 seconds. See also the smtpd_timeout discussion above.
216</p>
217
218<li> <p> Line 7: under conditions of stress, do not wait up to 6
219seconds for the completion of an address verification probe. If the
220result is not already in the address verification cache, reply
221immediately with $unverified_recipient_tempfail_action or
222$unverified_sender_tempfail_action. No mail should be lost, as long
223as this measure is used only temporarily.  </p>
224
225</ul>
226
227<p> NOTE: Please keep in mind that the stress-adaptive feature is
228a fairly desperate measure to keep <b>some</b> legitimate mail
229flowing under overload conditions.  If a site is reaching the SMTP
230server process limit when there isn't an attack or bot flood
231occurring, then either the process limit needs to be raised or more
232hardware needs to be added.  </p>
233
234<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
235
236<p> This section and the ones that follow discuss permanent measures
237against mail server overload.  </p>
238
239<p> One measure to avoid the "all server processes busy" condition
240is to service more SMTP clients simultaneously. For this you need
241to increase the number of Postfix SMTP server processes. This will
242improve the
243responsiveness for remote SMTP clients, as long as the server machine
244has enough hardware and software resources to run the additional
245processes, and as long as the file system can keep up with the
246additional load. </p>
247
248<ul>
249
250<li> <p> You increase the number of SMTP server processes either
251by increasing the default_process_limit in main.cf (line 3 below),
252or by increasing the SMTP server's "maxproc" field in master.cf
253(line 10 below).  Either way, you need to issue a "postfix reload"
254command to make the change effective.  </p>
255
256<li> <p> Process limits above 1000 require Postfix version 2.4 or
257later, and an operating system that supports kernel-based event
258filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
259</p>
260
261<li> <p> More processes use more memory. You can reduce the Postfix
262memory footprint by using cdb:
263lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
264
265<pre>
266 1 /etc/postfix/main.cf:
267 2     # Raise the global process limit, 100 since Postfix 2.0.
268 3     default_process_limit = 200
269 4
270 5 /etc/postfix/master.cf:
271 6     # =============================================================
272 7     # service type  private unpriv  chroot  wakeup  maxproc command
273 8     # =============================================================
274 9     # Raise the SMTP service process limit only.
27510     smtp      inet  n       -       n       -       200     smtpd
276</pre>
277
278<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
279contain a mistake: they configure a fixed number of policy daemon
280processes.  When you raise the SMTP server's "maxproc" field in
281master.cf, SMTP server processes will report problems when connecting
282to policy server processes, because there aren't enough of them.
283Examples of errors are "connection refused" or "operation timed
284out".  </p>
285
286<p> To fix, edit master.cf and specify a zero "maxproc" field
287in all policy server entries; see line 6 in the example below.
288Issue a "postfix reload" command to make the change effective.  </p>
289
290<pre>
2911 /etc/postfix/master.cf:
2922     # =============================================================
2933     # service type  private unpriv  chroot  wakeup  maxproc command
2944     # =============================================================
2955     # Disable the policy service process limit.
2966     policy    unix  -       n       n       -       0       spawn
2977         user=nobody argv=/some/where/policy-server
298</pre>
299
300</ul>
301
302<h2><a name="time"> Spend less time per SMTP client </a></h2>
303
304<p> When increasing the number of SMTP server processes is not
305practical, you can improve Postfix server responsiveness by eliminating
306delays.  When Postfix spends less time per SMTP session, the same
307number of SMTP server processes can service more clients in a given
308amount of time. </p>
309
310<ul>
311
312<li> <p> Eliminate non-functional RBL lookups (blocklists that are
313no longer in operation). These lookups can degrade performance.
314Postfix logs a warning when an RBL server does not respond. </p>
315
316<li> <p> Eliminate redundant RBL lookups (people often use multiple
317Spamhaus RBLs that include each other).  To find out whether RBLs
318include other RBLs, look up the websites that document the RBL's
319policies. </p>
320
321<li> <p> Eliminate header_checks and body_checks, and keep just a few
322emergency patterns to block the latest worm explosion or backscatter
323mail.  See BACKSCATTER_README for examples of the latter.
324
325<li> <p> Group your header_checks and body_checks patterns to avoid
326unnecessary pattern matching operations:
327
328<pre>
329 1  /etc/postfix/header_checks:
330 2      if /^Subject:/
331 3      /^Subject: virus found in mail from you/ reject
332 4      /^Subject: ..other../ reject
333 5      endif
334 6
335 7      if /^Received:/
336 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
337 9      /^Received: from ..other../ reject ....
33810      endif
339</pre>
340
341</ul>
342
343<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
344
345<p> Under conditions of overload you can improve Postfix SMTP server
346responsiveness by hanging up on suspicious clients, so that other
347clients get a chance to talk to Postfix.  </p>
348
349<ul>
350
351<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
352(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
353RBLs (see next bullet) or that match selected non-RBL restrictions
354such as SMTP access maps.  The Postfix SMTP server will reject mail
355and disconnect without waiting for the remote SMTP client to send
356a QUIT command.  </p>
357
358<li> <p> To hang up connections from denylisted zombies, you can
359set specific Postfix SMTP server reject codes for specific RBLs,
360and for individual responses from specific RBLs. We'll use
361zen.spamhaus.org as an example; by the time you read this document,
362details may have changed.  Right now, their documents say that a
363response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
364address, which means that the machine is probably running a bot of
365some kind.  To give a 521 response instead of the default 554
366response, use something like: </p>
367
368<pre>
369 1  /etc/postfix/main.cf:
370 2      smtpd_client_restrictions =
371 3         permit_mynetworks
372 4         reject_rbl_client zen.spamhaus.org=127.0.0.10
373 5         reject_rbl_client zen.spamhaus.org=127.0.0.11
374 6         reject_rbl_client zen.spamhaus.org
375 7
376 8      rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
377 9
37810  /etc/postfix/rbl_reply_maps:
37911      # With Postfix 2.3-2.5 use "421" to hang up connections.
38012      zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
38113       $rbl_class [$rbl_what] blocked using
38214       $rbl_domain${rbl_reason?; $rbl_reason}
38315
38416      zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
38517       $rbl_class [$rbl_what] blocked using
38618       $rbl_domain${rbl_reason?; $rbl_reason}
387</pre>
388
389<p> Although the above example shows three RBL lookups (lines 4-6),
390Postfix will only do a single DNS query, so it does not affect the
391performance. </p>
392
393<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
394cause Postfix to disconnect). The down-side of replying with 421
395is that it works only for zombies and other malware. If the client
396is running a real MTA, then it may connect again several times until
397the mail expires in its queue. When this is a problem, stick with
398the default 554 reply, and use "smtpd_hard_error_limit = 1" as
399described below.  </p>
400
401<li> <p> You can automatically turn on the above overload measure
402with Postfix 2.5 and later, or with earlier releases that contain
403the stress-adaptive behavior source code patch from the mirrors
404listed at http://www.postfix.org/download.html. Simply replace line
405above 8 with: </p>
406
407<pre>
408 8      rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
409</pre>
410
411</ul>
412
413<p> More information about automatic stress-adaptive behavior is
414in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
415</p>
416
417<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
418
419<p> See the section "<a href="#adapt">Automatic stress-adaptive
420behavior</a>" if you are running Postfix version 2.5 or later, or
421if you have applied the source code patch for stress-adaptive
422behavior from the mirrors listed at http://www.postfix.org/download.html.
423</p>
424
425<p> The following measures can be applied temporarily during overload.
426They still allow <b>most</b> legitimate clients to connect and send
427mail, but may affect some legitimate clients. </p>
428
429<ul>
430
431<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
432postfix-users list from a variety of sysadmins shows that reducing
433the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
434clients. However, it is unlikely to become the Postfix default
435because it's not RFC compliant. Setting smtpd_timeout to 10s (line
4362 below) or even 5s under stress will still allow <b>most</b>
437legitimate clients to connect and send mail, but may delay mail
438from some clients.  No mail should be lost, as long as this measure
439is used only temporarily.  </p>
440
441<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
442to 1 under stress (line 3 below) helps by disconnecting clients
443after a single error, giving other clients a chance to connect.
444However, this may cause significant delays with legitimate mail,
445such as a mailing list that contains a few no-longer-active user
446names that didn't bother to unsubscribe. No mail should be lost,
447as long as this measure is used only temporarily. </p>
448
449<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
450100. This prevents clients from keeping idle connections open by
451repeatedly sending NOOP or RSET commands. </p>
452
453</ul>
454
455<blockquote>
456<pre>
4571  /etc/postfix/main.cf:
4582      smtpd_timeout = 10
4593      smtpd_hard_error_limit = 1
4604      smtpd_junk_command_limit = 1
461</pre>
462</blockquote>
463
464<p> With these measures, no mail should be lost, as long
465as these measures are used only temporarily. The next section of
466this document introduces a way to automate this process. </p>
467
468<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
469
470<p> To find out if your Postfix installation supports stress-adaptive
471behavior, use the "ps" command, and look for the smtpd processes.
472Postfix has stress-adaptive support when you see "-o stress=" or
473"-o stress=yes" command-line options. Remember that Postfix never
474enables stress-adaptive behavior on servers that listen on local
475addresses only. </p>
476
477<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
478and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
479
480<blockquote>
481<pre>
482$ ps ax|grep smtpd
48383326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
48484345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
485</pre>
486</blockquote>
487
488<p> You can't use postconf(1) to detect stress-adaptive support.
489The postconf(1) command ignores the existence of the stress parameter
490in main.cf, because the parameter has no effect there.  Command-line
491"-o parameter" settings always take precedence over main.cf parameter
492settings.  <p>
493
494<p> If you configure stress-adaptive behavior in main.cf when it
495isn't supported, nothing bad will happen.  The processes will run
496as if the stress parameter always has an empty value. </p>
497
498<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
499
500<p> You can manually force stress-adaptive behavior on, by adding
501a "-o stress=yes" command-line option in master.cf. This can be
502useful for testing overrides on the SMTP service. Issue "postfix
503reload" to make the change effective.  </p>
504
505<p> Note: setting the stress parameter in main.cf has no effect for
506services that accept remote connections. </p>
507
508<blockquote>
509<pre>
5101 /etc/postfix/master.cf:
5112     # =============================================================
5123     # service type  private unpriv  chroot  wakeup  maxproc command
5134     # =============================================================
5145     #
5156     smtp      inet  n       -       n       -       -       smtpd
5167         -o stress=yes
5178         -o . . .
518</pre>
519</blockquote>
520
521<p> To permanently force stress-adaptive behavior off with a specific
522service, specify "-o stress=" on its master.cf command line.  This
523may be desirable for the "submission" service. Issue "postfix reload"
524to make the change effective.  </p>
525
526<p> Note: setting the stress parameter in main.cf has no effect for
527services that accept remote connections. </p>
528
529<blockquote>
530<pre>
5311 /etc/postfix/master.cf:
5322     # =============================================================
5333     # service type  private unpriv  chroot  wakeup  maxproc command
5344     # =============================================================
5355     #
5366     submission inet n       -       n       -       -       smtpd
5377         -o stress=
5388         -o . . .
539</pre>
540</blockquote>
541
542<h2><a name="other"> Other measures to off-load zombies </a> </h2>
543
544<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides
545additional protection against mail server overload. One postscreen(8)
546process handles multiple inbound SMTP connections, and decides which
547clients may talk to a Postfix SMTP server process.  By keeping
548spambots away, postscreen(8) leaves more SMTP server processes
549available for legitimate clients, and delays the onset of server
550overload conditions. </p>
551
552<h2><a name="credits"> Credits </a></h2>
553
554<ul>
555
556<li>  Thanks to the postfix-users mailing list members for sharing
557early experiences with the stress-adaptive feature.
558
559<li>  The RBL example and several other paragraphs of text were
560adapted from postfix-users postings by Noel Jones.
561
562<li>  Wietse implemented stress-adaptive behavior as the smallest
563possible patch while he should be working on other things.
564
565</ul>
566
567</body> </html>
568