xref: /netbsd-src/external/ibm-public/postfix/dist/proto/STRESS_README.html (revision 6cd39ddb8550f6fa1bff3fed32053d7f19fd0453)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title>Postfix Stress-Dependent Configuration</title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
11
12</head>
13
14<body>
15
16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
17Stress-Dependent Configuration</h1>
18
19<hr>
20
21<h2>Overview </h2>
22
23<p> This document describes the symptoms of Postfix SMTP server
24overload. It presents permanent main.cf changes to avoid overload
25during normal operation, and temporary main.cf changes to cope with
26an unexpected burst of mail. This document makes specific suggestions
27for Postfix 2.5 and later which support stress-adaptive behavior,
28and for earlier Postfix versions that don't.  </p>
29
30<p> Topics covered in this document: </p>
31
32<ul>
33
34<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a>
35
36<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
37
38<li><a href="#concurrency"> Service more SMTP clients at the same time </a>
39
40<li><a href="#time"> Spend less time per SMTP client </a>
41
42<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
43
44<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
45
46<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
47
48<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
49
50<li><a href="#other"> Other measures to off-load zombies </a>
51
52<li><a href="#credits"> Credits </a>
53
54</ul>
55
56<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
57
58<p> Under normal conditions, the Postfix SMTP server responds
59immediately when an SMTP client connects to it; the time to deliver
60mail is noticeable only with large messages.  Performance degrades
61dramatically when the number of SMTP clients exceeds the number of
62Postfix SMTP server processes.  When an SMTP client connects while
63all Postfix SMTP server processes are busy, the client must wait
64until a server process becomes available. </p>
65
66<p> SMTP server overload may be caused by a surge of legitimate
67mail (example: a DNS registrar opens a new zone for registrations),
68by mistake (mail explosion caused by a forwarding loop) or by malice
69(worm outbreak, botnet, or other illegitimate activity).  </p>
70
71<p> Symptoms of Postfix SMTP server overload are: </p>
72
73<ul>
74
75<li> <p> Remote SMTP clients experience a long delay before Postfix
76sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
77
78<ul>
79
80<li> <p> NOTE: Broken DNS configurations can also cause lengthy
81delays before Postfix sends "220 hostname.example.com ...". These
82delays also exist when Postfix is NOT overloaded.  </p>
83
84<li> <p> NOTE:  To avoid "overload" delays for end-user mail
85clients, enable the "submission" service entry in master.cf (present
86since Postfix 2.1), and tell users to connect to this instead of
87the public SMTP service. </p>
88
89</ul>
90
91<li> <p> The Postfix SMTP server logs an increased number of "lost
92connection after CONNECT" events. This happens because remote SMTP
93clients disconnect before Postfix answers the connection. </p>
94
95<ul>
96
97<li> <p> NOTE: A portscan for open SMTP ports can also result in
98"lost connection ..." logfile messages. </p>
99
100</ul>
101
102<li> <p> Postfix 2.3 and later logs a warning that all server ports
103are busy: </p>
104
105<pre>
106Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
107 (25) has reached its process limit "30": new clients may experience
108 noticeable delays
109Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
110 condition, increase the process count in master.cf or reduce the
111 service time per client
112Oct  3 20:39:27 spike postfix/master[28905]: warning: see
113  <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of
114  stress-adapting configuration settings
115</pre>
116
117</ul>
118
119<p> Legitimate mail that doesn't get through during an episode of
120Postfix SMTP server overload is not necessarily lost. It should
121still arrive once the situation returns to normal, as long as the
122overload condition is temporary.  </p>
123
124<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
125
126<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
127It works as follows. When a "public" network service such as the
128SMTP server runs into an "all server ports are busy" condition, the
129Postfix master(8) daemon logs a warning, restarts the service
130(without interrupting existing network sessions), and runs the
131service with "-o stress=yes" on the server process command line:
132</p>
133
134<blockquote>
135<pre>
13680821  ??  S      0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
137</pre>
138</blockquote>
139
140<p> Normally, the Postfix master(8) daemon runs such a service with
141"-o stress=" on the command line (i.e.  with an empty parameter
142value):  </p>
143
144<blockquote>
145<pre>
14683326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
147</pre>
148</blockquote>
149
150<p> Services that have local access only never have "-o stress"
151parameters on the command line. This includes services internal to
152Postfix such as the queue manager, and services that listen on a
153loopback interface only, such as after-filter SMTP services.  </p>
154
155<p> The "stress" parameter value is the key to making main.cf
156parameter settings stress adaptive. The following settings are the
157default with Postfix 2.6 and later. </p>
158
159<blockquote>
160<pre>
1611 smtpd_timeout = ${stress?10}${stress:300}s
1622 smtpd_hard_error_limit = ${stress?1}${stress:20}
1633 smtpd_junk_command_limit = ${stress?1}${stress:100}
1644 # Parameters added after Postfix 2.6:
1655 smtpd_per_record_deadline = ${stress?yes}${stress:no}
1666 smtpd_starttls_timeout = ${stress?10}${stress:300}s
1677 address_verify_poll_count = ${stress?1}${stress:3}
168</pre>
169</blockquote>
170
171<p> Translation: <p>
172
173<ul>
174
175<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
176value of 10 seconds instead of the default 300 seconds. Experience
177on the postfix-users list from a variety of sysadmins shows that
178reducing the "normal" smtpd_timeout to 60s is unlikely to affect
179legitimate clients. However, it is unlikely to become the Postfix
180default because it's not RFC compliant. Setting smtpd_timeout to
18110s or even 5s under stress will still allow most
182legitimate clients to connect and send mail, but may delay mail
183from some clients. No mail should be lost, as long as this measure
184is used only temporarily. </p>
185
186<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
187of 1 instead of the default 20. This helps by disconnecting clients
188after a single error, giving other clients a chance to connect.
189However, this may cause significant delays with legitimate mail,
190such as a mailing list that contains a few no-longer-active user
191names that didn't bother to unsubscribe. No mail should be lost,
192as long as this measure is used only temporarily. </p>
193
194<li> <p> Line 3: under conditions of stress, use an
195smtpd_junk_command_limit of 1 instead of the default 100. This
196prevents clients from keeping connections open by repeatedly
197sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p>
198
199<li> <p> Line 5: under conditions of stress, change the behavior
200of smtpd_timeout and smtpd_starttls_timeout, from a time limit per
201read or write system call, to a time limit to send or receive a
202complete record (an SMTP command line, SMTP response line, SMTP
203message content line, or TLS protocol message). </p>
204
205<li> <p> Line 6: under conditions of stress, reduce the time limit
206for TLS protocol handshake messages to 10 seconds, from the default
207value of 300 seconds. See also the smtpd_timeout discussion above.
208</p>
209
210<li> <p> Line 7: under conditions of stress, do not wait up to 6
211seconds for the completion of an address verification probe. If the
212result is not already in the address verification cache, reply
213immediately with $unverified_recipient_tempfail_action or
214$unverified_sender_tempfail_action. No mail should be lost, as long
215as this measure is used only temporarily.  </p>
216
217</ul>
218
219<p> The syntax of ${name?value} and ${name:value} is explained at
220the beginning of the postconf(5) manual page. </p>
221
222<p> NOTE: Please keep in mind that the stress-adaptive feature is
223a fairly desperate measure to keep <b>some</b> legitimate mail
224flowing under overload conditions.  If a site is reaching the SMTP
225server process limit when there isn't an attack or bot flood
226occurring, then either the process limit needs to be raised or more
227hardware needs to be added.  </p>
228
229<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
230
231<p> This section and the ones that follow discuss permanent measures
232against mail server overload.  </p>
233
234<p> One measure to avoid the "all server processes busy" condition
235is to service more SMTP clients simultaneously. For this you need
236to increase the number of Postfix SMTP server processes. This will
237improve the
238responsiveness for remote SMTP clients, as long as the server machine
239has enough hardware and software resources to run the additional
240processes, and as long as the file system can keep up with the
241additional load. </p>
242
243<ul>
244
245<li> <p> You increase the number of SMTP server processes either
246by increasing the default_process_limit in main.cf (line 3 below),
247or by increasing the SMTP server's "maxproc" field in master.cf
248(line 10 below).  Either way, you need to issue a "postfix reload"
249command to make the change effective.  </p>
250
251<li> <p> Process limits above 1000 require Postfix version 2.4 or
252later, and an operating system that supports kernel-based event
253filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
254</p>
255
256<li> <p> More processes use more memory. You can reduce the Postfix
257memory footprint by using cdb:
258lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
259
260<pre>
261 1 /etc/postfix/main.cf:
262 2     # Raise the global process limit, 100 since Postfix 2.0.
263 3     default_process_limit = 200
264 4
265 5 /etc/postfix/master.cf:
266 6     # =============================================================
267 7     # service type  private unpriv  chroot  wakeup  maxproc command
268 8     # =============================================================
269 9     # Raise the SMTP service process limit only.
27010     smtp      inet  n       -       n       -       200     smtpd
271</pre>
272
273<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
274contain a mistake: they configure a fixed number of policy daemon
275processes.  When you raise the SMTP server's "maxproc" field in
276master.cf, SMTP server processes will report problems when connecting
277to policy server processes, because there aren't enough of them.
278Examples of errors are "connection refused" or "operation timed
279out".  </p>
280
281<p> To fix, edit master.cf and specify a zero "maxproc" field
282in all policy server entries; see line 6 in the example below.
283Issue a "postfix reload" command to make the change effective.  </p>
284
285<pre>
2861 /etc/postfix/master.cf:
2872     # =============================================================
2883     # service type  private unpriv  chroot  wakeup  maxproc command
2894     # =============================================================
2905     # Disable the policy service process limit.
2916     policy    unix  -       n       n       -       0       spawn
2927         user=nobody argv=/some/where/policy-server
293</pre>
294
295</ul>
296
297<h2><a name="time"> Spend less time per SMTP client </a></h2>
298
299<p> When increasing the number of SMTP server processes is not
300practical, you can improve Postfix server responsiveness by eliminating
301delays.  When Postfix spends less time per SMTP session, the same
302number of SMTP server processes can service more clients in a given
303amount of time. </p>
304
305<ul>
306
307<li> <p> Eliminate non-functional RBL lookups (blocklists that are
308no longer in operation). These lookups can degrade performance.
309Postfix logs a warning when an RBL server does not respond. </p>
310
311<li> <p> Eliminate redundant RBL lookups (people often use multiple
312Spamhaus RBLs that include each other).  To find out whether RBLs
313include other RBLs, look up the websites that document the RBL's
314policies. </p>
315
316<li> <p> Eliminate header_checks and body_checks, and keep just a few
317emergency patterns to block the latest worm explosion or backscatter
318mail.  See BACKSCATTER_README for examples of the latter.
319
320<li> <p> Group your header_checks and body_checks patterns to avoid
321unnecessary pattern matching operations:
322
323<pre>
324 1  /etc/postfix/header_checks:
325 2      if /^Subject:/
326 3      /^Subject: virus found in mail from you/ reject
327 4      /^Subject: ..other../ reject
328 5      endif
329 6
330 7      if /^Received:/
331 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
332 9      /^Received: from ..other../ reject ....
33310      endif
334</pre>
335
336</ul>
337
338<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
339
340<p> Under conditions of overload you can improve Postfix SMTP server
341responsiveness by hanging up on suspicious clients, so that other
342clients get a chance to talk to Postfix.  </p>
343
344<ul>
345
346<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
347(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
348RBLs (see next bullet) or that match selected non-RBL restrictions
349such as SMTP access maps.  The Postfix SMTP server will reject mail
350and disconnect without waiting for the remote SMTP client to send
351a QUIT command.  </p>
352
353<li> <p> To hang up connections from blacklisted zombies, you can
354set specific Postfix SMTP server reject codes for specific RBLs,
355and for individual responses from specific RBLs. We'll use
356zen.spamhaus.org as an example; by the time you read this document,
357details may have changed.  Right now, their documents say that a
358response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
359address, which means that the machine is probably running a bot of
360some kind.  To give a 521 response instead of the default 554
361response, use something like: </p>
362
363<pre>
364 1  /etc/postfix/main.cf:
365 2      smtpd_client_restrictions =
366 3         permit_mynetworks
367 4         reject_rbl_client zen.spamhaus.org=127.0.0.10
368 5         reject_rbl_client zen.spamhaus.org=127.0.0.11
369 6         reject_rbl_client zen.spamhaus.org
370 7
371 8      rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
372 9
37310  /etc/postfix/rbl_reply_maps:
37411      # With Postfix 2.3-2.5 use "421" to hang up connections.
37512      zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
37613       $rbl_class [$rbl_what] blocked using
37714       $rbl_domain${rbl_reason?; $rbl_reason}
37815
37916      zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
38017       $rbl_class [$rbl_what] blocked using
38118       $rbl_domain${rbl_reason?; $rbl_reason}
382</pre>
383
384<p> Although the above example shows three RBL lookups (lines 4-6),
385Postfix will only do a single DNS query, so it does not affect the
386performance. </p>
387
388<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
389cause Postfix to disconnect). The down-side of replying with 421
390is that it works only for zombies and other malware. If the client
391is running a real MTA, then it may connect again several times until
392the mail expires in its queue. When this is a problem, stick with
393the default 554 reply, and use "smtpd_hard_error_limit = 1" as
394described below.  </p>
395
396<li> <p> You can automatically turn on the above overload measure
397with Postfix 2.5 and later, or with earlier releases that contain
398the stress-adaptive behavior source code patch from the mirrors
399listed at http://www.postfix.org/download.html. Simply replace line
400above 8 with: </p>
401
402<pre>
403 8      rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
404</pre>
405
406</ul>
407
408<p> More information about automatic stress-adaptive behavior is
409in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
410</p>
411
412<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
413
414<p> See the next section, "<a href="#adapt">Automatic stress-adaptive
415behavior</a>", if you are running Postfix version 2.5 or later, or
416if you have applied the source code patch for stress-adaptive
417behavior from the mirrors listed at http://www.postfix.org/download.html.
418</p>
419
420<p> The following measures can be applied temporarily during overload.
421They still allow <b>most</b> legitimate clients to connect and send
422mail, but may affect some legitimate clients. </p>
423
424<ul>
425
426<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
427postfix-users list from a variety of sysadmins shows that reducing
428the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
429clients. However, it is unlikely to become the Postfix default
430because it's not RFC compliant. Setting smtpd_timeout to 10s (line
4312 below) or even 5s under stress will still allow <b>most</b>
432legitimate clients to connect and send mail, but may delay mail
433from some clients.  No mail should be lost, as long as this measure
434is used only temporarily.  </p>
435
436<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
437to 1 under stress (line 3 below) helps by disconnecting clients
438after a single error, giving other clients a chance to connect.
439However, this may cause significant delays with legitimate mail,
440such as a mailing list that contains a few no-longer-active user
441names that didn't bother to unsubscribe. No mail should be lost,
442as long as this measure is used only temporarily. </p>
443
444<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
445100. This prevents clients from keeping idle connections open by
446repeatedly sending NOOP or RSET commands. </p>
447
448</ul>
449
450<blockquote>
451<pre>
4521  /etc/postfix/main.cf:
4532      smtpd_timeout = 10
4543      smtpd_hard_error_limit = 1
4554      smtpd_junk_command_limit = 1
456</pre>
457</blockquote>
458
459<p> With these measures, no mail should be lost, as long
460as these measures are used only temporarily. The next section of
461this document introduces a way to automate this process. </p>
462
463<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
464
465<p> To find out if your Postfix installation supports stress-adaptive
466behavior, use the "ps" command, and look for the smtpd processes.
467Postfix has stress-adaptive support when you see "-o stress=" or
468"-o stress=yes" command-line options. Remember that Postfix never
469enables stress-adaptive behavior on servers that listen on local
470addresses only. </p>
471
472<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
473and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
474
475<blockquote>
476<pre>
477$ ps ax|grep smtpd
47883326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
47984345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
480</pre>
481</blockquote>
482
483<p> You can't use postconf(1) to detect stress-adaptive support.
484The postconf(1) command ignores the existence of the stress parameter
485in main.cf, because the parameter has no effect there.  Command-line
486"-o parameter" settings always take precedence over main.cf parameter
487settings.  <p>
488
489<p> If you configure stress-adaptive behavior in main.cf when it
490isn't supported, nothing bad will happen.  The processes will run
491as if the stress parameter always has an empty value. </p>
492
493<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
494
495<p> You can manually force stress-adaptive behavior on, by adding
496a "-o stress=yes" command-line option in master.cf. This can be
497useful for testing overrides on the SMTP service. Issue "postfix
498reload" to make the change effective.  </p>
499
500<p> Note: setting the stress parameter in main.cf has no effect for
501services that accept remote connections. </p>
502
503<blockquote>
504<pre>
5051 /etc/postfix/master.cf:
5062     # =============================================================
5073     # service type  private unpriv  chroot  wakeup  maxproc command
5084     # =============================================================
5095     #
5106     smtp      inet  n       -       n       -       -       smtpd
5117         -o stress=yes
5128         -o . . .
513</pre>
514</blockquote>
515
516<p> To permanently force stress-adaptive behavior off with a specific
517service, specify "-o stress=" on its master.cf command line.  This
518may be desirable for the "submission" service. Issue "postfix reload"
519to make the change effective.  </p>
520
521<p> Note: setting the stress parameter in main.cf has no effect for
522services that accept remote connections. </p>
523
524<blockquote>
525<pre>
5261 /etc/postfix/master.cf:
5272     # =============================================================
5283     # service type  private unpriv  chroot  wakeup  maxproc command
5294     # =============================================================
5305     #
5316     submission inet n       -       n       -       -       smtpd
5327         -o stress=
5338         -o . . .
534</pre>
535</blockquote>
536
537<h2><a name="other"> Other measures to off-load zombies </a> </h2>
538
539<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides
540additional protection against mail server overload. One postscreen(8)
541process handles multiple inbound SMTP connections, and decides which
542clients may to talk to a Postfix SMTP server process.  By keeping
543spambots away, postscreen(8) leaves more SMTP server processes
544available for legitimate clients, and delays the onset of server
545overload conditions. </p>
546
547<h2><a name="credits"> Credits </a></h2>
548
549<ul>
550
551<li>  Thanks to the postfix-users mailing list members for sharing
552early experiences with the stress-adaptive feature.
553
554<li>  The RBL example and several other paragraphs of text were
555adapted from postfix-users postings by Noel Jones.
556
557<li>  Wietse implemented stress-adaptive behavior as the smallest
558possible patch while he should be working on other things.
559
560</ul>
561
562</body> </html>
563