xref: /netbsd-src/external/ibm-public/postfix/dist/proto/STRESS_README.html (revision ae87de8892f277bece3527c15b186ebcfa188227)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title>Postfix Stress-Dependent Configuration</title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
11
12</head>
13
14<body>
15
16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
17Stress-Dependent Configuration</h1>
18
19<hr>
20
21<h2>Overview </h2>
22
23<p> This document describes the symptoms of Postfix SMTP server
24overload. It presents permanent main.cf changes to avoid overload
25during normal operation, and temporary main.cf changes to cope with
26an unexpected burst of mail. This document makes specific suggestions
27for Postfix 2.5 and later which support stress-adaptive behavior,
28and for earlier Postfix versions that don't.  </p>
29
30<p> Topics covered in this document: </p>
31
32<ul>
33
34<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a>
35
36<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
37
38<li><a href="#concurrency"> Service more SMTP clients at the same time </a>
39
40<li><a href="#time"> Spend less time per SMTP client </a>
41
42<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
43
44<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
45
46<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
47
48<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
49
50<li><a href="#other"> Other measures to off-load zombies </a>
51
52<li><a href="#credits"> Credits </a>
53
54</ul>
55
56<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
57
58<p> Under normal conditions, the Postfix SMTP server responds
59immediately when an SMTP client connects to it; the time to deliver
60mail is noticeable only with large messages.  Performance degrades
61dramatically when the number of SMTP clients exceeds the number of
62Postfix SMTP server processes.  When an SMTP client connects while
63all Postfix SMTP server processes are busy, the client must wait
64until a server process becomes available. </p>
65
66<p> SMTP server overload may be caused by a surge of legitimate
67mail (example: a DNS registrar opens a new zone for registrations),
68by mistake (mail explosion caused by a forwarding loop) or by malice
69(worm outbreak, botnet, or other illegitimate activity).  </p>
70
71<p> Symptoms of Postfix SMTP server overload are: </p>
72
73<ul>
74
75<li> <p> Remote SMTP clients experience a long delay before Postfix
76sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
77
78<ul>
79
80<li> <p> NOTE: Broken DNS configurations can also cause lengthy
81delays before Postfix sends "220 hostname.example.com ...". These
82delays also exist when Postfix is NOT overloaded.  </p>
83
84<li> <p> NOTE:  To avoid "overload" delays for end-user mail
85clients, enable the "submission" service entry in master.cf (present
86since Postfix 2.1), and tell users to connect to this instead of
87the public SMTP service. </p>
88
89</ul>
90
91<li> <p> The Postfix SMTP server logs an increased number of "lost
92connection after CONNECT" events. This happens because remote SMTP
93clients disconnect before Postfix answers the connection. </p>
94
95<ul>
96
97<li> <p> NOTE: A portscan for open SMTP ports can also result in
98"lost connection ..." logfile messages. </p>
99
100</ul>
101
102<li> <p> Postfix 2.3 and later logs a warning that all server ports
103are busy: </p>
104
105<pre>
106Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
107 (25) has reached its process limit "30": new clients may experience
108 noticeable delays
109Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
110 condition, increase the process count in master.cf or reduce the
111 service time per client
112Oct  3 20:39:27 spike postfix/master[28905]: warning: see
113  <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of
114  stress-adapting configuration settings
115</pre>
116
117</ul>
118
119<p> Legitimate mail that doesn't get through during an episode of
120Postfix SMTP server overload is not necessarily lost. It should
121still arrive once the situation returns to normal, as long as the
122overload condition is temporary.  </p>
123
124<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
125
126<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
127It works as follows. When a "public" network service such as the
128SMTP server runs into an "all server ports are busy" condition, the
129Postfix master(8) daemon logs a warning, restarts the service
130(without interrupting existing network sessions), and runs the
131service with "-o stress=yes" on the server process command line:
132</p>
133
134<blockquote>
135<pre>
13680821  ??  S      0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
137</pre>
138</blockquote>
139
140<p> Normally, the Postfix master(8) daemon runs such a service with
141"-o stress=" on the command line (i.e.  with an empty parameter
142value):  </p>
143
144<blockquote>
145<pre>
14683326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
147</pre>
148</blockquote>
149
150<p> You won't see "-o stress" command-line parameters with services
151that have local clients only. These include services internal to
152Postfix such as the queue manager, and services that listen on a
153loopback interface only, such as after-filter SMTP services.  </p>
154
155<p> The "stress" parameter value is the key to making main.cf
156parameter settings stress adaptive. The following settings are the
157default with Postfix 2.6 and later. </p>
158
159<blockquote>
160<pre>
1611 smtpd_timeout = ${stress?{10}:{300}}s
1622 smtpd_hard_error_limit = ${stress?{1}:{20}}
1633 smtpd_junk_command_limit = ${stress?{1}:{100}}
1644 # Parameters added after Postfix 2.6:
1655 smtpd_per_record_deadline = ${stress?{yes}:{no}}
1666 smtpd_starttls_timeout = ${stress?{10}:{300}}s
1677 address_verify_poll_count = ${stress?{1}:{3}}
168</pre>
169</blockquote>
170
171<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y}
172instead of the newer form ${stress?{x}:{y}}. </p>
173
174<p> The syntax of ${name?{value}:{value}}, ${name?value} and
175${name:value} is explained at the beginning of the postconf(5)
176manual page. </p>
177
178<p> Translation: <p>
179
180<ul>
181
182<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
183value of 10 seconds instead of the default 300 seconds. Experience
184on the postfix-users list from a variety of sysadmins shows that
185reducing the "normal" smtpd_timeout to 60s is unlikely to affect
186legitimate clients. However, it is unlikely to become the Postfix
187default because it's not RFC compliant. Setting smtpd_timeout to
18810s or even 5s under stress will still allow most
189legitimate clients to connect and send mail, but may delay mail
190from some clients. No mail should be lost, as long as this measure
191is used only temporarily. </p>
192
193<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
194of 1 instead of the default 20. This disconnects clients
195after a single error, giving other clients a chance to connect.
196However, this may cause significant delays with legitimate mail,
197such as a mailing list that contains a few no-longer-active user
198names that didn't bother to unsubscribe. No mail should be lost,
199as long as this measure is used only temporarily. </p>
200
201<li> <p> Line 3: under conditions of stress, use an
202smtpd_junk_command_limit of 1 instead of the default 100. This
203prevents clients from keeping connections open by repeatedly
204sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p>
205
206<li> <p> Line 5: under conditions of stress, change the behavior
207of smtpd_timeout and smtpd_starttls_timeout, from a time limit per
208read or write system call, to a time limit to send or receive a
209complete record (an SMTP command line, SMTP response line, SMTP
210message content line, or TLS protocol message). </p>
211
212<li> <p> Line 6: under conditions of stress, reduce the time limit
213for TLS protocol handshake messages to 10 seconds, from the default
214value of 300 seconds. See also the smtpd_timeout discussion above.
215</p>
216
217<li> <p> Line 7: under conditions of stress, do not wait up to 6
218seconds for the completion of an address verification probe. If the
219result is not already in the address verification cache, reply
220immediately with $unverified_recipient_tempfail_action or
221$unverified_sender_tempfail_action. No mail should be lost, as long
222as this measure is used only temporarily.  </p>
223
224</ul>
225
226<p> NOTE: Please keep in mind that the stress-adaptive feature is
227a fairly desperate measure to keep <b>some</b> legitimate mail
228flowing under overload conditions.  If a site is reaching the SMTP
229server process limit when there isn't an attack or bot flood
230occurring, then either the process limit needs to be raised or more
231hardware needs to be added.  </p>
232
233<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
234
235<p> This section and the ones that follow discuss permanent measures
236against mail server overload.  </p>
237
238<p> One measure to avoid the "all server processes busy" condition
239is to service more SMTP clients simultaneously. For this you need
240to increase the number of Postfix SMTP server processes. This will
241improve the
242responsiveness for remote SMTP clients, as long as the server machine
243has enough hardware and software resources to run the additional
244processes, and as long as the file system can keep up with the
245additional load. </p>
246
247<ul>
248
249<li> <p> You increase the number of SMTP server processes either
250by increasing the default_process_limit in main.cf (line 3 below),
251or by increasing the SMTP server's "maxproc" field in master.cf
252(line 10 below).  Either way, you need to issue a "postfix reload"
253command to make the change effective.  </p>
254
255<li> <p> Process limits above 1000 require Postfix version 2.4 or
256later, and an operating system that supports kernel-based event
257filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
258</p>
259
260<li> <p> More processes use more memory. You can reduce the Postfix
261memory footprint by using cdb:
262lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
263
264<pre>
265 1 /etc/postfix/main.cf:
266 2     # Raise the global process limit, 100 since Postfix 2.0.
267 3     default_process_limit = 200
268 4
269 5 /etc/postfix/master.cf:
270 6     # =============================================================
271 7     # service type  private unpriv  chroot  wakeup  maxproc command
272 8     # =============================================================
273 9     # Raise the SMTP service process limit only.
27410     smtp      inet  n       -       n       -       200     smtpd
275</pre>
276
277<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
278contain a mistake: they configure a fixed number of policy daemon
279processes.  When you raise the SMTP server's "maxproc" field in
280master.cf, SMTP server processes will report problems when connecting
281to policy server processes, because there aren't enough of them.
282Examples of errors are "connection refused" or "operation timed
283out".  </p>
284
285<p> To fix, edit master.cf and specify a zero "maxproc" field
286in all policy server entries; see line 6 in the example below.
287Issue a "postfix reload" command to make the change effective.  </p>
288
289<pre>
2901 /etc/postfix/master.cf:
2912     # =============================================================
2923     # service type  private unpriv  chroot  wakeup  maxproc command
2934     # =============================================================
2945     # Disable the policy service process limit.
2956     policy    unix  -       n       n       -       0       spawn
2967         user=nobody argv=/some/where/policy-server
297</pre>
298
299</ul>
300
301<h2><a name="time"> Spend less time per SMTP client </a></h2>
302
303<p> When increasing the number of SMTP server processes is not
304practical, you can improve Postfix server responsiveness by eliminating
305delays.  When Postfix spends less time per SMTP session, the same
306number of SMTP server processes can service more clients in a given
307amount of time. </p>
308
309<ul>
310
311<li> <p> Eliminate non-functional RBL lookups (blocklists that are
312no longer in operation). These lookups can degrade performance.
313Postfix logs a warning when an RBL server does not respond. </p>
314
315<li> <p> Eliminate redundant RBL lookups (people often use multiple
316Spamhaus RBLs that include each other).  To find out whether RBLs
317include other RBLs, look up the websites that document the RBL's
318policies. </p>
319
320<li> <p> Eliminate header_checks and body_checks, and keep just a few
321emergency patterns to block the latest worm explosion or backscatter
322mail.  See BACKSCATTER_README for examples of the latter.
323
324<li> <p> Group your header_checks and body_checks patterns to avoid
325unnecessary pattern matching operations:
326
327<pre>
328 1  /etc/postfix/header_checks:
329 2      if /^Subject:/
330 3      /^Subject: virus found in mail from you/ reject
331 4      /^Subject: ..other../ reject
332 5      endif
333 6
334 7      if /^Received:/
335 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
336 9      /^Received: from ..other../ reject ....
33710      endif
338</pre>
339
340</ul>
341
342<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
343
344<p> Under conditions of overload you can improve Postfix SMTP server
345responsiveness by hanging up on suspicious clients, so that other
346clients get a chance to talk to Postfix.  </p>
347
348<ul>
349
350<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
351(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
352RBLs (see next bullet) or that match selected non-RBL restrictions
353such as SMTP access maps.  The Postfix SMTP server will reject mail
354and disconnect without waiting for the remote SMTP client to send
355a QUIT command.  </p>
356
357<li> <p> To hang up connections from denylisted zombies, you can
358set specific Postfix SMTP server reject codes for specific RBLs,
359and for individual responses from specific RBLs. We'll use
360zen.spamhaus.org as an example; by the time you read this document,
361details may have changed.  Right now, their documents say that a
362response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
363address, which means that the machine is probably running a bot of
364some kind.  To give a 521 response instead of the default 554
365response, use something like: </p>
366
367<pre>
368 1  /etc/postfix/main.cf:
369 2      smtpd_client_restrictions =
370 3         permit_mynetworks
371 4         reject_rbl_client zen.spamhaus.org=127.0.0.10
372 5         reject_rbl_client zen.spamhaus.org=127.0.0.11
373 6         reject_rbl_client zen.spamhaus.org
374 7
375 8      rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
376 9
37710  /etc/postfix/rbl_reply_maps:
37811      # With Postfix 2.3-2.5 use "421" to hang up connections.
37912      zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
38013       $rbl_class [$rbl_what] blocked using
38114       $rbl_domain${rbl_reason?; $rbl_reason}
38215
38316      zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
38417       $rbl_class [$rbl_what] blocked using
38518       $rbl_domain${rbl_reason?; $rbl_reason}
386</pre>
387
388<p> Although the above example shows three RBL lookups (lines 4-6),
389Postfix will only do a single DNS query, so it does not affect the
390performance. </p>
391
392<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
393cause Postfix to disconnect). The down-side of replying with 421
394is that it works only for zombies and other malware. If the client
395is running a real MTA, then it may connect again several times until
396the mail expires in its queue. When this is a problem, stick with
397the default 554 reply, and use "smtpd_hard_error_limit = 1" as
398described below.  </p>
399
400<li> <p> You can automatically turn on the above overload measure
401with Postfix 2.5 and later, or with earlier releases that contain
402the stress-adaptive behavior source code patch from the mirrors
403listed at http://www.postfix.org/download.html. Simply replace line
404above 8 with: </p>
405
406<pre>
407 8      rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
408</pre>
409
410</ul>
411
412<p> More information about automatic stress-adaptive behavior is
413in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
414</p>
415
416<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
417
418<p> See the section "<a href="#adapt">Automatic stress-adaptive
419behavior</a>" if you are running Postfix version 2.5 or later, or
420if you have applied the source code patch for stress-adaptive
421behavior from the mirrors listed at http://www.postfix.org/download.html.
422</p>
423
424<p> The following measures can be applied temporarily during overload.
425They still allow <b>most</b> legitimate clients to connect and send
426mail, but may affect some legitimate clients. </p>
427
428<ul>
429
430<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
431postfix-users list from a variety of sysadmins shows that reducing
432the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
433clients. However, it is unlikely to become the Postfix default
434because it's not RFC compliant. Setting smtpd_timeout to 10s (line
4352 below) or even 5s under stress will still allow <b>most</b>
436legitimate clients to connect and send mail, but may delay mail
437from some clients.  No mail should be lost, as long as this measure
438is used only temporarily.  </p>
439
440<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
441to 1 under stress (line 3 below) helps by disconnecting clients
442after a single error, giving other clients a chance to connect.
443However, this may cause significant delays with legitimate mail,
444such as a mailing list that contains a few no-longer-active user
445names that didn't bother to unsubscribe. No mail should be lost,
446as long as this measure is used only temporarily. </p>
447
448<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
449100. This prevents clients from keeping idle connections open by
450repeatedly sending NOOP or RSET commands. </p>
451
452</ul>
453
454<blockquote>
455<pre>
4561  /etc/postfix/main.cf:
4572      smtpd_timeout = 10
4583      smtpd_hard_error_limit = 1
4594      smtpd_junk_command_limit = 1
460</pre>
461</blockquote>
462
463<p> With these measures, no mail should be lost, as long
464as these measures are used only temporarily. The next section of
465this document introduces a way to automate this process. </p>
466
467<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
468
469<p> To find out if your Postfix installation supports stress-adaptive
470behavior, use the "ps" command, and look for the smtpd processes.
471Postfix has stress-adaptive support when you see "-o stress=" or
472"-o stress=yes" command-line options. Remember that Postfix never
473enables stress-adaptive behavior on servers that listen on local
474addresses only. </p>
475
476<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
477and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
478
479<blockquote>
480<pre>
481$ ps ax|grep smtpd
48283326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
48384345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
484</pre>
485</blockquote>
486
487<p> You can't use postconf(1) to detect stress-adaptive support.
488The postconf(1) command ignores the existence of the stress parameter
489in main.cf, because the parameter has no effect there.  Command-line
490"-o parameter" settings always take precedence over main.cf parameter
491settings.  <p>
492
493<p> If you configure stress-adaptive behavior in main.cf when it
494isn't supported, nothing bad will happen.  The processes will run
495as if the stress parameter always has an empty value. </p>
496
497<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
498
499<p> You can manually force stress-adaptive behavior on, by adding
500a "-o stress=yes" command-line option in master.cf. This can be
501useful for testing overrides on the SMTP service. Issue "postfix
502reload" to make the change effective.  </p>
503
504<p> Note: setting the stress parameter in main.cf has no effect for
505services that accept remote connections. </p>
506
507<blockquote>
508<pre>
5091 /etc/postfix/master.cf:
5102     # =============================================================
5113     # service type  private unpriv  chroot  wakeup  maxproc command
5124     # =============================================================
5135     #
5146     smtp      inet  n       -       n       -       -       smtpd
5157         -o stress=yes
5168         -o . . .
517</pre>
518</blockquote>
519
520<p> To permanently force stress-adaptive behavior off with a specific
521service, specify "-o stress=" on its master.cf command line.  This
522may be desirable for the "submission" service. Issue "postfix reload"
523to make the change effective.  </p>
524
525<p> Note: setting the stress parameter in main.cf has no effect for
526services that accept remote connections. </p>
527
528<blockquote>
529<pre>
5301 /etc/postfix/master.cf:
5312     # =============================================================
5323     # service type  private unpriv  chroot  wakeup  maxproc command
5334     # =============================================================
5345     #
5356     submission inet n       -       n       -       -       smtpd
5367         -o stress=
5378         -o . . .
538</pre>
539</blockquote>
540
541<h2><a name="other"> Other measures to off-load zombies </a> </h2>
542
543<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides
544additional protection against mail server overload. One postscreen(8)
545process handles multiple inbound SMTP connections, and decides which
546clients may talk to a Postfix SMTP server process.  By keeping
547spambots away, postscreen(8) leaves more SMTP server processes
548available for legitimate clients, and delays the onset of server
549overload conditions. </p>
550
551<h2><a name="credits"> Credits </a></h2>
552
553<ul>
554
555<li>  Thanks to the postfix-users mailing list members for sharing
556early experiences with the stress-adaptive feature.
557
558<li>  The RBL example and several other paragraphs of text were
559adapted from postfix-users postings by Noel Jones.
560
561<li>  Wietse implemented stress-adaptive behavior as the smallest
562possible patch while he should be working on other things.
563
564</ul>
565
566</body> </html>
567