1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title>Postfix Stress-Dependent Configuration</title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 11 12</head> 13 14<body> 15 16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix 17Stress-Dependent Configuration</h1> 18 19<hr> 20 21<h2>Overview </h2> 22 23<p> This document describes the symptoms of Postfix SMTP server 24overload. It presents permanent main.cf changes to avoid overload 25during normal operation, and temporary main.cf changes to cope with 26an unexpected burst of mail. This document makes specific suggestions 27for Postfix 2.5 and later which support stress-adaptive behavior, 28and for earlier Postfix versions that don't. </p> 29 30<p> Topics covered in this document: </p> 31 32<ul> 33 34<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 35 36<li><a href="#adapt"> Automatic stress-adaptive behavior </a> 37 38<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 39 40<li><a href="#time"> Spend less time per SMTP client </a> 41 42<li><a href="#hangup"> Disconnect suspicious SMTP clients </a> 43 44<li><a href="#legacy"> Temporary measures for older Postfix releases </a> 45 46<li><a href="#feature"> Detecting support for stress-adaptive behavior </a> 47 48<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a> 49 50<li><a href="#other"> Other measures to off-load zombies </a> 51 52<li><a href="#credits"> Credits </a> 53 54</ul> 55 56<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2> 57 58<p> Under normal conditions, the Postfix SMTP server responds 59immediately when an SMTP client connects to it; the time to deliver 60mail is noticeable only with large messages. Performance degrades 61dramatically when the number of SMTP clients exceeds the number of 62Postfix SMTP server processes. When an SMTP client connects while 63all Postfix SMTP server processes are busy, the client must wait 64until a server process becomes available. </p> 65 66<p> SMTP server overload may be caused by a surge of legitimate 67mail (example: a DNS registrar opens a new zone for registrations), 68by mistake (mail explosion caused by a forwarding loop) or by malice 69(worm outbreak, botnet, or other illegitimate activity). </p> 70 71<p> Symptoms of Postfix SMTP server overload are: </p> 72 73<ul> 74 75<li> <p> Remote SMTP clients experience a long delay before Postfix 76sends the "220 hostname.example.com ESMTP Postfix" greeting. </p> 77 78<ul> 79 80<li> <p> NOTE: Broken DNS configurations can also cause lengthy 81delays before Postfix sends "220 hostname.example.com ...". These 82delays also exist when Postfix is NOT overloaded. </p> 83 84<li> <p> NOTE: To avoid "overload" delays for end-user mail 85clients, enable the "submission" service entry in master.cf (present 86since Postfix 2.1), and tell users to connect to this instead of 87the public SMTP service. </p> 88 89</ul> 90 91<li> <p> The Postfix SMTP server logs an increased number of "lost 92connection after CONNECT" events. This happens because remote SMTP 93clients disconnect before Postfix answers the connection. </p> 94 95<ul> 96 97<li> <p> NOTE: A portscan for open SMTP ports can also result in 98"lost connection ..." logfile messages. </p> 99 100</ul> 101 102<li> <p> Postfix 2.3 and later logs a warning that all server ports 103are busy: </p> 104 105<pre> 106Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" 107 (25) has reached its process limit "30": new clients may experience 108 noticeable delays 109Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this 110 condition, increase the process count in master.cf or reduce the 111 service time per client 112Oct 3 20:39:27 spike postfix/master[28905]: warning: see 113 <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of 114 stress-adapting configuration settings 115</pre> 116 117</ul> 118 119<p> Legitimate mail that doesn't get through during an episode of 120Postfix SMTP server overload is not necessarily lost. It should 121still arrive once the situation returns to normal, as long as the 122overload condition is temporary. </p> 123 124<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2> 125 126<p> Postfix version 2.5 introduces automatic stress-adaptive behavior. 127It works as follows. When a "public" network service such as the 128SMTP server runs into an "all server ports are busy" condition, the 129Postfix master(8) daemon logs a warning, restarts the service 130(without interrupting existing network sessions), and runs the 131service with "-o stress=yes" on the server process command line: 132</p> 133 134<blockquote> 135<pre> 13680821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes 137</pre> 138</blockquote> 139 140<p> Normally, the Postfix master(8) daemon runs such a service with 141"-o stress=" on the command line (i.e. with an empty parameter 142value): </p> 143 144<blockquote> 145<pre> 14683326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 147</pre> 148</blockquote> 149 150<p> Services that have local access only never have "-o stress" 151parameters on the command line. This includes services internal to 152Postfix such as the queue manager, and services that listen on a 153loopback interface only, such as after-filter SMTP services. </p> 154 155<p> The "stress" parameter value is the key to making main.cf 156parameter settings stress adaptive. The following settings are the 157default with Postfix 2.6 and later. </p> 158 159<blockquote> 160<pre> 1611 smtpd_timeout = ${stress?10}${stress:300}s 1622 smtpd_hard_error_limit = ${stress?1}${stress:20} 1633 smtpd_junk_command_limit = ${stress?1}${stress:100} 1644 # Parameters added after Postfix 2.6: 1655 smtpd_per_record_deadline = ${stress?yes}${stress:no} 1666 smtpd_starttls_timeout = ${stress?10}${stress:300}s 1677 address_verify_poll_count = ${stress?1}${stress:3} 168</pre> 169</blockquote> 170 171<p> Translation: <p> 172 173<ul> 174 175<li> <p> Line 1: under conditions of stress, use an smtpd_timeout 176value of 10 seconds instead of the default 300 seconds. Experience 177on the postfix-users list from a variety of sysadmins shows that 178reducing the "normal" smtpd_timeout to 60s is unlikely to affect 179legitimate clients. However, it is unlikely to become the Postfix 180default because it's not RFC compliant. Setting smtpd_timeout to 18110s or even 5s under stress will still allow most 182legitimate clients to connect and send mail, but may delay mail 183from some clients. No mail should be lost, as long as this measure 184is used only temporarily. </p> 185 186<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit 187of 1 instead of the default 20. This helps by disconnecting clients 188after a single error, giving other clients a chance to connect. 189However, this may cause significant delays with legitimate mail, 190such as a mailing list that contains a few no-longer-active user 191names that didn't bother to unsubscribe. No mail should be lost, 192as long as this measure is used only temporarily. </p> 193 194<li> <p> Line 3: under conditions of stress, use an 195smtpd_junk_command_limit of 1 instead of the default 100. This 196prevents clients from keeping connections open by repeatedly 197sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p> 198 199<li> <p> Line 5: under conditions of stress, change the behavior 200of smtpd_timeout and smtpd_starttls_timeout, from a time limit per 201read or write system call, to a time limit to send or receive a 202complete record (an SMTP command line, SMTP response line, SMTP 203message content line, or TLS protocol message). </p> 204 205<li> <p> Line 6: under conditions of stress, reduce the time limit 206for TLS protocol handshake messages to 10 seconds, from the default 207value of 300 seconds. See also the smtpd_timeout discussion above. 208</p> 209 210<li> <p> Line 7: under conditions of stress, do not wait up to 6 211seconds for the completion of an address verification probe. If the 212result is not already in the address verification cache, reply 213immediately with $unverified_recipient_tempfail_action or 214$unverified_sender_tempfail_action. No mail should be lost, as long 215as this measure is used only temporarily. </p> 216 217</ul> 218 219<p> The syntax of ${name?value} and ${name:value} is explained at 220the beginning of the postconf(5) manual page. </p> 221 222<p> NOTE: Please keep in mind that the stress-adaptive feature is 223a fairly desperate measure to keep <b>some</b> legitimate mail 224flowing under overload conditions. If a site is reaching the SMTP 225server process limit when there isn't an attack or bot flood 226occurring, then either the process limit needs to be raised or more 227hardware needs to be added. </p> 228 229<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2> 230 231<p> This section and the ones that follow discuss permanent measures 232against mail server overload. </p> 233 234<p> One measure to avoid the "all server processes busy" condition 235is to service more SMTP clients simultaneously. For this you need 236to increase the number of Postfix SMTP server processes. This will 237improve the 238responsiveness for remote SMTP clients, as long as the server machine 239has enough hardware and software resources to run the additional 240processes, and as long as the file system can keep up with the 241additional load. </p> 242 243<ul> 244 245<li> <p> You increase the number of SMTP server processes either 246by increasing the default_process_limit in main.cf (line 3 below), 247or by increasing the SMTP server's "maxproc" field in master.cf 248(line 10 below). Either way, you need to issue a "postfix reload" 249command to make the change effective. </p> 250 251<li> <p> Process limits above 1000 require Postfix version 2.4 or 252later, and an operating system that supports kernel-based event 253filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll). 254</p> 255 256<li> <p> More processes use more memory. You can reduce the Postfix 257memory footprint by using cdb: 258lookup tables instead of Berkeley DB's hash: or btree: tables. </p> 259 260<pre> 261 1 /etc/postfix/main.cf: 262 2 # Raise the global process limit, 100 since Postfix 2.0. 263 3 default_process_limit = 200 264 4 265 5 /etc/postfix/master.cf: 266 6 # ============================================================= 267 7 # service type private unpriv chroot wakeup maxproc command 268 8 # ============================================================= 269 9 # Raise the SMTP service process limit only. 27010 smtp inet n - n - 200 smtpd 271</pre> 272 273<li> <p> NOTE: older versions of the SMTPD_POLICY_README document 274contain a mistake: they configure a fixed number of policy daemon 275processes. When you raise the SMTP server's "maxproc" field in 276master.cf, SMTP server processes will report problems when connecting 277to policy server processes, because there aren't enough of them. 278Examples of errors are "connection refused" or "operation timed 279out". </p> 280 281<p> To fix, edit master.cf and specify a zero "maxproc" field 282in all policy server entries; see line 6 in the example below. 283Issue a "postfix reload" command to make the change effective. </p> 284 285<pre> 2861 /etc/postfix/master.cf: 2872 # ============================================================= 2883 # service type private unpriv chroot wakeup maxproc command 2894 # ============================================================= 2905 # Disable the policy service process limit. 2916 policy unix - n n - 0 spawn 2927 user=nobody argv=/some/where/policy-server 293</pre> 294 295</ul> 296 297<h2><a name="time"> Spend less time per SMTP client </a></h2> 298 299<p> When increasing the number of SMTP server processes is not 300practical, you can improve Postfix server responsiveness by eliminating 301delays. When Postfix spends less time per SMTP session, the same 302number of SMTP server processes can service more clients in a given 303amount of time. </p> 304 305<ul> 306 307<li> <p> Eliminate non-functional RBL lookups (blocklists that are 308no longer in operation). These lookups can degrade performance. 309Postfix logs a warning when an RBL server does not respond. </p> 310 311<li> <p> Eliminate redundant RBL lookups (people often use multiple 312Spamhaus RBLs that include each other). To find out whether RBLs 313include other RBLs, look up the websites that document the RBL's 314policies. </p> 315 316<li> <p> Eliminate header_checks and body_checks, and keep just a few 317emergency patterns to block the latest worm explosion or backscatter 318mail. See BACKSCATTER_README for examples of the latter. 319 320<li> <p> Group your header_checks and body_checks patterns to avoid 321unnecessary pattern matching operations: 322 323<pre> 324 1 /etc/postfix/header_checks: 325 2 if /^Subject:/ 326 3 /^Subject: virus found in mail from you/ reject 327 4 /^Subject: ..other../ reject 328 5 endif 329 6 330 7 if /^Received:/ 331 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1 332 9 /^Received: from ..other../ reject .... 33310 endif 334</pre> 335 336</ul> 337 338<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2> 339 340<p> Under conditions of overload you can improve Postfix SMTP server 341responsiveness by hanging up on suspicious clients, so that other 342clients get a chance to talk to Postfix. </p> 343 344<ul> 345 346<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" 347(Postfix 2.3-2.5) to hang up on clients that that match botnet-related 348RBLs (see next bullet) or that match selected non-RBL restrictions 349such as SMTP access maps. The Postfix SMTP server will reject mail 350and disconnect without waiting for the remote SMTP client to send 351a QUIT command. </p> 352 353<li> <p> To hang up connections from blacklisted zombies, you can 354set specific Postfix SMTP server reject codes for specific RBLs, 355and for individual responses from specific RBLs. We'll use 356zen.spamhaus.org as an example; by the time you read this document, 357details may have changed. Right now, their documents say that a 358response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP 359address, which means that the machine is probably running a bot of 360some kind. To give a 521 response instead of the default 554 361response, use something like: </p> 362 363<pre> 364 1 /etc/postfix/main.cf: 365 2 smtpd_client_restrictions = 366 3 permit_mynetworks 367 4 reject_rbl_client zen.spamhaus.org=127.0.0.10 368 5 reject_rbl_client zen.spamhaus.org=127.0.0.11 369 6 reject_rbl_client zen.spamhaus.org 370 7 371 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps 372 9 37310 /etc/postfix/rbl_reply_maps: 37411 # With Postfix 2.3-2.5 use "421" to hang up connections. 37512 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; 37613 $rbl_class [$rbl_what] blocked using 37714 $rbl_domain${rbl_reason?; $rbl_reason} 37815 37916 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; 38017 $rbl_class [$rbl_what] blocked using 38118 $rbl_domain${rbl_reason?; $rbl_reason} 382</pre> 383 384<p> Although the above example shows three RBL lookups (lines 4-6), 385Postfix will only do a single DNS query, so it does not affect the 386performance. </p> 387 388<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not 389cause Postfix to disconnect). The down-side of replying with 421 390is that it works only for zombies and other malware. If the client 391is running a real MTA, then it may connect again several times until 392the mail expires in its queue. When this is a problem, stick with 393the default 554 reply, and use "smtpd_hard_error_limit = 1" as 394described below. </p> 395 396<li> <p> You can automatically turn on the above overload measure 397with Postfix 2.5 and later, or with earlier releases that contain 398the stress-adaptive behavior source code patch from the mirrors 399listed at http://www.postfix.org/download.html. Simply replace line 400above 8 with: </p> 401 402<pre> 403 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps} 404</pre> 405 406</ul> 407 408<p> More information about automatic stress-adaptive behavior is 409in section "<a href="#adapt">Automatic stress-adaptive behavior</a>". 410</p> 411 412<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2> 413 414<p> See the next section, "<a href="#adapt">Automatic stress-adaptive 415behavior</a>", if you are running Postfix version 2.5 or later, or 416if you have applied the source code patch for stress-adaptive 417behavior from the mirrors listed at http://www.postfix.org/download.html. 418</p> 419 420<p> The following measures can be applied temporarily during overload. 421They still allow <b>most</b> legitimate clients to connect and send 422mail, but may affect some legitimate clients. </p> 423 424<ul> 425 426<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the 427postfix-users list from a variety of sysadmins shows that reducing 428the "normal" smtpd_timeout to 60s is unlikely to affect legitimate 429clients. However, it is unlikely to become the Postfix default 430because it's not RFC compliant. Setting smtpd_timeout to 10s (line 4312 below) or even 5s under stress will still allow <b>most</b> 432legitimate clients to connect and send mail, but may delay mail 433from some clients. No mail should be lost, as long as this measure 434is used only temporarily. </p> 435 436<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this 437to 1 under stress (line 3 below) helps by disconnecting clients 438after a single error, giving other clients a chance to connect. 439However, this may cause significant delays with legitimate mail, 440such as a mailing list that contains a few no-longer-active user 441names that didn't bother to unsubscribe. No mail should be lost, 442as long as this measure is used only temporarily. </p> 443 444<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default 445100. This prevents clients from keeping idle connections open by 446repeatedly sending NOOP or RSET commands. </p> 447 448</ul> 449 450<blockquote> 451<pre> 4521 /etc/postfix/main.cf: 4532 smtpd_timeout = 10 4543 smtpd_hard_error_limit = 1 4554 smtpd_junk_command_limit = 1 456</pre> 457</blockquote> 458 459<p> With these measures, no mail should be lost, as long 460as these measures are used only temporarily. The next section of 461this document introduces a way to automate this process. </p> 462 463<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2> 464 465<p> To find out if your Postfix installation supports stress-adaptive 466behavior, use the "ps" command, and look for the smtpd processes. 467Postfix has stress-adaptive support when you see "-o stress=" or 468"-o stress=yes" command-line options. Remember that Postfix never 469enables stress-adaptive behavior on servers that listen on local 470addresses only. </p> 471 472<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX 473and other System-V flavors, use "ps -ef" instead of "ps ax". </p> 474 475<blockquote> 476<pre> 477$ ps ax|grep smtpd 47883326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 47984345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl 480</pre> 481</blockquote> 482 483<p> You can't use postconf(1) to detect stress-adaptive support. 484The postconf(1) command ignores the existence of the stress parameter 485in main.cf, because the parameter has no effect there. Command-line 486"-o parameter" settings always take precedence over main.cf parameter 487settings. <p> 488 489<p> If you configure stress-adaptive behavior in main.cf when it 490isn't supported, nothing bad will happen. The processes will run 491as if the stress parameter always has an empty value. </p> 492 493<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2> 494 495<p> You can manually force stress-adaptive behavior on, by adding 496a "-o stress=yes" command-line option in master.cf. This can be 497useful for testing overrides on the SMTP service. Issue "postfix 498reload" to make the change effective. </p> 499 500<p> Note: setting the stress parameter in main.cf has no effect for 501services that accept remote connections. </p> 502 503<blockquote> 504<pre> 5051 /etc/postfix/master.cf: 5062 # ============================================================= 5073 # service type private unpriv chroot wakeup maxproc command 5084 # ============================================================= 5095 # 5106 smtp inet n - n - - smtpd 5117 -o stress=yes 5128 -o . . . 513</pre> 514</blockquote> 515 516<p> To permanently force stress-adaptive behavior off with a specific 517service, specify "-o stress=" on its master.cf command line. This 518may be desirable for the "submission" service. Issue "postfix reload" 519to make the change effective. </p> 520 521<p> Note: setting the stress parameter in main.cf has no effect for 522services that accept remote connections. </p> 523 524<blockquote> 525<pre> 5261 /etc/postfix/master.cf: 5272 # ============================================================= 5283 # service type private unpriv chroot wakeup maxproc command 5294 # ============================================================= 5305 # 5316 submission inet n - n - - smtpd 5327 -o stress= 5338 -o . . . 534</pre> 535</blockquote> 536 537<h2><a name="other"> Other measures to off-load zombies </a> </h2> 538 539<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides 540additional protection against mail server overload. One postscreen(8) 541process handles multiple inbound SMTP connections, and decides which 542clients may to talk to a Postfix SMTP server process. By keeping 543spambots away, postscreen(8) leaves more SMTP server processes 544available for legitimate clients, and delays the onset of server 545overload conditions. </p> 546 547<h2><a name="credits"> Credits </a></h2> 548 549<ul> 550 551<li> Thanks to the postfix-users mailing list members for sharing 552early experiences with the stress-adaptive feature. 553 554<li> The RBL example and several other paragraphs of text were 555adapted from postfix-users postings by Noel Jones. 556 557<li> Wietse implemented stress-adaptive behavior as the smallest 558possible patch while he should be working on other things. 559 560</ul> 561 562</body> </html> 563