1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title>Postfix Stress-Dependent Configuration</title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 11<link rel='stylesheet' type='text/css' href='postfix-doc.css'> 12 13</head> 14 15<body> 16 17<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix 18Stress-Dependent Configuration</h1> 19 20<hr> 21 22<h2>Overview </h2> 23 24<p> This document describes the symptoms of Postfix SMTP server 25overload. It presents permanent main.cf changes to avoid overload 26during normal operation, and temporary main.cf changes to cope with 27an unexpected burst of mail. This document makes specific suggestions 28for Postfix 2.5 and later which support stress-adaptive behavior, 29and for earlier Postfix versions that don't. </p> 30 31<p> Topics covered in this document: </p> 32 33<ul> 34 35<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 36 37<li><a href="#adapt"> Automatic stress-adaptive behavior </a> 38 39<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 40 41<li><a href="#time"> Spend less time per SMTP client </a> 42 43<li><a href="#hangup"> Disconnect suspicious SMTP clients </a> 44 45<li><a href="#legacy"> Temporary measures for older Postfix releases </a> 46 47<li><a href="#feature"> Detecting support for stress-adaptive behavior </a> 48 49<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a> 50 51<li><a href="#other"> Other measures to off-load zombies </a> 52 53<li><a href="#credits"> Credits </a> 54 55</ul> 56 57<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2> 58 59<p> Under normal conditions, the Postfix SMTP server responds 60immediately when an SMTP client connects to it; the time to deliver 61mail is noticeable only with large messages. Performance degrades 62dramatically when the number of SMTP clients exceeds the number of 63Postfix SMTP server processes. When an SMTP client connects while 64all Postfix SMTP server processes are busy, the client must wait 65until a server process becomes available. </p> 66 67<p> SMTP server overload may be caused by a surge of legitimate 68mail (example: a DNS registrar opens a new zone for registrations), 69by mistake (mail explosion caused by a forwarding loop) or by malice 70(worm outbreak, botnet, or other illegitimate activity). </p> 71 72<p> Symptoms of Postfix SMTP server overload are: </p> 73 74<ul> 75 76<li> <p> Remote SMTP clients experience a long delay before Postfix 77sends the "220 hostname.example.com ESMTP Postfix" greeting. </p> 78 79<ul> 80 81<li> <p> NOTE: Broken DNS configurations can also cause lengthy 82delays before Postfix sends "220 hostname.example.com ...". These 83delays also exist when Postfix is NOT overloaded. </p> 84 85<li> <p> NOTE: To avoid "overload" delays for end-user mail 86clients, enable the "submission" service entry in master.cf (present 87since Postfix 2.1), and tell users to connect to this instead of 88the public SMTP service. </p> 89 90</ul> 91 92<li> <p> The Postfix SMTP server logs an increased number of "lost 93connection after CONNECT" events. This happens because remote SMTP 94clients disconnect before Postfix answers the connection. </p> 95 96<ul> 97 98<li> <p> NOTE: A portscan for open SMTP ports can also result in 99"lost connection ..." logfile messages. </p> 100 101</ul> 102 103<li> <p> Postfix 2.3 and later logs a warning that all server ports 104are busy: </p> 105 106<pre> 107Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" 108 (25) has reached its process limit "30": new clients may experience 109 noticeable delays 110Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this 111 condition, increase the process count in master.cf or reduce the 112 service time per client 113Oct 3 20:39:27 spike postfix/master[28905]: warning: see 114 <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of 115 stress-adapting configuration settings 116</pre> 117 118</ul> 119 120<p> Legitimate mail that doesn't get through during an episode of 121Postfix SMTP server overload is not necessarily lost. It should 122still arrive once the situation returns to normal, as long as the 123overload condition is temporary. </p> 124 125<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2> 126 127<p> Postfix version 2.5 introduces automatic stress-adaptive behavior. 128It works as follows. When a "public" network service such as the 129SMTP server runs into an "all server ports are busy" condition, the 130Postfix master(8) daemon logs a warning, restarts the service 131(without interrupting existing network sessions), and runs the 132service with "-o stress=yes" on the server process command line: 133</p> 134 135<blockquote> 136<pre> 13780821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes 138</pre> 139</blockquote> 140 141<p> Normally, the Postfix master(8) daemon runs such a service with 142"-o stress=" on the command line (i.e. with an empty parameter 143value): </p> 144 145<blockquote> 146<pre> 14783326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 148</pre> 149</blockquote> 150 151<p> You won't see "-o stress" command-line parameters with services 152that have local clients only. These include services internal to 153Postfix such as the queue manager, and services that listen on a 154loopback interface only, such as after-filter SMTP services. </p> 155 156<p> The "stress" parameter value is the key to making main.cf 157parameter settings stress adaptive. The following settings are the 158default with Postfix 2.6 and later. </p> 159 160<blockquote> 161<pre> 1621 smtpd_timeout = ${stress?{10}:{300}}s 1632 smtpd_hard_error_limit = ${stress?{1}:{20}} 1643 smtpd_junk_command_limit = ${stress?{1}:{100}} 1654 # Parameters added after Postfix 2.6: 1665 smtpd_per_record_deadline = ${stress?{yes}:{no}} 1676 smtpd_starttls_timeout = ${stress?{10}:{300}}s 1687 address_verify_poll_count = ${stress?{1}:{3}} 169</pre> 170</blockquote> 171 172<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y} 173instead of the newer form ${stress?{x}:{y}}. </p> 174 175<p> The syntax of ${name?{value}:{value}}, ${name?value} and 176${name:value} is explained at the beginning of the postconf(5) 177manual page. </p> 178 179<p> Translation: <p> 180 181<ul> 182 183<li> <p> Line 1: under conditions of stress, use an smtpd_timeout 184value of 10 seconds instead of the default 300 seconds. Experience 185on the postfix-users list from a variety of sysadmins shows that 186reducing the "normal" smtpd_timeout to 60s is unlikely to affect 187legitimate clients. However, it is unlikely to become the Postfix 188default because it's not RFC compliant. Setting smtpd_timeout to 18910s or even 5s under stress will still allow most 190legitimate clients to connect and send mail, but may delay mail 191from some clients. No mail should be lost, as long as this measure 192is used only temporarily. </p> 193 194<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit 195of 1 instead of the default 20. This disconnects clients 196after a single error, giving other clients a chance to connect. 197However, this may cause significant delays with legitimate mail, 198such as a mailing list that contains a few no-longer-active user 199names that didn't bother to unsubscribe. No mail should be lost, 200as long as this measure is used only temporarily. </p> 201 202<li> <p> Line 3: under conditions of stress, use an 203smtpd_junk_command_limit of 1 instead of the default 100. This 204prevents clients from keeping connections open by repeatedly 205sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p> 206 207<li> <p> Line 5: under conditions of stress, change the behavior 208of smtpd_timeout and smtpd_starttls_timeout, from a time limit per 209read or write system call, to a time limit to send or receive a 210complete record (an SMTP command line, SMTP response line, SMTP 211message content line, or TLS protocol message). </p> 212 213<li> <p> Line 6: under conditions of stress, reduce the time limit 214for TLS protocol handshake messages to 10 seconds, from the default 215value of 300 seconds. See also the smtpd_timeout discussion above. 216</p> 217 218<li> <p> Line 7: under conditions of stress, do not wait up to 6 219seconds for the completion of an address verification probe. If the 220result is not already in the address verification cache, reply 221immediately with $unverified_recipient_tempfail_action or 222$unverified_sender_tempfail_action. No mail should be lost, as long 223as this measure is used only temporarily. </p> 224 225</ul> 226 227<p> NOTE: Please keep in mind that the stress-adaptive feature is 228a fairly desperate measure to keep <b>some</b> legitimate mail 229flowing under overload conditions. If a site is reaching the SMTP 230server process limit when there isn't an attack or bot flood 231occurring, then either the process limit needs to be raised or more 232hardware needs to be added. </p> 233 234<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2> 235 236<p> This section and the ones that follow discuss permanent measures 237against mail server overload. </p> 238 239<p> One measure to avoid the "all server processes busy" condition 240is to service more SMTP clients simultaneously. For this you need 241to increase the number of Postfix SMTP server processes. This will 242improve the 243responsiveness for remote SMTP clients, as long as the server machine 244has enough hardware and software resources to run the additional 245processes, and as long as the file system can keep up with the 246additional load. </p> 247 248<ul> 249 250<li> <p> You increase the number of SMTP server processes either 251by increasing the default_process_limit in main.cf (line 3 below), 252or by increasing the SMTP server's "maxproc" field in master.cf 253(line 10 below). Either way, you need to issue a "postfix reload" 254command to make the change effective. </p> 255 256<li> <p> Process limits above 1000 require Postfix version 2.4 or 257later, and an operating system that supports kernel-based event 258filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll). 259</p> 260 261<li> <p> More processes use more memory. You can reduce the Postfix 262memory footprint by using cdb: 263lookup tables instead of Berkeley DB's hash: or btree: tables. </p> 264 265<pre> 266 1 /etc/postfix/main.cf: 267 2 # Raise the global process limit, 100 since Postfix 2.0. 268 3 default_process_limit = 200 269 4 270 5 /etc/postfix/master.cf: 271 6 # ============================================================= 272 7 # service type private unpriv chroot wakeup maxproc command 273 8 # ============================================================= 274 9 # Raise the SMTP service process limit only. 27510 smtp inet n - n - 200 smtpd 276</pre> 277 278<li> <p> NOTE: older versions of the SMTPD_POLICY_README document 279contain a mistake: they configure a fixed number of policy daemon 280processes. When you raise the SMTP server's "maxproc" field in 281master.cf, SMTP server processes will report problems when connecting 282to policy server processes, because there aren't enough of them. 283Examples of errors are "connection refused" or "operation timed 284out". </p> 285 286<p> To fix, edit master.cf and specify a zero "maxproc" field 287in all policy server entries; see line 6 in the example below. 288Issue a "postfix reload" command to make the change effective. </p> 289 290<pre> 2911 /etc/postfix/master.cf: 2922 # ============================================================= 2933 # service type private unpriv chroot wakeup maxproc command 2944 # ============================================================= 2955 # Disable the policy service process limit. 2966 policy unix - n n - 0 spawn 2977 user=nobody argv=/some/where/policy-server 298</pre> 299 300</ul> 301 302<h2><a name="time"> Spend less time per SMTP client </a></h2> 303 304<p> When increasing the number of SMTP server processes is not 305practical, you can improve Postfix server responsiveness by eliminating 306delays. When Postfix spends less time per SMTP session, the same 307number of SMTP server processes can service more clients in a given 308amount of time. </p> 309 310<ul> 311 312<li> <p> Eliminate non-functional RBL lookups (blocklists that are 313no longer in operation). These lookups can degrade performance. 314Postfix logs a warning when an RBL server does not respond. </p> 315 316<li> <p> Eliminate redundant RBL lookups (people often use multiple 317Spamhaus RBLs that include each other). To find out whether RBLs 318include other RBLs, look up the websites that document the RBL's 319policies. </p> 320 321<li> <p> Eliminate header_checks and body_checks, and keep just a few 322emergency patterns to block the latest worm explosion or backscatter 323mail. See BACKSCATTER_README for examples of the latter. 324 325<li> <p> Group your header_checks and body_checks patterns to avoid 326unnecessary pattern matching operations: 327 328<pre> 329 1 /etc/postfix/header_checks: 330 2 if /^Subject:/ 331 3 /^Subject: virus found in mail from you/ reject 332 4 /^Subject: ..other../ reject 333 5 endif 334 6 335 7 if /^Received:/ 336 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1 337 9 /^Received: from ..other../ reject .... 33810 endif 339</pre> 340 341</ul> 342 343<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2> 344 345<p> Under conditions of overload you can improve Postfix SMTP server 346responsiveness by hanging up on suspicious clients, so that other 347clients get a chance to talk to Postfix. </p> 348 349<ul> 350 351<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" 352(Postfix 2.3-2.5) to hang up on clients that that match botnet-related 353RBLs (see next bullet) or that match selected non-RBL restrictions 354such as SMTP access maps. The Postfix SMTP server will reject mail 355and disconnect without waiting for the remote SMTP client to send 356a QUIT command. </p> 357 358<li> <p> To hang up connections from denylisted zombies, you can 359set specific Postfix SMTP server reject codes for specific RBLs, 360and for individual responses from specific RBLs. We'll use 361zen.spamhaus.org as an example; by the time you read this document, 362details may have changed. Right now, their documents say that a 363response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP 364address, which means that the machine is probably running a bot of 365some kind. To give a 521 response instead of the default 554 366response, use something like: </p> 367 368<pre> 369 1 /etc/postfix/main.cf: 370 2 smtpd_client_restrictions = 371 3 permit_mynetworks 372 4 reject_rbl_client zen.spamhaus.org=127.0.0.10 373 5 reject_rbl_client zen.spamhaus.org=127.0.0.11 374 6 reject_rbl_client zen.spamhaus.org 375 7 376 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps 377 9 37810 /etc/postfix/rbl_reply_maps: 37911 # With Postfix 2.3-2.5 use "421" to hang up connections. 38012 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; 38113 $rbl_class [$rbl_what] blocked using 38214 $rbl_domain${rbl_reason?; $rbl_reason} 38315 38416 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; 38517 $rbl_class [$rbl_what] blocked using 38618 $rbl_domain${rbl_reason?; $rbl_reason} 387</pre> 388 389<p> Although the above example shows three RBL lookups (lines 4-6), 390Postfix will only do a single DNS query, so it does not affect the 391performance. </p> 392 393<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not 394cause Postfix to disconnect). The down-side of replying with 421 395is that it works only for zombies and other malware. If the client 396is running a real MTA, then it may connect again several times until 397the mail expires in its queue. When this is a problem, stick with 398the default 554 reply, and use "smtpd_hard_error_limit = 1" as 399described below. </p> 400 401<li> <p> You can automatically turn on the above overload measure 402with Postfix 2.5 and later, or with earlier releases that contain 403the stress-adaptive behavior source code patch from the mirrors 404listed at http://www.postfix.org/download.html. Simply replace line 405above 8 with: </p> 406 407<pre> 408 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps} 409</pre> 410 411</ul> 412 413<p> More information about automatic stress-adaptive behavior is 414in section "<a href="#adapt">Automatic stress-adaptive behavior</a>". 415</p> 416 417<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2> 418 419<p> See the section "<a href="#adapt">Automatic stress-adaptive 420behavior</a>" if you are running Postfix version 2.5 or later, or 421if you have applied the source code patch for stress-adaptive 422behavior from the mirrors listed at http://www.postfix.org/download.html. 423</p> 424 425<p> The following measures can be applied temporarily during overload. 426They still allow <b>most</b> legitimate clients to connect and send 427mail, but may affect some legitimate clients. </p> 428 429<ul> 430 431<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the 432postfix-users list from a variety of sysadmins shows that reducing 433the "normal" smtpd_timeout to 60s is unlikely to affect legitimate 434clients. However, it is unlikely to become the Postfix default 435because it's not RFC compliant. Setting smtpd_timeout to 10s (line 4362 below) or even 5s under stress will still allow <b>most</b> 437legitimate clients to connect and send mail, but may delay mail 438from some clients. No mail should be lost, as long as this measure 439is used only temporarily. </p> 440 441<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this 442to 1 under stress (line 3 below) helps by disconnecting clients 443after a single error, giving other clients a chance to connect. 444However, this may cause significant delays with legitimate mail, 445such as a mailing list that contains a few no-longer-active user 446names that didn't bother to unsubscribe. No mail should be lost, 447as long as this measure is used only temporarily. </p> 448 449<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default 450100. This prevents clients from keeping idle connections open by 451repeatedly sending NOOP or RSET commands. </p> 452 453</ul> 454 455<blockquote> 456<pre> 4571 /etc/postfix/main.cf: 4582 smtpd_timeout = 10 4593 smtpd_hard_error_limit = 1 4604 smtpd_junk_command_limit = 1 461</pre> 462</blockquote> 463 464<p> With these measures, no mail should be lost, as long 465as these measures are used only temporarily. The next section of 466this document introduces a way to automate this process. </p> 467 468<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2> 469 470<p> To find out if your Postfix installation supports stress-adaptive 471behavior, use the "ps" command, and look for the smtpd processes. 472Postfix has stress-adaptive support when you see "-o stress=" or 473"-o stress=yes" command-line options. Remember that Postfix never 474enables stress-adaptive behavior on servers that listen on local 475addresses only. </p> 476 477<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX 478and other System-V flavors, use "ps -ef" instead of "ps ax". </p> 479 480<blockquote> 481<pre> 482$ ps ax|grep smtpd 48383326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 48484345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl 485</pre> 486</blockquote> 487 488<p> You can't use postconf(1) to detect stress-adaptive support. 489The postconf(1) command ignores the existence of the stress parameter 490in main.cf, because the parameter has no effect there. Command-line 491"-o parameter" settings always take precedence over main.cf parameter 492settings. <p> 493 494<p> If you configure stress-adaptive behavior in main.cf when it 495isn't supported, nothing bad will happen. The processes will run 496as if the stress parameter always has an empty value. </p> 497 498<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2> 499 500<p> You can manually force stress-adaptive behavior on, by adding 501a "-o stress=yes" command-line option in master.cf. This can be 502useful for testing overrides on the SMTP service. Issue "postfix 503reload" to make the change effective. </p> 504 505<p> Note: setting the stress parameter in main.cf has no effect for 506services that accept remote connections. </p> 507 508<blockquote> 509<pre> 5101 /etc/postfix/master.cf: 5112 # ============================================================= 5123 # service type private unpriv chroot wakeup maxproc command 5134 # ============================================================= 5145 # 5156 smtp inet n - n - - smtpd 5167 -o stress=yes 5178 -o . . . 518</pre> 519</blockquote> 520 521<p> To permanently force stress-adaptive behavior off with a specific 522service, specify "-o stress=" on its master.cf command line. This 523may be desirable for the "submission" service. Issue "postfix reload" 524to make the change effective. </p> 525 526<p> Note: setting the stress parameter in main.cf has no effect for 527services that accept remote connections. </p> 528 529<blockquote> 530<pre> 5311 /etc/postfix/master.cf: 5322 # ============================================================= 5333 # service type private unpriv chroot wakeup maxproc command 5344 # ============================================================= 5355 # 5366 submission inet n - n - - smtpd 5377 -o stress= 5388 -o . . . 539</pre> 540</blockquote> 541 542<h2><a name="other"> Other measures to off-load zombies </a> </h2> 543 544<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides 545additional protection against mail server overload. One postscreen(8) 546process handles multiple inbound SMTP connections, and decides which 547clients may talk to a Postfix SMTP server process. By keeping 548spambots away, postscreen(8) leaves more SMTP server processes 549available for legitimate clients, and delays the onset of server 550overload conditions. </p> 551 552<h2><a name="credits"> Credits </a></h2> 553 554<ul> 555 556<li> Thanks to the postfix-users mailing list members for sharing 557early experiences with the stress-adaptive feature. 558 559<li> The RBL example and several other paragraphs of text were 560adapted from postfix-users postings by Noel Jones. 561 562<li> Wietse implemented stress-adaptive behavior as the smallest 563possible patch while he should be working on other things. 564 565</ul> 566 567</body> </html> 568