1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title> Postfix Debugging Howto </title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 11 12</head> 13 14<body> 15 16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1> 17 18<hr> 19 20<h2>Purpose of this document</h2> 21 22<p> This document describes how to debug parts of the Postfix mail 23system when things do not work according to expectation. The methods 24vary from making Postfix log a lot of detail, to running some daemon 25processes under control of a call tracer or debugger. </p> 26 27<p> The text assumes that the Postfix <a href="postconf.5.html">main.cf</a> and <a href="master.5.html">master.cf</a> 28configuration files are stored in directory /etc/postfix. You can 29use the command "<b>postconf <a href="postconf.5.html#config_directory">config_directory</a></b>" to find out the 30actual location of this directory on your machine. </p> 31 32<p> Listed in order of increasing invasiveness, the debugging 33techniques are as follows: </p> 34 35<ul> 36 37<li><a href="#logging">Look for obvious signs of trouble</a> 38 39<li><a href="#trace_mail">Debugging Postfix from inside</a> 40 41<li><a href="#no_chroot">Try turning off chroot operation in 42master.cf</a> 43 44<li><a href="#debug_peer">Verbose logging for specific SMTP 45connections</a> 46 47<li><a href="#sniffer">Record the SMTP session with a network 48sniffer</a> 49 50<li><a href="#verbose">Making Postfix daemon programs more verbose</a> 51 52<li><a href="#man_trace">Manually tracing a Postfix daemon process</a> 53 54<li><a href="#auto_trace">Automatically tracing a Postfix daemon 55process</a> 56 57<li><a href="#ddd">Running daemon programs with the interactive 58ddd debugger</a> 59 60<li><a href="#screen">Running daemon programs with the interactive 61gdb debugger</a> 62 63<li><a href="#gdb">Running daemon programs under a non-interactive 64debugger</a> 65 66<li><a href="#unreasonable">Unreasonable behavior</a> 67 68<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a> 69 70</ul> 71 72<h2><a name="logging">Look for obvious signs of trouble</a></h2> 73 74<p> Postfix logs all failed and successful deliveries to a logfile. 75The file is usually called /var/log/maillog or /var/log/mail; the 76exact pathname is defined in the /etc/syslog.conf file. </p> 77 78<p> When Postfix does not receive or deliver mail, the first order 79of business is to look for errors that prevent Postfix from working 80properly: </p> 81 82<blockquote> 83<pre> 84% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b> 85</pre> 86</blockquote> 87 88<p> Note: the most important message is near the BEGINNING of the 89output. Error messages that come later are less useful. </p> 90 91<p> The nature of each problem is indicated as follows: </p> 92 93<ul> 94 95<li> <p> "<b>panic</b>" indicates a problem in the software itself 96that only a programmer can fix. Postfix cannot proceed until this 97is fixed. </p> 98 99<li> <p> "<b>fatal</b>" is the result of missing files, incorrect 100permissions, incorrect configuration file settings that you can 101fix. Postfix cannot proceed until this is fixed. </p> 102 103<li> <p> "<b>error</b>" reports an error condition. For safety 104reasons, a Postfix process will terminate when more than 13 of these 105happen. </p> 106 107<li> <p> "<b>warning</b>" indicates a non-fatal error. These are 108problems that you may not be able to fix (such as a broken DNS 109server elsewhere on the network) but may also indicate local 110configuration errors that could become a problem later. </p> 111 112</ul> 113 114<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2> 115 116<p> Postfix version 2.1 and later can 117produce mail delivery reports for debugging purposes. These reports 118not only show sender/recipient addresses after address rewriting 119and alias expansion or forwarding, they also show information about 120delivery to mailbox, delivery to non-Postfix command, responses 121from remote SMTP servers, and so on. 122</p> 123 124<p> Postfix can produce two types of mail delivery reports for 125debugging: </p> 126 127<ul> 128 129<li> <p> What-if: report what would happen, but do not actually 130deliver mail. This mode of operation is requested with: </p> 131 132<pre> 133% <b>/usr/sbin/sendmail -bv address...</b> 134Mail Delivery Status Report will be mailed to <your login name>. 135</pre> 136 137<li> <p> What happened: deliver mail and report successes and/or 138failures, including replies from remote SMTP servers. This mode 139of operation is requested with: </p> 140 141<pre> 142% <b>/usr/sbin/sendmail -v address...</b> 143Mail Delivery Status Report will be mailed to <your login name>. 144</pre> 145 146</ul> 147 148<p> These reports contain information that is generated by Postfix 149delivery agents. Since these run as daemon processes that cannot 150interact with users directly, the result is sent as mail to the 151sender of the test message. The format of these reports is practically 152identical to that of ordinary non-delivery notifications. </p> 153 154<p> For a detailed example of a mail delivery status report, see 155the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a> 156section at the end of the <a href="ADDRESS_REWRITING_README.html">ADDRESS_REWRITING_README</a> document. </p> 157 158<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2> 159 160<p> A common mistake is to turn on chroot operation in the <a href="master.5.html">master.cf</a> 161file without going through all the necessary steps to set up a 162chroot environment. This causes Postfix daemon processes to fail 163due to all kinds of missing files. </p> 164 165<p> The example below shows an SMTP server that is configured with 166chroot turned off: </p> 167 168<blockquote> 169<pre> 170/etc/postfix/<a href="master.5.html">master.cf</a>: 171 # ============================================================= 172 # service type private unpriv <b>chroot</b> wakeup maxproc command 173 # (yes) (yes) <b>(yes)</b> (never) (100) 174 # ============================================================= 175 smtp inet n - <b>n</b> - - smtpd 176</pre> 177</blockquote> 178 179<p> Inspect <a href="master.5.html">master.cf</a> for any processes that have chroot operation 180not turned off. If you find any, save a copy of the <a href="master.5.html">master.cf</a> file, 181and edit the entries in question. After executing the command 182"<b>postfix reload</b>", see if the problem has gone away. </p> 183 184<p> If turning off chrooted operation made the problem go away, 185then congratulations. Leaving Postfix running in this way is 186adequate for most sites. If you prefer chrooted operation, see 187the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup"> 188BASIC_CONFIGURATION_README</a> file for information about how to 189prepare Postfix for chrooted operation. </p> 190 191<h2><a name="debug_peer">Verbose logging for specific SMTP 192connections</a></h2> 193 194<p> In /etc/postfix/<a href="postconf.5.html">main.cf</a>, list the remote site name or address 195in the <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> parameter. For example, in order to make 196the software log a lot of information to the syslog daemon for 197connections from or to the loopback interface: </p> 198 199<blockquote> 200<pre> 201/etc/postfix/<a href="postconf.5.html">main.cf</a>: 202 <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> = 127.0.0.1 203</pre> 204</blockquote> 205 206<p> You can specify one or more hosts, domains, addresses or 207net/masks. To make the change effective immediately, execute the 208command "<b>postfix reload</b>". </p> 209 210<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2> 211 212<p> This example uses <b>tcpdump</b>. In order to record a conversation 213you need to specify a large enough buffer with the "<b>-s</b>" 214option or else you will miss some or all of the packet payload. 215</p> 216 217<blockquote> 218<pre> 219# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b> 220</pre> 221</blockquote> 222 223<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case, 224use "<b>-s 2000</b>" instead. </p> 225 226<p> Run this for a while, stop with Ctrl-C when done. To view the 227data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>. 228</p> 229 230<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2> 231 232<p> Append one or more "<b>-v</b>" options to selected daemon 233definitions in /etc/postfix/<a href="master.5.html">master.cf</a> and type "<b>postfix reload</b>". 234This will cause a lot of activity to be logged to the syslog daemon. 235For example, to make the Postfix SMTP server process more verbose: </p> 236 237<blockquote> 238<pre> 239/etc/postfix/<a href="master.5.html">master.cf</a>: 240 smtp inet n - n - - smtpd -v 241</pre> 242</blockquote> 243 244<p> To diagnose problems with address rewriting specify a "<b>-v</b>" 245option for the <a href="cleanup.8.html">cleanup(8)</a> and/or <a href="trivial-rewrite.8.html">trivial-rewrite(8)</a> daemon, and to 246diagnose problems with mail delivery specify a "<b>-v</b>" 247option for the <a href="qmgr.8.html">qmgr(8)</a> or <a href="qmgr.8.html">oqmgr(8)</a> queue manager, or for the <a href="lmtp.8.html">lmtp(8)</a>, 248<a href="local.8.html">local(8)</a>, <a href="pipe.8.html">pipe(8)</a>, <a href="smtp.8.html">smtp(8)</a>, or <a href="virtual.8.html">virtual(8)</a> delivery agent. </p> 249 250<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2> 251 252<p> Many systems allow you to inspect a running process with a 253system call tracer. For example: </p> 254 255<blockquote> 256<pre> 257# <b>trace -p process-id</b> (SunOS 4) 258# <b>strace -p process-id</b> (Linux and many others) 259# <b>truss -p process-id</b> (Solaris, FreeBSD) 260# <b>ktrace -p process-id</b> (generic 4.4BSD) 261</pre> 262</blockquote> 263 264<p> Even more informative are traces of system library calls. 265Examples: </p> 266 267<blockquote> 268<pre> 269# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS) 270# <b>sotruss -p process-id</b> (Solaris) 271</pre> 272</blockquote> 273 274<p> See your system documentation for details. </p> 275 276<p> Tracing a running process can give valuable information about 277what a process is attempting to do. This is as much information as 278you can get without running an interactive debugger program, as 279described in a later section. </p> 280 281<h2><a name="auto_trace">Automatically tracing a Postfix daemon 282process</a></h2> 283 284<p> Postfix can attach a call tracer whenever a daemon process 285starts. Call tracers come in several kinds. </p> 286 287<ol> 288 289<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>, 290<b>strace</b>, or <b>ktrace</b>. These show the communication 291between the process and the kernel. </p> 292 293<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>. 294These show calls of library routines, and give a better idea of 295what is going on within the process. </p> 296 297</ol> 298 299<p> Append a <b>-D</b> option to the suspect command in 300/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> 301 302<blockquote> 303<pre> 304/etc/postfix/<a href="master.5.html">master.cf</a>: 305 smtp inet n - n - - smtpd -D 306</pre> 307</blockquote> 308 309<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> 310so that it invokes the call tracer of your choice, for example: 311</p> 312 313<blockquote> 314<pre> 315/etc/postfix/<a href="postconf.5.html">main.cf</a>: 316 <a href="postconf.5.html#debugger_command">debugger_command</a> = 317 PATH=/bin:/usr/bin:/usr/local/bin; 318 (truss -p $<a href="postconf.5.html#process_id">process_id</a> 2>&1 | logger -p mail.info) & sleep 5 319</pre> 320</blockquote> 321 322<p> Type "<b>postfix reload</b>" and watch the logfile. </p> 323 324<h2><a name="ddd">Running daemon programs with the interactive 325ddd debugger</a></h2> 326 327<p> If you have X Windows installed on the Postfix machine, then 328an interactive debugger such as <b>ddd</b> can be convenient. 329</p> 330 331<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> 332so that it invokes <b>ddd</b>: </p> 333 334<blockquote> 335<pre> 336/etc/postfix/<a href="postconf.5.html">main.cf</a>: 337 <a href="postconf.5.html#debugger_command">debugger_command</a> = 338 PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin 339 ddd $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> & sleep 5 340</pre> 341</blockquote> 342 343<p> Be sure that <b>gdb</b> is in the command search path, and 344export <b>XAUTHORITY</b> so that X access control works, for example: 345</p> 346 347<blockquote> 348<pre> 349% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax) 350$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax) 351</pre> 352</blockquote> 353 354<p> Append a <b>-D</b> option to the suspect daemon definition in 355/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> 356 357<blockquote> 358<pre> 359/etc/postfix/<a href="master.5.html">master.cf</a>: 360 smtp inet n - n - - smtpd -D 361</pre> 362</blockquote> 363 364<p> Stop and start the Postfix system. This is necessary so that 365Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b> 366settings. </p> 367 368<p> Whenever the suspect daemon process is started, a debugger 369window pops up and you can watch in detail what happens. </p> 370 371<h2><a name="screen">Running daemon programs with the interactive 372gdb debugger</a></h2> 373 374<p> If you have the screen command installed on the Postfix machine, then 375you can run an interactive debugger such as <b>gdb</b> as follows. </p> 376 377<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> 378so that it runs <b>gdb</b> inside a detached <b>screen</b> session: 379</p> 380 381<blockquote> 382<pre> 383/etc/postfix/<a href="postconf.5.html">main.cf</a>: 384 <a href="postconf.5.html#debugger_command">debugger_command</a> = 385 PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root; 386 export HOME; screen -e^tt -dmS $<a href="postconf.5.html#process_name">process_name</a> gdb 387 $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> & sleep 2 388</pre> 389</blockquote> 390 391<p> Be sure that <b>gdb</b> is in the command search path. </p> 392 393<p> Append a <b>-D</b> option to the suspect daemon definition in 394/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> 395 396<blockquote> 397<pre> 398/etc/postfix/<a href="master.5.html">master.cf</a>: 399 smtp inet n - n - - smtpd -D 400</pre> 401</blockquote> 402 403<p> Execute the command "<b>postfix reload</b>" and wait until a 404daemon process is started (you can see this in the maillog file). 405</p> 406 407<p> Then attach to the screen, and debug away: </p> 408 409<blockquote> 410<pre> 411# HOME=/root screen -r 412gdb) continue 413gdb) where 414</pre> 415</blockquote> 416 417<h2><a name="gdb">Running daemon programs under a non-interactive 418debugger</a></h2> 419 420<p> If you do not have X Windows installed on the Postfix machine, 421or if you are not familiar with interactive debuggers, then you 422can try to run <b>gdb</b> in non-interactive mode, and have it 423print a stack trace when the process crashes. </p> 424 425<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> 426so that it invokes the <b>gdb</b> debugger: </p> 427 428<blockquote> 429<pre> 430/etc/postfix/<a href="postconf.5.html">main.cf</a>: 431 <a href="postconf.5.html#debugger_command">debugger_command</a> = 432 PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo 433 where; sleep 8640000) | gdb $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> 434 $<a href="postconf.5.html#process_id">process_id</a> 2>&1 435 >$<a href="postconf.5.html#config_directory">config_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>.$<a href="postconf.5.html#process_id">process_id</a>.log & sleep 5 436</pre> 437</blockquote> 438 439<p> Append a <b>-D</b> option to the suspect daemon in 440/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> 441 442<blockquote> 443<pre> 444/etc/postfix/<a href="master.5.html">master.cf</a>: 445 smtp inet n - n - - smtpd -D 446</pre> 447</blockquote> 448 449<p> Type "<b>postfix reload</b>" to make the configuration changes 450effective. </p> 451 452<p> Whenever a suspect daemon process is started, an output file 453is created, named after the daemon and process ID (for example, 454smtpd.12345.log). When the process crashes, a stack trace (with 455output from the "<b>where</b>" command) is written to its logfile. 456</p> 457 458<h2><a name="unreasonable">Unreasonable behavior</a></h2> 459 460<p> Sometimes the behavior exhibited by Postfix just does not match the 461source code. Why can a program deviate from the instructions given 462by its author? There are two possibilities. </p> 463 464<ul> 465 466<li> <p> The compiler has erred. This rarely happens. </p> 467 468<li> <p> The hardware has erred. Does the machine have ECC memory? </p> 469 470</ul> 471 472<p> In both cases, the program being executed is not the program 473that was supposed to be executed, so anything could happen. </p> 474 475<p> There is a third possibility: </p> 476 477<ul> 478 479<li> <p> Bugs in system software (kernel or libraries). </p> 480 481</ul> 482 483<p> Hardware-related failures usually do not reproduce in exactly 484the same way after power cycling and rebooting the system. There's 485little Postfix can do about bad hardware. Be sure to use hardware 486that at the very least can detect memory errors. Otherwise, Postfix 487will just be waiting to be hit by a bit error. Critical systems 488deserve real hardware. </p> 489 490<p> When a compiler makes an error, the problem can be reproduced 491whenever the resulting program is run. Compiler errors are most 492likely to happen in the code optimizer. If a problem is reproducible 493across power cycles and system reboots, it can be worthwhile to 494rebuild Postfix with optimization disabled, and to see if optimization 495makes a difference. </p> 496 497<p> In order to compile Postfix with optimizations turned off: </p> 498 499<blockquote> 500<pre> 501% <b>make tidy</b> 502% <b>make makefiles OPT=</b> 503</pre> 504</blockquote> 505 506<p> This produces a set of Makefiles that do not request compiler 507optimization. </p> 508 509<p> Once the makefiles are set up, build the software: </p> 510 511<blockquote> 512<pre> 513% <b>make</b> 514% <b>su</b> 515Password: 516# <b>make install</b> 517</pre> 518</blockquote> 519 520<p> If the problem goes away, then it is time to ask your vendor 521for help. </p> 522 523<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2> 524 525<p> The people who participate on postfix-users@postfix.org 526are very helpful, especially if YOU provide them with sufficient 527information. Remember, these volunteers are willing to help, but 528their time is limited. </p> 529 530<p> When reporting a problem, be sure to include the following 531information. </p> 532 533<ul> 534 535<li> <p> A summary of the problem. Please do not just send some 536logging without explanation of what YOU believe is wrong. </p> 537 538<li> <p> Complete error messages. Please use cut-and-paste, or use 539attachments, instead of reciting information from memory. 540</p> 541 542<li> <p> Postfix logging. See the text at the top of the <a href="DEBUG_README.html">DEBUG_README</a> 543document to find out where logging is stored. Please do not frustrate 544the helpers by word wrapping the logging. If the logging is more 545than a few kbytes of text, consider posting an URL on a web or ftp 546site. </p> 547 548<li> <p> Consider using a test email address so that you don't have 549to reveal email addresses or passwords of innocent people. </p> 550 551<li> <p> If you can't use a test email address, please anonymize 552email addresses and host names consistently. Replace each letter 553by "A", each digit 554by "D" so that the helpers can still recognize syntactical errors. 555</p> 556 557<li> <p> Output from "<b>postconf -n</b>". Please do not send your 558<a href="postconf.5.html">main.cf</a> file, or 500+ lines of <b>postconf</b> output. </p> 559 560<li> <p> Better, provide output from the <b>postfinger</b> tool. 561This can be found at <a href="http://ftp.wl0.org/SOURCES/postfinger">http://ftp.wl0.org/SOURCES/postfinger</a>. </p> 562 563<li> <p> If the problem is SASL related, consider including the 564output from the <b>saslfinger</b> tool. This can be found at 565<a href="http://postfix.state-of-mind.de/patrick.koetter/saslfinger/">http://postfix.state-of-mind.de/patrick.koetter/saslfinger/</a>. </p> 566 567<li> <p> If the problem is about too much mail in the queue, consider 568including output from the <b>qshape</b> tool, as described in the 569<a href="QSHAPE_README.html">QSHAPE_README</a> file. </p> 570 571<li> <p> If the problem is protocol related (connections time out, 572or an SMTP server complains about syntax errors etc.) consider 573recording a session with <b>tcpdump</b>, as described in the <a 574href="#sniffer">DEBUG_README</a> document. </ul> 575 576</body> 577 578</html> 579