1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title> Postfix Debugging Howto </title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 11 12</head> 13 14<body> 15 16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1> 17 18<hr> 19 20<h2>Purpose of this document</h2> 21 22<p> This document describes how to debug parts of the Postfix mail 23system when things do not work according to expectation. The methods 24vary from making Postfix log a lot of detail, to running some daemon 25processes under control of a call tracer or debugger. </p> 26 27<p> The text assumes that the Postfix main.cf and master.cf 28configuration files are stored in directory /etc/postfix. You can 29use the command "<b>postconf config_directory</b>" to find out the 30actual location of this directory on your machine. </p> 31 32<p> Listed in order of increasing invasiveness, the debugging 33techniques are as follows: </p> 34 35<ul> 36 37<li><a href="#logging">Look for obvious signs of trouble</a> 38 39<li><a href="#trace_mail">Debugging Postfix from inside</a> 40 41<li><a href="#no_chroot">Try turning off chroot operation in 42master.cf</a> 43 44<li><a href="#debug_peer">Verbose logging for specific SMTP 45connections</a> 46 47<li><a href="#sniffer">Record the SMTP session with a network 48sniffer</a> 49 50<li><a href="#verbose">Making Postfix daemon programs more verbose</a> 51 52<li><a href="#man_trace">Manually tracing a Postfix daemon process</a> 53 54<li><a href="#auto_trace">Automatically tracing a Postfix daemon 55process</a> 56 57<li><a href="#ddd">Running daemon programs with the interactive 58ddd debugger</a> 59 60<li><a href="#screen">Running daemon programs with the interactive 61gdb debugger</a> 62 63<li><a href="#gdb">Running daemon programs under a non-interactive 64debugger</a> 65 66<li><a href="#unreasonable">Unreasonable behavior</a> 67 68<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a> 69 70</ul> 71 72<h2><a name="logging">Look for obvious signs of trouble</a></h2> 73 74<p> Postfix logs all failed and successful deliveries to a logfile. </p> 75 76<ul> 77 78<li> <p> When Postfix uses syslog logging (the default), the file 79is usually called /var/log/maillog, /var/log/mail, or something 80similar; the exact pathname is configured in a file called 81/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p> 82 83<li> <p> When Postfix uses its own logging system (see MAILLOG_README), 84the location of the logfile is configured with the Postfix maillog_file 85parameter. </p> 86 87</ul> 88 89<p> When Postfix does not receive or deliver mail, the first order 90of business is to look for errors that prevent Postfix from working 91properly: </p> 92 93<blockquote> 94<pre> 95% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b> 96</pre> 97</blockquote> 98 99<p> Note: the most important message is near the BEGINNING of the 100output. Error messages that come later are less useful. </p> 101 102<p> The nature of each problem is indicated as follows: </p> 103 104<ul> 105 106<li> <p> "<b>panic</b>" indicates a problem in the software itself 107that only a programmer can fix. Postfix cannot proceed until this 108is fixed. </p> 109 110<li> <p> "<b>fatal</b>" is the result of missing files, incorrect 111permissions, incorrect configuration file settings that you can 112fix. Postfix cannot proceed until this is fixed. </p> 113 114<li> <p> "<b>error</b>" reports an error condition. For safety 115reasons, a Postfix process will terminate when more than 13 of these 116happen. </p> 117 118<li> <p> "<b>warning</b>" indicates a non-fatal error. These are 119problems that you may not be able to fix (such as a broken DNS 120server elsewhere on the network) but may also indicate local 121configuration errors that could become a problem later. </p> 122 123</ul> 124 125<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2> 126 127<p> Postfix version 2.1 and later can 128produce mail delivery reports for debugging purposes. These reports 129not only show sender/recipient addresses after address rewriting 130and alias expansion or forwarding, they also show information about 131delivery to mailbox, delivery to non-Postfix command, responses 132from remote SMTP servers, and so on. 133</p> 134 135<p> Postfix can produce two types of mail delivery reports for 136debugging: </p> 137 138<ul> 139 140<li> <p> What-if: report what would happen, but do not actually 141deliver mail. This mode of operation is requested with: </p> 142 143<pre> 144% <b>/usr/sbin/sendmail -bv address...</b> 145Mail Delivery Status Report will be mailed to <your login name>. 146</pre> 147 148<li> <p> What happened: deliver mail and report successes and/or 149failures, including replies from remote SMTP servers. This mode 150of operation is requested with: </p> 151 152<pre> 153% <b>/usr/sbin/sendmail -v address...</b> 154Mail Delivery Status Report will be mailed to <your login name>. 155</pre> 156 157</ul> 158 159<p> These reports contain information that is generated by Postfix 160delivery agents. Since these run as daemon processes that cannot 161interact with users directly, the result is sent as mail to the 162sender of the test message. The format of these reports is practically 163identical to that of ordinary non-delivery notifications. </p> 164 165<p> For a detailed example of a mail delivery status report, see 166the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a> 167section at the end of the ADDRESS_REWRITING_README document. </p> 168 169<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2> 170 171<p> A common mistake is to turn on chroot operation in the master.cf 172file without going through all the necessary steps to set up a 173chroot environment. This causes Postfix daemon processes to fail 174due to all kinds of missing files. </p> 175 176<p> The example below shows an SMTP server that is configured with 177chroot turned off: </p> 178 179<blockquote> 180<pre> 181/etc/postfix/master.cf: 182 # ============================================================= 183 # service type private unpriv <b>chroot</b> wakeup maxproc command 184 # (yes) (yes) <b>(yes)</b> (never) (100) 185 # ============================================================= 186 smtp inet n - <b>n</b> - - smtpd 187</pre> 188</blockquote> 189 190<p> Inspect master.cf for any processes that have chroot operation 191not turned off. If you find any, save a copy of the master.cf file, 192and edit the entries in question. After executing the command 193"<b>postfix reload</b>", see if the problem has gone away. </p> 194 195<p> If turning off chrooted operation made the problem go away, 196then congratulations. Leaving Postfix running in this way is 197adequate for most sites. If you prefer chrooted operation, see 198the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup"> 199BASIC_CONFIGURATION_README</a> file for information about how to 200prepare Postfix for chrooted operation. </p> 201 202<h2><a name="debug_peer">Verbose logging for specific SMTP 203connections</a></h2> 204 205<p> In /etc/postfix/main.cf, list the remote site name or address 206in the debug_peer_list parameter. For example, in order to make 207the software log a lot of information to the syslog daemon for 208connections from or to the loopback interface: </p> 209 210<blockquote> 211<pre> 212/etc/postfix/main.cf: 213 debug_peer_list = 127.0.0.1 214</pre> 215</blockquote> 216 217<p> You can specify one or more hosts, domains, addresses or 218net/masks. To make the change effective immediately, execute the 219command "<b>postfix reload</b>". </p> 220 221<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2> 222 223<p> This example uses <b>tcpdump</b>. In order to record a conversation 224you need to specify a large enough buffer with the "<b>-s</b>" 225option or else you will miss some or all of the packet payload. 226</p> 227 228<blockquote> 229<pre> 230# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b> 231</pre> 232</blockquote> 233 234<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case, 235use "<b>-s 2000</b>" instead. </p> 236 237<p> Run this for a while, stop with Ctrl-C when done. To view the 238data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>. 239</p> 240 241<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2> 242 243<p> Append one or more "<b>-v</b>" options to selected daemon 244definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>". 245This will cause a lot of activity to be logged to the syslog daemon. 246For example, to make the Postfix SMTP server process more verbose: </p> 247 248<blockquote> 249<pre> 250/etc/postfix/master.cf: 251 smtp inet n - n - - smtpd -v 252</pre> 253</blockquote> 254 255<p> To diagnose problems with address rewriting specify a "<b>-v</b>" 256option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to 257diagnose problems with mail delivery specify a "<b>-v</b>" 258option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8), 259local(8), pipe(8), smtp(8), or virtual(8) delivery agent. </p> 260 261<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2> 262 263<p> Many systems allow you to inspect a running process with a 264system call tracer. For example: </p> 265 266<blockquote> 267<pre> 268# <b>trace -p process-id</b> (SunOS 4) 269# <b>strace -p process-id</b> (Linux and many others) 270# <b>truss -p process-id</b> (Solaris, FreeBSD) 271# <b>ktrace -p process-id</b> (generic 4.4BSD) 272</pre> 273</blockquote> 274 275<p> Even more informative are traces of system library calls. 276Examples: </p> 277 278<blockquote> 279<pre> 280# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS) 281# <b>sotruss -p process-id</b> (Solaris) 282</pre> 283</blockquote> 284 285<p> See your system documentation for details. </p> 286 287<p> Tracing a running process can give valuable information about 288what a process is attempting to do. This is as much information as 289you can get without running an interactive debugger program, as 290described in a later section. </p> 291 292<h2><a name="auto_trace">Automatically tracing a Postfix daemon 293process</a></h2> 294 295<p> Postfix can attach a call tracer whenever a daemon process 296starts. Call tracers come in several kinds. </p> 297 298<ol> 299 300<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>, 301<b>strace</b>, or <b>ktrace</b>. These show the communication 302between the process and the kernel. </p> 303 304<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>. 305These show calls of library routines, and give a better idea of 306what is going on within the process. </p> 307 308</ol> 309 310<p> Append a <b>-D</b> option to the suspect command in 311/etc/postfix/master.cf, for example: </p> 312 313<blockquote> 314<pre> 315/etc/postfix/master.cf: 316 smtp inet n - n - - smtpd -D 317</pre> 318</blockquote> 319 320<p> Edit the debugger_command definition in /etc/postfix/main.cf 321so that it invokes the call tracer of your choice, for example: 322</p> 323 324<blockquote> 325<pre> 326/etc/postfix/main.cf: 327 debugger_command = 328 PATH=/bin:/usr/bin:/usr/local/bin; 329 (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5 330</pre> 331</blockquote> 332 333<p> Type "<b>postfix reload</b>" and watch the logfile. </p> 334 335<h2><a name="ddd">Running daemon programs with the interactive 336ddd debugger</a></h2> 337 338<p> If you have X Windows installed on the Postfix machine, then 339an interactive debugger such as <b>ddd</b> can be convenient. 340</p> 341 342<p> Edit the debugger_command definition in /etc/postfix/main.cf 343so that it invokes <b>ddd</b>: </p> 344 345<blockquote> 346<pre> 347/etc/postfix/main.cf: 348 debugger_command = 349 PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin 350 ddd $daemon_directory/$process_name $process_id & sleep 5 351</pre> 352</blockquote> 353 354<p> Be sure that <b>gdb</b> is in the command search path, and 355export <b>XAUTHORITY</b> so that X access control works, for example: 356</p> 357 358<blockquote> 359<pre> 360% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax) 361$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax) 362</pre> 363</blockquote> 364 365<p> Append a <b>-D</b> option to the suspect daemon definition in 366/etc/postfix/master.cf, for example: </p> 367 368<blockquote> 369<pre> 370/etc/postfix/master.cf: 371 smtp inet n - n - - smtpd -D 372</pre> 373</blockquote> 374 375<p> Stop and start the Postfix system. This is necessary so that 376Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b> 377settings. </p> 378 379<p> Whenever the suspect daemon process is started, a debugger 380window pops up and you can watch in detail what happens. </p> 381 382<h2><a name="screen">Running daemon programs with the interactive 383gdb debugger</a></h2> 384 385<p> If you have the screen command installed on the Postfix machine, then 386you can run an interactive debugger such as <b>gdb</b> as follows. </p> 387 388<p> Edit the debugger_command definition in /etc/postfix/main.cf 389so that it runs <b>gdb</b> inside a detached <b>screen</b> session: 390</p> 391 392<blockquote> 393<pre> 394/etc/postfix/main.cf: 395 debugger_command = 396 PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root; 397 export HOME; screen -e^tt -dmS $process_name gdb 398 $daemon_directory/$process_name $process_id & sleep 2 399</pre> 400</blockquote> 401 402<p> Be sure that <b>gdb</b> is in the command search path. </p> 403 404<p> Append a <b>-D</b> option to the suspect daemon definition in 405/etc/postfix/master.cf, for example: </p> 406 407<blockquote> 408<pre> 409/etc/postfix/master.cf: 410 smtp inet n - n - - smtpd -D 411</pre> 412</blockquote> 413 414<p> Execute the command "<b>postfix reload</b>" and wait until a 415daemon process is started (you can see this in the maillog file). 416</p> 417 418<p> Then attach to the screen, and debug away: </p> 419 420<blockquote> 421<pre> 422# HOME=/root screen -r 423gdb) continue 424gdb) where 425</pre> 426</blockquote> 427 428<h2><a name="gdb">Running daemon programs under a non-interactive 429debugger</a></h2> 430 431<p> If you do not have X Windows installed on the Postfix machine, 432or if you are not familiar with interactive debuggers, then you 433can try to run <b>gdb</b> in non-interactive mode, and have it 434print a stack trace when the process crashes. </p> 435 436<p> Edit the debugger_command definition in /etc/postfix/main.cf 437so that it invokes the <b>gdb</b> debugger: </p> 438 439<blockquote> 440<pre> 441/etc/postfix/main.cf: 442 debugger_command = 443 PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo 444 where; sleep 8640000) | gdb $daemon_directory/$process_name 445 $process_id 2>&1 446 >$config_directory/$process_name.$process_id.log & sleep 5 447</pre> 448</blockquote> 449 450<p> Append a <b>-D</b> option to the suspect daemon in 451/etc/postfix/master.cf, for example: </p> 452 453<blockquote> 454<pre> 455/etc/postfix/master.cf: 456 smtp inet n - n - - smtpd -D 457</pre> 458</blockquote> 459 460<p> Type "<b>postfix reload</b>" to make the configuration changes 461effective. </p> 462 463<p> Whenever a suspect daemon process is started, an output file 464is created, named after the daemon and process ID (for example, 465smtpd.12345.log). When the process crashes, a stack trace (with 466output from the "<b>where</b>" command) is written to its logfile. 467</p> 468 469<h2><a name="unreasonable">Unreasonable behavior</a></h2> 470 471<p> Sometimes the behavior exhibited by Postfix just does not match the 472source code. Why can a program deviate from the instructions given 473by its author? There are two possibilities. </p> 474 475<ul> 476 477<li> <p> The compiler has erred. This rarely happens. </p> 478 479<li> <p> The hardware has erred. Does the machine have ECC memory? </p> 480 481</ul> 482 483<p> In both cases, the program being executed is not the program 484that was supposed to be executed, so anything could happen. </p> 485 486<p> There is a third possibility: </p> 487 488<ul> 489 490<li> <p> Bugs in system software (kernel or libraries). </p> 491 492</ul> 493 494<p> Hardware-related failures usually do not reproduce in exactly 495the same way after power cycling and rebooting the system. There's 496little Postfix can do about bad hardware. Be sure to use hardware 497that at the very least can detect memory errors. Otherwise, Postfix 498will just be waiting to be hit by a bit error. Critical systems 499deserve real hardware. </p> 500 501<p> When a compiler makes an error, the problem can be reproduced 502whenever the resulting program is run. Compiler errors are most 503likely to happen in the code optimizer. If a problem is reproducible 504across power cycles and system reboots, it can be worthwhile to 505rebuild Postfix with optimization disabled, and to see if optimization 506makes a difference. </p> 507 508<p> In order to compile Postfix with optimizations turned off: </p> 509 510<blockquote> 511<pre> 512% <b>make tidy</b> 513% <b>make makefiles OPT=</b> 514</pre> 515</blockquote> 516 517<p> This produces a set of Makefiles that do not request compiler 518optimization. </p> 519 520<p> Once the makefiles are set up, build the software: </p> 521 522<blockquote> 523<pre> 524% <b>make</b> 525% <b>su</b> 526Password: 527# <b>make install</b> 528</pre> 529</blockquote> 530 531<p> If the problem goes away, then it is time to ask your vendor 532for help. </p> 533 534<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2> 535 536<p> The people who participate on postfix-users@postfix.org 537are very helpful, especially if YOU provide them with sufficient 538information. Remember, these volunteers are willing to help, but 539their time is limited. </p> 540 541<p> When reporting a problem, be sure to include the following 542information. </p> 543 544<ul> 545 546<li> <p> A summary of the problem. Please do not just send some 547logging without explanation of what YOU believe is wrong. </p> 548 549<li> <p> Complete error messages. Please use cut-and-paste, or use 550attachments, instead of reciting information from memory. 551</p> 552 553<li> <p> Postfix logging. See the text at the top of the DEBUG_README 554document to find out where logging is stored. Please do not frustrate 555the helpers by word wrapping the logging. If the logging is more 556than a few kbytes of text, consider posting an URL on a web or ftp 557site. </p> 558 559<li> <p> Consider using a test email address so that you don't have 560to reveal email addresses or passwords of innocent people. </p> 561 562<li> <p> If you can't use a test email address, please anonymize 563email addresses and host names consistently. Replace each letter 564by "A", each digit 565by "D" so that the helpers can still recognize syntactical errors. 566</p> 567 568<li> <p> Command output from:</p> 569 570<ul> 571 572<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file, 573or 1000+ lines of <b>postconf</b> command output. </p> 574 575<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p> 576 577</ul> 578 579<li> <p> Better, provide output from the <b>postfinger</b> tool. 580This can be found at https://github.com/ford--prefect/postfinger. </p> 581 582<li> <p> If the problem is SASL related, consider including the 583output from the <b>saslfinger</b> tool. This can be found at 584https://packages.debian.org/search?keywords=sasl2-bin. </p> 585 586<li> <p> If the problem is about too much mail in the queue, consider 587including output from the <b>qshape</b> tool, as described in the 588QSHAPE_README file. </p> 589 590<li> <p> If the problem is protocol related (connections time out, 591or an SMTP server complains about syntax errors etc.) consider 592recording a session with <b>tcpdump</b>, as described in the <a 593href="#sniffer">DEBUG_README</a> document. </ul> 594 595</body> 596 597</html> 598