1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title> Postfix Debugging Howto </title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 11<link rel='stylesheet' type='text/css' href='postfix-doc.css'> 12 13</head> 14 15<body> 16 17<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1> 18 19<hr> 20 21<h2>Purpose of this document</h2> 22 23<p> This document describes how to debug parts of the Postfix mail 24system when things do not work according to expectation. The methods 25vary from making Postfix log a lot of detail, to running some daemon 26processes under control of a call tracer or debugger. </p> 27 28<p> The text assumes that the Postfix main.cf and master.cf 29configuration files are stored in directory /etc/postfix. You can 30use the command "<b>postconf config_directory</b>" to find out the 31actual location of this directory on your machine. </p> 32 33<p> Listed in order of increasing invasiveness, the debugging 34techniques are as follows: </p> 35 36<ul> 37 38<li><a href="#logging">Look for obvious signs of trouble</a> 39 40<li><a href="#trace_mail">Debugging Postfix from inside</a> 41 42<li><a href="#no_chroot">Try turning off chroot operation in 43master.cf</a> 44 45<li><a href="#debug_peer">Verbose logging for specific SMTP 46connections</a> 47 48<li><a href="#sniffer">Record the SMTP session with a network 49sniffer</a> 50 51<li><a href="#verbose">Making Postfix daemon programs more verbose</a> 52 53<li><a href="#man_trace">Manually tracing a Postfix daemon process</a> 54 55<li><a href="#auto_trace">Automatically tracing a Postfix daemon 56process</a> 57 58<li><a href="#ddd">Running daemon programs with the interactive 59ddd debugger</a> 60 61<li><a href="#screen">Running daemon programs with the interactive 62gdb debugger</a> 63 64<li><a href="#gdb">Running daemon programs under a non-interactive 65debugger</a> 66 67<li><a href="#unreasonable">Unreasonable behavior</a> 68 69<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a> 70 71</ul> 72 73<h2><a name="logging">Look for obvious signs of trouble</a></h2> 74 75<p> Postfix logs all failed and successful deliveries to a logfile. </p> 76 77<ul> 78 79<li> <p> When Postfix uses syslog logging (the default), the file 80is usually called /var/log/maillog, /var/log/mail, or something 81similar; the exact pathname is configured in a file called 82/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p> 83 84<li> <p> When Postfix uses its own logging system (see MAILLOG_README), 85the location of the logfile is configured with the Postfix maillog_file 86parameter. </p> 87 88</ul> 89 90<p> When Postfix does not receive or deliver mail, the first order 91of business is to look for errors that prevent Postfix from working 92properly: </p> 93 94<blockquote> 95<pre> 96% <b>grep -E '(warning|error|fatal|panic):' /some/log/file | more</b> 97</pre> 98</blockquote> 99 100<p> Note: the most important message is near the BEGINNING of the 101output. Error messages that come later are less useful. </p> 102 103<p> The nature of each problem is indicated as follows: </p> 104 105<ul> 106 107<li> <p> "<b>panic</b>" indicates a problem in the software itself 108that only a programmer can fix. Postfix cannot proceed until this 109is fixed. </p> 110 111<li> <p> "<b>fatal</b>" is the result of missing files, incorrect 112permissions, incorrect configuration file settings that you can 113fix. Postfix cannot proceed until this is fixed. </p> 114 115<li> <p> "<b>error</b>" reports an error condition. For safety 116reasons, a Postfix process will terminate when more than 13 of these 117happen. </p> 118 119<li> <p> "<b>warning</b>" indicates a non-fatal error. These are 120problems that you may not be able to fix (such as a broken DNS 121server elsewhere on the network) but may also indicate local 122configuration errors that could become a problem later. </p> 123 124</ul> 125 126<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2> 127 128<p> Postfix version 2.1 and later can 129produce mail delivery reports for debugging purposes. These reports 130not only show sender/recipient addresses after address rewriting 131and alias expansion or forwarding, they also show information about 132delivery to mailbox, delivery to non-Postfix command, responses 133from remote SMTP servers, and so on. 134</p> 135 136<p> Postfix can produce two types of mail delivery reports for 137debugging: </p> 138 139<ul> 140 141<li> <p> What-if: report what would happen, but do not actually 142deliver mail. This mode of operation is requested with: </p> 143 144<pre> 145% <b>/usr/sbin/sendmail -bv address...</b> 146Mail Delivery Status Report will be mailed to <your login name>. 147</pre> 148 149<li> <p> What happened: deliver mail and report successes and/or 150failures, including replies from remote SMTP servers. This mode 151of operation is requested with: </p> 152 153<pre> 154% <b>/usr/sbin/sendmail -v address...</b> 155Mail Delivery Status Report will be mailed to <your login name>. 156</pre> 157 158</ul> 159 160<p> These reports contain information that is generated by Postfix 161delivery agents. Since these run as daemon processes that cannot 162interact with users directly, the result is sent as mail to the 163sender of the test message. The format of these reports is practically 164identical to that of ordinary non-delivery notifications. </p> 165 166<p> For a detailed example of a mail delivery status report, see 167the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a> 168section at the end of the ADDRESS_REWRITING_README document. </p> 169 170<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2> 171 172<p> A common mistake is to turn on chroot operation in the master.cf 173file without going through all the necessary steps to set up a 174chroot environment. This causes Postfix daemon processes to fail 175due to all kinds of missing files. </p> 176 177<p> The example below shows an SMTP server that is configured with 178chroot turned off: </p> 179 180<blockquote> 181<pre> 182/etc/postfix/master.cf: 183 # ============================================================= 184 # service type private unpriv <b>chroot</b> wakeup maxproc command 185 # (yes) (yes) <b>(yes)</b> (never) (100) 186 # ============================================================= 187 smtp inet n - <b>n</b> - - smtpd 188</pre> 189</blockquote> 190 191<p> Inspect master.cf for any processes that have chroot operation 192not turned off. If you find any, save a copy of the master.cf file, 193and edit the entries in question. After executing the command 194"<b>postfix reload</b>", see if the problem has gone away. </p> 195 196<p> If turning off chrooted operation made the problem go away, 197then congratulations. Leaving Postfix running in this way is 198adequate for most sites. If you prefer chrooted operation, see 199the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup"> 200BASIC_CONFIGURATION_README</a> file for information about how to 201prepare Postfix for chrooted operation. </p> 202 203<h2><a name="debug_peer">Verbose logging for specific SMTP 204connections</a></h2> 205 206<p> In /etc/postfix/main.cf, list the remote site name or address 207in the debug_peer_list parameter. For example, in order to make 208the software log a lot of information to the syslog daemon for 209connections from or to the loopback interface: </p> 210 211<blockquote> 212<pre> 213/etc/postfix/main.cf: 214 debug_peer_list = 127.0.0.1 215</pre> 216</blockquote> 217 218<p> You can specify one or more hosts, domains, addresses or 219net/masks. To make the change effective immediately, execute the 220command "<b>postfix reload</b>". </p> 221 222<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2> 223 224<p> This example uses <b>tcpdump</b>. In order to record a conversation 225you need to specify a large enough buffer with the "<b>-s</b>" 226option or else you will miss some or all of the packet payload. 227</p> 228 229<blockquote> 230<pre> 231# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b> 232</pre> 233</blockquote> 234 235<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case, 236use "<b>-s 2000</b>" instead. </p> 237 238<p> Run this for a while, stop with Ctrl-C when done. To view the 239data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>. 240</p> 241 242<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2> 243 244<p> Append one or more "<b>-v</b>" options to selected daemon 245definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>". 246This will cause a lot of activity to be logged to the syslog daemon. 247For example, to make the Postfix SMTP server process more verbose: </p> 248 249<blockquote> 250<pre> 251/etc/postfix/master.cf: 252 smtp inet n - n - - smtpd -v 253</pre> 254</blockquote> 255 256<p> To diagnose problems with address rewriting specify a "<b>-v</b>" 257option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to 258diagnose problems with mail delivery specify a "<b>-v</b>" 259option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8), 260local(8), pipe(8), smtp(8), or virtual(8) delivery agent. </p> 261 262<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2> 263 264<p> Many systems allow you to inspect a running process with a 265system call tracer. For example: </p> 266 267<blockquote> 268<pre> 269# <b>trace -p process-id</b> (SunOS 4) 270# <b>strace -p process-id</b> (Linux and many others) 271# <b>truss -p process-id</b> (Solaris, FreeBSD) 272# <b>ktrace -p process-id</b> (generic 4.4BSD) 273</pre> 274</blockquote> 275 276<p> Even more informative are traces of system library calls. 277Examples: </p> 278 279<blockquote> 280<pre> 281# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS) 282# <b>sotruss -p process-id</b> (Solaris) 283</pre> 284</blockquote> 285 286<p> See your system documentation for details. </p> 287 288<p> Tracing a running process can give valuable information about 289what a process is attempting to do. This is as much information as 290you can get without running an interactive debugger program, as 291described in a later section. </p> 292 293<h2><a name="auto_trace">Automatically tracing a Postfix daemon 294process</a></h2> 295 296<p> Postfix can attach a call tracer whenever a daemon process 297starts. Call tracers come in several kinds. </p> 298 299<ol> 300 301<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>, 302<b>strace</b>, or <b>ktrace</b>. These show the communication 303between the process and the kernel. </p> 304 305<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>. 306These show calls of library routines, and give a better idea of 307what is going on within the process. </p> 308 309</ol> 310 311<p> Append a <b>-D</b> option to the suspect command in 312/etc/postfix/master.cf, for example: </p> 313 314<blockquote> 315<pre> 316/etc/postfix/master.cf: 317 smtp inet n - n - - smtpd -D 318</pre> 319</blockquote> 320 321<p> Edit the debugger_command definition in /etc/postfix/main.cf 322so that it invokes the call tracer of your choice, for example: 323</p> 324 325<blockquote> 326<pre> 327/etc/postfix/main.cf: 328 debugger_command = 329 PATH=/bin:/usr/bin:/usr/local/bin; 330 (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5 331</pre> 332</blockquote> 333 334<p> Type "<b>postfix reload</b>" and watch the logfile. </p> 335 336<h2><a name="ddd">Running daemon programs with the interactive 337ddd debugger</a></h2> 338 339<p> If you have X Windows installed on the Postfix machine, then 340an interactive debugger such as <b>ddd</b> can be convenient. 341</p> 342 343<p> Edit the debugger_command definition in /etc/postfix/main.cf 344so that it invokes <b>ddd</b>: </p> 345 346<blockquote> 347<pre> 348/etc/postfix/main.cf: 349 debugger_command = 350 PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin 351 ddd $daemon_directory/$process_name $process_id & sleep 5 352</pre> 353</blockquote> 354 355<p> Be sure that <b>gdb</b> is in the command search path, and 356export <b>XAUTHORITY</b> so that X access control works, for example: 357</p> 358 359<blockquote> 360<pre> 361% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax) 362$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax) 363</pre> 364</blockquote> 365 366<p> Append a <b>-D</b> option to the suspect daemon definition in 367/etc/postfix/master.cf, for example: </p> 368 369<blockquote> 370<pre> 371/etc/postfix/master.cf: 372 smtp inet n - n - - smtpd -D 373</pre> 374</blockquote> 375 376<p> Stop and start the Postfix system. This is necessary so that 377Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b> 378settings. </p> 379 380<p> Whenever the suspect daemon process is started, a debugger 381window pops up and you can watch in detail what happens. </p> 382 383<h2><a name="screen">Running daemon programs with the interactive 384gdb debugger</a></h2> 385 386<p> If you have the screen command installed on the Postfix machine, then 387you can run an interactive debugger such as <b>gdb</b> as follows. </p> 388 389<p> Edit the debugger_command definition in /etc/postfix/main.cf 390so that it runs <b>gdb</b> inside a detached <b>screen</b> session: 391</p> 392 393<blockquote> 394<pre> 395/etc/postfix/main.cf: 396 debugger_command = 397 PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root; 398 export HOME; screen -e^tt -dmS $process_name gdb 399 $daemon_directory/$process_name $process_id & sleep 2 400</pre> 401</blockquote> 402 403<p> Be sure that <b>gdb</b> is in the command search path. </p> 404 405<p> Append a <b>-D</b> option to the suspect daemon definition in 406/etc/postfix/master.cf, for example: </p> 407 408<blockquote> 409<pre> 410/etc/postfix/master.cf: 411 smtp inet n - n - - smtpd -D 412</pre> 413</blockquote> 414 415<p> Execute the command "<b>postfix reload</b>" and wait until a 416daemon process is started (you can see this in the maillog file). 417</p> 418 419<p> Then attach to the screen, and debug away: </p> 420 421<blockquote> 422<pre> 423# HOME=/root screen -r 424gdb) continue 425gdb) where 426</pre> 427</blockquote> 428 429<h2><a name="gdb">Running daemon programs under a non-interactive 430debugger</a></h2> 431 432<p> If you do not have X Windows installed on the Postfix machine, 433or if you are not familiar with interactive debuggers, then you 434can try to run <b>gdb</b> in non-interactive mode, and have it 435print a stack trace when the process crashes. </p> 436 437<p> Edit the debugger_command definition in /etc/postfix/main.cf 438so that it invokes the <b>gdb</b> debugger: </p> 439 440<blockquote> 441<pre> 442/etc/postfix/main.cf: 443 debugger_command = 444 PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo 445 where; sleep 8640000) | gdb $daemon_directory/$process_name 446 $process_id 2>&1 447 >$config_directory/$process_name.$process_id.log & sleep 5 448</pre> 449</blockquote> 450 451<p> Append a <b>-D</b> option to the suspect daemon in 452/etc/postfix/master.cf, for example: </p> 453 454<blockquote> 455<pre> 456/etc/postfix/master.cf: 457 smtp inet n - n - - smtpd -D 458</pre> 459</blockquote> 460 461<p> Type "<b>postfix reload</b>" to make the configuration changes 462effective. </p> 463 464<p> Whenever a suspect daemon process is started, an output file 465is created, named after the daemon and process ID (for example, 466smtpd.12345.log). When the process crashes, a stack trace (with 467output from the "<b>where</b>" command) is written to its logfile. 468</p> 469 470<h2><a name="unreasonable">Unreasonable behavior</a></h2> 471 472<p> Sometimes the behavior exhibited by Postfix just does not match the 473source code. Why can a program deviate from the instructions given 474by its author? There are two possibilities. </p> 475 476<ul> 477 478<li> <p> The compiler has erred. This rarely happens. </p> 479 480<li> <p> The hardware has erred. Does the machine have ECC memory? </p> 481 482</ul> 483 484<p> In both cases, the program being executed is not the program 485that was supposed to be executed, so anything could happen. </p> 486 487<p> There is a third possibility: </p> 488 489<ul> 490 491<li> <p> Bugs in system software (kernel or libraries). </p> 492 493</ul> 494 495<p> Hardware-related failures usually do not reproduce in exactly 496the same way after power cycling and rebooting the system. There's 497little Postfix can do about bad hardware. Be sure to use hardware 498that at the very least can detect memory errors. Otherwise, Postfix 499will just be waiting to be hit by a bit error. Critical systems 500deserve real hardware. </p> 501 502<p> When a compiler makes an error, the problem can be reproduced 503whenever the resulting program is run. Compiler errors are most 504likely to happen in the code optimizer. If a problem is reproducible 505across power cycles and system reboots, it can be worthwhile to 506rebuild Postfix with optimization disabled, and to see if optimization 507makes a difference. </p> 508 509<p> In order to compile Postfix with optimizations turned off: </p> 510 511<blockquote> 512<pre> 513% <b>make tidy</b> 514% <b>make makefiles OPT=</b> 515</pre> 516</blockquote> 517 518<p> This produces a set of Makefiles that do not request compiler 519optimization. </p> 520 521<p> Once the makefiles are set up, build the software: </p> 522 523<blockquote> 524<pre> 525% <b>make</b> 526% <b>su</b> 527Password: 528# <b>make install</b> 529</pre> 530</blockquote> 531 532<p> If the problem goes away, then it is time to ask your vendor 533for help. </p> 534 535<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2> 536 537<p> The people who participate on postfix-users@postfix.org 538are very helpful, especially if YOU provide them with sufficient 539information. Remember, these volunteers are willing to help, but 540their time is limited. </p> 541 542<p> When reporting a problem, be sure to include the following 543information. </p> 544 545<ul> 546 547<li> <p> A summary of the problem. Please do not just send some 548logging without explanation of what YOU believe is wrong. </p> 549 550<li> <p> Complete error messages. Please use cut-and-paste, or use 551attachments, instead of reciting information from memory. 552</p> 553 554<li> <p> Postfix logging. See the text at the top of the DEBUG_README 555document to find out where logging is stored. Please do not frustrate 556the helpers by word wrapping the logging. If the logging is more 557than a few kbytes of text, consider posting an URL on a web or ftp 558site. </p> 559 560<li> <p> Consider using a test email address so that you don't have 561to reveal email addresses or passwords of innocent people. </p> 562 563<li> <p> If you can't use a test email address, please anonymize 564email addresses and host names consistently. Replace each letter 565by "A", each digit 566by "D" so that the helpers can still recognize syntactical errors. 567</p> 568 569<li> <p> Command output from:</p> 570 571<ul> 572 573<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file, 574or 1000+ lines of <b>postconf</b> command output. </p> 575 576<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p> 577 578</ul> 579 580<li> <p> Better, provide output from the <b>postfinger</b> tool. 581This can be found at https://github.com/ford--prefect/postfinger. </p> 582 583<li> <p> If the problem is SASL related, consider including the 584output from the <b>saslfinger</b> tool. This can be found at 585https://packages.debian.org/search?keywords=sasl2-bin. </p> 586 587<li> <p> If the problem is about too much mail in the queue, consider 588including output from the <b>qshape</b> tool, as described in the 589QSHAPE_README file. </p> 590 591<li> <p> If the problem is protocol related (connections time out, 592or an SMTP server complains about syntax errors etc.) consider 593recording a session with <b>tcpdump</b>, as described in the <a 594href="#sniffer">DEBUG_README</a> document. </ul> 595 596</body> 597 598</html> 599