xref: /netbsd-src/external/ibm-public/postfix/dist/proto/DEBUG_README.html (revision 059c16a85b0b39d60ad6d18f53c09510815afa2b)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title> Postfix Debugging Howto </title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
11<link rel='stylesheet' type='text/css' href='postfix-doc.css'>
12
13</head>
14
15<body>
16
17<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>
18
19<hr>
20
21<h2>Purpose of this document</h2>
22
23<p> This document describes how to debug parts of the Postfix mail
24system when things do not work according to expectation. The methods
25vary from making Postfix log a lot of detail, to running some daemon
26processes under control of a call tracer or debugger. </p>
27
28<p> The text assumes that the Postfix main.cf and master.cf
29configuration files are stored in directory /etc/postfix. You can
30use the command "<b>postconf config_directory</b>" to find out the
31actual location of this directory on your machine. </p>
32
33<p> Listed in order of increasing invasiveness, the debugging
34techniques are as follows: </p>
35
36<ul>
37
38<li><a href="#logging">Look for obvious signs of trouble</a>
39
40<li><a href="#trace_mail">Debugging Postfix from inside</a>
41
42<li><a href="#no_chroot">Try turning off chroot operation in
43master.cf</a>
44
45<li><a href="#debug_peer">Verbose logging for specific SMTP
46connections</a>
47
48<li><a href="#sniffer">Record the SMTP session with a network
49sniffer</a>
50
51<li><a href="#verbose">Making Postfix daemon programs more verbose</a>
52
53<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>
54
55<li><a href="#auto_trace">Automatically tracing a Postfix daemon
56process</a>
57
58<li><a href="#ddd">Running daemon programs with the interactive
59ddd debugger</a>
60
61<li><a href="#screen">Running daemon programs with the interactive
62gdb debugger</a>
63
64<li><a href="#gdb">Running daemon programs under a non-interactive
65debugger</a>
66
67<li><a href="#unreasonable">Unreasonable behavior</a>
68
69<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>
70
71</ul>
72
73<h2><a name="logging">Look for obvious signs of trouble</a></h2>
74
75<p> Postfix logs all failed and successful deliveries to a logfile. </p>
76
77<ul>
78
79<li> <p> When Postfix uses syslog logging (the default), the file
80is usually called /var/log/maillog, /var/log/mail, or something
81similar; the exact pathname is configured in a file called
82/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p>
83
84<li> <p> When Postfix uses its own logging system (see MAILLOG_README),
85the location of the logfile is configured with the Postfix maillog_file
86parameter. </p>
87
88</ul>
89
90<p> When Postfix does not receive or deliver mail, the first order
91of business is to look for errors that prevent Postfix from working
92properly:  </p>
93
94<blockquote>
95<pre>
96% <b>grep -E '(warning|error|fatal|panic):' /some/log/file | more</b>
97</pre>
98</blockquote>
99
100<p> Note: the most important message is near the BEGINNING of the
101output.  Error messages that come later are less useful. </p>
102
103<p> The nature of each problem is indicated as follows: </p>
104
105<ul>
106
107<li> <p> "<b>panic</b>" indicates a problem in the software itself
108that only a programmer can fix. Postfix cannot proceed until this
109is fixed. </p>
110
111<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
112permissions, incorrect configuration file settings that you can
113fix.  Postfix cannot proceed until this is fixed. </p>
114
115<li> <p> "<b>error</b>" reports an error condition. For safety
116reasons, a Postfix process will terminate when more than 13 of these
117happen. </p>
118
119<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
120problems that you may not be able to fix (such as a broken DNS
121server elsewhere on the network) but may also indicate local
122configuration errors that could become a problem later. </p>
123
124</ul>
125
126<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>
127
128<p> Postfix version 2.1 and later can
129produce mail delivery reports for debugging purposes. These reports
130not only show sender/recipient addresses after address rewriting
131and alias expansion or forwarding, they also show information about
132delivery to mailbox, delivery to non-Postfix command, responses
133from remote SMTP servers, and so on.
134</p>
135
136<p> Postfix can produce two types of mail delivery reports for
137debugging: </p>
138
139<ul>
140
141<li> <p> What-if: report what would happen, but do not actually
142deliver mail. This mode of operation is requested with: </p>
143
144<pre>
145% <b>/usr/sbin/sendmail -bv address...</b>
146Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
147</pre>
148
149<li> <p> What happened: deliver mail and report successes and/or
150failures, including replies from remote SMTP servers.  This mode
151of operation is requested with: </p>
152
153<pre>
154% <b>/usr/sbin/sendmail -v address...</b>
155Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
156</pre>
157
158</ul>
159
160<p> These reports contain information that is generated by Postfix
161delivery agents. Since these run as daemon processes that cannot
162interact with users directly, the result is sent as mail to the
163sender of the test message. The format of these reports is practically
164identical to that of ordinary non-delivery notifications. </p>
165
166<p> For a detailed example of a mail delivery status report, see
167the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
168section at the end of the ADDRESS_REWRITING_README document.  </p>
169
170<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>
171
172<p> A common mistake is to turn on chroot operation in the master.cf
173file without going through all the necessary steps to set up a
174chroot environment. This causes Postfix daemon processes to fail
175due to all kinds of missing files. </p>
176
177<p> The example below shows an SMTP server that is configured with
178chroot turned off: </p>
179
180<blockquote>
181<pre>
182/etc/postfix/master.cf:
183    # =============================================================
184    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
185    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
186    # =============================================================
187    smtp      inet  n       -       <b>n</b>       -       -       smtpd
188</pre>
189</blockquote>
190
191<p> Inspect master.cf for any processes that have chroot operation
192not turned off. If you find any, save a copy of the master.cf file,
193and edit the entries in question.  After executing the command
194"<b>postfix reload</b>", see if the problem has gone away. </p>
195
196<p> If turning off chrooted operation made the problem go away,
197then congratulations.  Leaving Postfix running in this way is
198adequate for most sites.  If you prefer chrooted operation, see
199the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
200BASIC_CONFIGURATION_README</a> file for information about how to
201prepare Postfix for chrooted operation. </p>
202
203<h2><a name="debug_peer">Verbose logging for specific SMTP
204connections</a></h2>
205
206<p> In /etc/postfix/main.cf, list the remote site name or address
207in the debug_peer_list parameter. For example, in order to make
208the software log a lot of information to the syslog daemon for
209connections from or to the loopback interface: </p>
210
211<blockquote>
212<pre>
213/etc/postfix/main.cf:
214    debug_peer_list = 127.0.0.1
215</pre>
216</blockquote>
217
218<p> You can specify one or more hosts, domains, addresses or
219net/masks.  To make the change effective immediately, execute the
220command "<b>postfix reload</b>". </p>
221
222<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>
223
224<p> This example uses <b>tcpdump</b>. In order to record a conversation
225you need to specify a large enough buffer with the "<b>-s</b>"
226option or else you will miss some or all of the packet payload.
227</p>
228
229<blockquote>
230<pre>
231# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
232</pre>
233</blockquote>
234
235<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
236use "<b>-s 2000</b>" instead.  </p>
237
238<p> Run this for a while, stop with Ctrl-C when done. To view the
239data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
240</p>
241
242<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>
243
244<p> Append one or more "<b>-v</b>" options to selected daemon
245definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>".
246This will cause a lot of activity to be logged to the syslog daemon.
247For example, to make the Postfix SMTP server process more verbose: </p>
248
249<blockquote>
250<pre>
251/etc/postfix/master.cf:
252    smtp      inet  n       -       n       -       -       smtpd -v
253</pre>
254</blockquote>
255
256<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
257option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
258diagnose problems with mail delivery specify a "<b>-v</b>"
259option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
260local(8), pipe(8), smtp(8), or virtual(8) delivery agent.  </p>
261
262<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>
263
264<p> Many systems allow you to inspect a running process with a
265system call tracer. For example: </p>
266
267<blockquote>
268<pre>
269# <b>trace -p process-id</b> (SunOS 4)
270# <b>strace -p process-id</b> (Linux and many others)
271# <b>truss -p process-id</b> (Solaris, FreeBSD)
272# <b>ktrace -p process-id</b> (generic 4.4BSD)
273</pre>
274</blockquote>
275
276<p> Even more informative are traces of system library calls.
277Examples: </p>
278
279<blockquote>
280<pre>
281# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
282# <b>sotruss -p process-id</b> (Solaris)
283</pre>
284</blockquote>
285
286<p> See your system documentation for details. </p>
287
288<p> Tracing a running process can give valuable information about
289what a process is attempting to do. This is as much information as
290you can get without running an interactive debugger program, as
291described in a later section. </p>
292
293<h2><a name="auto_trace">Automatically tracing a Postfix daemon
294process</a></h2>
295
296<p> Postfix can attach a call tracer whenever a daemon process
297starts.  Call tracers come in several kinds. </p>
298
299<ol>
300
301<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
302<b>strace</b>, or <b>ktrace</b>.  These show the communication
303between the process and the kernel. </p>
304
305<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
306These show calls of library routines, and give a better idea of
307what is going on within the process. </p>
308
309</ol>
310
311<p> Append a <b>-D</b> option to the suspect command in
312/etc/postfix/master.cf, for example: </p>
313
314<blockquote>
315<pre>
316/etc/postfix/master.cf:
317    smtp      inet  n       -       n       -       -       smtpd -D
318</pre>
319</blockquote>
320
321<p> Edit the debugger_command definition in /etc/postfix/main.cf
322so that it invokes the call tracer of your choice, for example:
323</p>
324
325<blockquote>
326<pre>
327/etc/postfix/main.cf:
328    debugger_command =
329         PATH=/bin:/usr/bin:/usr/local/bin;
330         (truss -p $process_id 2&gt;&amp;1 | logger -p mail.info) &amp; sleep 5
331</pre>
332</blockquote>
333
334<p> Type "<b>postfix reload</b>" and watch the logfile. </p>
335
336<h2><a name="ddd">Running daemon programs with the interactive
337ddd debugger</a></h2>
338
339<p> If you have X Windows installed on the Postfix machine, then
340an interactive debugger such as <b>ddd</b> can be convenient.
341</p>
342
343<p> Edit the debugger_command definition in /etc/postfix/main.cf
344so that it invokes <b>ddd</b>: </p>
345
346<blockquote>
347<pre>
348/etc/postfix/main.cf:
349    debugger_command =
350         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
351         ddd $daemon_directory/$process_name $process_id &amp; sleep 5
352</pre>
353</blockquote>
354
355<p> Be sure that <b>gdb</b> is in the command search path, and
356export <b>XAUTHORITY</b> so that X access control works, for example:
357</p>
358
359<blockquote>
360<pre>
361% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
362$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
363</pre>
364</blockquote>
365
366<p> Append a <b>-D</b> option to the suspect daemon definition in
367/etc/postfix/master.cf, for example: </p>
368
369<blockquote>
370<pre>
371/etc/postfix/master.cf:
372    smtp      inet  n       -       n       -       -       smtpd -D
373</pre>
374</blockquote>
375
376<p> Stop and start the Postfix system.  This is necessary so that
377Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
378settings. </p>
379
380<p> Whenever the suspect daemon process is started, a debugger
381window pops up and you can watch in detail what happens. </p>
382
383<h2><a name="screen">Running daemon programs with the interactive
384gdb debugger</a></h2>
385
386<p> If you have the screen command installed on the Postfix machine, then
387you can run an interactive debugger such as <b>gdb</b> as follows. </p>
388
389<p> Edit the debugger_command definition in /etc/postfix/main.cf
390so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
391</p>
392
393<blockquote>
394<pre>
395/etc/postfix/main.cf:
396    debugger_command =
397        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
398        export HOME; screen -e^tt -dmS $process_name gdb
399        $daemon_directory/$process_name $process_id &amp; sleep 2
400</pre>
401</blockquote>
402
403<p> Be sure that <b>gdb</b> is in the command search path. </p>
404
405<p> Append a <b>-D</b> option to the suspect daemon definition in
406/etc/postfix/master.cf, for example: </p>
407
408<blockquote>
409<pre>
410/etc/postfix/master.cf:
411    smtp      inet  n       -       n       -       -       smtpd -D
412</pre>
413</blockquote>
414
415<p> Execute the command "<b>postfix reload</b>" and wait until a
416daemon process is started (you can see this in the maillog file).
417</p>
418
419<p> Then attach to the screen, and debug away: </p>
420
421<blockquote>
422<pre>
423# HOME=/root screen -r
424gdb) continue
425gdb) where
426</pre>
427</blockquote>
428
429<h2><a name="gdb">Running daemon programs under a non-interactive
430debugger</a></h2>
431
432<p> If you do not have X Windows installed on the Postfix machine,
433or if you are not familiar with interactive debuggers, then you
434can try to run <b>gdb</b> in non-interactive mode, and have it
435print a stack trace when the process crashes.  </p>
436
437<p> Edit the debugger_command definition in /etc/postfix/main.cf
438so that it invokes the <b>gdb</b> debugger: </p>
439
440<blockquote>
441<pre>
442/etc/postfix/main.cf:
443    debugger_command =
444        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
445        where; sleep 8640000) | gdb $daemon_directory/$process_name
446        $process_id 2&gt&amp;1
447        &gt;$config_directory/$process_name.$process_id.log &amp; sleep 5
448</pre>
449</blockquote>
450
451<p> Append a <b>-D</b> option to the suspect daemon in
452/etc/postfix/master.cf, for example: </p>
453
454<blockquote>
455<pre>
456/etc/postfix/master.cf:
457    smtp      inet  n       -       n       -       -       smtpd -D
458</pre>
459</blockquote>
460
461<p> Type "<b>postfix reload</b>" to make the configuration changes
462effective.  </p>
463
464<p> Whenever a suspect daemon process is started, an output file
465is created, named after the daemon and process ID (for example,
466smtpd.12345.log). When the process crashes, a stack trace (with
467output from the "<b>where</b>" command) is written to its logfile.
468</p>
469
470<h2><a name="unreasonable">Unreasonable behavior</a></h2>
471
472<p> Sometimes the behavior exhibited by Postfix just does not match the
473source code. Why can a program deviate from the instructions given
474by its author? There are two possibilities. </p>
475
476<ul>
477
478<li> <p> The compiler has erred. This rarely happens. </p>
479
480<li> <p> The hardware has erred. Does the machine have ECC memory? </p>
481
482</ul>
483
484<p> In both cases, the program being executed is not the program
485that was supposed to be executed, so anything could happen. </p>
486
487<p> There is a third possibility: </p>
488
489<ul>
490
491<li> <p> Bugs in system software (kernel or libraries). </p>
492
493</ul>
494
495<p> Hardware-related failures usually do not reproduce in exactly
496the same way after power cycling and rebooting the system.  There's
497little Postfix can do about bad hardware.  Be sure to use hardware
498that at the very least can detect memory errors.  Otherwise, Postfix
499will just be waiting to be hit by a bit error.  Critical systems
500deserve real hardware. </p>
501
502<p> When a compiler makes an error, the problem can be reproduced
503whenever the resulting program is run. Compiler errors are most
504likely to happen in the code optimizer. If a problem is reproducible
505across power cycles and system reboots, it can be worthwhile to
506rebuild Postfix with optimization disabled, and to see if optimization
507makes a difference. </p>
508
509<p> In order to compile Postfix with optimizations turned off: </p>
510
511<blockquote>
512<pre>
513% <b>make tidy</b>
514% <b>make makefiles OPT=</b>
515</pre>
516</blockquote>
517
518<p> This produces a set of Makefiles that do not request compiler
519optimization.  </p>
520
521<p> Once the makefiles are set up, build the software: </p>
522
523<blockquote>
524<pre>
525% <b>make</b>
526% <b>su</b>
527Password:
528# <b>make install</b>
529</pre>
530</blockquote>
531
532<p> If the problem goes away, then it is time to ask your vendor
533for help. </p>
534
535<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>
536
537<p> The people who participate on postfix-users@postfix.org
538are very helpful, especially if YOU provide them with sufficient
539information.  Remember, these volunteers are willing to help, but
540their time is limited. </p>
541
542<p> When reporting a problem, be sure to include the following
543information. </p>
544
545<ul>
546
547<li> <p> A summary of the problem. Please do not just send some
548logging without explanation of what YOU believe is wrong. </p>
549
550<li> <p> Complete error messages. Please use cut-and-paste, or use
551attachments, instead of reciting information from memory.
552</p>
553
554<li> <p> Postfix logging. See the text at the top of the DEBUG_README
555document to find out where logging is stored. Please do not frustrate
556the helpers by word wrapping the logging. If the logging is more
557than a few kbytes of text, consider posting an URL on a web or ftp
558site. </p>
559
560<li> <p> Consider using a test email address so that you don't have
561to reveal email addresses or passwords of innocent people. </p>
562
563<li> <p> If you can't use a test email address, please anonymize
564email addresses and host names consistently. Replace each letter
565by "A", each digit
566by "D" so that the helpers can still recognize syntactical errors.
567</p>
568
569<li> <p> Command output from:</p>
570
571<ul>
572
573<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file,
574or 1000+ lines of <b>postconf</b> command output. </p>
575
576<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p>
577
578</ul>
579
580<li> <p> Better, provide output from the <b>postfinger</b> tool.
581This can be found at https://github.com/ford--prefect/postfinger.  </p>
582
583<li> <p> If the problem is SASL related, consider including the
584output from the <b>saslfinger</b> tool. This can be found at
585https://packages.debian.org/search?keywords=sasl2-bin. </p>
586
587<li> <p> If the problem is about too much mail in the queue, consider
588including output from the <b>qshape</b> tool, as described in the
589QSHAPE_README file. </p>
590
591<li> <p> If the problem is protocol related (connections time out,
592or an SMTP server complains about syntax errors etc.) consider
593recording a session with <b>tcpdump</b>, as described in the <a
594href="#sniffer">DEBUG_README</a> document.  </ul>
595
596</body>
597
598</html>
599