xref: /netbsd-src/external/ibm-public/postfix/dist/html/DEBUG_README.html (revision 413d532bcc3f62d122e56d92e13ac64825a40baf)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title> Postfix Debugging Howto </title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
11
12</head>
13
14<body>
15
16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>
17
18<hr>
19
20<h2>Purpose of this document</h2>
21
22<p> This document describes how to debug parts of the Postfix mail
23system when things do not work according to expectation. The methods
24vary from making Postfix log a lot of detail, to running some daemon
25processes under control of a call tracer or debugger. </p>
26
27<p> The text assumes that the Postfix <a href="postconf.5.html">main.cf</a> and <a href="master.5.html">master.cf</a>
28configuration files are stored in directory /etc/postfix. You can
29use the command "<b>postconf <a href="postconf.5.html#config_directory">config_directory</a></b>" to find out the
30actual location of this directory on your machine. </p>
31
32<p> Listed in order of increasing invasiveness, the debugging
33techniques are as follows: </p>
34
35<ul>
36
37<li><a href="#logging">Look for obvious signs of trouble</a>
38
39<li><a href="#trace_mail">Debugging Postfix from inside</a>
40
41<li><a href="#no_chroot">Try turning off chroot operation in
42master.cf</a>
43
44<li><a href="#debug_peer">Verbose logging for specific SMTP
45connections</a>
46
47<li><a href="#sniffer">Record the SMTP session with a network
48sniffer</a>
49
50<li><a href="#verbose">Making Postfix daemon programs more verbose</a>
51
52<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>
53
54<li><a href="#auto_trace">Automatically tracing a Postfix daemon
55process</a>
56
57<li><a href="#ddd">Running daemon programs with the interactive
58ddd debugger</a>
59
60<li><a href="#screen">Running daemon programs with the interactive
61gdb debugger</a>
62
63<li><a href="#gdb">Running daemon programs under a non-interactive
64debugger</a>
65
66<li><a href="#unreasonable">Unreasonable behavior</a>
67
68<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>
69
70</ul>
71
72<h2><a name="logging">Look for obvious signs of trouble</a></h2>
73
74<p> Postfix logs all failed and successful deliveries to a logfile.
75The file is usually called /var/log/maillog or /var/log/mail; the
76exact pathname is defined in the /etc/syslog.conf file. </p>
77
78<p> When Postfix does not receive or deliver mail, the first order
79of business is to look for errors that prevent Postfix from working
80properly:  </p>
81
82<blockquote>
83<pre>
84% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b>
85</pre>
86</blockquote>
87
88<p> Note: the most important message is near the BEGINNING of the
89output.  Error messages that come later are less useful. </p>
90
91<p> The nature of each problem is indicated as follows: </p>
92
93<ul>
94
95<li> <p> "<b>panic</b>" indicates a problem in the software itself
96that only a programmer can fix. Postfix cannot proceed until this
97is fixed. </p>
98
99<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
100permissions, incorrect configuration file settings that you can
101fix.  Postfix cannot proceed until this is fixed. </p>
102
103<li> <p> "<b>error</b>" reports an error condition. For safety
104reasons, a Postfix process will terminate when more than 13 of these
105happen. </p>
106
107<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
108problems that you may not be able to fix (such as a broken DNS
109server elsewhere on the network) but may also indicate local
110configuration errors that could become a problem later. </p>
111
112</ul>
113
114<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>
115
116<p> Postfix version 2.1 and later can
117produce mail delivery reports for debugging purposes. These reports
118not only show sender/recipient addresses after address rewriting
119and alias expansion or forwarding, they also show information about
120delivery to mailbox, delivery to non-Postfix command, responses
121from remote SMTP servers, and so on.
122</p>
123
124<p> Postfix can produce two types of mail delivery reports for
125debugging: </p>
126
127<ul>
128
129<li> <p> What-if: report what would happen, but do not actually
130deliver mail. This mode of operation is requested with: </p>
131
132<pre>
133% <b>/usr/sbin/sendmail -bv address...</b>
134Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
135</pre>
136
137<li> <p> What happened: deliver mail and report successes and/or
138failures, including replies from remote SMTP servers.  This mode
139of operation is requested with: </p>
140
141<pre>
142% <b>/usr/sbin/sendmail -v address...</b>
143Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
144</pre>
145
146</ul>
147
148<p> These reports contain information that is generated by Postfix
149delivery agents. Since these run as daemon processes that cannot
150interact with users directly, the result is sent as mail to the
151sender of the test message. The format of these reports is practically
152identical to that of ordinary non-delivery notifications. </p>
153
154<p> For a detailed example of a mail delivery status report, see
155the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
156section at the end of the <a href="ADDRESS_REWRITING_README.html">ADDRESS_REWRITING_README</a> document.  </p>
157
158<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>
159
160<p> A common mistake is to turn on chroot operation in the <a href="master.5.html">master.cf</a>
161file without going through all the necessary steps to set up a
162chroot environment. This causes Postfix daemon processes to fail
163due to all kinds of missing files. </p>
164
165<p> The example below shows an SMTP server that is configured with
166chroot turned off: </p>
167
168<blockquote>
169<pre>
170/etc/postfix/<a href="master.5.html">master.cf</a>:
171    # =============================================================
172    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
173    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
174    # =============================================================
175    smtp      inet  n       -       <b>n</b>       -       -       smtpd
176</pre>
177</blockquote>
178
179<p> Inspect <a href="master.5.html">master.cf</a> for any processes that have chroot operation
180not turned off. If you find any, save a copy of the <a href="master.5.html">master.cf</a> file,
181and edit the entries in question.  After executing the command
182"<b>postfix reload</b>", see if the problem has gone away. </p>
183
184<p> If turning off chrooted operation made the problem go away,
185then congratulations.  Leaving Postfix running in this way is
186adequate for most sites.  If you prefer chrooted operation, see
187the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
188BASIC_CONFIGURATION_README</a> file for information about how to
189prepare Postfix for chrooted operation. </p>
190
191<h2><a name="debug_peer">Verbose logging for specific SMTP
192connections</a></h2>
193
194<p> In /etc/postfix/<a href="postconf.5.html">main.cf</a>, list the remote site name or address
195in the <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> parameter. For example, in order to make
196the software log a lot of information to the syslog daemon for
197connections from or to the loopback interface: </p>
198
199<blockquote>
200<pre>
201/etc/postfix/<a href="postconf.5.html">main.cf</a>:
202    <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> = 127.0.0.1
203</pre>
204</blockquote>
205
206<p> You can specify one or more hosts, domains, addresses or
207net/masks.  To make the change effective immediately, execute the
208command "<b>postfix reload</b>". </p>
209
210<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>
211
212<p> This example uses <b>tcpdump</b>. In order to record a conversation
213you need to specify a large enough buffer with the "<b>-s</b>"
214option or else you will miss some or all of the packet payload.
215</p>
216
217<blockquote>
218<pre>
219# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
220</pre>
221</blockquote>
222
223<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
224use "<b>-s 2000</b>" instead.  </p>
225
226<p> Run this for a while, stop with Ctrl-C when done. To view the
227data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
228</p>
229
230<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>
231
232<p> Append one or more "<b>-v</b>" options to selected daemon
233definitions in /etc/postfix/<a href="master.5.html">master.cf</a> and type "<b>postfix reload</b>".
234This will cause a lot of activity to be logged to the syslog daemon.
235For example, to make the Postfix SMTP server process more verbose: </p>
236
237<blockquote>
238<pre>
239/etc/postfix/<a href="master.5.html">master.cf</a>:
240    smtp      inet  n       -       n       -       -       smtpd -v
241</pre>
242</blockquote>
243
244<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
245option for the <a href="cleanup.8.html">cleanup(8)</a> and/or <a href="trivial-rewrite.8.html">trivial-rewrite(8)</a> daemon, and to
246diagnose problems with mail delivery specify a "<b>-v</b>"
247option for the <a href="qmgr.8.html">qmgr(8)</a> or <a href="qmgr.8.html">oqmgr(8)</a> queue manager, or for the <a href="lmtp.8.html">lmtp(8)</a>,
248<a href="local.8.html">local(8)</a>, <a href="pipe.8.html">pipe(8)</a>, <a href="smtp.8.html">smtp(8)</a>, or <a href="virtual.8.html">virtual(8)</a> delivery agent.  </p>
249
250<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>
251
252<p> Many systems allow you to inspect a running process with a
253system call tracer. For example: </p>
254
255<blockquote>
256<pre>
257# <b>trace -p process-id</b> (SunOS 4)
258# <b>strace -p process-id</b> (Linux and many others)
259# <b>truss -p process-id</b> (Solaris, FreeBSD)
260# <b>ktrace -p process-id</b> (generic 4.4BSD)
261</pre>
262</blockquote>
263
264<p> Even more informative are traces of system library calls.
265Examples: </p>
266
267<blockquote>
268<pre>
269# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
270# <b>sotruss -p process-id</b> (Solaris)
271</pre>
272</blockquote>
273
274<p> See your system documentation for details. </p>
275
276<p> Tracing a running process can give valuable information about
277what a process is attempting to do. This is as much information as
278you can get without running an interactive debugger program, as
279described in a later section. </p>
280
281<h2><a name="auto_trace">Automatically tracing a Postfix daemon
282process</a></h2>
283
284<p> Postfix can attach a call tracer whenever a daemon process
285starts.  Call tracers come in several kinds. </p>
286
287<ol>
288
289<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
290<b>strace</b>, or <b>ktrace</b>.  These show the communication
291between the process and the kernel. </p>
292
293<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
294These show calls of library routines, and give a better idea of
295what is going on within the process. </p>
296
297</ol>
298
299<p> Append a <b>-D</b> option to the suspect command in
300/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
301
302<blockquote>
303<pre>
304/etc/postfix/<a href="master.5.html">master.cf</a>:
305    smtp      inet  n       -       n       -       -       smtpd -D
306</pre>
307</blockquote>
308
309<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
310so that it invokes the call tracer of your choice, for example:
311</p>
312
313<blockquote>
314<pre>
315/etc/postfix/<a href="postconf.5.html">main.cf</a>:
316    <a href="postconf.5.html#debugger_command">debugger_command</a> =
317         PATH=/bin:/usr/bin:/usr/local/bin;
318         (truss -p $<a href="postconf.5.html#process_id">process_id</a> 2&gt;&amp;1 | logger -p mail.info) &amp; sleep 5
319</pre>
320</blockquote>
321
322<p> Type "<b>postfix reload</b>" and watch the logfile. </p>
323
324<h2><a name="ddd">Running daemon programs with the interactive
325ddd debugger</a></h2>
326
327<p> If you have X Windows installed on the Postfix machine, then
328an interactive debugger such as <b>ddd</b> can be convenient.
329</p>
330
331<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
332so that it invokes <b>ddd</b>: </p>
333
334<blockquote>
335<pre>
336/etc/postfix/<a href="postconf.5.html">main.cf</a>:
337    <a href="postconf.5.html#debugger_command">debugger_command</a> =
338         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
339         ddd $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> &amp; sleep 5
340</pre>
341</blockquote>
342
343<p> Be sure that <b>gdb</b> is in the command search path, and
344export <b>XAUTHORITY</b> so that X access control works, for example:
345</p>
346
347<blockquote>
348<pre>
349% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
350$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
351</pre>
352</blockquote>
353
354<p> Append a <b>-D</b> option to the suspect daemon definition in
355/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
356
357<blockquote>
358<pre>
359/etc/postfix/<a href="master.5.html">master.cf</a>:
360    smtp      inet  n       -       n       -       -       smtpd -D
361</pre>
362</blockquote>
363
364<p> Stop and start the Postfix system.  This is necessary so that
365Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
366settings. </p>
367
368<p> Whenever the suspect daemon process is started, a debugger
369window pops up and you can watch in detail what happens. </p>
370
371<h2><a name="screen">Running daemon programs with the interactive
372gdb debugger</a></h2>
373
374<p> If you have the screen command installed on the Postfix machine, then
375you can run an interactive debugger such as <b>gdb</b> as follows. </p>
376
377<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
378so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
379</p>
380
381<blockquote>
382<pre>
383/etc/postfix/<a href="postconf.5.html">main.cf</a>:
384    <a href="postconf.5.html#debugger_command">debugger_command</a> =
385        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
386        export HOME; screen -e^tt -dmS $<a href="postconf.5.html#process_name">process_name</a> gdb
387        $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> &amp; sleep 2
388</pre>
389</blockquote>
390
391<p> Be sure that <b>gdb</b> is in the command search path. </p>
392
393<p> Append a <b>-D</b> option to the suspect daemon definition in
394/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
395
396<blockquote>
397<pre>
398/etc/postfix/<a href="master.5.html">master.cf</a>:
399    smtp      inet  n       -       n       -       -       smtpd -D
400</pre>
401</blockquote>
402
403<p> Execute the command "<b>postfix reload</b>" and wait until a
404daemon process is started (you can see this in the maillog file).
405</p>
406
407<p> Then attach to the screen, and debug away: </p>
408
409<blockquote>
410<pre>
411# HOME=/root screen -r
412gdb) continue
413gdb) where
414</pre>
415</blockquote>
416
417<h2><a name="gdb">Running daemon programs under a non-interactive
418debugger</a></h2>
419
420<p> If you do not have X Windows installed on the Postfix machine,
421or if you are not familiar with interactive debuggers, then you
422can try to run <b>gdb</b> in non-interactive mode, and have it
423print a stack trace when the process crashes.  </p>
424
425<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
426so that it invokes the <b>gdb</b> debugger: </p>
427
428<blockquote>
429<pre>
430/etc/postfix/<a href="postconf.5.html">main.cf</a>:
431    <a href="postconf.5.html#debugger_command">debugger_command</a> =
432        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
433        where; sleep 8640000) | gdb $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>
434        $<a href="postconf.5.html#process_id">process_id</a> 2&gt&amp;1
435        &gt;$<a href="postconf.5.html#config_directory">config_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>.$<a href="postconf.5.html#process_id">process_id</a>.log &amp; sleep 5
436</pre>
437</blockquote>
438
439<p> Append a <b>-D</b> option to the suspect daemon in
440/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
441
442<blockquote>
443<pre>
444/etc/postfix/<a href="master.5.html">master.cf</a>:
445    smtp      inet  n       -       n       -       -       smtpd -D
446</pre>
447</blockquote>
448
449<p> Type "<b>postfix reload</b>" to make the configuration changes
450effective.  </p>
451
452<p> Whenever a suspect daemon process is started, an output file
453is created, named after the daemon and process ID (for example,
454smtpd.12345.log). When the process crashes, a stack trace (with
455output from the "<b>where</b>" command) is written to its logfile.
456</p>
457
458<h2><a name="unreasonable">Unreasonable behavior</a></h2>
459
460<p> Sometimes the behavior exhibited by Postfix just does not match the
461source code. Why can a program deviate from the instructions given
462by its author? There are two possibilities. </p>
463
464<ul>
465
466<li> <p> The compiler has erred. This rarely happens. </p>
467
468<li> <p> The hardware has erred. Does the machine have ECC memory? </p>
469
470</ul>
471
472<p> In both cases, the program being executed is not the program
473that was supposed to be executed, so anything could happen. </p>
474
475<p> There is a third possibility: </p>
476
477<ul>
478
479<li> <p> Bugs in system software (kernel or libraries). </p>
480
481</ul>
482
483<p> Hardware-related failures usually do not reproduce in exactly
484the same way after power cycling and rebooting the system.  There's
485little Postfix can do about bad hardware.  Be sure to use hardware
486that at the very least can detect memory errors.  Otherwise, Postfix
487will just be waiting to be hit by a bit error.  Critical systems
488deserve real hardware. </p>
489
490<p> When a compiler makes an error, the problem can be reproduced
491whenever the resulting program is run. Compiler errors are most
492likely to happen in the code optimizer. If a problem is reproducible
493across power cycles and system reboots, it can be worthwhile to
494rebuild Postfix with optimization disabled, and to see if optimization
495makes a difference. </p>
496
497<p> In order to compile Postfix with optimizations turned off: </p>
498
499<blockquote>
500<pre>
501% <b>make tidy</b>
502% <b>make makefiles OPT=</b>
503</pre>
504</blockquote>
505
506<p> This produces a set of Makefiles that do not request compiler
507optimization.  </p>
508
509<p> Once the makefiles are set up, build the software: </p>
510
511<blockquote>
512<pre>
513% <b>make</b>
514% <b>su</b>
515Password:
516# <b>make install</b>
517</pre>
518</blockquote>
519
520<p> If the problem goes away, then it is time to ask your vendor
521for help. </p>
522
523<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>
524
525<p> The people who participate on postfix-users@postfix.org
526are very helpful, especially if YOU provide them with sufficient
527information.  Remember, these volunteers are willing to help, but
528their time is limited. </p>
529
530<p> When reporting a problem, be sure to include the following
531information. </p>
532
533<ul>
534
535<li> <p> A summary of the problem. Please do not just send some
536logging without explanation of what YOU believe is wrong. </p>
537
538<li> <p> Complete error messages. Please use cut-and-paste, or use
539attachments, instead of reciting information from memory.
540</p>
541
542<li> <p> Postfix logging. See the text at the top of the <a href="DEBUG_README.html">DEBUG_README</a>
543document to find out where logging is stored. Please do not frustrate
544the helpers by word wrapping the logging. If the logging is more
545than a few kbytes of text, consider posting an URL on a web or ftp
546site. </p>
547
548<li> <p> Consider using a test email address so that you don't have
549to reveal email addresses or passwords of innocent people. </p>
550
551<li> <p> If you can't use a test email address, please anonymize
552email addresses and host names consistently. Replace each letter
553by "A", each digit
554by "D" so that the helpers can still recognize syntactical errors.
555</p>
556
557<li> <p> Output from "<b>postconf -n</b>". Please do not send your
558<a href="postconf.5.html">main.cf</a> file, or 500+ lines of <b>postconf</b> output. </p>
559
560<li> <p> Better, provide output from the <b>postfinger</b> tool.
561This can be found at <a href="http://ftp.wl0.org/SOURCES/postfinger">http://ftp.wl0.org/SOURCES/postfinger</a>.  </p>
562
563<li> <p> If the problem is SASL related, consider including the
564output from the <b>saslfinger</b> tool. This can be found at
565<a href="http://postfix.state-of-mind.de/patrick.koetter/saslfinger/">http://postfix.state-of-mind.de/patrick.koetter/saslfinger/</a>. </p>
566
567<li> <p> If the problem is about too much mail in the queue, consider
568including output from the <b>qshape</b> tool, as described in the
569<a href="QSHAPE_README.html">QSHAPE_README</a> file. </p>
570
571<li> <p> If the problem is protocol related (connections time out,
572or an SMTP server complains about syntax errors etc.) consider
573recording a session with <b>tcpdump</b>, as described in the <a
574href="#sniffer">DEBUG_README</a> document.  </ul>
575
576</body>
577
578</html>
579