xref: /netbsd-src/external/ibm-public/postfix/dist/proto/DEBUG_README.html (revision d16b7486a53dcb8072b60ec6fcb4373a2d0c27b7)
1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title> Postfix Debugging Howto </title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
11
12</head>
13
14<body>
15
16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>
17
18<hr>
19
20<h2>Purpose of this document</h2>
21
22<p> This document describes how to debug parts of the Postfix mail
23system when things do not work according to expectation. The methods
24vary from making Postfix log a lot of detail, to running some daemon
25processes under control of a call tracer or debugger. </p>
26
27<p> The text assumes that the Postfix main.cf and master.cf
28configuration files are stored in directory /etc/postfix. You can
29use the command "<b>postconf config_directory</b>" to find out the
30actual location of this directory on your machine. </p>
31
32<p> Listed in order of increasing invasiveness, the debugging
33techniques are as follows: </p>
34
35<ul>
36
37<li><a href="#logging">Look for obvious signs of trouble</a>
38
39<li><a href="#trace_mail">Debugging Postfix from inside</a>
40
41<li><a href="#no_chroot">Try turning off chroot operation in
42master.cf</a>
43
44<li><a href="#debug_peer">Verbose logging for specific SMTP
45connections</a>
46
47<li><a href="#sniffer">Record the SMTP session with a network
48sniffer</a>
49
50<li><a href="#verbose">Making Postfix daemon programs more verbose</a>
51
52<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>
53
54<li><a href="#auto_trace">Automatically tracing a Postfix daemon
55process</a>
56
57<li><a href="#ddd">Running daemon programs with the interactive
58ddd debugger</a>
59
60<li><a href="#screen">Running daemon programs with the interactive
61gdb debugger</a>
62
63<li><a href="#gdb">Running daemon programs under a non-interactive
64debugger</a>
65
66<li><a href="#unreasonable">Unreasonable behavior</a>
67
68<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>
69
70</ul>
71
72<h2><a name="logging">Look for obvious signs of trouble</a></h2>
73
74<p> Postfix logs all failed and successful deliveries to a logfile. </p>
75
76<ul>
77
78<li> <p> When Postfix uses syslog logging (the default), the file
79is usually called /var/log/maillog, /var/log/mail, or something
80similar; the exact pathname is configured in a file called
81/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p>
82
83<li> <p> When Postfix uses its own logging system (see MAILLOG_README),
84the location of the logfile is configured with the Postfix maillog_file
85parameter. </p>
86
87</ul>
88
89<p> When Postfix does not receive or deliver mail, the first order
90of business is to look for errors that prevent Postfix from working
91properly:  </p>
92
93<blockquote>
94<pre>
95% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b>
96</pre>
97</blockquote>
98
99<p> Note: the most important message is near the BEGINNING of the
100output.  Error messages that come later are less useful. </p>
101
102<p> The nature of each problem is indicated as follows: </p>
103
104<ul>
105
106<li> <p> "<b>panic</b>" indicates a problem in the software itself
107that only a programmer can fix. Postfix cannot proceed until this
108is fixed. </p>
109
110<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
111permissions, incorrect configuration file settings that you can
112fix.  Postfix cannot proceed until this is fixed. </p>
113
114<li> <p> "<b>error</b>" reports an error condition. For safety
115reasons, a Postfix process will terminate when more than 13 of these
116happen. </p>
117
118<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
119problems that you may not be able to fix (such as a broken DNS
120server elsewhere on the network) but may also indicate local
121configuration errors that could become a problem later. </p>
122
123</ul>
124
125<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>
126
127<p> Postfix version 2.1 and later can
128produce mail delivery reports for debugging purposes. These reports
129not only show sender/recipient addresses after address rewriting
130and alias expansion or forwarding, they also show information about
131delivery to mailbox, delivery to non-Postfix command, responses
132from remote SMTP servers, and so on.
133</p>
134
135<p> Postfix can produce two types of mail delivery reports for
136debugging: </p>
137
138<ul>
139
140<li> <p> What-if: report what would happen, but do not actually
141deliver mail. This mode of operation is requested with: </p>
142
143<pre>
144% <b>/usr/sbin/sendmail -bv address...</b>
145Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
146</pre>
147
148<li> <p> What happened: deliver mail and report successes and/or
149failures, including replies from remote SMTP servers.  This mode
150of operation is requested with: </p>
151
152<pre>
153% <b>/usr/sbin/sendmail -v address...</b>
154Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
155</pre>
156
157</ul>
158
159<p> These reports contain information that is generated by Postfix
160delivery agents. Since these run as daemon processes that cannot
161interact with users directly, the result is sent as mail to the
162sender of the test message. The format of these reports is practically
163identical to that of ordinary non-delivery notifications. </p>
164
165<p> For a detailed example of a mail delivery status report, see
166the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
167section at the end of the ADDRESS_REWRITING_README document.  </p>
168
169<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>
170
171<p> A common mistake is to turn on chroot operation in the master.cf
172file without going through all the necessary steps to set up a
173chroot environment. This causes Postfix daemon processes to fail
174due to all kinds of missing files. </p>
175
176<p> The example below shows an SMTP server that is configured with
177chroot turned off: </p>
178
179<blockquote>
180<pre>
181/etc/postfix/master.cf:
182    # =============================================================
183    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
184    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
185    # =============================================================
186    smtp      inet  n       -       <b>n</b>       -       -       smtpd
187</pre>
188</blockquote>
189
190<p> Inspect master.cf for any processes that have chroot operation
191not turned off. If you find any, save a copy of the master.cf file,
192and edit the entries in question.  After executing the command
193"<b>postfix reload</b>", see if the problem has gone away. </p>
194
195<p> If turning off chrooted operation made the problem go away,
196then congratulations.  Leaving Postfix running in this way is
197adequate for most sites.  If you prefer chrooted operation, see
198the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
199BASIC_CONFIGURATION_README</a> file for information about how to
200prepare Postfix for chrooted operation. </p>
201
202<h2><a name="debug_peer">Verbose logging for specific SMTP
203connections</a></h2>
204
205<p> In /etc/postfix/main.cf, list the remote site name or address
206in the debug_peer_list parameter. For example, in order to make
207the software log a lot of information to the syslog daemon for
208connections from or to the loopback interface: </p>
209
210<blockquote>
211<pre>
212/etc/postfix/main.cf:
213    debug_peer_list = 127.0.0.1
214</pre>
215</blockquote>
216
217<p> You can specify one or more hosts, domains, addresses or
218net/masks.  To make the change effective immediately, execute the
219command "<b>postfix reload</b>". </p>
220
221<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>
222
223<p> This example uses <b>tcpdump</b>. In order to record a conversation
224you need to specify a large enough buffer with the "<b>-s</b>"
225option or else you will miss some or all of the packet payload.
226</p>
227
228<blockquote>
229<pre>
230# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
231</pre>
232</blockquote>
233
234<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
235use "<b>-s 2000</b>" instead.  </p>
236
237<p> Run this for a while, stop with Ctrl-C when done. To view the
238data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
239</p>
240
241<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>
242
243<p> Append one or more "<b>-v</b>" options to selected daemon
244definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>".
245This will cause a lot of activity to be logged to the syslog daemon.
246For example, to make the Postfix SMTP server process more verbose: </p>
247
248<blockquote>
249<pre>
250/etc/postfix/master.cf:
251    smtp      inet  n       -       n       -       -       smtpd -v
252</pre>
253</blockquote>
254
255<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
256option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
257diagnose problems with mail delivery specify a "<b>-v</b>"
258option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
259local(8), pipe(8), smtp(8), or virtual(8) delivery agent.  </p>
260
261<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>
262
263<p> Many systems allow you to inspect a running process with a
264system call tracer. For example: </p>
265
266<blockquote>
267<pre>
268# <b>trace -p process-id</b> (SunOS 4)
269# <b>strace -p process-id</b> (Linux and many others)
270# <b>truss -p process-id</b> (Solaris, FreeBSD)
271# <b>ktrace -p process-id</b> (generic 4.4BSD)
272</pre>
273</blockquote>
274
275<p> Even more informative are traces of system library calls.
276Examples: </p>
277
278<blockquote>
279<pre>
280# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
281# <b>sotruss -p process-id</b> (Solaris)
282</pre>
283</blockquote>
284
285<p> See your system documentation for details. </p>
286
287<p> Tracing a running process can give valuable information about
288what a process is attempting to do. This is as much information as
289you can get without running an interactive debugger program, as
290described in a later section. </p>
291
292<h2><a name="auto_trace">Automatically tracing a Postfix daemon
293process</a></h2>
294
295<p> Postfix can attach a call tracer whenever a daemon process
296starts.  Call tracers come in several kinds. </p>
297
298<ol>
299
300<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
301<b>strace</b>, or <b>ktrace</b>.  These show the communication
302between the process and the kernel. </p>
303
304<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
305These show calls of library routines, and give a better idea of
306what is going on within the process. </p>
307
308</ol>
309
310<p> Append a <b>-D</b> option to the suspect command in
311/etc/postfix/master.cf, for example: </p>
312
313<blockquote>
314<pre>
315/etc/postfix/master.cf:
316    smtp      inet  n       -       n       -       -       smtpd -D
317</pre>
318</blockquote>
319
320<p> Edit the debugger_command definition in /etc/postfix/main.cf
321so that it invokes the call tracer of your choice, for example:
322</p>
323
324<blockquote>
325<pre>
326/etc/postfix/main.cf:
327    debugger_command =
328         PATH=/bin:/usr/bin:/usr/local/bin;
329         (truss -p $process_id 2&gt;&amp;1 | logger -p mail.info) &amp; sleep 5
330</pre>
331</blockquote>
332
333<p> Type "<b>postfix reload</b>" and watch the logfile. </p>
334
335<h2><a name="ddd">Running daemon programs with the interactive
336ddd debugger</a></h2>
337
338<p> If you have X Windows installed on the Postfix machine, then
339an interactive debugger such as <b>ddd</b> can be convenient.
340</p>
341
342<p> Edit the debugger_command definition in /etc/postfix/main.cf
343so that it invokes <b>ddd</b>: </p>
344
345<blockquote>
346<pre>
347/etc/postfix/main.cf:
348    debugger_command =
349         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
350         ddd $daemon_directory/$process_name $process_id &amp; sleep 5
351</pre>
352</blockquote>
353
354<p> Be sure that <b>gdb</b> is in the command search path, and
355export <b>XAUTHORITY</b> so that X access control works, for example:
356</p>
357
358<blockquote>
359<pre>
360% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
361$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
362</pre>
363</blockquote>
364
365<p> Append a <b>-D</b> option to the suspect daemon definition in
366/etc/postfix/master.cf, for example: </p>
367
368<blockquote>
369<pre>
370/etc/postfix/master.cf:
371    smtp      inet  n       -       n       -       -       smtpd -D
372</pre>
373</blockquote>
374
375<p> Stop and start the Postfix system.  This is necessary so that
376Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
377settings. </p>
378
379<p> Whenever the suspect daemon process is started, a debugger
380window pops up and you can watch in detail what happens. </p>
381
382<h2><a name="screen">Running daemon programs with the interactive
383gdb debugger</a></h2>
384
385<p> If you have the screen command installed on the Postfix machine, then
386you can run an interactive debugger such as <b>gdb</b> as follows. </p>
387
388<p> Edit the debugger_command definition in /etc/postfix/main.cf
389so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
390</p>
391
392<blockquote>
393<pre>
394/etc/postfix/main.cf:
395    debugger_command =
396        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
397        export HOME; screen -e^tt -dmS $process_name gdb
398        $daemon_directory/$process_name $process_id &amp; sleep 2
399</pre>
400</blockquote>
401
402<p> Be sure that <b>gdb</b> is in the command search path. </p>
403
404<p> Append a <b>-D</b> option to the suspect daemon definition in
405/etc/postfix/master.cf, for example: </p>
406
407<blockquote>
408<pre>
409/etc/postfix/master.cf:
410    smtp      inet  n       -       n       -       -       smtpd -D
411</pre>
412</blockquote>
413
414<p> Execute the command "<b>postfix reload</b>" and wait until a
415daemon process is started (you can see this in the maillog file).
416</p>
417
418<p> Then attach to the screen, and debug away: </p>
419
420<blockquote>
421<pre>
422# HOME=/root screen -r
423gdb) continue
424gdb) where
425</pre>
426</blockquote>
427
428<h2><a name="gdb">Running daemon programs under a non-interactive
429debugger</a></h2>
430
431<p> If you do not have X Windows installed on the Postfix machine,
432or if you are not familiar with interactive debuggers, then you
433can try to run <b>gdb</b> in non-interactive mode, and have it
434print a stack trace when the process crashes.  </p>
435
436<p> Edit the debugger_command definition in /etc/postfix/main.cf
437so that it invokes the <b>gdb</b> debugger: </p>
438
439<blockquote>
440<pre>
441/etc/postfix/main.cf:
442    debugger_command =
443        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
444        where; sleep 8640000) | gdb $daemon_directory/$process_name
445        $process_id 2&gt&amp;1
446        &gt;$config_directory/$process_name.$process_id.log &amp; sleep 5
447</pre>
448</blockquote>
449
450<p> Append a <b>-D</b> option to the suspect daemon in
451/etc/postfix/master.cf, for example: </p>
452
453<blockquote>
454<pre>
455/etc/postfix/master.cf:
456    smtp      inet  n       -       n       -       -       smtpd -D
457</pre>
458</blockquote>
459
460<p> Type "<b>postfix reload</b>" to make the configuration changes
461effective.  </p>
462
463<p> Whenever a suspect daemon process is started, an output file
464is created, named after the daemon and process ID (for example,
465smtpd.12345.log). When the process crashes, a stack trace (with
466output from the "<b>where</b>" command) is written to its logfile.
467</p>
468
469<h2><a name="unreasonable">Unreasonable behavior</a></h2>
470
471<p> Sometimes the behavior exhibited by Postfix just does not match the
472source code. Why can a program deviate from the instructions given
473by its author? There are two possibilities. </p>
474
475<ul>
476
477<li> <p> The compiler has erred. This rarely happens. </p>
478
479<li> <p> The hardware has erred. Does the machine have ECC memory? </p>
480
481</ul>
482
483<p> In both cases, the program being executed is not the program
484that was supposed to be executed, so anything could happen. </p>
485
486<p> There is a third possibility: </p>
487
488<ul>
489
490<li> <p> Bugs in system software (kernel or libraries). </p>
491
492</ul>
493
494<p> Hardware-related failures usually do not reproduce in exactly
495the same way after power cycling and rebooting the system.  There's
496little Postfix can do about bad hardware.  Be sure to use hardware
497that at the very least can detect memory errors.  Otherwise, Postfix
498will just be waiting to be hit by a bit error.  Critical systems
499deserve real hardware. </p>
500
501<p> When a compiler makes an error, the problem can be reproduced
502whenever the resulting program is run. Compiler errors are most
503likely to happen in the code optimizer. If a problem is reproducible
504across power cycles and system reboots, it can be worthwhile to
505rebuild Postfix with optimization disabled, and to see if optimization
506makes a difference. </p>
507
508<p> In order to compile Postfix with optimizations turned off: </p>
509
510<blockquote>
511<pre>
512% <b>make tidy</b>
513% <b>make makefiles OPT=</b>
514</pre>
515</blockquote>
516
517<p> This produces a set of Makefiles that do not request compiler
518optimization.  </p>
519
520<p> Once the makefiles are set up, build the software: </p>
521
522<blockquote>
523<pre>
524% <b>make</b>
525% <b>su</b>
526Password:
527# <b>make install</b>
528</pre>
529</blockquote>
530
531<p> If the problem goes away, then it is time to ask your vendor
532for help. </p>
533
534<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>
535
536<p> The people who participate on postfix-users@postfix.org
537are very helpful, especially if YOU provide them with sufficient
538information.  Remember, these volunteers are willing to help, but
539their time is limited. </p>
540
541<p> When reporting a problem, be sure to include the following
542information. </p>
543
544<ul>
545
546<li> <p> A summary of the problem. Please do not just send some
547logging without explanation of what YOU believe is wrong. </p>
548
549<li> <p> Complete error messages. Please use cut-and-paste, or use
550attachments, instead of reciting information from memory.
551</p>
552
553<li> <p> Postfix logging. See the text at the top of the DEBUG_README
554document to find out where logging is stored. Please do not frustrate
555the helpers by word wrapping the logging. If the logging is more
556than a few kbytes of text, consider posting an URL on a web or ftp
557site. </p>
558
559<li> <p> Consider using a test email address so that you don't have
560to reveal email addresses or passwords of innocent people. </p>
561
562<li> <p> If you can't use a test email address, please anonymize
563email addresses and host names consistently. Replace each letter
564by "A", each digit
565by "D" so that the helpers can still recognize syntactical errors.
566</p>
567
568<li> <p> Command output from:</p>
569
570<ul>
571
572<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file,
573or 1000+ lines of <b>postconf</b> command output. </p>
574
575<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p>
576
577</ul>
578
579<li> <p> Better, provide output from the <b>postfinger</b> tool.
580This can be found at https://github.com/ford--prefect/postfinger.  </p>
581
582<li> <p> If the problem is SASL related, consider including the
583output from the <b>saslfinger</b> tool. This can be found at
584https://packages.debian.org/search?keywords=sasl2-bin. </p>
585
586<li> <p> If the problem is about too much mail in the queue, consider
587including output from the <b>qshape</b> tool, as described in the
588QSHAPE_README file. </p>
589
590<li> <p> If the problem is protocol related (connections time out,
591or an SMTP server complains about syntax errors etc.) consider
592recording a session with <b>tcpdump</b>, as described in the <a
593href="#sniffer">DEBUG_README</a> document.  </ul>
594
595</body>
596
597</html>
598