xref: /minix3/minix/usr.bin/trace/NOTES (revision 521fa314e2aaec3c192c15f2aaa4c677a544e62a)
1*521fa314SDavid van MoolenbroekDeveloper notes regarding trace(1), by David van Moolenbroek.
2*521fa314SDavid van Moolenbroek
3*521fa314SDavid van Moolenbroek
4*521fa314SDavid van MoolenbroekOVERALL CODE STRUCTURE
5*521fa314SDavid van Moolenbroek
6*521fa314SDavid van MoolenbroekThe general tracing engine is in trace.c.  It passes IPC-level system call
7*521fa314SDavid van Moolenbroekenter and leave events off to call.c, which handles IPC-level system call
8*521fa314SDavid van Moolenbroekprinting and passes off system calls to be interpreted by a service-specific
9*521fa314SDavid van Moolenbroeksystem call handler whenever possible.  All the service-specific code is in the
10*521fa314SDavid van Moolenbroekservice/ subdirectory, grouped by destination service.  IOCTLs are a special
11*521fa314SDavid van Moolenbroekcase, which are handled in ioctl.c and passed on to driver-type-grouped IOCTL
12*521fa314SDavid van Moolenbroekhandlers in the ioctl/ subdirectory (this grouping is not strict).  Some of the
13*521fa314SDavid van Moolenbroekgenerated output goes through the formatting code in format.c, and all of it
14*521fa314SDavid van Moolenbroekends up in output.c.  The remaining source files contain support code.
15*521fa314SDavid van Moolenbroek
16*521fa314SDavid van Moolenbroek
17*521fa314SDavid van MoolenbroekADDING A SYSTEM CALL HANDLER
18*521fa314SDavid van Moolenbroek
19*521fa314SDavid van MoolenbroekIn principle, every system call stops the traced process twice: once when the
20*521fa314SDavid van Moolenbroeksystem call is started (the call-enter event) and once when the system call
21*521fa314SDavid van Moolenbroekreturns (the call-leave event).  The tracer uses the call-enter event to print
22*521fa314SDavid van Moolenbroekthe request being made, and the call-leave event to print the result of the
23*521fa314SDavid van Moolenbroekcall.  The output format is supposed to mimic largely what the system call
24*521fa314SDavid van Moolenbroeklooks like from a C program, although with additional information where that
25*521fa314SDavid van Moolenbroekmakes sense.  The general output format for system calls is:
26*521fa314SDavid van Moolenbroek
27*521fa314SDavid van Moolenbroek  name(parameters) = result
28*521fa314SDavid van Moolenbroek
29*521fa314SDavid van Moolenbroek..where "name" is the name of the system call, "parameters" is a list of system
30*521fa314SDavid van Moolenbroekcall parameters, and "result" is the result of the system call.  If possible,
31*521fa314SDavid van Moolenbroekthe part up to and including the equals sign is printed from the call-enter
32*521fa314SDavid van Moolenbroekevent, and the result is printed from the call-leave event.  However, many
33*521fa314SDavid van Moolenbroeksystem calls actually pass a pointer to a block of memory that is filled with
34*521fa314SDavid van Moolenbroekmeaningful content as part of the system call.  For that reason, it is also
35*521fa314SDavid van Moolenbroekpossible that the call-enter event stops printing somewhere inside the
36*521fa314SDavid van Moolenbroekparameters block, and the call-leave event prints the rest of the parameters,
37*521fa314SDavid van Moolenbroekas well as the equals sign and the result after it.  The place in the printed
38*521fa314SDavid van Moolenbroeksystem call where the call-enter printer stops and the call-leave printer is
39*521fa314SDavid van Moolenbroeksupposed to pick up again, is referred to as the "call split".
40*521fa314SDavid van Moolenbroek
41*521fa314SDavid van MoolenbroekThe tracer has to a handler structure for every system call that can be made by
42*521fa314SDavid van Moolenbroeka user program to any of the the MINIX3 services.  This handler structure
43*521fa314SDavid van Moolenbroekprovides three elements: the name of the system call, an "out" function that
44*521fa314SDavid van Moolenbroekhandles printing of the call-enter part of the system call, and an "in"
45*521fa314SDavid van Moolenbroekfunction that handles printing of the call-leave part of the system call.  The
46*521fa314SDavid van Moolenbroek"out" function is expected to print zero or more call parameters, and then
47*521fa314SDavid van Moolenbroekreturn a call type, which indicates whether all parameters have been printed
48*521fa314SDavid van Moolenbroekyet, or not.  In fact, there are three call types, shown here with an example
49*521fa314SDavid van Moolenbroekwhich has a "|" pipe symbol added to indicate the call split:
50*521fa314SDavid van Moolenbroek
51*521fa314SDavid van Moolenbroek  CT_DONE:       write(5, "foo", 3) = |3
52*521fa314SDavid van Moolenbroek  CT_NOTDONE:    read(5, |"foo", 1024) = 3
53*521fa314SDavid van Moolenbroek  CT_NORETURN:   execve("foo", ["foo"], []")| = -1 [ENOENT]
54*521fa314SDavid van Moolenbroek
55*521fa314SDavid van MoolenbroekThe CT_DONE call type indicates that the handler is done printing all the
56*521fa314SDavid van Moolenbroekparameters during the call-enter event, and the call split will be after the
57*521fa314SDavid van Moolenbroekequals sign.  The CT_NOTDONE call type indicates that the handler is not done
58*521fa314SDavid van Moolenbroekprinting all parameters yet, thus yielding a call split in the middle of the
59*521fa314SDavid van Moolenbroekparameters block (or even right after the opening parenthesis).  The no-return
60*521fa314SDavid van Moolenbroek(CT_NORETURN) call type is used for a small number of functions that do not
61*521fa314SDavid van Moolenbroekreturn on success.  Currently, these are the exit(), execve(), and sigreturn()
62*521fa314SDavid van Moolenbroeksystem calls.  For these calls, no result will be printed at all, unless such
63*521fa314SDavid van Moolenbroeka call fails, in which case a failure result is printed after all.  The call
64*521fa314SDavid van Moolenbroeksplit is such that the entire parameters block is printed upon entering the
65*521fa314SDavid van Moolenbroekcall, but the equals sign and result are printed only if the call does return.
66*521fa314SDavid van Moolenbroek
67*521fa314SDavid van MoolenbroekNow more about the handler structure for the system call.  First of all, each
68*521fa314SDavid van Moolenbroeksystem call has a name, which must be a static string.  It may be supplied
69*521fa314SDavid van Moolenbroekeither as a string, or as a function that returns a name string.  The latter is
70*521fa314SDavid van Moolenbroekfor cases where one message-level system call is used to implement multiple
71*521fa314SDavid van MoolenbroekC-level system calls (such as setitimer() and getitimer() both going through
72*521fa314SDavid van MoolenbroekPM_ITIMER).  The name function has the following prototype:
73*521fa314SDavid van Moolenbroek
74*521fa314SDavid van Moolenbroek  const char *svc_syscall_name(const message *m_out);
75*521fa314SDavid van Moolenbroek
76*521fa314SDavid van Moolenbroek..where "m_out" is a local copy of the request message, which the name function
77*521fa314SDavid van Moolenbroekcan use to decide what string to return for the system call.  As a sidenote,
78*521fa314SDavid van Moolenbroekin the future, the system call name will be used to implement call filtering.
79*521fa314SDavid van Moolenbroek
80*521fa314SDavid van MoolenbroekAn "out" printer function has the following prototype:
81*521fa314SDavid van Moolenbroek
82*521fa314SDavid van Moolenbroek  int svc_syscall_out(struct trace_proc *proc, const message *m_out);
83*521fa314SDavid van Moolenbroek
84*521fa314SDavid van MoolenbroekHere, "proc" is a pointer to the process structure containing information about
85*521fa314SDavid van Moolenbroekthe process making the system call; proc->pid returns the process PID, but the
86*521fa314SDavid van Moolenbroekfunction should not access any other fields of this structure directly.
87*521fa314SDavid van MoolenbroekInstead, many of the output primitive and helper functions (which are all
88*521fa314SDavid van Moolenbroekprefixed with "put_") take this pointer as part of the call.  "m_out" is a
89*521fa314SDavid van Moolenbroeklocal copy of the request message, and the printer may access its fields as it
90*521fa314SDavid van Moolenbroeksees fit.
91*521fa314SDavid van Moolenbroek
92*521fa314SDavid van MoolenbroekThe printer function should simply print parameters.  The call name and the
93*521fa314SDavid van Moolenbroekopening parenthesis are printed by the main output routine.
94*521fa314SDavid van Moolenbroek
95*521fa314SDavid van MoolenbroekAll simple call parameters should be printed using the put_field() and
96*521fa314SDavid van Moolenbroekput_value() functions.  The former prints a parameter or field name as flat
97*521fa314SDavid van Moolenbroektext; the latter is a printf-like interface to the former.  By default, call
98*521fa314SDavid van Moolenbroekparamaters are simply printed as "value", but if printing all names is enabled,
99*521fa314SDavid van Moolenbroekcall parameters are printed as "name=value".  Thus, all parameters should be
100*521fa314SDavid van Moolenbroekgiven a name, even if this name does not show up by default.  Either way, these
101*521fa314SDavid van Moolenbroektwo functions take care of deciding whether to print the name, as well as of
102*521fa314SDavid van Moolenbroekprinting separators between the parameters.  More about printing more complex
103*521fa314SDavid van Moolenbroekparameters (such as structures) in a bit.
104*521fa314SDavid van Moolenbroek
105*521fa314SDavid van MoolenbroekThe out printer function must return one of the three CT_ call type values.  If
106*521fa314SDavid van Moolenbroekit returns CT_DONE, the main output routine will immediately print the closing
107*521fa314SDavid van Moolenbroekparenthesis and equals sign.  If it returns CF_NORETURN, a closing parenthesis
108*521fa314SDavid van Moolenbroekwill be printed.  If it return CF_NOTDONE, only a parameter field separator
109*521fa314SDavid van Moolenbroek(that is, a comma and a space) will be printed--after all, it can be assumed
110*521fa314SDavid van Moolenbroekthat more parameters will be printed later.
111*521fa314SDavid van Moolenbroek
112*521fa314SDavid van MoolenbroekAn "in" printer function has the following prototype:
113*521fa314SDavid van Moolenbroek
114*521fa314SDavid van Moolenbroek  void svc_syscall_in(struct trace_proc *proc, const message *m_out,
115*521fa314SDavid van Moolenbroek          const message *m_in, int failed);
116*521fa314SDavid van Moolenbroek
117*521fa314SDavid van MoolenbroekAgain, "proc" is the traced process of which its current system call has now
118*521fa314SDavid van Moolenbroekreturned.  "m_out" is again the request message, guaranteed to be unchanged
119*521fa314SDavid van Moolenbroeksince the "out" call.  "m_in" is the reply message from the service.  "failed"
120*521fa314SDavid van Moolenbroekis either 0 to indicate that the call appears to have succeeded, or PF_FAILED
121*521fa314SDavid van Moolenbroekto indicate that the call definitely failed.  If PF_FAILED is set, the call
122*521fa314SDavid van Moolenbroekhas failed either at the IPC level or at the system call level (or for another,
123*521fa314SDavid van Moolenbroekless common reason).  In that case, the contents of "m_in" may be garbage and
124*521fa314SDavid van Moolenbroek"m_in" must not be used at all.
125*521fa314SDavid van Moolenbroek
126*521fa314SDavid van MoolenbroekFor CF_NOTDONE type calls, the in printer function should first print the
127*521fa314SDavid van Moolenbroekremaining parameters.  Here especially, it is important to consider that the
128*521fa314SDavid van Moolenbroekentire call may fail.  In that case, the parameters of which the contents were
129*521fa314SDavid van Moolenbroekstill going to be printed may also contain garbage, since they were never
130*521fa314SDavid van Moolenbroekfilled.  The expected behavior is to print such parameters as pointer or "&.."
131*521fa314SDavid van Moolenbroekor something else to indicate that their actual contents are not valid.
132*521fa314SDavid van Moolenbroek
133*521fa314SDavid van MoolenbroekEither way, once a CF_NOTDONE type call function is done printing the remaining
134*521fa314SDavid van Moolenbroekparameters, it must call put_equals(proc) to print the closing parenthesis of
135*521fa314SDavid van Moolenbroekthe call and the equals sign.  CF_NORETURN calls must also use put_equals(proc)
136*521fa314SDavid van Moolenbroekto print the equals sign.
137*521fa314SDavid van Moolenbroek
138*521fa314SDavid van MoolenbroekThen comes the result part.  If the call failed, the in printer function *must*
139*521fa314SDavid van Moolenbroekuse put_result(proc) to print the failure result.  This call not only takes
140*521fa314SDavid van Moolenbroekcare of converting negative error codes from m_in->m_type into "-1 [ECODE]" but
141*521fa314SDavid van Moolenbroekalso prints appropriate failure codes for IPC-level and other exceptional
142*521fa314SDavid van Moolenbroekfailures.  Only if the system call did not fail, may the in printer function
143*521fa314SDavid van Moolenbroekchoose to not call put_result(proc), which on success simply prints
144*521fa314SDavid van Moolenbroekm_in->m_type as an integer.  Similarly, if the system call succeeded, the in
145*521fa314SDavid van Moolenbroekprinter function may print extended results after the primary result, generally
146*521fa314SDavid van Moolenbroekin parentheses.  For example, getpid() and getppid() share the same system call
147*521fa314SDavid van Moolenbroekand thus the tracer prints both return values, one as the primary result of the
148*521fa314SDavid van Moolenbroekactual call and one in parentheses with a clarifying name as extended result:
149*521fa314SDavid van Moolenbroek
150*521fa314SDavid van Moolenbroek  getpid() = 3 (ppid=1)
151*521fa314SDavid van Moolenbroek
152*521fa314SDavid van MoolenbroekIt should now be clear that printing extended results makes no sense if the
153*521fa314SDavid van Moolenbroeksystem call failed.
154*521fa314SDavid van Moolenbroek
155*521fa314SDavid van MoolenbroekBesidse put_equals and put_result, the following more or less generic support
156*521fa314SDavid van Moolenbroekfunctions are available to print the various parts of the requests and replies.
157*521fa314SDavid van Moolenbroek
158*521fa314SDavid van Moolenbroek  put_field - output a parameter, structure field, and so on; this function
159*521fa314SDavid van Moolenbroek              should be used for just about every actual value
160*521fa314SDavid van Moolenbroek  put_value - printf-like version of put_field
161*521fa314SDavid van Moolenbroek  put_text  - output plain text; for call handlers, this should be used only to
162*521fa314SDavid van Moolenbroek              to add things right after a put_field call, never on its own
163*521fa314SDavid van Moolenbroek  put_fmt   - printf-like version of put_text, should generally not be used
164*521fa314SDavid van Moolenbroek              from call handlers at all
165*521fa314SDavid van Moolenbroek  put_open  - open a nested block of fields, surrounded by parentheses,
166*521fa314SDavid van Moolenbroek              brackets, or something like that; this is used for structures,
167*521fa314SDavid van Moolenbroek              arrays, and any other similar nontrivial case of nesting
168*521fa314SDavid van Moolenbroek  put_close - close a previously opened block of fields; the nesting depth is
169*521fa314SDavid van Moolenbroek              actually tracked (to keep per-level separators etc), so each
170*521fa314SDavid van Moolenbroek              put_open call must have a corresponding put_close call
171*521fa314SDavid van Moolenbroek  put_open_struct  - perform several tasks necessary to start printing the
172*521fa314SDavid van Moolenbroek                     fields of a structure; note that this function may fail!
173*521fa314SDavid van Moolenbroek  put_close_struct - end successful printing of a structure
174*521fa314SDavid van Moolenbroek  put_ptr   - print a pointer in the traced process
175*521fa314SDavid van Moolenbroek  put_buf   - print a buffer or string
176*521fa314SDavid van Moolenbroek  put_flags - print a bitwise flags field
177*521fa314SDavid van Moolenbroek  put_tail  - helper function for printing the continuation part of an array
178*521fa314SDavid van Moolenbroek
179*521fa314SDavid van MoolenbroekMany of these support functions take a flags field which takes PF_-prefixed
180*521fa314SDavid van Moolenbroekflags to modify the output they generate.  The value of 'failed' in the in
181*521fa314SDavid van Moolenbroekprinter function may actually be passed (bitwise-OR'ed in) as the PF_FAILED
182*521fa314SDavid van Moolenbroekflag to these support functions, and they will do the right thing.  For
183*521fa314SDavid van Moolenbroekexample, a call to put_open_struct with the PF_FAILED flag will end up simply
184*521fa314SDavid van Moolenbroekprinting the pointer to the structure, and not allow printing of the contents
185*521fa314SDavid van Moolenbroekof the structure.
186*521fa314SDavid van Moolenbroek
187*521fa314SDavid van MoolenbroekThe above support functions are documented (at a basic level) within the code,
188*521fa314SDavid van Moolenbroekbut in many cases, it may be useful to look up how they are used in practice by
189*521fa314SDavid van Moolenbroekthe existing handlers.  The same goes for various less clear cases; while there
190*521fa314SDavid van Moolenbroekis basic support for printing structures, support for printing arrays must be
191*521fa314SDavid van Moolenbroekcoded fully by hand, as has been done for many places.  A serious attempt has
192*521fa314SDavid van Moolenbroekbeen made to make the output consistent across the board (mainly thanks to the
193*521fa314SDavid van Moolenbroekoutput format of strace, on which the output of this tracer has been based,
194*521fa314SDavid van Moolenbroeksometimes very strictly and sometimes more loosely, but that aside) so it is
195*521fa314SDavid van Moolenbroekalways advisable to follow the ways of the existing handlers.  Also keep in
196*521fa314SDavid van Moolenbroekmind that there are already printer functions for several generic structures,
197*521fa314SDavid van Moolenbroekand these should be used whenever possible (e.g., see the put_fd() comment).
198*521fa314SDavid van Moolenbroek
199*521fa314SDavid van MoolenbroekFinally, the default_out and default_in functions may be used as printer
200*521fa314SDavid van Moolenbroekfunctions for call with no parameters, and for functions which need no more
201*521fa314SDavid van Moolenbroekthan put_result() to print their system call result, respectively.
202*521fa314SDavid van Moolenbroek
203*521fa314SDavid van Moolenbroek
204*521fa314SDavid van MoolenbroekINTERNALS: MULTIPROCESS OUTPUT AND PREEMPTION
205*521fa314SDavid van Moolenbroek
206*521fa314SDavid van MoolenbroekThings get interesting when multiple processes are traced at once.  Due to the
207*521fa314SDavid van Moolenbroeknature of process scheduling, system calls may end up being preempted between
208*521fa314SDavid van Moolenbroekthe call-enter and call-leave phases.  This means that the output of a system
209*521fa314SDavid van Moolenbroekcall has to be suspended to give way to an event from another traced process.
210*521fa314SDavid van MoolenbroekSuch preemption may occur with literally all calls; not just "blocking" calls.
211*521fa314SDavid van Moolenbroek
212*521fa314SDavid van MoolenbroekThe tracer goes through some lengths to aid the user in following the output in
213*521fa314SDavid van Moolenbroekthe light of preemtion.  The most important aspect is that the output of the
214*521fa314SDavid van Moolenbroekcall-enter phase is recorded, so that in the case of preemption, the call-leave
215*521fa314SDavid van Moolenbroekphase can start by replaying the record.  As a result, the user gets to see the
216*521fa314SDavid van Moolenbroekwhole system call on a single line, instead of just the second half.  Such
217*521fa314SDavid van Moolenbroeksystem call resumptions are marked with a "*" in their prefix, to show that
218*521fa314SDavid van Moolenbroekthe call was not just entered.  The output therefore looks like this:
219*521fa314SDavid van Moolenbroek
220*521fa314SDavid van Moolenbroek      2| syscall() = <..>
221*521fa314SDavid van Moolenbroek      3| othercall() = 0
222*521fa314SDavid van Moolenbroek      2|*syscall() = 0
223*521fa314SDavid van Moolenbroek
224*521fa314SDavid van MoolenbroekSignals that arrive during a call will cause a resumption of the call as well.
225*521fa314SDavid van MoolenbroekAs a result, a call may be resumed multiple times:
226*521fa314SDavid van Moolenbroek
227*521fa314SDavid van Moolenbroek      2| syscall() = <..>
228*521fa314SDavid van Moolenbroek      3| othercall() = 0
229*521fa314SDavid van Moolenbroek      2|*syscall() = ** SIGUSR1 ** ** SIGUSR2 ** <..>
230*521fa314SDavid van Moolenbroek      3| othercall() = -1 [EBUSY]
231*521fa314SDavid van Moolenbroek      2|*syscall() = ** SIGHUP ** <..>
232*521fa314SDavid van Moolenbroek      3| othercall() = 0
233*521fa314SDavid van Moolenbroek      2|*syscall() = 0
234*521fa314SDavid van Moolenbroek
235*521fa314SDavid van MoolenbroekThis entire scenario shows one single system call from process 2.
236*521fa314SDavid van Moolenbroek
237*521fa314SDavid van MoolenbroekIn the current implementation, the output that should be recorded and/or cause
238*521fa314SDavid van Moolenbroekthe "<..>" preemption marker, as well as the cases where the recorded text must
239*521fa314SDavid van Moolenbroekbe replayed, are marked by the code explicitly.  Replay takes place in three
240*521fa314SDavid van Moolenbroekcases: upon the call-leave event (obviously), upon receiving a signal (as shown
241*521fa314SDavid van Moolenbroekabove), and when it is required that a suspended no-return call is shown as
242*521fa314SDavid van Moolenbroekcompleted before continuing with other output.  The last case applies to exit()
243*521fa314SDavid van Moolenbroekand execve(), and both are documented in the code quite extensively.  Generally
244*521fa314SDavid van Moolenbroekspeaking, in all output lines where no recording or replay actions are
245*521fa314SDavid van Moolenbroekperformed, the recording will not be replayed but also not removed.  This
246*521fa314SDavid van Moolenbroekallows for intermediate lines for that process in the output.  Practically
247*521fa314SDavid van Moolenbroekspeaking, future support for job control could even print when a process get
248*521fa314SDavid van Moolenbroekstopped and continued, for that process, while preempting the output for the
249*521fa314SDavid van Moolenbroekongoing system call for that same process.
250*521fa314SDavid van Moolenbroek
251*521fa314SDavid van MoolenbroekIt is possible that the output of the call-enter phase exhausts the recording
252*521fa314SDavid van Moolenbroekbuffer for its process.  In this case, a new, shorter text is generated upon
253*521fa314SDavid van Moolenbroekprocess resumption.  There are many other aspects to proper output formatting
254*521fa314SDavid van Moolenbroekin the light of preemption, but most of them should be documented as part of
255*521fa314SDavid van Moolenbroekthe code reasonably well.
256