xref: /minix3/minix/usr.bin/trace/NOTES (revision 8b18d03deb24ce40341e3b0db2b5b9206825d061)
1521fa314SDavid van MoolenbroekDeveloper notes regarding trace(1), by David van Moolenbroek.
2521fa314SDavid van Moolenbroek
3521fa314SDavid van Moolenbroek
4521fa314SDavid van MoolenbroekOVERALL CODE STRUCTURE
5521fa314SDavid van Moolenbroek
6521fa314SDavid van MoolenbroekThe general tracing engine is in trace.c.  It passes IPC-level system call
7521fa314SDavid van Moolenbroekenter and leave events off to call.c, which handles IPC-level system call
8521fa314SDavid van Moolenbroekprinting and passes off system calls to be interpreted by a service-specific
9521fa314SDavid van Moolenbroeksystem call handler whenever possible.  All the service-specific code is in the
10521fa314SDavid van Moolenbroekservice/ subdirectory, grouped by destination service.  IOCTLs are a special
11521fa314SDavid van Moolenbroekcase, which are handled in ioctl.c and passed on to driver-type-grouped IOCTL
12521fa314SDavid van Moolenbroekhandlers in the ioctl/ subdirectory (this grouping is not strict).  Some of the
13521fa314SDavid van Moolenbroekgenerated output goes through the formatting code in format.c, and all of it
14521fa314SDavid van Moolenbroekends up in output.c.  The remaining source files contain support code.
15521fa314SDavid van Moolenbroek
16521fa314SDavid van Moolenbroek
17521fa314SDavid van MoolenbroekADDING A SYSTEM CALL HANDLER
18521fa314SDavid van Moolenbroek
19521fa314SDavid van MoolenbroekIn principle, every system call stops the traced process twice: once when the
20521fa314SDavid van Moolenbroeksystem call is started (the call-enter event) and once when the system call
21521fa314SDavid van Moolenbroekreturns (the call-leave event).  The tracer uses the call-enter event to print
22521fa314SDavid van Moolenbroekthe request being made, and the call-leave event to print the result of the
23521fa314SDavid van Moolenbroekcall.  The output format is supposed to mimic largely what the system call
24521fa314SDavid van Moolenbroeklooks like from a C program, although with additional information where that
25521fa314SDavid van Moolenbroekmakes sense.  The general output format for system calls is:
26521fa314SDavid van Moolenbroek
27521fa314SDavid van Moolenbroek  name(parameters) = result
28521fa314SDavid van Moolenbroek
29521fa314SDavid van Moolenbroek..where "name" is the name of the system call, "parameters" is a list of system
30521fa314SDavid van Moolenbroekcall parameters, and "result" is the result of the system call.  If possible,
31521fa314SDavid van Moolenbroekthe part up to and including the equals sign is printed from the call-enter
32521fa314SDavid van Moolenbroekevent, and the result is printed from the call-leave event.  However, many
33521fa314SDavid van Moolenbroeksystem calls actually pass a pointer to a block of memory that is filled with
34521fa314SDavid van Moolenbroekmeaningful content as part of the system call.  For that reason, it is also
35521fa314SDavid van Moolenbroekpossible that the call-enter event stops printing somewhere inside the
36521fa314SDavid van Moolenbroekparameters block, and the call-leave event prints the rest of the parameters,
37521fa314SDavid van Moolenbroekas well as the equals sign and the result after it.  The place in the printed
38521fa314SDavid van Moolenbroeksystem call where the call-enter printer stops and the call-leave printer is
39521fa314SDavid van Moolenbroeksupposed to pick up again, is referred to as the "call split".
40521fa314SDavid van Moolenbroek
41521fa314SDavid van MoolenbroekThe tracer has to a handler structure for every system call that can be made by
42521fa314SDavid van Moolenbroeka user program to any of the the MINIX3 services.  This handler structure
43521fa314SDavid van Moolenbroekprovides three elements: the name of the system call, an "out" function that
44521fa314SDavid van Moolenbroekhandles printing of the call-enter part of the system call, and an "in"
45521fa314SDavid van Moolenbroekfunction that handles printing of the call-leave part of the system call.  The
46521fa314SDavid van Moolenbroek"out" function is expected to print zero or more call parameters, and then
47521fa314SDavid van Moolenbroekreturn a call type, which indicates whether all parameters have been printed
48521fa314SDavid van Moolenbroekyet, or not.  In fact, there are three call types, shown here with an example
49521fa314SDavid van Moolenbroekwhich has a "|" pipe symbol added to indicate the call split:
50521fa314SDavid van Moolenbroek
51521fa314SDavid van Moolenbroek  CT_DONE:       write(5, "foo", 3) = |3
52521fa314SDavid van Moolenbroek  CT_NOTDONE:    read(5, |"foo", 1024) = 3
53521fa314SDavid van Moolenbroek  CT_NORETURN:   execve("foo", ["foo"], []")| = -1 [ENOENT]
54521fa314SDavid van Moolenbroek
55521fa314SDavid van MoolenbroekThe CT_DONE call type indicates that the handler is done printing all the
56521fa314SDavid van Moolenbroekparameters during the call-enter event, and the call split will be after the
57521fa314SDavid van Moolenbroekequals sign.  The CT_NOTDONE call type indicates that the handler is not done
58521fa314SDavid van Moolenbroekprinting all parameters yet, thus yielding a call split in the middle of the
59521fa314SDavid van Moolenbroekparameters block (or even right after the opening parenthesis).  The no-return
60521fa314SDavid van Moolenbroek(CT_NORETURN) call type is used for a small number of functions that do not
61521fa314SDavid van Moolenbroekreturn on success.  Currently, these are the exit(), execve(), and sigreturn()
62521fa314SDavid van Moolenbroeksystem calls.  For these calls, no result will be printed at all, unless such
63521fa314SDavid van Moolenbroeka call fails, in which case a failure result is printed after all.  The call
64521fa314SDavid van Moolenbroeksplit is such that the entire parameters block is printed upon entering the
65521fa314SDavid van Moolenbroekcall, but the equals sign and result are printed only if the call does return.
66521fa314SDavid van Moolenbroek
67521fa314SDavid van MoolenbroekNow more about the handler structure for the system call.  First of all, each
68521fa314SDavid van Moolenbroeksystem call has a name, which must be a static string.  It may be supplied
69521fa314SDavid van Moolenbroekeither as a string, or as a function that returns a name string.  The latter is
70521fa314SDavid van Moolenbroekfor cases where one message-level system call is used to implement multiple
71521fa314SDavid van MoolenbroekC-level system calls (such as setitimer() and getitimer() both going through
72521fa314SDavid van MoolenbroekPM_ITIMER).  The name function has the following prototype:
73521fa314SDavid van Moolenbroek
74521fa314SDavid van Moolenbroek  const char *svc_syscall_name(const message *m_out);
75521fa314SDavid van Moolenbroek
76521fa314SDavid van Moolenbroek..where "m_out" is a local copy of the request message, which the name function
77521fa314SDavid van Moolenbroekcan use to decide what string to return for the system call.  As a sidenote,
78521fa314SDavid van Moolenbroekin the future, the system call name will be used to implement call filtering.
79521fa314SDavid van Moolenbroek
80521fa314SDavid van MoolenbroekAn "out" printer function has the following prototype:
81521fa314SDavid van Moolenbroek
82521fa314SDavid van Moolenbroek  int svc_syscall_out(struct trace_proc *proc, const message *m_out);
83521fa314SDavid van Moolenbroek
84521fa314SDavid van MoolenbroekHere, "proc" is a pointer to the process structure containing information about
85521fa314SDavid van Moolenbroekthe process making the system call; proc->pid returns the process PID, but the
86521fa314SDavid van Moolenbroekfunction should not access any other fields of this structure directly.
87521fa314SDavid van MoolenbroekInstead, many of the output primitive and helper functions (which are all
88521fa314SDavid van Moolenbroekprefixed with "put_") take this pointer as part of the call.  "m_out" is a
89521fa314SDavid van Moolenbroeklocal copy of the request message, and the printer may access its fields as it
90521fa314SDavid van Moolenbroeksees fit.
91521fa314SDavid van Moolenbroek
92521fa314SDavid van MoolenbroekThe printer function should simply print parameters.  The call name and the
93521fa314SDavid van Moolenbroekopening parenthesis are printed by the main output routine.
94521fa314SDavid van Moolenbroek
95521fa314SDavid van MoolenbroekAll simple call parameters should be printed using the put_field() and
96521fa314SDavid van Moolenbroekput_value() functions.  The former prints a parameter or field name as flat
97521fa314SDavid van Moolenbroektext; the latter is a printf-like interface to the former.  By default, call
98521fa314SDavid van Moolenbroekparamaters are simply printed as "value", but if printing all names is enabled,
99521fa314SDavid van Moolenbroekcall parameters are printed as "name=value".  Thus, all parameters should be
100521fa314SDavid van Moolenbroekgiven a name, even if this name does not show up by default.  Either way, these
101521fa314SDavid van Moolenbroektwo functions take care of deciding whether to print the name, as well as of
102521fa314SDavid van Moolenbroekprinting separators between the parameters.  More about printing more complex
103521fa314SDavid van Moolenbroekparameters (such as structures) in a bit.
104521fa314SDavid van Moolenbroek
105521fa314SDavid van MoolenbroekThe out printer function must return one of the three CT_ call type values.  If
106521fa314SDavid van Moolenbroekit returns CT_DONE, the main output routine will immediately print the closing
107521fa314SDavid van Moolenbroekparenthesis and equals sign.  If it returns CF_NORETURN, a closing parenthesis
108521fa314SDavid van Moolenbroekwill be printed.  If it return CF_NOTDONE, only a parameter field separator
109521fa314SDavid van Moolenbroek(that is, a comma and a space) will be printed--after all, it can be assumed
110521fa314SDavid van Moolenbroekthat more parameters will be printed later.
111521fa314SDavid van Moolenbroek
112521fa314SDavid van MoolenbroekAn "in" printer function has the following prototype:
113521fa314SDavid van Moolenbroek
114521fa314SDavid van Moolenbroek  void svc_syscall_in(struct trace_proc *proc, const message *m_out,
115521fa314SDavid van Moolenbroek          const message *m_in, int failed);
116521fa314SDavid van Moolenbroek
117521fa314SDavid van MoolenbroekAgain, "proc" is the traced process of which its current system call has now
118521fa314SDavid van Moolenbroekreturned.  "m_out" is again the request message, guaranteed to be unchanged
119521fa314SDavid van Moolenbroeksince the "out" call.  "m_in" is the reply message from the service.  "failed"
120521fa314SDavid van Moolenbroekis either 0 to indicate that the call appears to have succeeded, or PF_FAILED
121521fa314SDavid van Moolenbroekto indicate that the call definitely failed.  If PF_FAILED is set, the call
122521fa314SDavid van Moolenbroekhas failed either at the IPC level or at the system call level (or for another,
123521fa314SDavid van Moolenbroekless common reason).  In that case, the contents of "m_in" may be garbage and
124521fa314SDavid van Moolenbroek"m_in" must not be used at all.
125521fa314SDavid van Moolenbroek
126521fa314SDavid van MoolenbroekFor CF_NOTDONE type calls, the in printer function should first print the
127521fa314SDavid van Moolenbroekremaining parameters.  Here especially, it is important to consider that the
128521fa314SDavid van Moolenbroekentire call may fail.  In that case, the parameters of which the contents were
129521fa314SDavid van Moolenbroekstill going to be printed may also contain garbage, since they were never
130521fa314SDavid van Moolenbroekfilled.  The expected behavior is to print such parameters as pointer or "&.."
131521fa314SDavid van Moolenbroekor something else to indicate that their actual contents are not valid.
132521fa314SDavid van Moolenbroek
133521fa314SDavid van MoolenbroekEither way, once a CF_NOTDONE type call function is done printing the remaining
134521fa314SDavid van Moolenbroekparameters, it must call put_equals(proc) to print the closing parenthesis of
135521fa314SDavid van Moolenbroekthe call and the equals sign.  CF_NORETURN calls must also use put_equals(proc)
136521fa314SDavid van Moolenbroekto print the equals sign.
137521fa314SDavid van Moolenbroek
138521fa314SDavid van MoolenbroekThen comes the result part.  If the call failed, the in printer function *must*
139521fa314SDavid van Moolenbroekuse put_result(proc) to print the failure result.  This call not only takes
140521fa314SDavid van Moolenbroekcare of converting negative error codes from m_in->m_type into "-1 [ECODE]" but
141521fa314SDavid van Moolenbroekalso prints appropriate failure codes for IPC-level and other exceptional
142521fa314SDavid van Moolenbroekfailures.  Only if the system call did not fail, may the in printer function
143521fa314SDavid van Moolenbroekchoose to not call put_result(proc), which on success simply prints
144521fa314SDavid van Moolenbroekm_in->m_type as an integer.  Similarly, if the system call succeeded, the in
145521fa314SDavid van Moolenbroekprinter function may print extended results after the primary result, generally
146521fa314SDavid van Moolenbroekin parentheses.  For example, getpid() and getppid() share the same system call
147521fa314SDavid van Moolenbroekand thus the tracer prints both return values, one as the primary result of the
148521fa314SDavid van Moolenbroekactual call and one in parentheses with a clarifying name as extended result:
149521fa314SDavid van Moolenbroek
150521fa314SDavid van Moolenbroek  getpid() = 3 (ppid=1)
151521fa314SDavid van Moolenbroek
152521fa314SDavid van MoolenbroekIt should now be clear that printing extended results makes no sense if the
153521fa314SDavid van Moolenbroeksystem call failed.
154521fa314SDavid van Moolenbroek
155521fa314SDavid van MoolenbroekBesidse put_equals and put_result, the following more or less generic support
156521fa314SDavid van Moolenbroekfunctions are available to print the various parts of the requests and replies.
157521fa314SDavid van Moolenbroek
158521fa314SDavid van Moolenbroek  put_field - output a parameter, structure field, and so on; this function
159521fa314SDavid van Moolenbroek              should be used for just about every actual value
160521fa314SDavid van Moolenbroek  put_value - printf-like version of put_field
161521fa314SDavid van Moolenbroek  put_text  - output plain text; for call handlers, this should be used only to
162521fa314SDavid van Moolenbroek              to add things right after a put_field call, never on its own
163521fa314SDavid van Moolenbroek  put_fmt   - printf-like version of put_text, should generally not be used
164521fa314SDavid van Moolenbroek              from call handlers at all
165521fa314SDavid van Moolenbroek  put_open  - open a nested block of fields, surrounded by parentheses,
166521fa314SDavid van Moolenbroek              brackets, or something like that; this is used for structures,
167521fa314SDavid van Moolenbroek              arrays, and any other similar nontrivial case of nesting
168521fa314SDavid van Moolenbroek  put_close - close a previously opened block of fields; the nesting depth is
169521fa314SDavid van Moolenbroek              actually tracked (to keep per-level separators etc), so each
170521fa314SDavid van Moolenbroek              put_open call must have a corresponding put_close call
171521fa314SDavid van Moolenbroek  put_open_struct  - perform several tasks necessary to start printing the
172521fa314SDavid van Moolenbroek                     fields of a structure; note that this function may fail!
173521fa314SDavid van Moolenbroek  put_close_struct - end successful printing of a structure
174521fa314SDavid van Moolenbroek  put_ptr   - print a pointer in the traced process
175521fa314SDavid van Moolenbroek  put_buf   - print a buffer or string
176521fa314SDavid van Moolenbroek  put_flags - print a bitwise flags field
177521fa314SDavid van Moolenbroek  put_tail  - helper function for printing the continuation part of an array
178521fa314SDavid van Moolenbroek
179521fa314SDavid van MoolenbroekMany of these support functions take a flags field which takes PF_-prefixed
180521fa314SDavid van Moolenbroekflags to modify the output they generate.  The value of 'failed' in the in
181521fa314SDavid van Moolenbroekprinter function may actually be passed (bitwise-OR'ed in) as the PF_FAILED
182521fa314SDavid van Moolenbroekflag to these support functions, and they will do the right thing.  For
183521fa314SDavid van Moolenbroekexample, a call to put_open_struct with the PF_FAILED flag will end up simply
184521fa314SDavid van Moolenbroekprinting the pointer to the structure, and not allow printing of the contents
185521fa314SDavid van Moolenbroekof the structure.
186521fa314SDavid van Moolenbroek
187521fa314SDavid van MoolenbroekThe above support functions are documented (at a basic level) within the code,
188521fa314SDavid van Moolenbroekbut in many cases, it may be useful to look up how they are used in practice by
189521fa314SDavid van Moolenbroekthe existing handlers.  The same goes for various less clear cases; while there
190521fa314SDavid van Moolenbroekis basic support for printing structures, support for printing arrays must be
191521fa314SDavid van Moolenbroekcoded fully by hand, as has been done for many places.  A serious attempt has
192521fa314SDavid van Moolenbroekbeen made to make the output consistent across the board (mainly thanks to the
193521fa314SDavid van Moolenbroekoutput format of strace, on which the output of this tracer has been based,
194521fa314SDavid van Moolenbroeksometimes very strictly and sometimes more loosely, but that aside) so it is
195521fa314SDavid van Moolenbroekalways advisable to follow the ways of the existing handlers.  Also keep in
196521fa314SDavid van Moolenbroekmind that there are already printer functions for several generic structures,
197521fa314SDavid van Moolenbroekand these should be used whenever possible (e.g., see the put_fd() comment).
198521fa314SDavid van Moolenbroek
199521fa314SDavid van MoolenbroekFinally, the default_out and default_in functions may be used as printer
200521fa314SDavid van Moolenbroekfunctions for call with no parameters, and for functions which need no more
201521fa314SDavid van Moolenbroekthan put_result() to print their system call result, respectively.
202521fa314SDavid van Moolenbroek
203521fa314SDavid van Moolenbroek
204*8b18d03dSDavid van MoolenbroekADDING AN IOCTL HANDLER
205*8b18d03dSDavid van Moolenbroek
206*8b18d03dSDavid van MoolenbroekThere are many IOCTL requests, and many have their own associated data types.
207*8b18d03dSDavid van MoolenbroekLike with system calls, the idea is to provide an actual implementation for any
208*8b18d03dSDavid van MoolenbroekIOCTLs that can actually occur in the wild.  This consists of printing the full
209*8b18d03dSDavid van MoolenbroekIOCTL name, as well as its argument.  First something about how handling IOCTLs
210*8b18d03dSDavid van Moolenbroekis grouped into files in the ioctl subdirectory, then about the actual
211*8b18d03dSDavid van Moolenbroekprocedure the IOCTLs are handled.
212*8b18d03dSDavid van Moolenbroek
213*8b18d03dSDavid van MoolenbroekGrouping of IOCTL handling in the ioctl subdirectory is currently based on the
214*8b18d03dSDavid van MoolenbroekIOCTLs' associated device type.  This is not a performance optimization: for
215*8b18d03dSDavid van Moolenbroekany given IOCTL, there is no way for the main IOCTL code (in ioctl.c) to know
216*8b18d03dSDavid van Moolenbroekwhich group, if any, contains a handler for the IOCTL, so it simply queries all
217*8b18d03dSDavid van Moolenbroekgroups.  The grouping is there only to keep down the size of individual source
218*8b18d03dSDavid van Moolenbroekfiles, and as such not even strict: for example, networking IOCTLs are
219*8b18d03dSDavid van Moolenbroektechnically a subset of character IOCTLs, and kept separate only because there
220*8b18d03dSDavid van Moolenbroekare so many of them.  The point here is mainly that the separation is not at
221*8b18d03dSDavid van Moolenbroekall set in stone.  However, the svrctl group is an exception: svrctl(2)
222*8b18d03dSDavid van Moolenbroekrequests are very much like IOCTLs, and thus also treated as such, but they are
223*8b18d03dSDavid van Moolenbroekin a different namespace.  Thus, their handlers are in a separate file.
224*8b18d03dSDavid van Moolenbroek
225*8b18d03dSDavid van MoolenbroekAs per the ioctl_table structure, each group has a function to return the name
226*8b18d03dSDavid van Moolenbroekof an IOCTL it knows (typically <group>_ioctl_name), and a function to handle
227*8b18d03dSDavid van MoolenbroekIOCTL arguments (typically <group>_ioctl_arg).  Whenever an IOCTL system call
228*8b18d03dSDavid van Moolenbroekis made, each group's name function is queried.  This function has the
229*8b18d03dSDavid van Moolenbroekfollowing prototype:
230*8b18d03dSDavid van Moolenbroek
231*8b18d03dSDavid van Moolenbroek  const char *group_ioctl_name(unsigned long req);
232*8b18d03dSDavid van Moolenbroek
233*8b18d03dSDavid van MoolenbroekThe "req" parameter contains the IOCTL request code.  The function is to return
234*8b18d03dSDavid van Moolenbroeka static non-NULL string if it knows the name for the request code, or NULL
235*8b18d03dSDavid van Moolenbroekotherwise.  If the function returns a non-NULL string, that name will be used
236*8b18d03dSDavid van Moolenbroekfor the IOCTL.  In addition, if the IOCTL has an argument at all, i.e. it is
237*8b18d03dSDavid van Moolenbroeknot of the basic _IO() type, that group (and only that group!) will be queried
238*8b18d03dSDavid van Moolenbroekabout the IOCTL argument, by calling the group's IOCTL argument function.  The
239*8b18d03dSDavid van MoolenbroekIOCTL argument function has the following prototype:
240*8b18d03dSDavid van Moolenbroek
241*8b18d03dSDavid van Moolenbroek  int group_ioctl_arg(struct trace_proc *proc, unsigned long req, void *ptr,
242*8b18d03dSDavid van Moolenbroek          int dir);
243*8b18d03dSDavid van Moolenbroek
244*8b18d03dSDavid van MoolenbroekFor a single IOCTL, this function may be called up to three times.  The first
245*8b18d03dSDavid van Moolenbroektime, "ptr" will be NULL, and based on the same IOCTL request code "req", the
246*8b18d03dSDavid van Moolenbroekfunction must return any bitwise combination of two flags: IF_OUT and IF_IN.
247*8b18d03dSDavid van Moolenbroek
248*8b18d03dSDavid van MoolenbroekThe returned flags determine whether and how the IOCTL's argument will be
249*8b18d03dSDavid van Moolenbroekprinted: before and/or after performing the IOCTL system call.  These two flags
250*8b18d03dSDavid van Moolenbroekeffectively correspond to the "write" and "read" argument directions of IOCTLs:
251*8b18d03dSDavid van MoolenbroekIF_OUT indicates that the argument should be printed before the IOCTL request,
252*8b18d03dSDavid van Moolenbroekand this is to be used only for IOCTLs of type _IOW() and _IOWR().  IF_IN
253*8b18d03dSDavid van Moolenbroekindicates that the argument should be printed after the IOCTL request (but if
254*8b18d03dSDavid van Moolenbroekit was successful only), and is to be used only for IOCTLs of type _IOR() and
255*8b18d03dSDavid van Moolenbroek_IOWR().
256*8b18d03dSDavid van Moolenbroek
257*8b18d03dSDavid van MoolenbroekThe returned flag combination determines how the IOCTL is formatted.  The
258*8b18d03dSDavid van Moolenbroekfollowing possible return values result in the following output formats, again
259*8b18d03dSDavid van Moolenbroekwith the "|" indicating the call split, "out" being the IOCTL argument contents
260*8b18d03dSDavid van Moolenbroekprinted before the IOCTL call, and "in" being the IOCTL argument printed after
261*8b18d03dSDavid van Moolenbroekthe IOCTL call:
262*8b18d03dSDavid van Moolenbroek
263*8b18d03dSDavid van Moolenbroek  0:             ioctl(3, IOCFOO, &0xaddress) = |0
264*8b18d03dSDavid van Moolenbroek  IF_OUT:        ioctl(3, IOCFOO, {out}) = |0
265*8b18d03dSDavid van Moolenbroek  IF_OUT|IF_IN:  ioctl(3, IOCFOO, {out}) = |0 {in}
266*8b18d03dSDavid van Moolenbroek  IF_IN:         ioctl(3, IOCFOO, |{in}) = 0
267*8b18d03dSDavid van Moolenbroek
268*8b18d03dSDavid van MoolenbroekBoth IF_ flags are optional, mainly because it is not always needed to print
269*8b18d03dSDavid van Moolenbroekboth sides for an _IOWR() request.  However, using the wrong flag (e.g., IF_OUT
270*8b18d03dSDavid van Moolenbroekfor an _IOR() request, which simply makes no sense) will trigger an assert.
271*8b18d03dSDavid van MoolenbroekAlso, the function should basically never return 0 for an IOCTL it recognizes.
272*8b18d03dSDavid van MoolenbroekAgain, for IOCTLs of type _IO(), which have no argument, the argument function
273*8b18d03dSDavid van Moolenbroekis not called at all.
274*8b18d03dSDavid van Moolenbroek
275*8b18d03dSDavid van MoolenbroekNow the important part.  For each flag that is returned on the initial call to
276*8b18d03dSDavid van Moolenbroekthe argument function, the argument function will be called again, this time to
277*8b18d03dSDavid van Moolenbroekperform actual printing of the argument.  For these subsequent calls, "ptr"
278*8b18d03dSDavid van Moolenbroekwill point to the argument data which has been copied to the local address
279*8b18d03dSDavid van Moolenbroekspace, and "dir" will contain one of the returned flags (that is, either IF_OUT
280*8b18d03dSDavid van Moolenbroekor IF_IN) to indicate whether the function is called before or after the IOCTL
281*8b18d03dSDavid van Moolenbroekcall.  As should now be obvious, if the first call returned IF_OUT | IF_IN, the
282*8b18d03dSDavid van Moolenbroekfunction will be called again with "dir" set to IF_OUT, and if the IOCTL call
283*8b18d03dSDavid van Moolenbroekdid not fail, once more (for the third time), now with "dir" set to IF_IN.
284*8b18d03dSDavid van Moolenbroek
285*8b18d03dSDavid van MoolenbroekFor these calls with an actual "ptr" value and a direction, the function should
286*8b18d03dSDavid van Moolenbroekindeed print the argument as appropriate, using "proc" as process pointer for
287*8b18d03dSDavid van Moolenbroekuse in calls to the printing functions.  The general approach is to print non-
288*8b18d03dSDavid van Moolenbroekstructure arguments as single values with no field name, and structure
289*8b18d03dSDavid van Moolenbroekarguments by printing its fields with their field names.  The main code (in
290*8b18d03dSDavid van Moolenbroekioctl.c) ensures that the output is enclosed in curly brackets, thus making the
291*8b18d03dSDavid van Moolenbroekoutput look like a structure anyway.
292*8b18d03dSDavid van Moolenbroek
293*8b18d03dSDavid van MoolenbroekFor these subsequent calls, the argument function's return value should be
294*8b18d03dSDavid van MoolenbroekIF_ALL if all parts of the IOCTL argument have been printed, or 0 otherwise.
295*8b18d03dSDavid van MoolenbroekIn the latter case, the main code will add a final ".." field to indicate to
296*8b18d03dSDavid van Moolenbroekthe user that not all parts of the argument have been printed, very much like
297*8b18d03dSDavid van Moolenbroekthe "all" parameter of put_close_struct.
298*8b18d03dSDavid van Moolenbroek
299*8b18d03dSDavid van MoolenbroekIf no name can be found for the IOCTL request code, the argument will simply be
300*8b18d03dSDavid van Moolenbroekprinted as a pointer.  The same happens in error cases, for example if copying
301*8b18d03dSDavid van Moolenbroekin the IOCTL data resulted in an error.
302*8b18d03dSDavid van Moolenbroek
303*8b18d03dSDavid van MoolenbroekThere is no support for dealing with multiple IOCTLs with the exact same
304*8b18d03dSDavid van Moolenbroekrequest code--something that should not, but sadly does, occur in practice.
305*8b18d03dSDavid van MoolenbroekFor now, the preferred approach would be to implement only support for the
306*8b18d03dSDavid van MoolenbroekIOCTL that is most likely to be found in practice, and possibly to put a horse
307*8b18d03dSDavid van Moolenbroekhead in the bed of whoever introduced the duplicate request code.
308*8b18d03dSDavid van Moolenbroek
309*8b18d03dSDavid van Moolenbroek
310521fa314SDavid van MoolenbroekINTERNALS: MULTIPROCESS OUTPUT AND PREEMPTION
311521fa314SDavid van Moolenbroek
312521fa314SDavid van MoolenbroekThings get interesting when multiple processes are traced at once.  Due to the
313521fa314SDavid van Moolenbroeknature of process scheduling, system calls may end up being preempted between
314521fa314SDavid van Moolenbroekthe call-enter and call-leave phases.  This means that the output of a system
315521fa314SDavid van Moolenbroekcall has to be suspended to give way to an event from another traced process.
316521fa314SDavid van MoolenbroekSuch preemption may occur with literally all calls; not just "blocking" calls.
317521fa314SDavid van Moolenbroek
318521fa314SDavid van MoolenbroekThe tracer goes through some lengths to aid the user in following the output in
319521fa314SDavid van Moolenbroekthe light of preemtion.  The most important aspect is that the output of the
320521fa314SDavid van Moolenbroekcall-enter phase is recorded, so that in the case of preemption, the call-leave
321521fa314SDavid van Moolenbroekphase can start by replaying the record.  As a result, the user gets to see the
322521fa314SDavid van Moolenbroekwhole system call on a single line, instead of just the second half.  Such
323521fa314SDavid van Moolenbroeksystem call resumptions are marked with a "*" in their prefix, to show that
324521fa314SDavid van Moolenbroekthe call was not just entered.  The output therefore looks like this:
325521fa314SDavid van Moolenbroek
326521fa314SDavid van Moolenbroek      2| syscall() = <..>
327521fa314SDavid van Moolenbroek      3| othercall() = 0
328521fa314SDavid van Moolenbroek      2|*syscall() = 0
329521fa314SDavid van Moolenbroek
330521fa314SDavid van MoolenbroekSignals that arrive during a call will cause a resumption of the call as well.
331521fa314SDavid van MoolenbroekAs a result, a call may be resumed multiple times:
332521fa314SDavid van Moolenbroek
333521fa314SDavid van Moolenbroek      2| syscall() = <..>
334521fa314SDavid van Moolenbroek      3| othercall() = 0
335521fa314SDavid van Moolenbroek      2|*syscall() = ** SIGUSR1 ** ** SIGUSR2 ** <..>
336521fa314SDavid van Moolenbroek      3| othercall() = -1 [EBUSY]
337521fa314SDavid van Moolenbroek      2|*syscall() = ** SIGHUP ** <..>
338521fa314SDavid van Moolenbroek      3| othercall() = 0
339521fa314SDavid van Moolenbroek      2|*syscall() = 0
340521fa314SDavid van Moolenbroek
341521fa314SDavid van MoolenbroekThis entire scenario shows one single system call from process 2.
342521fa314SDavid van Moolenbroek
343521fa314SDavid van MoolenbroekIn the current implementation, the output that should be recorded and/or cause
344521fa314SDavid van Moolenbroekthe "<..>" preemption marker, as well as the cases where the recorded text must
345521fa314SDavid van Moolenbroekbe replayed, are marked by the code explicitly.  Replay takes place in three
346521fa314SDavid van Moolenbroekcases: upon the call-leave event (obviously), upon receiving a signal (as shown
347521fa314SDavid van Moolenbroekabove), and when it is required that a suspended no-return call is shown as
348521fa314SDavid van Moolenbroekcompleted before continuing with other output.  The last case applies to exit()
349521fa314SDavid van Moolenbroekand execve(), and both are documented in the code quite extensively.  Generally
350521fa314SDavid van Moolenbroekspeaking, in all output lines where no recording or replay actions are
351521fa314SDavid van Moolenbroekperformed, the recording will not be replayed but also not removed.  This
352521fa314SDavid van Moolenbroekallows for intermediate lines for that process in the output.  Practically
353521fa314SDavid van Moolenbroekspeaking, future support for job control could even print when a process get
354521fa314SDavid van Moolenbroekstopped and continued, for that process, while preempting the output for the
355521fa314SDavid van Moolenbroekongoing system call for that same process.
356521fa314SDavid van Moolenbroek
357521fa314SDavid van MoolenbroekIt is possible that the output of the call-enter phase exhausts the recording
358521fa314SDavid van Moolenbroekbuffer for its process.  In this case, a new, shorter text is generated upon
359521fa314SDavid van Moolenbroekprocess resumption.  There are many other aspects to proper output formatting
360521fa314SDavid van Moolenbroekin the light of preemption, but most of them should be documented as part of
361521fa314SDavid van Moolenbroekthe code reasonably well.
362