1*521fa314SDavid van MoolenbroekDeveloper notes regarding trace(1), by David van Moolenbroek. 2*521fa314SDavid van Moolenbroek 3*521fa314SDavid van Moolenbroek 4*521fa314SDavid van MoolenbroekOVERALL CODE STRUCTURE 5*521fa314SDavid van Moolenbroek 6*521fa314SDavid van MoolenbroekThe general tracing engine is in trace.c. It passes IPC-level system call 7*521fa314SDavid van Moolenbroekenter and leave events off to call.c, which handles IPC-level system call 8*521fa314SDavid van Moolenbroekprinting and passes off system calls to be interpreted by a service-specific 9*521fa314SDavid van Moolenbroeksystem call handler whenever possible. All the service-specific code is in the 10*521fa314SDavid van Moolenbroekservice/ subdirectory, grouped by destination service. IOCTLs are a special 11*521fa314SDavid van Moolenbroekcase, which are handled in ioctl.c and passed on to driver-type-grouped IOCTL 12*521fa314SDavid van Moolenbroekhandlers in the ioctl/ subdirectory (this grouping is not strict). Some of the 13*521fa314SDavid van Moolenbroekgenerated output goes through the formatting code in format.c, and all of it 14*521fa314SDavid van Moolenbroekends up in output.c. The remaining source files contain support code. 15*521fa314SDavid van Moolenbroek 16*521fa314SDavid van Moolenbroek 17*521fa314SDavid van MoolenbroekADDING A SYSTEM CALL HANDLER 18*521fa314SDavid van Moolenbroek 19*521fa314SDavid van MoolenbroekIn principle, every system call stops the traced process twice: once when the 20*521fa314SDavid van Moolenbroeksystem call is started (the call-enter event) and once when the system call 21*521fa314SDavid van Moolenbroekreturns (the call-leave event). The tracer uses the call-enter event to print 22*521fa314SDavid van Moolenbroekthe request being made, and the call-leave event to print the result of the 23*521fa314SDavid van Moolenbroekcall. The output format is supposed to mimic largely what the system call 24*521fa314SDavid van Moolenbroeklooks like from a C program, although with additional information where that 25*521fa314SDavid van Moolenbroekmakes sense. The general output format for system calls is: 26*521fa314SDavid van Moolenbroek 27*521fa314SDavid van Moolenbroek name(parameters) = result 28*521fa314SDavid van Moolenbroek 29*521fa314SDavid van Moolenbroek..where "name" is the name of the system call, "parameters" is a list of system 30*521fa314SDavid van Moolenbroekcall parameters, and "result" is the result of the system call. If possible, 31*521fa314SDavid van Moolenbroekthe part up to and including the equals sign is printed from the call-enter 32*521fa314SDavid van Moolenbroekevent, and the result is printed from the call-leave event. However, many 33*521fa314SDavid van Moolenbroeksystem calls actually pass a pointer to a block of memory that is filled with 34*521fa314SDavid van Moolenbroekmeaningful content as part of the system call. For that reason, it is also 35*521fa314SDavid van Moolenbroekpossible that the call-enter event stops printing somewhere inside the 36*521fa314SDavid van Moolenbroekparameters block, and the call-leave event prints the rest of the parameters, 37*521fa314SDavid van Moolenbroekas well as the equals sign and the result after it. The place in the printed 38*521fa314SDavid van Moolenbroeksystem call where the call-enter printer stops and the call-leave printer is 39*521fa314SDavid van Moolenbroeksupposed to pick up again, is referred to as the "call split". 40*521fa314SDavid van Moolenbroek 41*521fa314SDavid van MoolenbroekThe tracer has to a handler structure for every system call that can be made by 42*521fa314SDavid van Moolenbroeka user program to any of the the MINIX3 services. This handler structure 43*521fa314SDavid van Moolenbroekprovides three elements: the name of the system call, an "out" function that 44*521fa314SDavid van Moolenbroekhandles printing of the call-enter part of the system call, and an "in" 45*521fa314SDavid van Moolenbroekfunction that handles printing of the call-leave part of the system call. The 46*521fa314SDavid van Moolenbroek"out" function is expected to print zero or more call parameters, and then 47*521fa314SDavid van Moolenbroekreturn a call type, which indicates whether all parameters have been printed 48*521fa314SDavid van Moolenbroekyet, or not. In fact, there are three call types, shown here with an example 49*521fa314SDavid van Moolenbroekwhich has a "|" pipe symbol added to indicate the call split: 50*521fa314SDavid van Moolenbroek 51*521fa314SDavid van Moolenbroek CT_DONE: write(5, "foo", 3) = |3 52*521fa314SDavid van Moolenbroek CT_NOTDONE: read(5, |"foo", 1024) = 3 53*521fa314SDavid van Moolenbroek CT_NORETURN: execve("foo", ["foo"], []")| = -1 [ENOENT] 54*521fa314SDavid van Moolenbroek 55*521fa314SDavid van MoolenbroekThe CT_DONE call type indicates that the handler is done printing all the 56*521fa314SDavid van Moolenbroekparameters during the call-enter event, and the call split will be after the 57*521fa314SDavid van Moolenbroekequals sign. The CT_NOTDONE call type indicates that the handler is not done 58*521fa314SDavid van Moolenbroekprinting all parameters yet, thus yielding a call split in the middle of the 59*521fa314SDavid van Moolenbroekparameters block (or even right after the opening parenthesis). The no-return 60*521fa314SDavid van Moolenbroek(CT_NORETURN) call type is used for a small number of functions that do not 61*521fa314SDavid van Moolenbroekreturn on success. Currently, these are the exit(), execve(), and sigreturn() 62*521fa314SDavid van Moolenbroeksystem calls. For these calls, no result will be printed at all, unless such 63*521fa314SDavid van Moolenbroeka call fails, in which case a failure result is printed after all. The call 64*521fa314SDavid van Moolenbroeksplit is such that the entire parameters block is printed upon entering the 65*521fa314SDavid van Moolenbroekcall, but the equals sign and result are printed only if the call does return. 66*521fa314SDavid van Moolenbroek 67*521fa314SDavid van MoolenbroekNow more about the handler structure for the system call. First of all, each 68*521fa314SDavid van Moolenbroeksystem call has a name, which must be a static string. It may be supplied 69*521fa314SDavid van Moolenbroekeither as a string, or as a function that returns a name string. The latter is 70*521fa314SDavid van Moolenbroekfor cases where one message-level system call is used to implement multiple 71*521fa314SDavid van MoolenbroekC-level system calls (such as setitimer() and getitimer() both going through 72*521fa314SDavid van MoolenbroekPM_ITIMER). The name function has the following prototype: 73*521fa314SDavid van Moolenbroek 74*521fa314SDavid van Moolenbroek const char *svc_syscall_name(const message *m_out); 75*521fa314SDavid van Moolenbroek 76*521fa314SDavid van Moolenbroek..where "m_out" is a local copy of the request message, which the name function 77*521fa314SDavid van Moolenbroekcan use to decide what string to return for the system call. As a sidenote, 78*521fa314SDavid van Moolenbroekin the future, the system call name will be used to implement call filtering. 79*521fa314SDavid van Moolenbroek 80*521fa314SDavid van MoolenbroekAn "out" printer function has the following prototype: 81*521fa314SDavid van Moolenbroek 82*521fa314SDavid van Moolenbroek int svc_syscall_out(struct trace_proc *proc, const message *m_out); 83*521fa314SDavid van Moolenbroek 84*521fa314SDavid van MoolenbroekHere, "proc" is a pointer to the process structure containing information about 85*521fa314SDavid van Moolenbroekthe process making the system call; proc->pid returns the process PID, but the 86*521fa314SDavid van Moolenbroekfunction should not access any other fields of this structure directly. 87*521fa314SDavid van MoolenbroekInstead, many of the output primitive and helper functions (which are all 88*521fa314SDavid van Moolenbroekprefixed with "put_") take this pointer as part of the call. "m_out" is a 89*521fa314SDavid van Moolenbroeklocal copy of the request message, and the printer may access its fields as it 90*521fa314SDavid van Moolenbroeksees fit. 91*521fa314SDavid van Moolenbroek 92*521fa314SDavid van MoolenbroekThe printer function should simply print parameters. The call name and the 93*521fa314SDavid van Moolenbroekopening parenthesis are printed by the main output routine. 94*521fa314SDavid van Moolenbroek 95*521fa314SDavid van MoolenbroekAll simple call parameters should be printed using the put_field() and 96*521fa314SDavid van Moolenbroekput_value() functions. The former prints a parameter or field name as flat 97*521fa314SDavid van Moolenbroektext; the latter is a printf-like interface to the former. By default, call 98*521fa314SDavid van Moolenbroekparamaters are simply printed as "value", but if printing all names is enabled, 99*521fa314SDavid van Moolenbroekcall parameters are printed as "name=value". Thus, all parameters should be 100*521fa314SDavid van Moolenbroekgiven a name, even if this name does not show up by default. Either way, these 101*521fa314SDavid van Moolenbroektwo functions take care of deciding whether to print the name, as well as of 102*521fa314SDavid van Moolenbroekprinting separators between the parameters. More about printing more complex 103*521fa314SDavid van Moolenbroekparameters (such as structures) in a bit. 104*521fa314SDavid van Moolenbroek 105*521fa314SDavid van MoolenbroekThe out printer function must return one of the three CT_ call type values. If 106*521fa314SDavid van Moolenbroekit returns CT_DONE, the main output routine will immediately print the closing 107*521fa314SDavid van Moolenbroekparenthesis and equals sign. If it returns CF_NORETURN, a closing parenthesis 108*521fa314SDavid van Moolenbroekwill be printed. If it return CF_NOTDONE, only a parameter field separator 109*521fa314SDavid van Moolenbroek(that is, a comma and a space) will be printed--after all, it can be assumed 110*521fa314SDavid van Moolenbroekthat more parameters will be printed later. 111*521fa314SDavid van Moolenbroek 112*521fa314SDavid van MoolenbroekAn "in" printer function has the following prototype: 113*521fa314SDavid van Moolenbroek 114*521fa314SDavid van Moolenbroek void svc_syscall_in(struct trace_proc *proc, const message *m_out, 115*521fa314SDavid van Moolenbroek const message *m_in, int failed); 116*521fa314SDavid van Moolenbroek 117*521fa314SDavid van MoolenbroekAgain, "proc" is the traced process of which its current system call has now 118*521fa314SDavid van Moolenbroekreturned. "m_out" is again the request message, guaranteed to be unchanged 119*521fa314SDavid van Moolenbroeksince the "out" call. "m_in" is the reply message from the service. "failed" 120*521fa314SDavid van Moolenbroekis either 0 to indicate that the call appears to have succeeded, or PF_FAILED 121*521fa314SDavid van Moolenbroekto indicate that the call definitely failed. If PF_FAILED is set, the call 122*521fa314SDavid van Moolenbroekhas failed either at the IPC level or at the system call level (or for another, 123*521fa314SDavid van Moolenbroekless common reason). In that case, the contents of "m_in" may be garbage and 124*521fa314SDavid van Moolenbroek"m_in" must not be used at all. 125*521fa314SDavid van Moolenbroek 126*521fa314SDavid van MoolenbroekFor CF_NOTDONE type calls, the in printer function should first print the 127*521fa314SDavid van Moolenbroekremaining parameters. Here especially, it is important to consider that the 128*521fa314SDavid van Moolenbroekentire call may fail. In that case, the parameters of which the contents were 129*521fa314SDavid van Moolenbroekstill going to be printed may also contain garbage, since they were never 130*521fa314SDavid van Moolenbroekfilled. The expected behavior is to print such parameters as pointer or "&.." 131*521fa314SDavid van Moolenbroekor something else to indicate that their actual contents are not valid. 132*521fa314SDavid van Moolenbroek 133*521fa314SDavid van MoolenbroekEither way, once a CF_NOTDONE type call function is done printing the remaining 134*521fa314SDavid van Moolenbroekparameters, it must call put_equals(proc) to print the closing parenthesis of 135*521fa314SDavid van Moolenbroekthe call and the equals sign. CF_NORETURN calls must also use put_equals(proc) 136*521fa314SDavid van Moolenbroekto print the equals sign. 137*521fa314SDavid van Moolenbroek 138*521fa314SDavid van MoolenbroekThen comes the result part. If the call failed, the in printer function *must* 139*521fa314SDavid van Moolenbroekuse put_result(proc) to print the failure result. This call not only takes 140*521fa314SDavid van Moolenbroekcare of converting negative error codes from m_in->m_type into "-1 [ECODE]" but 141*521fa314SDavid van Moolenbroekalso prints appropriate failure codes for IPC-level and other exceptional 142*521fa314SDavid van Moolenbroekfailures. Only if the system call did not fail, may the in printer function 143*521fa314SDavid van Moolenbroekchoose to not call put_result(proc), which on success simply prints 144*521fa314SDavid van Moolenbroekm_in->m_type as an integer. Similarly, if the system call succeeded, the in 145*521fa314SDavid van Moolenbroekprinter function may print extended results after the primary result, generally 146*521fa314SDavid van Moolenbroekin parentheses. For example, getpid() and getppid() share the same system call 147*521fa314SDavid van Moolenbroekand thus the tracer prints both return values, one as the primary result of the 148*521fa314SDavid van Moolenbroekactual call and one in parentheses with a clarifying name as extended result: 149*521fa314SDavid van Moolenbroek 150*521fa314SDavid van Moolenbroek getpid() = 3 (ppid=1) 151*521fa314SDavid van Moolenbroek 152*521fa314SDavid van MoolenbroekIt should now be clear that printing extended results makes no sense if the 153*521fa314SDavid van Moolenbroeksystem call failed. 154*521fa314SDavid van Moolenbroek 155*521fa314SDavid van MoolenbroekBesidse put_equals and put_result, the following more or less generic support 156*521fa314SDavid van Moolenbroekfunctions are available to print the various parts of the requests and replies. 157*521fa314SDavid van Moolenbroek 158*521fa314SDavid van Moolenbroek put_field - output a parameter, structure field, and so on; this function 159*521fa314SDavid van Moolenbroek should be used for just about every actual value 160*521fa314SDavid van Moolenbroek put_value - printf-like version of put_field 161*521fa314SDavid van Moolenbroek put_text - output plain text; for call handlers, this should be used only to 162*521fa314SDavid van Moolenbroek to add things right after a put_field call, never on its own 163*521fa314SDavid van Moolenbroek put_fmt - printf-like version of put_text, should generally not be used 164*521fa314SDavid van Moolenbroek from call handlers at all 165*521fa314SDavid van Moolenbroek put_open - open a nested block of fields, surrounded by parentheses, 166*521fa314SDavid van Moolenbroek brackets, or something like that; this is used for structures, 167*521fa314SDavid van Moolenbroek arrays, and any other similar nontrivial case of nesting 168*521fa314SDavid van Moolenbroek put_close - close a previously opened block of fields; the nesting depth is 169*521fa314SDavid van Moolenbroek actually tracked (to keep per-level separators etc), so each 170*521fa314SDavid van Moolenbroek put_open call must have a corresponding put_close call 171*521fa314SDavid van Moolenbroek put_open_struct - perform several tasks necessary to start printing the 172*521fa314SDavid van Moolenbroek fields of a structure; note that this function may fail! 173*521fa314SDavid van Moolenbroek put_close_struct - end successful printing of a structure 174*521fa314SDavid van Moolenbroek put_ptr - print a pointer in the traced process 175*521fa314SDavid van Moolenbroek put_buf - print a buffer or string 176*521fa314SDavid van Moolenbroek put_flags - print a bitwise flags field 177*521fa314SDavid van Moolenbroek put_tail - helper function for printing the continuation part of an array 178*521fa314SDavid van Moolenbroek 179*521fa314SDavid van MoolenbroekMany of these support functions take a flags field which takes PF_-prefixed 180*521fa314SDavid van Moolenbroekflags to modify the output they generate. The value of 'failed' in the in 181*521fa314SDavid van Moolenbroekprinter function may actually be passed (bitwise-OR'ed in) as the PF_FAILED 182*521fa314SDavid van Moolenbroekflag to these support functions, and they will do the right thing. For 183*521fa314SDavid van Moolenbroekexample, a call to put_open_struct with the PF_FAILED flag will end up simply 184*521fa314SDavid van Moolenbroekprinting the pointer to the structure, and not allow printing of the contents 185*521fa314SDavid van Moolenbroekof the structure. 186*521fa314SDavid van Moolenbroek 187*521fa314SDavid van MoolenbroekThe above support functions are documented (at a basic level) within the code, 188*521fa314SDavid van Moolenbroekbut in many cases, it may be useful to look up how they are used in practice by 189*521fa314SDavid van Moolenbroekthe existing handlers. The same goes for various less clear cases; while there 190*521fa314SDavid van Moolenbroekis basic support for printing structures, support for printing arrays must be 191*521fa314SDavid van Moolenbroekcoded fully by hand, as has been done for many places. A serious attempt has 192*521fa314SDavid van Moolenbroekbeen made to make the output consistent across the board (mainly thanks to the 193*521fa314SDavid van Moolenbroekoutput format of strace, on which the output of this tracer has been based, 194*521fa314SDavid van Moolenbroeksometimes very strictly and sometimes more loosely, but that aside) so it is 195*521fa314SDavid van Moolenbroekalways advisable to follow the ways of the existing handlers. Also keep in 196*521fa314SDavid van Moolenbroekmind that there are already printer functions for several generic structures, 197*521fa314SDavid van Moolenbroekand these should be used whenever possible (e.g., see the put_fd() comment). 198*521fa314SDavid van Moolenbroek 199*521fa314SDavid van MoolenbroekFinally, the default_out and default_in functions may be used as printer 200*521fa314SDavid van Moolenbroekfunctions for call with no parameters, and for functions which need no more 201*521fa314SDavid van Moolenbroekthan put_result() to print their system call result, respectively. 202*521fa314SDavid van Moolenbroek 203*521fa314SDavid van Moolenbroek 204*521fa314SDavid van MoolenbroekINTERNALS: MULTIPROCESS OUTPUT AND PREEMPTION 205*521fa314SDavid van Moolenbroek 206*521fa314SDavid van MoolenbroekThings get interesting when multiple processes are traced at once. Due to the 207*521fa314SDavid van Moolenbroeknature of process scheduling, system calls may end up being preempted between 208*521fa314SDavid van Moolenbroekthe call-enter and call-leave phases. This means that the output of a system 209*521fa314SDavid van Moolenbroekcall has to be suspended to give way to an event from another traced process. 210*521fa314SDavid van MoolenbroekSuch preemption may occur with literally all calls; not just "blocking" calls. 211*521fa314SDavid van Moolenbroek 212*521fa314SDavid van MoolenbroekThe tracer goes through some lengths to aid the user in following the output in 213*521fa314SDavid van Moolenbroekthe light of preemtion. The most important aspect is that the output of the 214*521fa314SDavid van Moolenbroekcall-enter phase is recorded, so that in the case of preemption, the call-leave 215*521fa314SDavid van Moolenbroekphase can start by replaying the record. As a result, the user gets to see the 216*521fa314SDavid van Moolenbroekwhole system call on a single line, instead of just the second half. Such 217*521fa314SDavid van Moolenbroeksystem call resumptions are marked with a "*" in their prefix, to show that 218*521fa314SDavid van Moolenbroekthe call was not just entered. The output therefore looks like this: 219*521fa314SDavid van Moolenbroek 220*521fa314SDavid van Moolenbroek 2| syscall() = <..> 221*521fa314SDavid van Moolenbroek 3| othercall() = 0 222*521fa314SDavid van Moolenbroek 2|*syscall() = 0 223*521fa314SDavid van Moolenbroek 224*521fa314SDavid van MoolenbroekSignals that arrive during a call will cause a resumption of the call as well. 225*521fa314SDavid van MoolenbroekAs a result, a call may be resumed multiple times: 226*521fa314SDavid van Moolenbroek 227*521fa314SDavid van Moolenbroek 2| syscall() = <..> 228*521fa314SDavid van Moolenbroek 3| othercall() = 0 229*521fa314SDavid van Moolenbroek 2|*syscall() = ** SIGUSR1 ** ** SIGUSR2 ** <..> 230*521fa314SDavid van Moolenbroek 3| othercall() = -1 [EBUSY] 231*521fa314SDavid van Moolenbroek 2|*syscall() = ** SIGHUP ** <..> 232*521fa314SDavid van Moolenbroek 3| othercall() = 0 233*521fa314SDavid van Moolenbroek 2|*syscall() = 0 234*521fa314SDavid van Moolenbroek 235*521fa314SDavid van MoolenbroekThis entire scenario shows one single system call from process 2. 236*521fa314SDavid van Moolenbroek 237*521fa314SDavid van MoolenbroekIn the current implementation, the output that should be recorded and/or cause 238*521fa314SDavid van Moolenbroekthe "<..>" preemption marker, as well as the cases where the recorded text must 239*521fa314SDavid van Moolenbroekbe replayed, are marked by the code explicitly. Replay takes place in three 240*521fa314SDavid van Moolenbroekcases: upon the call-leave event (obviously), upon receiving a signal (as shown 241*521fa314SDavid van Moolenbroekabove), and when it is required that a suspended no-return call is shown as 242*521fa314SDavid van Moolenbroekcompleted before continuing with other output. The last case applies to exit() 243*521fa314SDavid van Moolenbroekand execve(), and both are documented in the code quite extensively. Generally 244*521fa314SDavid van Moolenbroekspeaking, in all output lines where no recording or replay actions are 245*521fa314SDavid van Moolenbroekperformed, the recording will not be replayed but also not removed. This 246*521fa314SDavid van Moolenbroekallows for intermediate lines for that process in the output. Practically 247*521fa314SDavid van Moolenbroekspeaking, future support for job control could even print when a process get 248*521fa314SDavid van Moolenbroekstopped and continued, for that process, while preempting the output for the 249*521fa314SDavid van Moolenbroekongoing system call for that same process. 250*521fa314SDavid van Moolenbroek 251*521fa314SDavid van MoolenbroekIt is possible that the output of the call-enter phase exhausts the recording 252*521fa314SDavid van Moolenbroekbuffer for its process. In this case, a new, shorter text is generated upon 253*521fa314SDavid van Moolenbroekprocess resumption. There are many other aspects to proper output formatting 254*521fa314SDavid van Moolenbroekin the light of preemption, but most of them should be documented as part of 255*521fa314SDavid van Moolenbroekthe code reasonably well. 256