xref: /csrg-svn/bin/sh/TOUR (revision 47102)
1*47102Sbostic#	@(#)TOUR	5.1 (Berkeley) 03/07/91
2*47102Sbostic
3*47102Sbostic                       A Tour through Ash
4*47102Sbostic
5*47102Sbostic               Copyright 1989 by Kenneth Almquist.
6*47102Sbostic
7*47102Sbostic
8*47102SbosticDIRECTORIES:  The subdirectory bltin contains commands which can
9*47102Sbosticbe compiled stand-alone.  The rest of the source is in the main
10*47102Sbosticash directory.
11*47102Sbostic
12*47102SbosticSOURCE CODE GENERATORS:  Files whose names begin with "mk" are
13*47102Sbosticprograms that generate source code.  A complete list of these
14*47102Sbosticprograms is:
15*47102Sbostic
16*47102Sbostic        program         intput files        generates
17*47102Sbostic        -------         ------------        ---------
18*47102Sbostic        mkbuiltins      builtins            builtins.h builtins.c
19*47102Sbostic        mkinit          *.c                 init.c
20*47102Sbostic        mknodes         nodetypes           nodes.h nodes.c
21*47102Sbostic        mksignames          -               signames.h signames.c
22*47102Sbostic        mksyntax            -               syntax.h syntax.c
23*47102Sbostic        mktokens            -               token.def
24*47102Sbostic        bltin/mkexpr    unary_op binary_op  operators.h operators.c
25*47102Sbostic
26*47102SbosticThere are undoubtedly too many of these.  Mkinit searches all the
27*47102SbosticC source files for entries looking like:
28*47102Sbostic
29*47102Sbostic        INIT {
30*47102Sbostic              x = 1;    /* executed during initialization */
31*47102Sbostic        }
32*47102Sbostic
33*47102Sbostic        RESET {
34*47102Sbostic              x = 2;    /* executed when the shell does a longjmp
35*47102Sbostic                           back to the main command loop */
36*47102Sbostic        }
37*47102Sbostic
38*47102Sbostic        SHELLPROC {
39*47102Sbostic              x = 3;    /* executed when the shell runs a shell procedure */
40*47102Sbostic        }
41*47102Sbostic
42*47102SbosticIt pulls this code out into routines which are when particular
43*47102Sbosticevents occur.  The intent is to improve modularity by isolating
44*47102Sbosticthe information about which modules need to be explicitly
45*47102Sbosticinitialized/reset within the modules themselves.
46*47102Sbostic
47*47102SbosticMkinit recognizes several constructs for placing declarations in
48*47102Sbosticthe init.c file.
49*47102Sbostic        INCLUDE "file.h"
50*47102Sbosticincludes a file.  The storage class MKINIT makes a declaration
51*47102Sbosticavailable in the init.c file, for example:
52*47102Sbostic        MKINIT int funcnest;    /* depth of function calls */
53*47102SbosticMKINIT alone on a line introduces a structure or union declara-
54*47102Sbostiction:
55*47102Sbostic        MKINIT
56*47102Sbostic        struct redirtab {
57*47102Sbostic              short renamed[10];
58*47102Sbostic        };
59*47102SbosticPreprocessor #define statements are copied to init.c without any
60*47102Sbosticspecial action to request this.
61*47102Sbostic
62*47102SbosticINDENTATION:  The ash source is indented in multiples of six
63*47102Sbosticspaces.  The only study that I have heard of on the subject con-
64*47102Sbosticcluded that the optimal amount to indent is in the range of four
65*47102Sbosticto six spaces.  I use six spaces since it is not too big a jump
66*47102Sbosticfrom the widely used eight spaces.  If you really hate six space
67*47102Sbosticindentation, use the adjind (source included) program to change
68*47102Sbosticit to something else.
69*47102Sbostic
70*47102SbosticEXCEPTIONS:  Code for dealing with exceptions appears in
71*47102Sbosticexceptions.c.  The C language doesn't include exception handling,
72*47102Sbosticso I implement it using setjmp and longjmp.  The global variable
73*47102Sbosticexception contains the type of exception.  EXERROR is raised by
74*47102Sbosticcalling error.  EXINT is an interrupt.  EXSHELLPROC is an excep-
75*47102Sbostiction which is raised when a shell procedure is invoked.  The pur-
76*47102Sbosticpose of EXSHELLPROC is to perform the cleanup actions associated
77*47102Sbosticwith other exceptions.  After these cleanup actions, the shell
78*47102Sbosticcan interpret a shell procedure itself without exec'ing a new
79*47102Sbosticcopy of the shell.
80*47102Sbostic
81*47102SbosticINTERRUPTS:  In an interactive shell, an interrupt will cause an
82*47102SbosticEXINT exception to return to the main command loop.  (Exception:
83*47102SbosticEXINT is not raised if the user traps interrupts using the trap
84*47102Sbosticcommand.)  The INTOFF and INTON macros (defined in exception.h)
85*47102Sbosticprovide uninterruptable critical sections.  Between the execution
86*47102Sbosticof INTOFF and the execution of INTON, interrupt signals will be
87*47102Sbosticheld for later delivery.  INTOFF and INTON can be nested.
88*47102Sbostic
89*47102SbosticMEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
90*47102Sbosticwhich call error when there is no memory left.  It also defines a
91*47102Sbosticstack oriented memory allocation scheme.  Allocating off a stack
92*47102Sbosticis probably more efficient than allocation using malloc, but the
93*47102Sbosticbig advantage is that when an exception occurs all we have to do
94*47102Sbosticto free up the memory in use at the time of the exception is to
95*47102Sbosticrestore the stack pointer.  The stack is implemented using a
96*47102Sbosticlinked list of blocks.
97*47102Sbostic
98*47102SbosticSTPUTC:  If the stack were contiguous, it would be easy to store
99*47102Sbosticstrings on the stack without knowing in advance how long the
100*47102Sbosticstring was going to be:
101*47102Sbostic        p = stackptr;
102*47102Sbostic        *p++ = c;       /* repeated as many times as needed */
103*47102Sbostic        stackptr = p;
104*47102SbosticThe folloing three macros (defined in memalloc.h) perform these
105*47102Sbosticoperations, but grow the stack if you run off the end:
106*47102Sbostic        STARTSTACKSTR(p);
107*47102Sbostic        STPUTC(c, p);   /* repeated as many times as needed */
108*47102Sbostic        grabstackstr(p);
109*47102Sbostic
110*47102SbosticWe now start a top-down look at the code:
111*47102Sbostic
112*47102SbosticMAIN.C:  The main routine performs some initialization, executes
113*47102Sbosticthe user's profile if necessary, and calls cmdloop.  Cmdloop is
114*47102Sbosticrepeatedly parses and executes commands.
115*47102Sbostic
116*47102SbosticOPTIONS.C:  This file contains the option processing code.  It is
117*47102Sbosticcalled from main to parse the shell arguments when the shell is
118*47102Sbosticinvoked, and it also contains the set builtin.  The -i and -j op-
119*47102Sbostictions (the latter turns on job control) require changes in signal
120*47102Sbostichandling.  The routines setjobctl (in jobs.c) and setinteractive
121*47102Sbostic(in trap.c) are called to handle changes to these options.
122*47102Sbostic
123*47102SbosticPARSING:  The parser code is all in parser.c.  A recursive des-
124*47102Sbosticcent parser is used.  Syntax tables (generated by mksyntax) are
125*47102Sbosticused to classify characters during lexical analysis.  There are
126*47102Sbosticthree tables:  one for normal use, one for use when inside single
127*47102Sbosticquotes, and one for use when inside double quotes.  The tables
128*47102Sbosticare machine dependent because they are indexed by character vari-
129*47102Sbosticables and the range of a char varies from machine to machine.
130*47102Sbostic
131*47102SbosticPARSE OUTPUT:  The output of the parser consists of a tree of
132*47102Sbosticnodes.  The various types of nodes are defined in the file node-
133*47102Sbostictypes.
134*47102Sbostic
135*47102SbosticNodes of type NARG are used to represent both words and the con-
136*47102Sbostictents of here documents.  An early version of ash kept the con-
137*47102Sbostictents of here documents in temporary files, but keeping here do-
138*47102Sbosticcuments in memory typically results in significantly better per-
139*47102Sbosticformance.  It would have been nice to make it an option to use
140*47102Sbostictemporary files for here documents, for the benefit of small
141*47102Sbosticmachines, but the code to keep track of when to delete the tem-
142*47102Sbosticporary files was complex and I never fixed all the bugs in it.
143*47102Sbostic(AT&T has been maintaining the Bourne shell for more than ten
144*47102Sbosticyears, and to the best of my knowledge they still haven't gotten
145*47102Sbosticit to handle temporary files correctly in obscure cases.)
146*47102Sbostic
147*47102SbosticThe text field of a NARG structure points to the text of the
148*47102Sbosticword.  The text consists of ordinary characters and a number of
149*47102Sbosticspecial codes defined in parser.h.  The special codes are:
150*47102Sbostic
151*47102Sbostic        CTLVAR              Variable substitution
152*47102Sbostic        CTLENDVAR           End of variable substitution
153*47102Sbostic        CTLBACKQ            Command substitution
154*47102Sbostic        CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
155*47102Sbostic        CTLESC              Escape next character
156*47102Sbostic
157*47102SbosticA variable substitution contains the following elements:
158*47102Sbostic
159*47102Sbostic        CTLVAR type name '=' [ alternative-text CTLENDVAR ]
160*47102Sbostic
161*47102SbosticThe type field is a single character specifying the type of sub-
162*47102Sbosticstitution.  The possible types are:
163*47102Sbostic
164*47102Sbostic        VSNORMAL            $var
165*47102Sbostic        VSMINUS             ${var-text}
166*47102Sbostic        VSMINUS|VSNUL       ${var:-text}
167*47102Sbostic        VSPLUS              ${var+text}
168*47102Sbostic        VSPLUS|VSNUL        ${var:+text}
169*47102Sbostic        VSQUESTION          ${var?text}
170*47102Sbostic        VSQUESTION|VSNUL    ${var:?text}
171*47102Sbostic        VSASSIGN            ${var=text}
172*47102Sbostic        VSASSIGN|VSNUL      ${var=text}
173*47102Sbostic
174*47102SbosticIn addition, the type field will have the VSQUOTE flag set if the
175*47102Sbosticvariable is enclosed in double quotes.  The name of the variable
176*47102Sbosticcomes next, terminated by an equals sign.  If the type is not
177*47102SbosticVSNORMAL, then the text field in the substitution follows, ter-
178*47102Sbosticminated by a CTLENDVAR byte.
179*47102Sbostic
180*47102SbosticCommands in back quotes are parsed and stored in a linked list.
181*47102SbosticThe locations of these commands in the string are indicated by
182*47102SbosticCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
183*47102Sbosticthe back quotes were enclosed in double quotes.
184*47102Sbostic
185*47102SbosticThe character CTLESC escapes the next character, so that in case
186*47102Sbosticany of the CTL characters mentioned above appear in the input,
187*47102Sbosticthey can be passed through transparently.  CTLESC is also used to
188*47102Sbosticescape '*', '?', '[', and '!' characters which were quoted by the
189*47102Sbosticuser and thus should not be used for file name generation.
190*47102Sbostic
191*47102SbosticCTLESC characters have proved to be particularly tricky to get
192*47102Sbosticright.  In the case of here documents which are not subject to
193*47102Sbosticvariable and command substitution, the parser doesn't insert any
194*47102SbosticCTLESC characters to begin with (so the contents of the text
195*47102Sbosticfield can be written without any processing).  Other here docu-
196*47102Sbosticments, and words which are not subject to splitting and file name
197*47102Sbosticgeneration, have the CTLESC characters removed during the vari-
198*47102Sbosticable and command substitution phase.  Words which are subject
199*47102Sbosticsplitting and file name generation have the CTLESC characters re-
200*47102Sbosticmoved as part of the file name phase.
201*47102Sbostic
202*47102SbosticEXECUTION:  Command execution is handled by the following files:
203*47102Sbostic        eval.c     The top level routines.
204*47102Sbostic        redir.c    Code to handle redirection of input and output.
205*47102Sbostic        jobs.c     Code to handle forking, waiting, and job control.
206*47102Sbostic        exec.c     Code to to path searches and the actual exec sys call.
207*47102Sbostic        expand.c   Code to evaluate arguments.
208*47102Sbostic        var.c      Maintains the variable symbol table.  Called from expand.c.
209*47102Sbostic
210*47102SbosticEVAL.C:  Evaltree recursively executes a parse tree.  The exit
211*47102Sbosticstatus is returned in the global variable exitstatus.  The alter-
212*47102Sbosticnative entry evalbackcmd is called to evaluate commands in back
213*47102Sbosticquotes.  It saves the result in memory if the command is a buil-
214*47102Sbostictin; otherwise it forks off a child to execute the command and
215*47102Sbosticconnects the standard output of the child to a pipe.
216*47102Sbostic
217*47102SbosticJOBS.C:  To create a process, you call makejob to return a job
218*47102Sbosticstructure, and then call forkshell (passing the job structure as
219*47102Sbostican argument) to create the process.  Waitforjob waits for a job
220*47102Sbosticto complete.  These routines take care of process groups if job
221*47102Sbosticcontrol is defined.
222*47102Sbostic
223*47102SbosticREDIR.C:  Ash allows file descriptors to be redirected and then
224*47102Sbosticrestored without forking off a child process.  This is accom-
225*47102Sbosticplished by duplicating the original file descriptors.  The redir-
226*47102Sbostictab structure records where the file descriptors have be dupli-
227*47102Sbosticcated to.
228*47102Sbostic
229*47102SbosticEXEC.C:  The routine find_command locates a command, and enters
230*47102Sbosticthe command in the hash table if it is not already there.  The
231*47102Sbosticthird argument specifies whether it is to print an error message
232*47102Sbosticif the command is not found.  (When a pipeline is set up,
233*47102Sbosticfind_command is called for all the commands in the pipeline be-
234*47102Sbosticfore any forking is done, so to get the commands into the hash
235*47102Sbostictable of the parent process.  But to make command hashing as
236*47102Sbostictransparent as possible, we silently ignore errors at that point
237*47102Sbosticand only print error messages if the command cannot be found
238*47102Sbosticlater.)
239*47102Sbostic
240*47102SbosticThe routine shellexec is the interface to the exec system call.
241*47102Sbostic
242*47102SbosticEXPAND.C:  Arguments are processed in three passes.  The first
243*47102Sbostic(performed by the routine argstr) performs variable and command
244*47102Sbosticsubstitution.  The second (ifsbreakup) performs word splitting
245*47102Sbosticand the third (expandmeta) performs file name generation.  If the
246*47102Sbostic"/u" directory is simulated, then when "/u/username" is replaced
247*47102Sbosticby the user's home directory, the flag "didudir" is set.  This
248*47102Sbostictells the cd command that it should print out the directory name,
249*47102Sbosticjust as it would if the "/u" directory were implemented using
250*47102Sbosticsymbolic links.
251*47102Sbostic
252*47102SbosticVAR.C:  Variables are stored in a hash table.  Probably we should
253*47102Sbosticswitch to extensible hashing.  The variable name is stored in the
254*47102Sbosticsame string as the value (using the format "name=value") so that
255*47102Sbosticno string copying is needed to create the environment of a com-
256*47102Sbosticmand.  Variables which the shell references internally are preal-
257*47102Sbosticlocated so that the shell can reference the values of these vari-
258*47102Sbosticables without doing a lookup.
259*47102Sbostic
260*47102SbosticWhen a program is run, the code in eval.c sticks any environment
261*47102Sbosticvariables which precede the command (as in "PATH=xxx command") in
262*47102Sbosticthe variable table as the simplest way to strip duplicates, and
263*47102Sbosticthen calls "environment" to get the value of the environment.
264*47102SbosticThere are two consequences of this.  First, if an assignment to
265*47102SbosticPATH precedes the command, the value of PATH before the assign-
266*47102Sbosticment must be remembered and passed to shellexec.  Second, if the
267*47102Sbosticprogram turns out to be a shell procedure, the strings from the
268*47102Sbosticenvironment variables which preceded the command must be pulled
269*47102Sbosticout of the table and replaced with strings obtained from malloc,
270*47102Sbosticsince the former will automatically be freed when the stack (see
271*47102Sbosticthe entry on memalloc.c) is emptied.
272*47102Sbostic
273*47102SbosticBUILTIN COMMANDS:  The procedures for handling these are scat-
274*47102Sbostictered throughout the code, depending on which location appears
275*47102Sbosticmost appropriate.  They can be recognized because their names al-
276*47102Sbosticways end in "cmd".  The mapping from names to procedures is
277*47102Sbosticspecified in the file builtins, which is processed by the mkbuil-
278*47102Sbostictins command.
279*47102Sbostic
280*47102SbosticA builtin command is invoked with argc and argv set up like a
281*47102Sbosticnormal program.  A builtin command is allowed to overwrite its
282*47102Sbosticarguments.  Builtin routines can call nextopt to do option pars-
283*47102Sbosticing.  This is kind of like getopt, but you don't pass argc and
284*47102Sbosticargv to it.  Builtin routines can also call error.  This routine
285*47102Sbosticnormally terminates the shell (or returns to the main command
286*47102Sbosticloop if the shell is interactive), but when called from a builtin
287*47102Sbosticcommand it causes the builtin command to terminate with an exit
288*47102Sbosticstatus of 2.
289*47102Sbostic
290*47102SbosticThe directory bltins contains commands which can be compiled in-
291*47102Sbosticdependently but can also be built into the shell for efficiency
292*47102Sbosticreasons.  The makefile in this directory compiles these programs
293*47102Sbosticin the normal fashion (so that they can be run regardless of
294*47102Sbosticwhether the invoker is ash), but also creates a library named
295*47102Sbosticbltinlib.a which can be linked with ash.  The header file bltin.h
296*47102Sbostictakes care of most of the differences between the ash and the
297*47102Sbosticstand-alone environment.  The user should call the main routine
298*47102Sbostic"main", and #define main to be the name of the routine to use
299*47102Sbosticwhen the program is linked into ash.  This #define should appear
300*47102Sbosticbefore bltin.h is included; bltin.h will #undef main if the pro-
301*47102Sbosticgram is to be compiled stand-alone.
302*47102Sbostic
303*47102SbosticCD.C:  This file defines the cd and pwd builtins.  The pwd com-
304*47102Sbosticmand runs /bin/pwd the first time it is invoked (unless the user
305*47102Sbostichas already done a cd to an absolute pathname), but then
306*47102Sbosticremembers the current directory and updates it when the cd com-
307*47102Sbosticmand is run, so subsequent pwd commands run very fast.  The main
308*47102Sbosticcomplication in the cd command is in the docd command, which
309*47102Sbosticresolves symbolic links into actual names and informs the user
310*47102Sbosticwhere the user ended up if he crossed a symbolic link.
311*47102Sbostic
312*47102SbosticSIGNALS:  Trap.c implements the trap command.  The routine set-
313*47102Sbosticsignal figures out what action should be taken when a signal is
314*47102Sbosticreceived and invokes the signal system call to set the signal ac-
315*47102Sbostiction appropriately.  When a signal that a user has set a trap for
316*47102Sbosticis caught, the routine "onsig" sets a flag.  The routine dotrap
317*47102Sbosticis called at appropriate points to actually handle the signal.
318*47102SbosticWhen an interrupt is caught and no trap has been set for that
319*47102Sbosticsignal, the routine "onint" in error.c is called.
320*47102Sbostic
321*47102SbosticOUTPUT:  Ash uses it's own output routines.  There are three out-
322*47102Sbosticput structures allocated.  "Output" represents the standard out-
323*47102Sbosticput, "errout" the standard error, and "memout" contains output
324*47102Sbosticwhich is to be stored in memory.  This last is used when a buil-
325*47102Sbostictin command appears in backquotes, to allow its output to be col-
326*47102Sbosticlected without doing any I/O through the UNIX operating system.
327*47102SbosticThe variables out1 and out2 normally point to output and errout,
328*47102Sbosticrespectively, but they are set to point to memout when appropri-
329*47102Sbosticate inside backquotes.
330*47102Sbostic
331*47102SbosticINPUT:  The basic input routine is pgetc, which reads from the
332*47102Sbosticcurrent input file.  There is a stack of input files; the current
333*47102Sbosticinput file is the top file on this stack.  The code allows the
334*47102Sbosticinput to come from a string rather than a file.  (This is for the
335*47102Sbostic-c option and the "." and eval builtin commands.)  The global
336*47102Sbosticvariable plinno is saved and restored when files are pushed and
337*47102Sbosticpopped from the stack.  The parser routines store the number of
338*47102Sbosticthe current line in this variable.
339*47102Sbostic
340*47102SbosticDEBUGGING:  If DEBUG is defined in shell.h, then the shell will
341*47102Sbosticwrite debugging information to the file $HOME/trace.  Most of
342*47102Sbosticthis is done using the TRACE macro, which takes a set of printf
343*47102Sbosticarguments inside two sets of parenthesis.  Example:
344*47102Sbostic"TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
345*47102Sbosticcause the preprocessor can't handle functions with a variable
346*47102Sbosticnumber of arguments.  Defining DEBUG also causes the shell to
347*47102Sbosticgenerate a core dump if it is sent a quit signal.  The tracing
348*47102Sbosticcode is in show.c.
349