1*47102Sbostic# @(#)TOUR 5.1 (Berkeley) 03/07/91 2*47102Sbostic 3*47102Sbostic A Tour through Ash 4*47102Sbostic 5*47102Sbostic Copyright 1989 by Kenneth Almquist. 6*47102Sbostic 7*47102Sbostic 8*47102SbosticDIRECTORIES: The subdirectory bltin contains commands which can 9*47102Sbosticbe compiled stand-alone. The rest of the source is in the main 10*47102Sbosticash directory. 11*47102Sbostic 12*47102SbosticSOURCE CODE GENERATORS: Files whose names begin with "mk" are 13*47102Sbosticprograms that generate source code. A complete list of these 14*47102Sbosticprograms is: 15*47102Sbostic 16*47102Sbostic program intput files generates 17*47102Sbostic ------- ------------ --------- 18*47102Sbostic mkbuiltins builtins builtins.h builtins.c 19*47102Sbostic mkinit *.c init.c 20*47102Sbostic mknodes nodetypes nodes.h nodes.c 21*47102Sbostic mksignames - signames.h signames.c 22*47102Sbostic mksyntax - syntax.h syntax.c 23*47102Sbostic mktokens - token.def 24*47102Sbostic bltin/mkexpr unary_op binary_op operators.h operators.c 25*47102Sbostic 26*47102SbosticThere are undoubtedly too many of these. Mkinit searches all the 27*47102SbosticC source files for entries looking like: 28*47102Sbostic 29*47102Sbostic INIT { 30*47102Sbostic x = 1; /* executed during initialization */ 31*47102Sbostic } 32*47102Sbostic 33*47102Sbostic RESET { 34*47102Sbostic x = 2; /* executed when the shell does a longjmp 35*47102Sbostic back to the main command loop */ 36*47102Sbostic } 37*47102Sbostic 38*47102Sbostic SHELLPROC { 39*47102Sbostic x = 3; /* executed when the shell runs a shell procedure */ 40*47102Sbostic } 41*47102Sbostic 42*47102SbosticIt pulls this code out into routines which are when particular 43*47102Sbosticevents occur. The intent is to improve modularity by isolating 44*47102Sbosticthe information about which modules need to be explicitly 45*47102Sbosticinitialized/reset within the modules themselves. 46*47102Sbostic 47*47102SbosticMkinit recognizes several constructs for placing declarations in 48*47102Sbosticthe init.c file. 49*47102Sbostic INCLUDE "file.h" 50*47102Sbosticincludes a file. The storage class MKINIT makes a declaration 51*47102Sbosticavailable in the init.c file, for example: 52*47102Sbostic MKINIT int funcnest; /* depth of function calls */ 53*47102SbosticMKINIT alone on a line introduces a structure or union declara- 54*47102Sbostiction: 55*47102Sbostic MKINIT 56*47102Sbostic struct redirtab { 57*47102Sbostic short renamed[10]; 58*47102Sbostic }; 59*47102SbosticPreprocessor #define statements are copied to init.c without any 60*47102Sbosticspecial action to request this. 61*47102Sbostic 62*47102SbosticINDENTATION: The ash source is indented in multiples of six 63*47102Sbosticspaces. The only study that I have heard of on the subject con- 64*47102Sbosticcluded that the optimal amount to indent is in the range of four 65*47102Sbosticto six spaces. I use six spaces since it is not too big a jump 66*47102Sbosticfrom the widely used eight spaces. If you really hate six space 67*47102Sbosticindentation, use the adjind (source included) program to change 68*47102Sbosticit to something else. 69*47102Sbostic 70*47102SbosticEXCEPTIONS: Code for dealing with exceptions appears in 71*47102Sbosticexceptions.c. The C language doesn't include exception handling, 72*47102Sbosticso I implement it using setjmp and longjmp. The global variable 73*47102Sbosticexception contains the type of exception. EXERROR is raised by 74*47102Sbosticcalling error. EXINT is an interrupt. EXSHELLPROC is an excep- 75*47102Sbostiction which is raised when a shell procedure is invoked. The pur- 76*47102Sbosticpose of EXSHELLPROC is to perform the cleanup actions associated 77*47102Sbosticwith other exceptions. After these cleanup actions, the shell 78*47102Sbosticcan interpret a shell procedure itself without exec'ing a new 79*47102Sbosticcopy of the shell. 80*47102Sbostic 81*47102SbosticINTERRUPTS: In an interactive shell, an interrupt will cause an 82*47102SbosticEXINT exception to return to the main command loop. (Exception: 83*47102SbosticEXINT is not raised if the user traps interrupts using the trap 84*47102Sbosticcommand.) The INTOFF and INTON macros (defined in exception.h) 85*47102Sbosticprovide uninterruptable critical sections. Between the execution 86*47102Sbosticof INTOFF and the execution of INTON, interrupt signals will be 87*47102Sbosticheld for later delivery. INTOFF and INTON can be nested. 88*47102Sbostic 89*47102SbosticMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 90*47102Sbosticwhich call error when there is no memory left. It also defines a 91*47102Sbosticstack oriented memory allocation scheme. Allocating off a stack 92*47102Sbosticis probably more efficient than allocation using malloc, but the 93*47102Sbosticbig advantage is that when an exception occurs all we have to do 94*47102Sbosticto free up the memory in use at the time of the exception is to 95*47102Sbosticrestore the stack pointer. The stack is implemented using a 96*47102Sbosticlinked list of blocks. 97*47102Sbostic 98*47102SbosticSTPUTC: If the stack were contiguous, it would be easy to store 99*47102Sbosticstrings on the stack without knowing in advance how long the 100*47102Sbosticstring was going to be: 101*47102Sbostic p = stackptr; 102*47102Sbostic *p++ = c; /* repeated as many times as needed */ 103*47102Sbostic stackptr = p; 104*47102SbosticThe folloing three macros (defined in memalloc.h) perform these 105*47102Sbosticoperations, but grow the stack if you run off the end: 106*47102Sbostic STARTSTACKSTR(p); 107*47102Sbostic STPUTC(c, p); /* repeated as many times as needed */ 108*47102Sbostic grabstackstr(p); 109*47102Sbostic 110*47102SbosticWe now start a top-down look at the code: 111*47102Sbostic 112*47102SbosticMAIN.C: The main routine performs some initialization, executes 113*47102Sbosticthe user's profile if necessary, and calls cmdloop. Cmdloop is 114*47102Sbosticrepeatedly parses and executes commands. 115*47102Sbostic 116*47102SbosticOPTIONS.C: This file contains the option processing code. It is 117*47102Sbosticcalled from main to parse the shell arguments when the shell is 118*47102Sbosticinvoked, and it also contains the set builtin. The -i and -j op- 119*47102Sbostictions (the latter turns on job control) require changes in signal 120*47102Sbostichandling. The routines setjobctl (in jobs.c) and setinteractive 121*47102Sbostic(in trap.c) are called to handle changes to these options. 122*47102Sbostic 123*47102SbosticPARSING: The parser code is all in parser.c. A recursive des- 124*47102Sbosticcent parser is used. Syntax tables (generated by mksyntax) are 125*47102Sbosticused to classify characters during lexical analysis. There are 126*47102Sbosticthree tables: one for normal use, one for use when inside single 127*47102Sbosticquotes, and one for use when inside double quotes. The tables 128*47102Sbosticare machine dependent because they are indexed by character vari- 129*47102Sbosticables and the range of a char varies from machine to machine. 130*47102Sbostic 131*47102SbosticPARSE OUTPUT: The output of the parser consists of a tree of 132*47102Sbosticnodes. The various types of nodes are defined in the file node- 133*47102Sbostictypes. 134*47102Sbostic 135*47102SbosticNodes of type NARG are used to represent both words and the con- 136*47102Sbostictents of here documents. An early version of ash kept the con- 137*47102Sbostictents of here documents in temporary files, but keeping here do- 138*47102Sbosticcuments in memory typically results in significantly better per- 139*47102Sbosticformance. It would have been nice to make it an option to use 140*47102Sbostictemporary files for here documents, for the benefit of small 141*47102Sbosticmachines, but the code to keep track of when to delete the tem- 142*47102Sbosticporary files was complex and I never fixed all the bugs in it. 143*47102Sbostic(AT&T has been maintaining the Bourne shell for more than ten 144*47102Sbosticyears, and to the best of my knowledge they still haven't gotten 145*47102Sbosticit to handle temporary files correctly in obscure cases.) 146*47102Sbostic 147*47102SbosticThe text field of a NARG structure points to the text of the 148*47102Sbosticword. The text consists of ordinary characters and a number of 149*47102Sbosticspecial codes defined in parser.h. The special codes are: 150*47102Sbostic 151*47102Sbostic CTLVAR Variable substitution 152*47102Sbostic CTLENDVAR End of variable substitution 153*47102Sbostic CTLBACKQ Command substitution 154*47102Sbostic CTLBACKQ|CTLQUOTE Command substitution inside double quotes 155*47102Sbostic CTLESC Escape next character 156*47102Sbostic 157*47102SbosticA variable substitution contains the following elements: 158*47102Sbostic 159*47102Sbostic CTLVAR type name '=' [ alternative-text CTLENDVAR ] 160*47102Sbostic 161*47102SbosticThe type field is a single character specifying the type of sub- 162*47102Sbosticstitution. The possible types are: 163*47102Sbostic 164*47102Sbostic VSNORMAL $var 165*47102Sbostic VSMINUS ${var-text} 166*47102Sbostic VSMINUS|VSNUL ${var:-text} 167*47102Sbostic VSPLUS ${var+text} 168*47102Sbostic VSPLUS|VSNUL ${var:+text} 169*47102Sbostic VSQUESTION ${var?text} 170*47102Sbostic VSQUESTION|VSNUL ${var:?text} 171*47102Sbostic VSASSIGN ${var=text} 172*47102Sbostic VSASSIGN|VSNUL ${var=text} 173*47102Sbostic 174*47102SbosticIn addition, the type field will have the VSQUOTE flag set if the 175*47102Sbosticvariable is enclosed in double quotes. The name of the variable 176*47102Sbosticcomes next, terminated by an equals sign. If the type is not 177*47102SbosticVSNORMAL, then the text field in the substitution follows, ter- 178*47102Sbosticminated by a CTLENDVAR byte. 179*47102Sbostic 180*47102SbosticCommands in back quotes are parsed and stored in a linked list. 181*47102SbosticThe locations of these commands in the string are indicated by 182*47102SbosticCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 183*47102Sbosticthe back quotes were enclosed in double quotes. 184*47102Sbostic 185*47102SbosticThe character CTLESC escapes the next character, so that in case 186*47102Sbosticany of the CTL characters mentioned above appear in the input, 187*47102Sbosticthey can be passed through transparently. CTLESC is also used to 188*47102Sbosticescape '*', '?', '[', and '!' characters which were quoted by the 189*47102Sbosticuser and thus should not be used for file name generation. 190*47102Sbostic 191*47102SbosticCTLESC characters have proved to be particularly tricky to get 192*47102Sbosticright. In the case of here documents which are not subject to 193*47102Sbosticvariable and command substitution, the parser doesn't insert any 194*47102SbosticCTLESC characters to begin with (so the contents of the text 195*47102Sbosticfield can be written without any processing). Other here docu- 196*47102Sbosticments, and words which are not subject to splitting and file name 197*47102Sbosticgeneration, have the CTLESC characters removed during the vari- 198*47102Sbosticable and command substitution phase. Words which are subject 199*47102Sbosticsplitting and file name generation have the CTLESC characters re- 200*47102Sbosticmoved as part of the file name phase. 201*47102Sbostic 202*47102SbosticEXECUTION: Command execution is handled by the following files: 203*47102Sbostic eval.c The top level routines. 204*47102Sbostic redir.c Code to handle redirection of input and output. 205*47102Sbostic jobs.c Code to handle forking, waiting, and job control. 206*47102Sbostic exec.c Code to to path searches and the actual exec sys call. 207*47102Sbostic expand.c Code to evaluate arguments. 208*47102Sbostic var.c Maintains the variable symbol table. Called from expand.c. 209*47102Sbostic 210*47102SbosticEVAL.C: Evaltree recursively executes a parse tree. The exit 211*47102Sbosticstatus is returned in the global variable exitstatus. The alter- 212*47102Sbosticnative entry evalbackcmd is called to evaluate commands in back 213*47102Sbosticquotes. It saves the result in memory if the command is a buil- 214*47102Sbostictin; otherwise it forks off a child to execute the command and 215*47102Sbosticconnects the standard output of the child to a pipe. 216*47102Sbostic 217*47102SbosticJOBS.C: To create a process, you call makejob to return a job 218*47102Sbosticstructure, and then call forkshell (passing the job structure as 219*47102Sbostican argument) to create the process. Waitforjob waits for a job 220*47102Sbosticto complete. These routines take care of process groups if job 221*47102Sbosticcontrol is defined. 222*47102Sbostic 223*47102SbosticREDIR.C: Ash allows file descriptors to be redirected and then 224*47102Sbosticrestored without forking off a child process. This is accom- 225*47102Sbosticplished by duplicating the original file descriptors. The redir- 226*47102Sbostictab structure records where the file descriptors have be dupli- 227*47102Sbosticcated to. 228*47102Sbostic 229*47102SbosticEXEC.C: The routine find_command locates a command, and enters 230*47102Sbosticthe command in the hash table if it is not already there. The 231*47102Sbosticthird argument specifies whether it is to print an error message 232*47102Sbosticif the command is not found. (When a pipeline is set up, 233*47102Sbosticfind_command is called for all the commands in the pipeline be- 234*47102Sbosticfore any forking is done, so to get the commands into the hash 235*47102Sbostictable of the parent process. But to make command hashing as 236*47102Sbostictransparent as possible, we silently ignore errors at that point 237*47102Sbosticand only print error messages if the command cannot be found 238*47102Sbosticlater.) 239*47102Sbostic 240*47102SbosticThe routine shellexec is the interface to the exec system call. 241*47102Sbostic 242*47102SbosticEXPAND.C: Arguments are processed in three passes. The first 243*47102Sbostic(performed by the routine argstr) performs variable and command 244*47102Sbosticsubstitution. The second (ifsbreakup) performs word splitting 245*47102Sbosticand the third (expandmeta) performs file name generation. If the 246*47102Sbostic"/u" directory is simulated, then when "/u/username" is replaced 247*47102Sbosticby the user's home directory, the flag "didudir" is set. This 248*47102Sbostictells the cd command that it should print out the directory name, 249*47102Sbosticjust as it would if the "/u" directory were implemented using 250*47102Sbosticsymbolic links. 251*47102Sbostic 252*47102SbosticVAR.C: Variables are stored in a hash table. Probably we should 253*47102Sbosticswitch to extensible hashing. The variable name is stored in the 254*47102Sbosticsame string as the value (using the format "name=value") so that 255*47102Sbosticno string copying is needed to create the environment of a com- 256*47102Sbosticmand. Variables which the shell references internally are preal- 257*47102Sbosticlocated so that the shell can reference the values of these vari- 258*47102Sbosticables without doing a lookup. 259*47102Sbostic 260*47102SbosticWhen a program is run, the code in eval.c sticks any environment 261*47102Sbosticvariables which precede the command (as in "PATH=xxx command") in 262*47102Sbosticthe variable table as the simplest way to strip duplicates, and 263*47102Sbosticthen calls "environment" to get the value of the environment. 264*47102SbosticThere are two consequences of this. First, if an assignment to 265*47102SbosticPATH precedes the command, the value of PATH before the assign- 266*47102Sbosticment must be remembered and passed to shellexec. Second, if the 267*47102Sbosticprogram turns out to be a shell procedure, the strings from the 268*47102Sbosticenvironment variables which preceded the command must be pulled 269*47102Sbosticout of the table and replaced with strings obtained from malloc, 270*47102Sbosticsince the former will automatically be freed when the stack (see 271*47102Sbosticthe entry on memalloc.c) is emptied. 272*47102Sbostic 273*47102SbosticBUILTIN COMMANDS: The procedures for handling these are scat- 274*47102Sbostictered throughout the code, depending on which location appears 275*47102Sbosticmost appropriate. They can be recognized because their names al- 276*47102Sbosticways end in "cmd". The mapping from names to procedures is 277*47102Sbosticspecified in the file builtins, which is processed by the mkbuil- 278*47102Sbostictins command. 279*47102Sbostic 280*47102SbosticA builtin command is invoked with argc and argv set up like a 281*47102Sbosticnormal program. A builtin command is allowed to overwrite its 282*47102Sbosticarguments. Builtin routines can call nextopt to do option pars- 283*47102Sbosticing. This is kind of like getopt, but you don't pass argc and 284*47102Sbosticargv to it. Builtin routines can also call error. This routine 285*47102Sbosticnormally terminates the shell (or returns to the main command 286*47102Sbosticloop if the shell is interactive), but when called from a builtin 287*47102Sbosticcommand it causes the builtin command to terminate with an exit 288*47102Sbosticstatus of 2. 289*47102Sbostic 290*47102SbosticThe directory bltins contains commands which can be compiled in- 291*47102Sbosticdependently but can also be built into the shell for efficiency 292*47102Sbosticreasons. The makefile in this directory compiles these programs 293*47102Sbosticin the normal fashion (so that they can be run regardless of 294*47102Sbosticwhether the invoker is ash), but also creates a library named 295*47102Sbosticbltinlib.a which can be linked with ash. The header file bltin.h 296*47102Sbostictakes care of most of the differences between the ash and the 297*47102Sbosticstand-alone environment. The user should call the main routine 298*47102Sbostic"main", and #define main to be the name of the routine to use 299*47102Sbosticwhen the program is linked into ash. This #define should appear 300*47102Sbosticbefore bltin.h is included; bltin.h will #undef main if the pro- 301*47102Sbosticgram is to be compiled stand-alone. 302*47102Sbostic 303*47102SbosticCD.C: This file defines the cd and pwd builtins. The pwd com- 304*47102Sbosticmand runs /bin/pwd the first time it is invoked (unless the user 305*47102Sbostichas already done a cd to an absolute pathname), but then 306*47102Sbosticremembers the current directory and updates it when the cd com- 307*47102Sbosticmand is run, so subsequent pwd commands run very fast. The main 308*47102Sbosticcomplication in the cd command is in the docd command, which 309*47102Sbosticresolves symbolic links into actual names and informs the user 310*47102Sbosticwhere the user ended up if he crossed a symbolic link. 311*47102Sbostic 312*47102SbosticSIGNALS: Trap.c implements the trap command. The routine set- 313*47102Sbosticsignal figures out what action should be taken when a signal is 314*47102Sbosticreceived and invokes the signal system call to set the signal ac- 315*47102Sbostiction appropriately. When a signal that a user has set a trap for 316*47102Sbosticis caught, the routine "onsig" sets a flag. The routine dotrap 317*47102Sbosticis called at appropriate points to actually handle the signal. 318*47102SbosticWhen an interrupt is caught and no trap has been set for that 319*47102Sbosticsignal, the routine "onint" in error.c is called. 320*47102Sbostic 321*47102SbosticOUTPUT: Ash uses it's own output routines. There are three out- 322*47102Sbosticput structures allocated. "Output" represents the standard out- 323*47102Sbosticput, "errout" the standard error, and "memout" contains output 324*47102Sbosticwhich is to be stored in memory. This last is used when a buil- 325*47102Sbostictin command appears in backquotes, to allow its output to be col- 326*47102Sbosticlected without doing any I/O through the UNIX operating system. 327*47102SbosticThe variables out1 and out2 normally point to output and errout, 328*47102Sbosticrespectively, but they are set to point to memout when appropri- 329*47102Sbosticate inside backquotes. 330*47102Sbostic 331*47102SbosticINPUT: The basic input routine is pgetc, which reads from the 332*47102Sbosticcurrent input file. There is a stack of input files; the current 333*47102Sbosticinput file is the top file on this stack. The code allows the 334*47102Sbosticinput to come from a string rather than a file. (This is for the 335*47102Sbostic-c option and the "." and eval builtin commands.) The global 336*47102Sbosticvariable plinno is saved and restored when files are pushed and 337*47102Sbosticpopped from the stack. The parser routines store the number of 338*47102Sbosticthe current line in this variable. 339*47102Sbostic 340*47102SbosticDEBUGGING: If DEBUG is defined in shell.h, then the shell will 341*47102Sbosticwrite debugging information to the file $HOME/trace. Most of 342*47102Sbosticthis is done using the TRACE macro, which takes a set of printf 343*47102Sbosticarguments inside two sets of parenthesis. Example: 344*47102Sbostic"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 345*47102Sbosticcause the preprocessor can't handle functions with a variable 346*47102Sbosticnumber of arguments. Defining DEBUG also causes the shell to 347*47102Sbosticgenerate a core dump if it is sent a quit signal. The tracing 348*47102Sbosticcode is in show.c. 349