1*60699Sbostic# @(#)TOUR 8.1 (Berkeley) 05/31/93 247102Sbostic 355237SmarcNOTE -- This is the original TOUR paper distributed with ash and 455237Smarcdoes not represent the current state of the shell. It is provided anyway 555237Smarcsince it provides helpful information for how the shell is structured, 655237Smarcbut be warned that things have changed -- the current shell is 755237Smarcstill under development. 855237Smarc 955237Smarc================================================================ 1055237Smarc 1147102Sbostic A Tour through Ash 1247102Sbostic 1347102Sbostic Copyright 1989 by Kenneth Almquist. 1447102Sbostic 1547102Sbostic 1647102SbosticDIRECTORIES: The subdirectory bltin contains commands which can 1747102Sbosticbe compiled stand-alone. The rest of the source is in the main 1847102Sbosticash directory. 1947102Sbostic 2047102SbosticSOURCE CODE GENERATORS: Files whose names begin with "mk" are 2147102Sbosticprograms that generate source code. A complete list of these 2247102Sbosticprograms is: 2347102Sbostic 2447102Sbostic program intput files generates 2547102Sbostic ------- ------------ --------- 2647102Sbostic mkbuiltins builtins builtins.h builtins.c 2747102Sbostic mkinit *.c init.c 2847102Sbostic mknodes nodetypes nodes.h nodes.c 2947102Sbostic mksignames - signames.h signames.c 3047102Sbostic mksyntax - syntax.h syntax.c 3147102Sbostic mktokens - token.def 3247102Sbostic bltin/mkexpr unary_op binary_op operators.h operators.c 3347102Sbostic 3447102SbosticThere are undoubtedly too many of these. Mkinit searches all the 3547102SbosticC source files for entries looking like: 3647102Sbostic 3747102Sbostic INIT { 3847102Sbostic x = 1; /* executed during initialization */ 3947102Sbostic } 4047102Sbostic 4147102Sbostic RESET { 4247102Sbostic x = 2; /* executed when the shell does a longjmp 4347102Sbostic back to the main command loop */ 4447102Sbostic } 4547102Sbostic 4647102Sbostic SHELLPROC { 4747102Sbostic x = 3; /* executed when the shell runs a shell procedure */ 4847102Sbostic } 4947102Sbostic 5047102SbosticIt pulls this code out into routines which are when particular 5147102Sbosticevents occur. The intent is to improve modularity by isolating 5247102Sbosticthe information about which modules need to be explicitly 5347102Sbosticinitialized/reset within the modules themselves. 5447102Sbostic 5547102SbosticMkinit recognizes several constructs for placing declarations in 5647102Sbosticthe init.c file. 5747102Sbostic INCLUDE "file.h" 5847102Sbosticincludes a file. The storage class MKINIT makes a declaration 5947102Sbosticavailable in the init.c file, for example: 6047102Sbostic MKINIT int funcnest; /* depth of function calls */ 6147102SbosticMKINIT alone on a line introduces a structure or union declara- 6247102Sbostiction: 6347102Sbostic MKINIT 6447102Sbostic struct redirtab { 6547102Sbostic short renamed[10]; 6647102Sbostic }; 6747102SbosticPreprocessor #define statements are copied to init.c without any 6847102Sbosticspecial action to request this. 6947102Sbostic 7047102SbosticINDENTATION: The ash source is indented in multiples of six 7147102Sbosticspaces. The only study that I have heard of on the subject con- 7247102Sbosticcluded that the optimal amount to indent is in the range of four 7347102Sbosticto six spaces. I use six spaces since it is not too big a jump 7447102Sbosticfrom the widely used eight spaces. If you really hate six space 7547102Sbosticindentation, use the adjind (source included) program to change 7647102Sbosticit to something else. 7747102Sbostic 7847102SbosticEXCEPTIONS: Code for dealing with exceptions appears in 7947102Sbosticexceptions.c. The C language doesn't include exception handling, 8047102Sbosticso I implement it using setjmp and longjmp. The global variable 8147102Sbosticexception contains the type of exception. EXERROR is raised by 8247102Sbosticcalling error. EXINT is an interrupt. EXSHELLPROC is an excep- 8347102Sbostiction which is raised when a shell procedure is invoked. The pur- 8447102Sbosticpose of EXSHELLPROC is to perform the cleanup actions associated 8547102Sbosticwith other exceptions. After these cleanup actions, the shell 8647102Sbosticcan interpret a shell procedure itself without exec'ing a new 8747102Sbosticcopy of the shell. 8847102Sbostic 8947102SbosticINTERRUPTS: In an interactive shell, an interrupt will cause an 9047102SbosticEXINT exception to return to the main command loop. (Exception: 9147102SbosticEXINT is not raised if the user traps interrupts using the trap 9247102Sbosticcommand.) The INTOFF and INTON macros (defined in exception.h) 9347102Sbosticprovide uninterruptable critical sections. Between the execution 9447102Sbosticof INTOFF and the execution of INTON, interrupt signals will be 9547102Sbosticheld for later delivery. INTOFF and INTON can be nested. 9647102Sbostic 9747102SbosticMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 9847102Sbosticwhich call error when there is no memory left. It also defines a 9947102Sbosticstack oriented memory allocation scheme. Allocating off a stack 10047102Sbosticis probably more efficient than allocation using malloc, but the 10147102Sbosticbig advantage is that when an exception occurs all we have to do 10247102Sbosticto free up the memory in use at the time of the exception is to 10347102Sbosticrestore the stack pointer. The stack is implemented using a 10447102Sbosticlinked list of blocks. 10547102Sbostic 10647102SbosticSTPUTC: If the stack were contiguous, it would be easy to store 10747102Sbosticstrings on the stack without knowing in advance how long the 10847102Sbosticstring was going to be: 10947102Sbostic p = stackptr; 11047102Sbostic *p++ = c; /* repeated as many times as needed */ 11147102Sbostic stackptr = p; 11247102SbosticThe folloing three macros (defined in memalloc.h) perform these 11347102Sbosticoperations, but grow the stack if you run off the end: 11447102Sbostic STARTSTACKSTR(p); 11547102Sbostic STPUTC(c, p); /* repeated as many times as needed */ 11647102Sbostic grabstackstr(p); 11747102Sbostic 11847102SbosticWe now start a top-down look at the code: 11947102Sbostic 12047102SbosticMAIN.C: The main routine performs some initialization, executes 12147102Sbosticthe user's profile if necessary, and calls cmdloop. Cmdloop is 12247102Sbosticrepeatedly parses and executes commands. 12347102Sbostic 12447102SbosticOPTIONS.C: This file contains the option processing code. It is 12547102Sbosticcalled from main to parse the shell arguments when the shell is 12647102Sbosticinvoked, and it also contains the set builtin. The -i and -j op- 12747102Sbostictions (the latter turns on job control) require changes in signal 12847102Sbostichandling. The routines setjobctl (in jobs.c) and setinteractive 12947102Sbostic(in trap.c) are called to handle changes to these options. 13047102Sbostic 13147102SbosticPARSING: The parser code is all in parser.c. A recursive des- 13247102Sbosticcent parser is used. Syntax tables (generated by mksyntax) are 13347102Sbosticused to classify characters during lexical analysis. There are 13447102Sbosticthree tables: one for normal use, one for use when inside single 13547102Sbosticquotes, and one for use when inside double quotes. The tables 13647102Sbosticare machine dependent because they are indexed by character vari- 13747102Sbosticables and the range of a char varies from machine to machine. 13847102Sbostic 13947102SbosticPARSE OUTPUT: The output of the parser consists of a tree of 14047102Sbosticnodes. The various types of nodes are defined in the file node- 14147102Sbostictypes. 14247102Sbostic 14347102SbosticNodes of type NARG are used to represent both words and the con- 14447102Sbostictents of here documents. An early version of ash kept the con- 14547102Sbostictents of here documents in temporary files, but keeping here do- 14647102Sbosticcuments in memory typically results in significantly better per- 14747102Sbosticformance. It would have been nice to make it an option to use 14847102Sbostictemporary files for here documents, for the benefit of small 14947102Sbosticmachines, but the code to keep track of when to delete the tem- 15047102Sbosticporary files was complex and I never fixed all the bugs in it. 15147102Sbostic(AT&T has been maintaining the Bourne shell for more than ten 15247102Sbosticyears, and to the best of my knowledge they still haven't gotten 15347102Sbosticit to handle temporary files correctly in obscure cases.) 15447102Sbostic 15547102SbosticThe text field of a NARG structure points to the text of the 15647102Sbosticword. The text consists of ordinary characters and a number of 15747102Sbosticspecial codes defined in parser.h. The special codes are: 15847102Sbostic 15947102Sbostic CTLVAR Variable substitution 16047102Sbostic CTLENDVAR End of variable substitution 16147102Sbostic CTLBACKQ Command substitution 16247102Sbostic CTLBACKQ|CTLQUOTE Command substitution inside double quotes 16347102Sbostic CTLESC Escape next character 16447102Sbostic 16547102SbosticA variable substitution contains the following elements: 16647102Sbostic 16747102Sbostic CTLVAR type name '=' [ alternative-text CTLENDVAR ] 16847102Sbostic 16947102SbosticThe type field is a single character specifying the type of sub- 17047102Sbosticstitution. The possible types are: 17147102Sbostic 17247102Sbostic VSNORMAL $var 17347102Sbostic VSMINUS ${var-text} 17447102Sbostic VSMINUS|VSNUL ${var:-text} 17547102Sbostic VSPLUS ${var+text} 17647102Sbostic VSPLUS|VSNUL ${var:+text} 17747102Sbostic VSQUESTION ${var?text} 17847102Sbostic VSQUESTION|VSNUL ${var:?text} 17947102Sbostic VSASSIGN ${var=text} 18047102Sbostic VSASSIGN|VSNUL ${var=text} 18147102Sbostic 18247102SbosticIn addition, the type field will have the VSQUOTE flag set if the 18347102Sbosticvariable is enclosed in double quotes. The name of the variable 18447102Sbosticcomes next, terminated by an equals sign. If the type is not 18547102SbosticVSNORMAL, then the text field in the substitution follows, ter- 18647102Sbosticminated by a CTLENDVAR byte. 18747102Sbostic 18847102SbosticCommands in back quotes are parsed and stored in a linked list. 18947102SbosticThe locations of these commands in the string are indicated by 19047102SbosticCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 19147102Sbosticthe back quotes were enclosed in double quotes. 19247102Sbostic 19347102SbosticThe character CTLESC escapes the next character, so that in case 19447102Sbosticany of the CTL characters mentioned above appear in the input, 19547102Sbosticthey can be passed through transparently. CTLESC is also used to 19647102Sbosticescape '*', '?', '[', and '!' characters which were quoted by the 19747102Sbosticuser and thus should not be used for file name generation. 19847102Sbostic 19947102SbosticCTLESC characters have proved to be particularly tricky to get 20047102Sbosticright. In the case of here documents which are not subject to 20147102Sbosticvariable and command substitution, the parser doesn't insert any 20247102SbosticCTLESC characters to begin with (so the contents of the text 20347102Sbosticfield can be written without any processing). Other here docu- 20447102Sbosticments, and words which are not subject to splitting and file name 20547102Sbosticgeneration, have the CTLESC characters removed during the vari- 20647102Sbosticable and command substitution phase. Words which are subject 20747102Sbosticsplitting and file name generation have the CTLESC characters re- 20847102Sbosticmoved as part of the file name phase. 20947102Sbostic 21047102SbosticEXECUTION: Command execution is handled by the following files: 21147102Sbostic eval.c The top level routines. 21247102Sbostic redir.c Code to handle redirection of input and output. 21347102Sbostic jobs.c Code to handle forking, waiting, and job control. 21447102Sbostic exec.c Code to to path searches and the actual exec sys call. 21547102Sbostic expand.c Code to evaluate arguments. 21647102Sbostic var.c Maintains the variable symbol table. Called from expand.c. 21747102Sbostic 21847102SbosticEVAL.C: Evaltree recursively executes a parse tree. The exit 21947102Sbosticstatus is returned in the global variable exitstatus. The alter- 22047102Sbosticnative entry evalbackcmd is called to evaluate commands in back 22147102Sbosticquotes. It saves the result in memory if the command is a buil- 22247102Sbostictin; otherwise it forks off a child to execute the command and 22347102Sbosticconnects the standard output of the child to a pipe. 22447102Sbostic 22547102SbosticJOBS.C: To create a process, you call makejob to return a job 22647102Sbosticstructure, and then call forkshell (passing the job structure as 22747102Sbostican argument) to create the process. Waitforjob waits for a job 22847102Sbosticto complete. These routines take care of process groups if job 22947102Sbosticcontrol is defined. 23047102Sbostic 23147102SbosticREDIR.C: Ash allows file descriptors to be redirected and then 23247102Sbosticrestored without forking off a child process. This is accom- 23347102Sbosticplished by duplicating the original file descriptors. The redir- 23447102Sbostictab structure records where the file descriptors have be dupli- 23547102Sbosticcated to. 23647102Sbostic 23747102SbosticEXEC.C: The routine find_command locates a command, and enters 23847102Sbosticthe command in the hash table if it is not already there. The 23947102Sbosticthird argument specifies whether it is to print an error message 24047102Sbosticif the command is not found. (When a pipeline is set up, 24147102Sbosticfind_command is called for all the commands in the pipeline be- 24247102Sbosticfore any forking is done, so to get the commands into the hash 24347102Sbostictable of the parent process. But to make command hashing as 24447102Sbostictransparent as possible, we silently ignore errors at that point 24547102Sbosticand only print error messages if the command cannot be found 24647102Sbosticlater.) 24747102Sbostic 24847102SbosticThe routine shellexec is the interface to the exec system call. 24947102Sbostic 25047102SbosticEXPAND.C: Arguments are processed in three passes. The first 25147102Sbostic(performed by the routine argstr) performs variable and command 25247102Sbosticsubstitution. The second (ifsbreakup) performs word splitting 25347102Sbosticand the third (expandmeta) performs file name generation. If the 25447102Sbostic"/u" directory is simulated, then when "/u/username" is replaced 25547102Sbosticby the user's home directory, the flag "didudir" is set. This 25647102Sbostictells the cd command that it should print out the directory name, 25747102Sbosticjust as it would if the "/u" directory were implemented using 25847102Sbosticsymbolic links. 25947102Sbostic 26047102SbosticVAR.C: Variables are stored in a hash table. Probably we should 26147102Sbosticswitch to extensible hashing. The variable name is stored in the 26247102Sbosticsame string as the value (using the format "name=value") so that 26347102Sbosticno string copying is needed to create the environment of a com- 26447102Sbosticmand. Variables which the shell references internally are preal- 26547102Sbosticlocated so that the shell can reference the values of these vari- 26647102Sbosticables without doing a lookup. 26747102Sbostic 26847102SbosticWhen a program is run, the code in eval.c sticks any environment 26947102Sbosticvariables which precede the command (as in "PATH=xxx command") in 27047102Sbosticthe variable table as the simplest way to strip duplicates, and 27147102Sbosticthen calls "environment" to get the value of the environment. 27247102SbosticThere are two consequences of this. First, if an assignment to 27347102SbosticPATH precedes the command, the value of PATH before the assign- 27447102Sbosticment must be remembered and passed to shellexec. Second, if the 27547102Sbosticprogram turns out to be a shell procedure, the strings from the 27647102Sbosticenvironment variables which preceded the command must be pulled 27747102Sbosticout of the table and replaced with strings obtained from malloc, 27847102Sbosticsince the former will automatically be freed when the stack (see 27947102Sbosticthe entry on memalloc.c) is emptied. 28047102Sbostic 28147102SbosticBUILTIN COMMANDS: The procedures for handling these are scat- 28247102Sbostictered throughout the code, depending on which location appears 28347102Sbosticmost appropriate. They can be recognized because their names al- 28447102Sbosticways end in "cmd". The mapping from names to procedures is 28547102Sbosticspecified in the file builtins, which is processed by the mkbuil- 28647102Sbostictins command. 28747102Sbostic 28847102SbosticA builtin command is invoked with argc and argv set up like a 28947102Sbosticnormal program. A builtin command is allowed to overwrite its 29047102Sbosticarguments. Builtin routines can call nextopt to do option pars- 29147102Sbosticing. This is kind of like getopt, but you don't pass argc and 29247102Sbosticargv to it. Builtin routines can also call error. This routine 29347102Sbosticnormally terminates the shell (or returns to the main command 29447102Sbosticloop if the shell is interactive), but when called from a builtin 29547102Sbosticcommand it causes the builtin command to terminate with an exit 29647102Sbosticstatus of 2. 29747102Sbostic 29847102SbosticThe directory bltins contains commands which can be compiled in- 29947102Sbosticdependently but can also be built into the shell for efficiency 30047102Sbosticreasons. The makefile in this directory compiles these programs 30147102Sbosticin the normal fashion (so that they can be run regardless of 30247102Sbosticwhether the invoker is ash), but also creates a library named 30347102Sbosticbltinlib.a which can be linked with ash. The header file bltin.h 30447102Sbostictakes care of most of the differences between the ash and the 30547102Sbosticstand-alone environment. The user should call the main routine 30647102Sbostic"main", and #define main to be the name of the routine to use 30747102Sbosticwhen the program is linked into ash. This #define should appear 30847102Sbosticbefore bltin.h is included; bltin.h will #undef main if the pro- 30947102Sbosticgram is to be compiled stand-alone. 31047102Sbostic 31147102SbosticCD.C: This file defines the cd and pwd builtins. The pwd com- 31247102Sbosticmand runs /bin/pwd the first time it is invoked (unless the user 31347102Sbostichas already done a cd to an absolute pathname), but then 31447102Sbosticremembers the current directory and updates it when the cd com- 31547102Sbosticmand is run, so subsequent pwd commands run very fast. The main 31647102Sbosticcomplication in the cd command is in the docd command, which 31747102Sbosticresolves symbolic links into actual names and informs the user 31847102Sbosticwhere the user ended up if he crossed a symbolic link. 31947102Sbostic 32047102SbosticSIGNALS: Trap.c implements the trap command. The routine set- 32147102Sbosticsignal figures out what action should be taken when a signal is 32247102Sbosticreceived and invokes the signal system call to set the signal ac- 32347102Sbostiction appropriately. When a signal that a user has set a trap for 32447102Sbosticis caught, the routine "onsig" sets a flag. The routine dotrap 32547102Sbosticis called at appropriate points to actually handle the signal. 32647102SbosticWhen an interrupt is caught and no trap has been set for that 32747102Sbosticsignal, the routine "onint" in error.c is called. 32847102Sbostic 32947102SbosticOUTPUT: Ash uses it's own output routines. There are three out- 33047102Sbosticput structures allocated. "Output" represents the standard out- 33147102Sbosticput, "errout" the standard error, and "memout" contains output 33247102Sbosticwhich is to be stored in memory. This last is used when a buil- 33347102Sbostictin command appears in backquotes, to allow its output to be col- 33447102Sbosticlected without doing any I/O through the UNIX operating system. 33547102SbosticThe variables out1 and out2 normally point to output and errout, 33647102Sbosticrespectively, but they are set to point to memout when appropri- 33747102Sbosticate inside backquotes. 33847102Sbostic 33947102SbosticINPUT: The basic input routine is pgetc, which reads from the 34047102Sbosticcurrent input file. There is a stack of input files; the current 34147102Sbosticinput file is the top file on this stack. The code allows the 34247102Sbosticinput to come from a string rather than a file. (This is for the 34347102Sbostic-c option and the "." and eval builtin commands.) The global 34447102Sbosticvariable plinno is saved and restored when files are pushed and 34547102Sbosticpopped from the stack. The parser routines store the number of 34647102Sbosticthe current line in this variable. 34747102Sbostic 34847102SbosticDEBUGGING: If DEBUG is defined in shell.h, then the shell will 34947102Sbosticwrite debugging information to the file $HOME/trace. Most of 35047102Sbosticthis is done using the TRACE macro, which takes a set of printf 35147102Sbosticarguments inside two sets of parenthesis. Example: 35247102Sbostic"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 35347102Sbosticcause the preprocessor can't handle functions with a variable 35447102Sbosticnumber of arguments. Defining DEBUG also causes the shell to 35547102Sbosticgenerate a core dump if it is sent a quit signal. The tracing 35647102Sbosticcode is in show.c. 357