1*056b8cc7Sabhinav# $NetBSD: TOUR,v 1.11 2016/10/25 13:01:59 abhinav Exp $ 249f0ad86Scgd# @(#)TOUR 8.1 (Berkeley) 5/31/93 337ed7877Sjtc 437ed7877SjtcNOTE -- This is the original TOUR paper distributed with ash and 537ed7877Sjtcdoes not represent the current state of the shell. It is provided anyway 637ed7877Sjtcsince it provides helpful information for how the shell is structured, 737ed7877Sjtcbut be warned that things have changed -- the current shell is 837ed7877Sjtcstill under development. 937ed7877Sjtc 1037ed7877Sjtc================================================================ 1161f28255Scgd 1261f28255Scgd A Tour through Ash 1361f28255Scgd 1461f28255Scgd Copyright 1989 by Kenneth Almquist. 1561f28255Scgd 1661f28255Scgd 1761f28255ScgdDIRECTORIES: The subdirectory bltin contains commands which can 1861f28255Scgdbe compiled stand-alone. The rest of the source is in the main 1961f28255Scgdash directory. 2061f28255Scgd 2161f28255ScgdSOURCE CODE GENERATORS: Files whose names begin with "mk" are 2261f28255Scgdprograms that generate source code. A complete list of these 2361f28255Scgdprograms is: 2461f28255Scgd 25976326adSsnj program input files generates 2661f28255Scgd ------- ------------ --------- 2761f28255Scgd mkbuiltins builtins builtins.h builtins.c 2861f28255Scgd mkinit *.c init.c 2961f28255Scgd mknodes nodetypes nodes.h nodes.c 3061f28255Scgd mksignames - signames.h signames.c 3161f28255Scgd mksyntax - syntax.h syntax.c 32aded8d4cSchristos mktokens - token.h 3361f28255Scgd bltin/mkexpr unary_op binary_op operators.h operators.c 3461f28255Scgd 3561f28255ScgdThere are undoubtedly too many of these. Mkinit searches all the 3661f28255ScgdC source files for entries looking like: 3761f28255Scgd 3861f28255Scgd INIT { 3961f28255Scgd x = 1; /* executed during initialization */ 4061f28255Scgd } 4161f28255Scgd 4261f28255Scgd RESET { 4361f28255Scgd x = 2; /* executed when the shell does a longjmp 4461f28255Scgd back to the main command loop */ 4561f28255Scgd } 4661f28255Scgd 4761f28255Scgd SHELLPROC { 4861f28255Scgd x = 3; /* executed when the shell runs a shell procedure */ 4961f28255Scgd } 5061f28255Scgd 5161f28255ScgdIt pulls this code out into routines which are when particular 5261f28255Scgdevents occur. The intent is to improve modularity by isolating 5361f28255Scgdthe information about which modules need to be explicitly 5461f28255Scgdinitialized/reset within the modules themselves. 5561f28255Scgd 5661f28255ScgdMkinit recognizes several constructs for placing declarations in 5761f28255Scgdthe init.c file. 5861f28255Scgd INCLUDE "file.h" 5961f28255Scgdincludes a file. The storage class MKINIT makes a declaration 6061f28255Scgdavailable in the init.c file, for example: 6161f28255Scgd MKINIT int funcnest; /* depth of function calls */ 6261f28255ScgdMKINIT alone on a line introduces a structure or union declara- 6361f28255Scgdtion: 6461f28255Scgd MKINIT 6561f28255Scgd struct redirtab { 6661f28255Scgd short renamed[10]; 6761f28255Scgd }; 6861f28255ScgdPreprocessor #define statements are copied to init.c without any 6961f28255Scgdspecial action to request this. 7061f28255Scgd 7161f28255ScgdINDENTATION: The ash source is indented in multiples of six 7261f28255Scgdspaces. The only study that I have heard of on the subject con- 7361f28255Scgdcluded that the optimal amount to indent is in the range of four 7461f28255Scgdto six spaces. I use six spaces since it is not too big a jump 7561f28255Scgdfrom the widely used eight spaces. If you really hate six space 7661f28255Scgdindentation, use the adjind (source included) program to change 7761f28255Scgdit to something else. 7861f28255Scgd 7961f28255ScgdEXCEPTIONS: Code for dealing with exceptions appears in 8061f28255Scgdexceptions.c. The C language doesn't include exception handling, 8161f28255Scgdso I implement it using setjmp and longjmp. The global variable 8261f28255Scgdexception contains the type of exception. EXERROR is raised by 8361f28255Scgdcalling error. EXINT is an interrupt. EXSHELLPROC is an excep- 8461f28255Scgdtion which is raised when a shell procedure is invoked. The pur- 8561f28255Scgdpose of EXSHELLPROC is to perform the cleanup actions associated 8661f28255Scgdwith other exceptions. After these cleanup actions, the shell 8761f28255Scgdcan interpret a shell procedure itself without exec'ing a new 8861f28255Scgdcopy of the shell. 8961f28255Scgd 9061f28255ScgdINTERRUPTS: In an interactive shell, an interrupt will cause an 9161f28255ScgdEXINT exception to return to the main command loop. (Exception: 9261f28255ScgdEXINT is not raised if the user traps interrupts using the trap 9361f28255Scgdcommand.) The INTOFF and INTON macros (defined in exception.h) 94976326adSsnjprovide uninterruptible critical sections. Between the execution 9561f28255Scgdof INTOFF and the execution of INTON, interrupt signals will be 9661f28255Scgdheld for later delivery. INTOFF and INTON can be nested. 9761f28255Scgd 9861f28255ScgdMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 9961f28255Scgdwhich call error when there is no memory left. It also defines a 10061f28255Scgdstack oriented memory allocation scheme. Allocating off a stack 10161f28255Scgdis probably more efficient than allocation using malloc, but the 10261f28255Scgdbig advantage is that when an exception occurs all we have to do 10361f28255Scgdto free up the memory in use at the time of the exception is to 10461f28255Scgdrestore the stack pointer. The stack is implemented using a 10561f28255Scgdlinked list of blocks. 10661f28255Scgd 10761f28255ScgdSTPUTC: If the stack were contiguous, it would be easy to store 10861f28255Scgdstrings on the stack without knowing in advance how long the 10961f28255Scgdstring was going to be: 11061f28255Scgd p = stackptr; 11161f28255Scgd *p++ = c; /* repeated as many times as needed */ 11261f28255Scgd stackptr = p; 113976326adSsnjThe following three macros (defined in memalloc.h) perform these 11461f28255Scgdoperations, but grow the stack if you run off the end: 11561f28255Scgd STARTSTACKSTR(p); 11661f28255Scgd STPUTC(c, p); /* repeated as many times as needed */ 11761f28255Scgd grabstackstr(p); 11861f28255Scgd 11961f28255ScgdWe now start a top-down look at the code: 12061f28255Scgd 12161f28255ScgdMAIN.C: The main routine performs some initialization, executes 122*056b8cc7Sabhinavthe user's profile if necessary, and calls cmdloop. Cmdloop 12361f28255Scgdrepeatedly parses and executes commands. 12461f28255Scgd 12561f28255ScgdOPTIONS.C: This file contains the option processing code. It is 12661f28255Scgdcalled from main to parse the shell arguments when the shell is 12761f28255Scgdinvoked, and it also contains the set builtin. The -i and -j op- 12861f28255Scgdtions (the latter turns on job control) require changes in signal 12961f28255Scgdhandling. The routines setjobctl (in jobs.c) and setinteractive 13061f28255Scgd(in trap.c) are called to handle changes to these options. 13161f28255Scgd 13261f28255ScgdPARSING: The parser code is all in parser.c. A recursive des- 13361f28255Scgdcent parser is used. Syntax tables (generated by mksyntax) are 13461f28255Scgdused to classify characters during lexical analysis. There are 13561f28255Scgdthree tables: one for normal use, one for use when inside single 13661f28255Scgdquotes, and one for use when inside double quotes. The tables 13761f28255Scgdare machine dependent because they are indexed by character vari- 13861f28255Scgdables and the range of a char varies from machine to machine. 13961f28255Scgd 14061f28255ScgdPARSE OUTPUT: The output of the parser consists of a tree of 14161f28255Scgdnodes. The various types of nodes are defined in the file node- 14261f28255Scgdtypes. 14361f28255Scgd 14461f28255ScgdNodes of type NARG are used to represent both words and the con- 14561f28255Scgdtents of here documents. An early version of ash kept the con- 14661f28255Scgdtents of here documents in temporary files, but keeping here do- 14761f28255Scgdcuments in memory typically results in significantly better per- 14861f28255Scgdformance. It would have been nice to make it an option to use 14961f28255Scgdtemporary files for here documents, for the benefit of small 15061f28255Scgdmachines, but the code to keep track of when to delete the tem- 15161f28255Scgdporary files was complex and I never fixed all the bugs in it. 15261f28255Scgd(AT&T has been maintaining the Bourne shell for more than ten 15361f28255Scgdyears, and to the best of my knowledge they still haven't gotten 15461f28255Scgdit to handle temporary files correctly in obscure cases.) 15561f28255Scgd 15661f28255ScgdThe text field of a NARG structure points to the text of the 15761f28255Scgdword. The text consists of ordinary characters and a number of 15861f28255Scgdspecial codes defined in parser.h. The special codes are: 15961f28255Scgd 16061f28255Scgd CTLVAR Variable substitution 16161f28255Scgd CTLENDVAR End of variable substitution 16261f28255Scgd CTLBACKQ Command substitution 16361f28255Scgd CTLBACKQ|CTLQUOTE Command substitution inside double quotes 16461f28255Scgd CTLESC Escape next character 16561f28255Scgd 16661f28255ScgdA variable substitution contains the following elements: 16761f28255Scgd 16861f28255Scgd CTLVAR type name '=' [ alternative-text CTLENDVAR ] 16961f28255Scgd 17061f28255ScgdThe type field is a single character specifying the type of sub- 17161f28255Scgdstitution. The possible types are: 17261f28255Scgd 17361f28255Scgd VSNORMAL $var 17461f28255Scgd VSMINUS ${var-text} 17561f28255Scgd VSMINUS|VSNUL ${var:-text} 17661f28255Scgd VSPLUS ${var+text} 17761f28255Scgd VSPLUS|VSNUL ${var:+text} 17861f28255Scgd VSQUESTION ${var?text} 17961f28255Scgd VSQUESTION|VSNUL ${var:?text} 18061f28255Scgd VSASSIGN ${var=text} 18161f28255Scgd VSASSIGN|VSNUL ${var=text} 18261f28255Scgd 18361f28255ScgdIn addition, the type field will have the VSQUOTE flag set if the 18461f28255Scgdvariable is enclosed in double quotes. The name of the variable 18561f28255Scgdcomes next, terminated by an equals sign. If the type is not 18661f28255ScgdVSNORMAL, then the text field in the substitution follows, ter- 18761f28255Scgdminated by a CTLENDVAR byte. 18861f28255Scgd 18961f28255ScgdCommands in back quotes are parsed and stored in a linked list. 19061f28255ScgdThe locations of these commands in the string are indicated by 19161f28255ScgdCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 19261f28255Scgdthe back quotes were enclosed in double quotes. 19361f28255Scgd 19461f28255ScgdThe character CTLESC escapes the next character, so that in case 19561f28255Scgdany of the CTL characters mentioned above appear in the input, 19661f28255Scgdthey can be passed through transparently. CTLESC is also used to 19761f28255Scgdescape '*', '?', '[', and '!' characters which were quoted by the 19861f28255Scgduser and thus should not be used for file name generation. 19961f28255Scgd 20061f28255ScgdCTLESC characters have proved to be particularly tricky to get 20161f28255Scgdright. In the case of here documents which are not subject to 20261f28255Scgdvariable and command substitution, the parser doesn't insert any 20361f28255ScgdCTLESC characters to begin with (so the contents of the text 20461f28255Scgdfield can be written without any processing). Other here docu- 20561f28255Scgdments, and words which are not subject to splitting and file name 20661f28255Scgdgeneration, have the CTLESC characters removed during the vari- 20761f28255Scgdable and command substitution phase. Words which are subject 20861f28255Scgdsplitting and file name generation have the CTLESC characters re- 20961f28255Scgdmoved as part of the file name phase. 21061f28255Scgd 21161f28255ScgdEXECUTION: Command execution is handled by the following files: 21261f28255Scgd eval.c The top level routines. 21361f28255Scgd redir.c Code to handle redirection of input and output. 21461f28255Scgd jobs.c Code to handle forking, waiting, and job control. 215*056b8cc7Sabhinav exec.c Code to do path searches and the actual exec sys call. 21661f28255Scgd expand.c Code to evaluate arguments. 21761f28255Scgd var.c Maintains the variable symbol table. Called from expand.c. 21861f28255Scgd 21961f28255ScgdEVAL.C: Evaltree recursively executes a parse tree. The exit 22061f28255Scgdstatus is returned in the global variable exitstatus. The alter- 22161f28255Scgdnative entry evalbackcmd is called to evaluate commands in back 22261f28255Scgdquotes. It saves the result in memory if the command is a buil- 22361f28255Scgdtin; otherwise it forks off a child to execute the command and 22461f28255Scgdconnects the standard output of the child to a pipe. 22561f28255Scgd 22661f28255ScgdJOBS.C: To create a process, you call makejob to return a job 22761f28255Scgdstructure, and then call forkshell (passing the job structure as 22861f28255Scgdan argument) to create the process. Waitforjob waits for a job 22961f28255Scgdto complete. These routines take care of process groups if job 23061f28255Scgdcontrol is defined. 23161f28255Scgd 23261f28255ScgdREDIR.C: Ash allows file descriptors to be redirected and then 23361f28255Scgdrestored without forking off a child process. This is accom- 23461f28255Scgdplished by duplicating the original file descriptors. The redir- 235976326adSsnjtab structure records where the file descriptors have been dupli- 23661f28255Scgdcated to. 23761f28255Scgd 23861f28255ScgdEXEC.C: The routine find_command locates a command, and enters 23961f28255Scgdthe command in the hash table if it is not already there. The 24061f28255Scgdthird argument specifies whether it is to print an error message 24161f28255Scgdif the command is not found. (When a pipeline is set up, 24261f28255Scgdfind_command is called for all the commands in the pipeline be- 24361f28255Scgdfore any forking is done, so to get the commands into the hash 24461f28255Scgdtable of the parent process. But to make command hashing as 24561f28255Scgdtransparent as possible, we silently ignore errors at that point 24661f28255Scgdand only print error messages if the command cannot be found 24761f28255Scgdlater.) 24861f28255Scgd 24961f28255ScgdThe routine shellexec is the interface to the exec system call. 25061f28255Scgd 25161f28255ScgdEXPAND.C: Arguments are processed in three passes. The first 25261f28255Scgd(performed by the routine argstr) performs variable and command 25361f28255Scgdsubstitution. The second (ifsbreakup) performs word splitting 25461f28255Scgdand the third (expandmeta) performs file name generation. If the 25561f28255Scgd"/u" directory is simulated, then when "/u/username" is replaced 25661f28255Scgdby the user's home directory, the flag "didudir" is set. This 25761f28255Scgdtells the cd command that it should print out the directory name, 25861f28255Scgdjust as it would if the "/u" directory were implemented using 25961f28255Scgdsymbolic links. 26061f28255Scgd 26161f28255ScgdVAR.C: Variables are stored in a hash table. Probably we should 26261f28255Scgdswitch to extensible hashing. The variable name is stored in the 26361f28255Scgdsame string as the value (using the format "name=value") so that 26461f28255Scgdno string copying is needed to create the environment of a com- 26561f28255Scgdmand. Variables which the shell references internally are preal- 26661f28255Scgdlocated so that the shell can reference the values of these vari- 26761f28255Scgdables without doing a lookup. 26861f28255Scgd 26961f28255ScgdWhen a program is run, the code in eval.c sticks any environment 27061f28255Scgdvariables which precede the command (as in "PATH=xxx command") in 27161f28255Scgdthe variable table as the simplest way to strip duplicates, and 27261f28255Scgdthen calls "environment" to get the value of the environment. 27361f28255ScgdThere are two consequences of this. First, if an assignment to 27461f28255ScgdPATH precedes the command, the value of PATH before the assign- 27561f28255Scgdment must be remembered and passed to shellexec. Second, if the 27661f28255Scgdprogram turns out to be a shell procedure, the strings from the 27761f28255Scgdenvironment variables which preceded the command must be pulled 27861f28255Scgdout of the table and replaced with strings obtained from malloc, 27961f28255Scgdsince the former will automatically be freed when the stack (see 28061f28255Scgdthe entry on memalloc.c) is emptied. 28161f28255Scgd 28261f28255ScgdBUILTIN COMMANDS: The procedures for handling these are scat- 28361f28255Scgdtered throughout the code, depending on which location appears 28461f28255Scgdmost appropriate. They can be recognized because their names al- 28561f28255Scgdways end in "cmd". The mapping from names to procedures is 28661f28255Scgdspecified in the file builtins, which is processed by the mkbuil- 28761f28255Scgdtins command. 28861f28255Scgd 28961f28255ScgdA builtin command is invoked with argc and argv set up like a 29061f28255Scgdnormal program. A builtin command is allowed to overwrite its 29161f28255Scgdarguments. Builtin routines can call nextopt to do option pars- 29261f28255Scgding. This is kind of like getopt, but you don't pass argc and 29361f28255Scgdargv to it. Builtin routines can also call error. This routine 29461f28255Scgdnormally terminates the shell (or returns to the main command 29561f28255Scgdloop if the shell is interactive), but when called from a builtin 29661f28255Scgdcommand it causes the builtin command to terminate with an exit 29761f28255Scgdstatus of 2. 29861f28255Scgd 29961f28255ScgdThe directory bltins contains commands which can be compiled in- 30061f28255Scgddependently but can also be built into the shell for efficiency 30161f28255Scgdreasons. The makefile in this directory compiles these programs 30261f28255Scgdin the normal fashion (so that they can be run regardless of 30361f28255Scgdwhether the invoker is ash), but also creates a library named 30461f28255Scgdbltinlib.a which can be linked with ash. The header file bltin.h 30561f28255Scgdtakes care of most of the differences between the ash and the 30661f28255Scgdstand-alone environment. The user should call the main routine 30761f28255Scgd"main", and #define main to be the name of the routine to use 30861f28255Scgdwhen the program is linked into ash. This #define should appear 30961f28255Scgdbefore bltin.h is included; bltin.h will #undef main if the pro- 31061f28255Scgdgram is to be compiled stand-alone. 31161f28255Scgd 31261f28255ScgdCD.C: This file defines the cd and pwd builtins. The pwd com- 31361f28255Scgdmand runs /bin/pwd the first time it is invoked (unless the user 31461f28255Scgdhas already done a cd to an absolute pathname), but then 31561f28255Scgdremembers the current directory and updates it when the cd com- 31661f28255Scgdmand is run, so subsequent pwd commands run very fast. The main 31761f28255Scgdcomplication in the cd command is in the docd command, which 31861f28255Scgdresolves symbolic links into actual names and informs the user 31961f28255Scgdwhere the user ended up if he crossed a symbolic link. 32061f28255Scgd 32161f28255ScgdSIGNALS: Trap.c implements the trap command. The routine set- 32261f28255Scgdsignal figures out what action should be taken when a signal is 32361f28255Scgdreceived and invokes the signal system call to set the signal ac- 32461f28255Scgdtion appropriately. When a signal that a user has set a trap for 32561f28255Scgdis caught, the routine "onsig" sets a flag. The routine dotrap 32661f28255Scgdis called at appropriate points to actually handle the signal. 32761f28255ScgdWhen an interrupt is caught and no trap has been set for that 32861f28255Scgdsignal, the routine "onint" in error.c is called. 32961f28255Scgd 330bf5ceaaeSsnjOUTPUT: Ash uses its own output routines. There are three out- 33161f28255Scgdput structures allocated. "Output" represents the standard out- 33261f28255Scgdput, "errout" the standard error, and "memout" contains output 33361f28255Scgdwhich is to be stored in memory. This last is used when a buil- 33461f28255Scgdtin command appears in backquotes, to allow its output to be col- 33561f28255Scgdlected without doing any I/O through the UNIX operating system. 33661f28255ScgdThe variables out1 and out2 normally point to output and errout, 33761f28255Scgdrespectively, but they are set to point to memout when appropri- 33861f28255Scgdate inside backquotes. 33961f28255Scgd 34061f28255ScgdINPUT: The basic input routine is pgetc, which reads from the 34161f28255Scgdcurrent input file. There is a stack of input files; the current 34261f28255Scgdinput file is the top file on this stack. The code allows the 34361f28255Scgdinput to come from a string rather than a file. (This is for the 34461f28255Scgd-c option and the "." and eval builtin commands.) The global 34561f28255Scgdvariable plinno is saved and restored when files are pushed and 34661f28255Scgdpopped from the stack. The parser routines store the number of 34761f28255Scgdthe current line in this variable. 34861f28255Scgd 34961f28255ScgdDEBUGGING: If DEBUG is defined in shell.h, then the shell will 35061f28255Scgdwrite debugging information to the file $HOME/trace. Most of 35161f28255Scgdthis is done using the TRACE macro, which takes a set of printf 35261f28255Scgdarguments inside two sets of parenthesis. Example: 35361f28255Scgd"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 35461f28255Scgdcause the preprocessor can't handle functions with a variable 35561f28255Scgdnumber of arguments. Defining DEBUG also causes the shell to 35661f28255Scgdgenerate a core dump if it is sent a quit signal. The tracing 35761f28255Scgdcode is in show.c. 358