1*56016Sbostic# @(#)POSIX 5.5 (Berkeley) 08/24/92 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 1856002Sbostic 1. Historic implementations of sed strip the text arguments of the 1956002Sbostic a, c and i commands of their initial blanks, i.e. 2055924Sbostic 2155924Sbostic #!/bin/sed -f 2255924Sbostic a\ 2355924Sbostic foo\ 2455924Sbostic bar 2555924Sbostic 2656001Sbostic produces: 2755924Sbostic 2855924Sbostic foo 2955924Sbostic bar 3055924Sbostic 3156002Sbostic POSIX does not specify this behavior. This implementation follows 3256002Sbostic historic practice. 3355924Sbostic 3456010Sbostic 2. Historical versions of sed required that the w flag be the last 3556002Sbostic flag to an s command as it takes an additional argument. This 3656002Sbostic is obvious, but not specified in POSIX. 3756001Sbostic 3856010Sbostic 3. Historical versions of sed required that whitespace follow a w 3956002Sbostic flag to an s command. This is not specified in POSIX. This 4056002Sbostic implementation permits whitespace but does not require it. 4155924Sbostic 4256010Sbostic 4. Historical versions of sed permitted any number of whitespace 4356002Sbostic characters to follow the w command. This is not specified in 4456002Sbostic POSIX. This implementation permits whitespace but does not 4556002Sbostic require it. 4655924Sbostic 4756010Sbostic 5. The rule for the l command differs from historic practice. Table 4856002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 4956002Sbostic for backslash. Some historical versions of sed displayed two 5056010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 5156010Sbostic is a cleanup, and is followed by this implementation. 5255924Sbostic 5356010Sbostic 6. The POSIX specification for ! does not specify that for a single 5456001Sbostic command the command must not contain an address specification 5556010Sbostic whereas the command list can contain address specifications. The 5656010Sbostic specification for ! implies that "3!/hello/p" works, and it never 5756010Sbostic has, historically. (Note, "3!{ /hello/p }" does work.) 5855924Sbostic 5956010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 6056002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 6156002Sbostic !'s without changing the behaviour. (It seems logical that each 6256002Sbostic one might reverse the behaviour.) This implementation follows 6356002Sbostic historic practice. 6456001Sbostic 6556010Sbostic 8. Historic versions of sed permitted commands to be separated 6656002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 6756001Sbostic three lines of a file. This is not specified by POSIX. 6856001Sbostic Note, the ; command separator is not allowed for the commands 6956001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 7056002Sbostic command. This implementation follows historic practice and 7156002Sbostic implements the ; separator. 7256001Sbostic 7356010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 7456002Sbostic during the execution of the 'n' command, i.e.: 7556001Sbostic 7655924Sbostic sed -e ' 7755924Sbostic n 7855924Sbostic i\ 7955924Sbostic hello 8055924Sbostic ' </dev/null 8155924Sbostic 8256002Sbostic did not produce any output. POSIX does not specify this behavior. 8356002Sbostic This implementation follows historic practice. 8455924Sbostic 8556010Sbostic10. POSIX does not specify that the q command causes all lines that 8656002Sbostic have been appended to be output and that the pattern space is 8756002Sbostic printed before exiting. This implementation follows historic 8856002Sbostic practice. 8955924Sbostic 9056010Sbostic11. Historical implementations do not output the change text of a c 9156002Sbostic command in the case of an address range whose second line number 9256002Sbostic is greater than the first (e.g. 3,1). POSIX requires that the 9356002Sbostic text be output. Since the historic behavior doesn't seem to have 9456002Sbostic any particular purpose, this implementation follows the POSIX 9556002Sbostic behavior. 9655924Sbostic 9756010Sbostic12. POSIX does not specify whether address ranges are checked and 9856002Sbostic reset if a command is not executed due to a jump. The following 9956002Sbostic program, with the input "one\ntwo\nthree\nfour\nfive" can behave 10056002Sbostic in different ways depending on whether the the /one/,/three/c 10156002Sbostic command is triggered at the third line. 10255924Sbostic 10356002Sbostic 2,4b 10456002Sbostic /one,/three/c\ 10556002Sbostic append some text 10655924Sbostic 10756002Sbostic Historic implementations of sed, for the above example, would 10856002Sbostic output the text after the "branch" no longer applied, but would 10956002Sbostic then quit without further processing. This implementation has 11056002Sbostic the more intuitive behavior of never outputting the text at all. 11156002Sbostic This is based on the belief that it would be reasonable to want 11256002Sbostic to output some text if the pattern /one/,/three/ occurs but only 11356002Sbostic if it occurs outside of the range of lines 2 to 4. 11455924Sbostic 11556010Sbostic13. Historical implementations allow an output suppressing #n at the 11656002Sbostic beginning of -e arguments as well as in a script file. POSIX 11756002Sbostic does not specify this. This implementation follows historical 11856002Sbostic practice. 11955924Sbostic 12056010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 12156002Sbostic specified. Since the sed Synopsis permits this form of the command, 12256002Sbostic and the language in the Description section states that the input 12356002Sbostic is output, it seems reasonable that it behave like the cat(1) 12456002Sbostic command. Historic sed implementations behave differently for "ls | 12556010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 12656010Sbostic behave like cat. This implementation behaves like cat in both cases. 12755924Sbostic 12856010Sbostic15. The POSIX requirement to open all wfiles from the beginning makes 12956002Sbostic sed behave nonintuitively when the w commands are preceded by 13056002Sbostic addresses or are within conditional blocks. This implementation 13156002Sbostic follows historic practice and POSIX, by default, and provides the 13256010Sbostic -a option which opens the files only when they are needed. 13355924Sbostic 13456010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 13556002Sbostic (where D is the delimiter character) are to be treated. This is 13656010Sbostic reasonable, however, it also doesn't state that the backslash is 13756010Sbostic to be discarded from the output regardless. A strict reading of 13856010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 13956010Sbostic As historic sed implementations always discarded the backslash, 14056010Sbostic this implementation does as well. 14155924Sbostic 14256010Sbostic17. POSIX specifies that an address can be "empty". This implies 14356010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 14456010Sbostic is not true for historic implementations or this implementation 14556010Sbostic of sed. 14655924Sbostic 14756010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 14856002Sbostic white space, but no mention is made of trailing white space. 14956002Sbostic Historic implementations of sed assigned different locations to 15056002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 15156010Sbostic programming errors, but it is historic practice and changing it 15256010Sbostic could theoretically break working scripts. 15355924Sbostic 15456010Sbostic19. Although POSIX specifies that reading from files that do not exist 15556002Sbostic from within the script must not terminate the script, it does not 15656002Sbostic specify what happens if a write command fails. Historic practice 15756010Sbostic is to fail immediately if the file cannot be opened or written. 15856010Sbostic This implementation follows historic practice. 15956001Sbostic 16056010Sbostic20. Historic practice is that the \n construct can be used for either 16156002Sbostic string1 or string2 of the y command. This is not specified by 16256002Sbostic POSIX. This implementation follows historic practice. 16356001Sbostic 16456010Sbostic21. POSIX does not specify if the "Nth occurrence" of an RE in a 16556002Sbostic substitute command is an overlapping or a non-overlapping one, 16656002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 16756002Sbostic Historical practice is to drop core or only do non-overlapping 16856010Sbostic RE's. This implementation only does on-overlapping RE's. 16956001Sbostic 17056010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 17156010Sbostic within character classes. This is not specified in POSIX. This 17256010Sbostic implementation follows historic practice. 173*56016Sbostic 174*56016Sbostic23. Historic implementations handle empty RE's in a special way: the 175*56016Sbostic empty RE is interpreted as if it were the last RE encountered, 176*56016Sbostic whether in an address or elsewhere. POSIX does not document this 177*56016Sbostic behavior. For example the command: 178*56016Sbostic 179*56016Sbostic sed -e /abc/s//XXX/ 180*56016Sbostic 181*56016Sbostic substitutes XXX for the pattern abc. The semantics of "the last 182*56016Sbostic RE" can be defined in two different ways: 183*56016Sbostic 184*56016Sbostic 1. The last RE encountered when compiling (lexical/static scope). 185*56016Sbostic 2. The last RE encountered while running (dynamic scope). 186*56016Sbostic 187*56016Sbostic While many historical implementations fail on programs depending 188*56016Sbostic on scope differences, the SunOS version exhibited dynamic scope 189*56016Sbostic behaviour. This implementation also uses does dynamic scoping, as 190*56016Sbostic this seems the natural way to interact with an editor, and in order 191*56016Sbostic to remain consistent with historical practice. 192