1*56047Sbostic# @(#)POSIX 5.6 (Berkeley) 08/26/92 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 1856002Sbostic 1. Historic implementations of sed strip the text arguments of the 1956002Sbostic a, c and i commands of their initial blanks, i.e. 2055924Sbostic 2155924Sbostic #!/bin/sed -f 2255924Sbostic a\ 2355924Sbostic foo\ 2455924Sbostic bar 2555924Sbostic 2656001Sbostic produces: 2755924Sbostic 2855924Sbostic foo 2955924Sbostic bar 3055924Sbostic 3156002Sbostic POSIX does not specify this behavior. This implementation follows 3256002Sbostic historic practice. 3355924Sbostic 3456010Sbostic 2. Historical versions of sed required that the w flag be the last 3556002Sbostic flag to an s command as it takes an additional argument. This 3656002Sbostic is obvious, but not specified in POSIX. 3756001Sbostic 3856010Sbostic 3. Historical versions of sed required that whitespace follow a w 3956002Sbostic flag to an s command. This is not specified in POSIX. This 4056002Sbostic implementation permits whitespace but does not require it. 4155924Sbostic 4256010Sbostic 4. Historical versions of sed permitted any number of whitespace 4356002Sbostic characters to follow the w command. This is not specified in 4456002Sbostic POSIX. This implementation permits whitespace but does not 4556002Sbostic require it. 4655924Sbostic 4756010Sbostic 5. The rule for the l command differs from historic practice. Table 4856002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 4956002Sbostic for backslash. Some historical versions of sed displayed two 5056010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 5156010Sbostic is a cleanup, and is followed by this implementation. 5255924Sbostic 5356010Sbostic 6. The POSIX specification for ! does not specify that for a single 5456001Sbostic command the command must not contain an address specification 5556010Sbostic whereas the command list can contain address specifications. The 5656010Sbostic specification for ! implies that "3!/hello/p" works, and it never 57*56047Sbostic has, historically. Note, 5855924Sbostic 59*56047Sbostic 3!{ 60*56047Sbostic /hello/p 61*56047Sbostic } 62*56047Sbostic 63*56047Sbostic does work. 64*56047Sbostic 6556010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 6656002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 6756002Sbostic !'s without changing the behaviour. (It seems logical that each 6856002Sbostic one might reverse the behaviour.) This implementation follows 6956002Sbostic historic practice. 7056001Sbostic 7156010Sbostic 8. Historic versions of sed permitted commands to be separated 7256002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 7356001Sbostic three lines of a file. This is not specified by POSIX. 7456001Sbostic Note, the ; command separator is not allowed for the commands 7556001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 7656002Sbostic command. This implementation follows historic practice and 7756002Sbostic implements the ; separator. 7856001Sbostic 7956010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 8056002Sbostic during the execution of the 'n' command, i.e.: 8156001Sbostic 8255924Sbostic sed -e ' 8355924Sbostic n 8455924Sbostic i\ 8555924Sbostic hello 8655924Sbostic ' </dev/null 8755924Sbostic 8856002Sbostic did not produce any output. POSIX does not specify this behavior. 8956002Sbostic This implementation follows historic practice. 9055924Sbostic 9156010Sbostic10. POSIX does not specify that the q command causes all lines that 9256002Sbostic have been appended to be output and that the pattern space is 9356002Sbostic printed before exiting. This implementation follows historic 9456002Sbostic practice. 9555924Sbostic 9656010Sbostic11. Historical implementations do not output the change text of a c 97*56047Sbostic command in the case of an address range whose first line number 98*56047Sbostic is greater than the second (e.g. 3,1). POSIX requires that the 9956002Sbostic text be output. Since the historic behavior doesn't seem to have 10056002Sbostic any particular purpose, this implementation follows the POSIX 10156002Sbostic behavior. 10255924Sbostic 10356010Sbostic12. POSIX does not specify whether address ranges are checked and 10456002Sbostic reset if a command is not executed due to a jump. The following 10556002Sbostic program, with the input "one\ntwo\nthree\nfour\nfive" can behave 10656002Sbostic in different ways depending on whether the the /one/,/three/c 10756002Sbostic command is triggered at the third line. 10855924Sbostic 10956002Sbostic 2,4b 11056002Sbostic /one,/three/c\ 11156002Sbostic append some text 11255924Sbostic 11356002Sbostic Historic implementations of sed, for the above example, would 11456002Sbostic output the text after the "branch" no longer applied, but would 11556002Sbostic then quit without further processing. This implementation has 11656002Sbostic the more intuitive behavior of never outputting the text at all. 11756002Sbostic This is based on the belief that it would be reasonable to want 11856002Sbostic to output some text if the pattern /one/,/three/ occurs but only 11956002Sbostic if it occurs outside of the range of lines 2 to 4. 12055924Sbostic 12156010Sbostic13. Historical implementations allow an output suppressing #n at the 12256002Sbostic beginning of -e arguments as well as in a script file. POSIX 12356002Sbostic does not specify this. This implementation follows historical 12456002Sbostic practice. 12555924Sbostic 12656010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 12756002Sbostic specified. Since the sed Synopsis permits this form of the command, 12856002Sbostic and the language in the Description section states that the input 12956002Sbostic is output, it seems reasonable that it behave like the cat(1) 13056002Sbostic command. Historic sed implementations behave differently for "ls | 13156010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 13256010Sbostic behave like cat. This implementation behaves like cat in both cases. 13355924Sbostic 13456010Sbostic15. The POSIX requirement to open all wfiles from the beginning makes 13556002Sbostic sed behave nonintuitively when the w commands are preceded by 13656002Sbostic addresses or are within conditional blocks. This implementation 13756002Sbostic follows historic practice and POSIX, by default, and provides the 13856010Sbostic -a option which opens the files only when they are needed. 13955924Sbostic 14056010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 14156002Sbostic (where D is the delimiter character) are to be treated. This is 14256010Sbostic reasonable, however, it also doesn't state that the backslash is 14356010Sbostic to be discarded from the output regardless. A strict reading of 14456010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 14556010Sbostic As historic sed implementations always discarded the backslash, 14656010Sbostic this implementation does as well. 14755924Sbostic 14856010Sbostic17. POSIX specifies that an address can be "empty". This implies 14956010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 15056010Sbostic is not true for historic implementations or this implementation 15156010Sbostic of sed. 15255924Sbostic 15356010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 15456002Sbostic white space, but no mention is made of trailing white space. 15556002Sbostic Historic implementations of sed assigned different locations to 15656002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 15756010Sbostic programming errors, but it is historic practice and changing it 158*56047Sbostic could theoretically break working scripts. This implementation 159*56047Sbostic follows historic practice. 16055924Sbostic 16156010Sbostic19. Although POSIX specifies that reading from files that do not exist 16256002Sbostic from within the script must not terminate the script, it does not 16356002Sbostic specify what happens if a write command fails. Historic practice 16456010Sbostic is to fail immediately if the file cannot be opened or written. 16556010Sbostic This implementation follows historic practice. 16656001Sbostic 16756010Sbostic20. Historic practice is that the \n construct can be used for either 16856002Sbostic string1 or string2 of the y command. This is not specified by 16956002Sbostic POSIX. This implementation follows historic practice. 17056001Sbostic 17156010Sbostic21. POSIX does not specify if the "Nth occurrence" of an RE in a 17256002Sbostic substitute command is an overlapping or a non-overlapping one, 17356002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 17456002Sbostic Historical practice is to drop core or only do non-overlapping 175*56047Sbostic RE's. This implementation only does non-overlapping RE's. 17656001Sbostic 17756010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 17856010Sbostic within character classes. This is not specified in POSIX. This 17956010Sbostic implementation follows historic practice. 18056016Sbostic 18156016Sbostic23. Historic implementations handle empty RE's in a special way: the 18256016Sbostic empty RE is interpreted as if it were the last RE encountered, 18356016Sbostic whether in an address or elsewhere. POSIX does not document this 18456016Sbostic behavior. For example the command: 18556016Sbostic 18656016Sbostic sed -e /abc/s//XXX/ 18756016Sbostic 18856016Sbostic substitutes XXX for the pattern abc. The semantics of "the last 18956016Sbostic RE" can be defined in two different ways: 19056016Sbostic 19156016Sbostic 1. The last RE encountered when compiling (lexical/static scope). 19256016Sbostic 2. The last RE encountered while running (dynamic scope). 19356016Sbostic 19456016Sbostic While many historical implementations fail on programs depending 19556016Sbostic on scope differences, the SunOS version exhibited dynamic scope 19656016Sbostic behaviour. This implementation also uses does dynamic scoping, as 197*56047Sbostic this seems the most useful and in order to remain consistent with 198*56047Sbostic historical practice. 199