1*56067Sbostic# @(#)POSIX 5.7 (Berkeley) 08/27/92 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 1856002Sbostic 1. Historic implementations of sed strip the text arguments of the 1956002Sbostic a, c and i commands of their initial blanks, i.e. 2055924Sbostic 2155924Sbostic #!/bin/sed -f 2255924Sbostic a\ 2355924Sbostic foo\ 2455924Sbostic bar 2555924Sbostic 2656001Sbostic produces: 2755924Sbostic 2855924Sbostic foo 2955924Sbostic bar 3055924Sbostic 3156002Sbostic POSIX does not specify this behavior. This implementation follows 3256002Sbostic historic practice. 3355924Sbostic 3456010Sbostic 2. Historical versions of sed required that the w flag be the last 3556002Sbostic flag to an s command as it takes an additional argument. This 3656002Sbostic is obvious, but not specified in POSIX. 3756001Sbostic 3856010Sbostic 3. Historical versions of sed required that whitespace follow a w 3956002Sbostic flag to an s command. This is not specified in POSIX. This 4056002Sbostic implementation permits whitespace but does not require it. 4155924Sbostic 4256010Sbostic 4. Historical versions of sed permitted any number of whitespace 4356002Sbostic characters to follow the w command. This is not specified in 4456002Sbostic POSIX. This implementation permits whitespace but does not 4556002Sbostic require it. 4655924Sbostic 4756010Sbostic 5. The rule for the l command differs from historic practice. Table 4856002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 4956002Sbostic for backslash. Some historical versions of sed displayed two 5056010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 5156010Sbostic is a cleanup, and is followed by this implementation. 5255924Sbostic 5356010Sbostic 6. The POSIX specification for ! does not specify that for a single 5456001Sbostic command the command must not contain an address specification 5556010Sbostic whereas the command list can contain address specifications. The 5656010Sbostic specification for ! implies that "3!/hello/p" works, and it never 5756047Sbostic has, historically. Note, 5855924Sbostic 5956047Sbostic 3!{ 6056047Sbostic /hello/p 6156047Sbostic } 6256047Sbostic 6356047Sbostic does work. 6456047Sbostic 6556010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 6656002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 6756002Sbostic !'s without changing the behaviour. (It seems logical that each 6856002Sbostic one might reverse the behaviour.) This implementation follows 6956002Sbostic historic practice. 7056001Sbostic 7156010Sbostic 8. Historic versions of sed permitted commands to be separated 7256002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 7356001Sbostic three lines of a file. This is not specified by POSIX. 7456001Sbostic Note, the ; command separator is not allowed for the commands 7556001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 7656002Sbostic command. This implementation follows historic practice and 7756002Sbostic implements the ; separator. 7856001Sbostic 7956010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 8056002Sbostic during the execution of the 'n' command, i.e.: 8156001Sbostic 8255924Sbostic sed -e ' 8355924Sbostic n 8455924Sbostic i\ 8555924Sbostic hello 8655924Sbostic ' </dev/null 8755924Sbostic 8856002Sbostic did not produce any output. POSIX does not specify this behavior. 8956002Sbostic This implementation follows historic practice. 9055924Sbostic 9156010Sbostic10. POSIX does not specify that the q command causes all lines that 9256002Sbostic have been appended to be output and that the pattern space is 9356002Sbostic printed before exiting. This implementation follows historic 9456002Sbostic practice. 9555924Sbostic 9656010Sbostic11. Historical implementations do not output the change text of a c 9756047Sbostic command in the case of an address range whose first line number 9856047Sbostic is greater than the second (e.g. 3,1). POSIX requires that the 9956002Sbostic text be output. Since the historic behavior doesn't seem to have 10056002Sbostic any particular purpose, this implementation follows the POSIX 10156002Sbostic behavior. 10255924Sbostic 10356010Sbostic12. POSIX does not specify whether address ranges are checked and 10456002Sbostic reset if a command is not executed due to a jump. The following 105*56067Sbostic program will behave in different ways depending on whether the 106*56067Sbostic 'c' command is triggered at the third line, i.e. will the text 107*56067Sbostic be output even though that command is never logically encountered 108*56067Sbostic in the script by line 3. 10955924Sbostic 11056002Sbostic 2,4b 111*56067Sbostic 1,3c\ 112*56067Sbostic text 11355924Sbostic 11456002Sbostic Historic implementations of sed, for the above example, would 115*56067Sbostic never output the test. There was a bug, however, that if the 116*56067Sbostic "1,3" was replaced by a RE address they would output the text 117*56067Sbostic after the branch no longer applied, but would then quit without 118*56067Sbostic further processing. For example: 119*56067Sbostic 120*56067Sbostic 2,4b 121*56067Sbostic /one/,/three/c\ 122*56067Sbostic text 123*56067Sbostic 124*56067Sbostic with the input: 125*56067Sbostic 126*56067Sbostic one 127*56067Sbostic two 128*56067Sbostic three 129*56067Sbostic four 130*56067Sbostic five 131*56067Sbostic six 132*56067Sbostic 133*56067Sbostic would output: 134*56067Sbostic 135*56067Sbostic two 136*56067Sbostic three 137*56067Sbostic four 138*56067Sbostic text 139*56067Sbostic 140*56067Sbostic This implementation never outputs the text, for either example. 14156002Sbostic This is based on the belief that it would be reasonable to want 14256002Sbostic to output some text if the pattern /one/,/three/ occurs but only 14356002Sbostic if it occurs outside of the range of lines 2 to 4. 14455924Sbostic 14556010Sbostic13. Historical implementations allow an output suppressing #n at the 14656002Sbostic beginning of -e arguments as well as in a script file. POSIX 14756002Sbostic does not specify this. This implementation follows historical 14856002Sbostic practice. 14955924Sbostic 15056010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 15156002Sbostic specified. Since the sed Synopsis permits this form of the command, 15256002Sbostic and the language in the Description section states that the input 15356002Sbostic is output, it seems reasonable that it behave like the cat(1) 15456002Sbostic command. Historic sed implementations behave differently for "ls | 15556010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 15656010Sbostic behave like cat. This implementation behaves like cat in both cases. 15755924Sbostic 15856010Sbostic15. The POSIX requirement to open all wfiles from the beginning makes 15956002Sbostic sed behave nonintuitively when the w commands are preceded by 16056002Sbostic addresses or are within conditional blocks. This implementation 16156002Sbostic follows historic practice and POSIX, by default, and provides the 16256010Sbostic -a option which opens the files only when they are needed. 16355924Sbostic 16456010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 16556002Sbostic (where D is the delimiter character) are to be treated. This is 16656010Sbostic reasonable, however, it also doesn't state that the backslash is 16756010Sbostic to be discarded from the output regardless. A strict reading of 16856010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 16956010Sbostic As historic sed implementations always discarded the backslash, 17056010Sbostic this implementation does as well. 17155924Sbostic 17256010Sbostic17. POSIX specifies that an address can be "empty". This implies 17356010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 17456010Sbostic is not true for historic implementations or this implementation 17556010Sbostic of sed. 17655924Sbostic 17756010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 17856002Sbostic white space, but no mention is made of trailing white space. 17956002Sbostic Historic implementations of sed assigned different locations to 18056002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 18156010Sbostic programming errors, but it is historic practice and changing it 18256047Sbostic could theoretically break working scripts. This implementation 18356047Sbostic follows historic practice. 18455924Sbostic 18556010Sbostic19. Although POSIX specifies that reading from files that do not exist 18656002Sbostic from within the script must not terminate the script, it does not 18756002Sbostic specify what happens if a write command fails. Historic practice 18856010Sbostic is to fail immediately if the file cannot be opened or written. 18956010Sbostic This implementation follows historic practice. 19056001Sbostic 19156010Sbostic20. Historic practice is that the \n construct can be used for either 19256002Sbostic string1 or string2 of the y command. This is not specified by 19356002Sbostic POSIX. This implementation follows historic practice. 19456001Sbostic 19556010Sbostic21. POSIX does not specify if the "Nth occurrence" of an RE in a 19656002Sbostic substitute command is an overlapping or a non-overlapping one, 19756002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 19856002Sbostic Historical practice is to drop core or only do non-overlapping 19956047Sbostic RE's. This implementation only does non-overlapping RE's. 20056001Sbostic 20156010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 20256010Sbostic within character classes. This is not specified in POSIX. This 20356010Sbostic implementation follows historic practice. 20456016Sbostic 20556016Sbostic23. Historic implementations handle empty RE's in a special way: the 20656016Sbostic empty RE is interpreted as if it were the last RE encountered, 20756016Sbostic whether in an address or elsewhere. POSIX does not document this 20856016Sbostic behavior. For example the command: 20956016Sbostic 21056016Sbostic sed -e /abc/s//XXX/ 21156016Sbostic 21256016Sbostic substitutes XXX for the pattern abc. The semantics of "the last 21356016Sbostic RE" can be defined in two different ways: 21456016Sbostic 21556016Sbostic 1. The last RE encountered when compiling (lexical/static scope). 21656016Sbostic 2. The last RE encountered while running (dynamic scope). 21756016Sbostic 21856016Sbostic While many historical implementations fail on programs depending 21956016Sbostic on scope differences, the SunOS version exhibited dynamic scope 22056016Sbostic behaviour. This implementation also uses does dynamic scoping, as 22156047Sbostic this seems the most useful and in order to remain consistent with 22256047Sbostic historical practice. 223