1*56010Sbostic# @(#)POSIX 5.4 (Berkeley) 08/24/92 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 1856002Sbostic 1. Historic implementations of sed strip the text arguments of the 1956002Sbostic a, c and i commands of their initial blanks, i.e. 2055924Sbostic 2155924Sbostic #!/bin/sed -f 2255924Sbostic a\ 2355924Sbostic foo\ 2455924Sbostic bar 2555924Sbostic 2656001Sbostic produces: 2755924Sbostic 2855924Sbostic foo 2955924Sbostic bar 3055924Sbostic 3156002Sbostic POSIX does not specify this behavior. This implementation follows 3256002Sbostic historic practice. 3355924Sbostic 34*56010Sbostic 2. Historical versions of sed required that the w flag be the last 3556002Sbostic flag to an s command as it takes an additional argument. This 3656002Sbostic is obvious, but not specified in POSIX. 3756001Sbostic 38*56010Sbostic 3. Historical versions of sed required that whitespace follow a w 3956002Sbostic flag to an s command. This is not specified in POSIX. This 4056002Sbostic implementation permits whitespace but does not require it. 4155924Sbostic 42*56010Sbostic 4. Historical versions of sed permitted any number of whitespace 4356002Sbostic characters to follow the w command. This is not specified in 4456002Sbostic POSIX. This implementation permits whitespace but does not 4556002Sbostic require it. 4655924Sbostic 47*56010Sbostic 5. The rule for the l command differs from historic practice. Table 4856002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 4956002Sbostic for backslash. Some historical versions of sed displayed two 50*56010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 51*56010Sbostic is a cleanup, and is followed by this implementation. 5255924Sbostic 53*56010Sbostic 6. The POSIX specification for ! does not specify that for a single 5456001Sbostic command the command must not contain an address specification 55*56010Sbostic whereas the command list can contain address specifications. The 56*56010Sbostic specification for ! implies that "3!/hello/p" works, and it never 57*56010Sbostic has, historically. (Note, "3!{ /hello/p }" does work.) 5855924Sbostic 59*56010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 6056002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 6156002Sbostic !'s without changing the behaviour. (It seems logical that each 6256002Sbostic one might reverse the behaviour.) This implementation follows 6356002Sbostic historic practice. 6456001Sbostic 65*56010Sbostic 8. Historic versions of sed permitted commands to be separated 6656002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 6756001Sbostic three lines of a file. This is not specified by POSIX. 6856001Sbostic Note, the ; command separator is not allowed for the commands 6956001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 7056002Sbostic command. This implementation follows historic practice and 7156002Sbostic implements the ; separator. 7256001Sbostic 73*56010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 7456002Sbostic during the execution of the 'n' command, i.e.: 7556001Sbostic 7655924Sbostic sed -e ' 7755924Sbostic n 7855924Sbostic i\ 7955924Sbostic hello 8055924Sbostic ' </dev/null 8155924Sbostic 8256002Sbostic did not produce any output. POSIX does not specify this behavior. 8356002Sbostic This implementation follows historic practice. 8455924Sbostic 85*56010Sbostic10. POSIX does not specify that the q command causes all lines that 8656002Sbostic have been appended to be output and that the pattern space is 8756002Sbostic printed before exiting. This implementation follows historic 8856002Sbostic practice. 8955924Sbostic 90*56010Sbostic11. Historical implementations do not output the change text of a c 9156002Sbostic command in the case of an address range whose second line number 9256002Sbostic is greater than the first (e.g. 3,1). POSIX requires that the 9356002Sbostic text be output. Since the historic behavior doesn't seem to have 9456002Sbostic any particular purpose, this implementation follows the POSIX 9556002Sbostic behavior. 9655924Sbostic 97*56010Sbostic12. POSIX does not specify whether address ranges are checked and 9856002Sbostic reset if a command is not executed due to a jump. The following 9956002Sbostic program, with the input "one\ntwo\nthree\nfour\nfive" can behave 10056002Sbostic in different ways depending on whether the the /one/,/three/c 10156002Sbostic command is triggered at the third line. 10255924Sbostic 10356002Sbostic 2,4b 10456002Sbostic /one,/three/c\ 10556002Sbostic append some text 10655924Sbostic 10756002Sbostic Historic implementations of sed, for the above example, would 10856002Sbostic output the text after the "branch" no longer applied, but would 10956002Sbostic then quit without further processing. This implementation has 11056002Sbostic the more intuitive behavior of never outputting the text at all. 11156002Sbostic This is based on the belief that it would be reasonable to want 11256002Sbostic to output some text if the pattern /one/,/three/ occurs but only 11356002Sbostic if it occurs outside of the range of lines 2 to 4. 11455924Sbostic 115*56010Sbostic13. Historical implementations allow an output suppressing #n at the 11656002Sbostic beginning of -e arguments as well as in a script file. POSIX 11756002Sbostic does not specify this. This implementation follows historical 11856002Sbostic practice. 11955924Sbostic 120*56010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 12156002Sbostic specified. Since the sed Synopsis permits this form of the command, 12256002Sbostic and the language in the Description section states that the input 12356002Sbostic is output, it seems reasonable that it behave like the cat(1) 12456002Sbostic command. Historic sed implementations behave differently for "ls | 125*56010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 126*56010Sbostic behave like cat. This implementation behaves like cat in both cases. 12755924Sbostic 128*56010Sbostic15. The POSIX requirement to open all wfiles from the beginning makes 12956002Sbostic sed behave nonintuitively when the w commands are preceded by 13056002Sbostic addresses or are within conditional blocks. This implementation 13156002Sbostic follows historic practice and POSIX, by default, and provides the 132*56010Sbostic -a option which opens the files only when they are needed. 13355924Sbostic 134*56010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 13556002Sbostic (where D is the delimiter character) are to be treated. This is 136*56010Sbostic reasonable, however, it also doesn't state that the backslash is 137*56010Sbostic to be discarded from the output regardless. A strict reading of 138*56010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 139*56010Sbostic As historic sed implementations always discarded the backslash, 140*56010Sbostic this implementation does as well. 14155924Sbostic 142*56010Sbostic17. POSIX specifies that an address can be "empty". This implies 143*56010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 144*56010Sbostic is not true for historic implementations or this implementation 145*56010Sbostic of sed. 14655924Sbostic 147*56010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 14856002Sbostic white space, but no mention is made of trailing white space. 14956002Sbostic Historic implementations of sed assigned different locations to 15056002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 151*56010Sbostic programming errors, but it is historic practice and changing it 152*56010Sbostic could theoretically break working scripts. 15355924Sbostic 154*56010Sbostic19. Although POSIX specifies that reading from files that do not exist 15556002Sbostic from within the script must not terminate the script, it does not 15656002Sbostic specify what happens if a write command fails. Historic practice 157*56010Sbostic is to fail immediately if the file cannot be opened or written. 158*56010Sbostic This implementation follows historic practice. 15956001Sbostic 160*56010Sbostic20. Historic practice is that the \n construct can be used for either 16156002Sbostic string1 or string2 of the y command. This is not specified by 16256002Sbostic POSIX. This implementation follows historic practice. 16356001Sbostic 164*56010Sbostic21. POSIX does not specify if the "Nth occurrence" of an RE in a 16556002Sbostic substitute command is an overlapping or a non-overlapping one, 16656002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 16756002Sbostic Historical practice is to drop core or only do non-overlapping 168*56010Sbostic RE's. This implementation only does on-overlapping RE's. 16956001Sbostic 170*56010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 171*56010Sbostic within character classes. This is not specified in POSIX. This 172*56010Sbostic implementation follows historic practice. 173