1*56002Sbostic# @(#)POSIX 5.3 (Berkeley) 08/24/92 255924Sbostic 3*56002SbosticComments on the IEEE P1003.2 Draft 12 4*56002Sbostic Part 2: Shell and Utilities 5*56002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 7*56002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 8*56002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 10*56002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 11*56002Sbostichistoric practice", as most of the following comments refer to 12*56002Sbosticundocumented inconsistencies between the historical versions of sed and 13*56002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 14*56002Sbosticimplementing a POSIX-compatible version of sed, and should not be 15*56002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 16*56002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 18*56002Sbostic 1. Historic implementations of sed strip the text arguments of the 19*56002Sbostic a, c and i commands of their initial blanks, i.e. 2055924Sbostic 2155924Sbostic #!/bin/sed -f 2255924Sbostic a\ 2355924Sbostic foo\ 2455924Sbostic bar 2555924Sbostic 2656001Sbostic produces: 2755924Sbostic 2855924Sbostic foo 2955924Sbostic bar 3055924Sbostic 31*56002Sbostic POSIX does not specify this behavior. This implementation follows 32*56002Sbostic historic practice. 3355924Sbostic 34*56002Sbostic 2. Historic implementations ignore comments in the text of the i 35*56002Sbostic and a commands. This implementation follows historic practice. 3655924Sbostic 37*56002SbosticTK I can't duplicate this -- the BSD version of sed doesn't, i.e. 38*56002SbosticTK i\ 39*56002SbosticTK foo\ 40*56002SbosticTK #comment\ 41*56002SbosticTK bar 42*56002SbosticTK prints 43*56002SbosticTK 44*56002SbosticTK foo 45*56002SbosticTK #comment 46*56002SbosticTK bar 4755924Sbostic 48*56002Sbostic 3. Historical versions of sed required that the w flag be the last 49*56002Sbostic flag to an s command as it takes an additional argument. This 50*56002Sbostic is obvious, but not specified in POSIX. 5156001Sbostic 52*56002Sbostic 4. Historical versions of sed required that whitespace follow a w 53*56002Sbostic flag to an s command. This is not specified in POSIX. This 54*56002Sbostic implementation permits whitespace but does not require it. 5555924Sbostic 56*56002Sbostic 5. Historical versions of sed permitted any number of whitespace 57*56002Sbostic characters to follow the w command. This is not specified in 58*56002Sbostic POSIX. This implementation permits whitespace but does not 59*56002Sbostic require it. 6055924Sbostic 61*56002Sbostic 6. The rule for the l command differs from historic practice. Table 62*56002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 63*56002Sbostic for backslash. Some historical versions of sed displayed two 64*56002Sbostic digit octal numbers, too, not three as specified by POSIX. The 65*56002Sbostic POSIX specification is a cleanup, and this implementation follows 66*56002Sbostic it. 6755924Sbostic 68*56002Sbostic 7. The specification for ! does not specify that for a single 6956001Sbostic command the command must not contain an address specification 7056001Sbostic whereas the command list can contain address specifications. 7155924Sbostic 72*56002SbosticTK I think this is wrong: the script: 7356001SbosticTK 7456001SbosticTK 3!p 7556001SbosticTK 76*56002SbosticTK works fine. Am I misunderstanding your point? 77*56002SbosticDDS Yes. By the definition of command by POSIX 3!/hello/p should work 78*56002SbosticDDS just as 3!{/hello/p} does. The current implementation follows 79*56002SbosticDDS historic practice and does not implement it. 80*56002SbosticTK I *still* don't understand.... Would you please try to explain 81*56002SbosticTK it one more time? Thanks... 8255924Sbostic 83*56002Sbostic 8. POSIX does not specify what happens with consecutive ! commands 84*56002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 85*56002Sbostic !'s without changing the behaviour. (It seems logical that each 86*56002Sbostic one might reverse the behaviour.) This implementation follows 87*56002Sbostic historic practice. 8856001Sbostic 89*56002Sbostic 9. Historic versions of sed permitted commands to be separated 90*56002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 9156001Sbostic three lines of a file. This is not specified by POSIX. 9256001Sbostic Note, the ; command separator is not allowed for the commands 9356001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 94*56002Sbostic command. This implementation follows historic practice and 95*56002Sbostic implements the ; separator. 9656001Sbostic 97*56002Sbostic10. Historic versions of sed terminated the script if EOF was reached 98*56002Sbostic during the execution of the 'n' command, i.e.: 9956001Sbostic 10055924Sbostic sed -e ' 10155924Sbostic n 10255924Sbostic i\ 10355924Sbostic hello 10455924Sbostic ' </dev/null 10555924Sbostic 106*56002Sbostic did not produce any output. POSIX does not specify this behavior. 107*56002Sbostic This implementation follows historic practice. 10855924Sbostic 109*56002Sbostic11. POSIX does not specify that the q command causes all lines that 110*56002Sbostic have been appended to be output and that the pattern space is 111*56002Sbostic printed before exiting. This implementation follows historic 112*56002Sbostic practice. 11355924Sbostic 114*56002Sbostic12. Historical implementations do not output the change text of a c 115*56002Sbostic command in the case of an address range whose second line number 116*56002Sbostic is greater than the first (e.g. 3,1). POSIX requires that the 117*56002Sbostic text be output. Since the historic behavior doesn't seem to have 118*56002Sbostic any particular purpose, this implementation follows the POSIX 119*56002Sbostic behavior. 12055924Sbostic 121*56002Sbostic13. POSIX does not specify whether address ranges are checked and 122*56002Sbostic reset if a command is not executed due to a jump. The following 123*56002Sbostic program, with the input "one\ntwo\nthree\nfour\nfive" can behave 124*56002Sbostic in different ways depending on whether the the /one/,/three/c 125*56002Sbostic command is triggered at the third line. 12655924Sbostic 127*56002Sbostic 2,4b 128*56002Sbostic /one,/three/c\ 129*56002Sbostic append some text 13055924Sbostic 131*56002Sbostic Historic implementations of sed, for the above example, would 132*56002Sbostic output the text after the "branch" no longer applied, but would 133*56002Sbostic then quit without further processing. This implementation has 134*56002Sbostic the more intuitive behavior of never outputting the text at all. 135*56002Sbostic This is based on the belief that it would be reasonable to want 136*56002Sbostic to output some text if the pattern /one/,/three/ occurs but only 137*56002Sbostic if it occurs outside of the range of lines 2 to 4. 13855924Sbostic 139*56002Sbostic14. Historical implementations allow an output suppressing #n at the 140*56002Sbostic beginning of -e arguments as well as in a script file. POSIX 141*56002Sbostic does not specify this. This implementation follows historical 142*56002Sbostic practice. 14355924Sbostic 144*56002Sbostic15. POSIX does not specify whether more than one numeric flag is 145*56002Sbostic allowed on the s command. Historic practice is to specify only 146*56002Sbostic a single flag. 14755924Sbostic 14856001SbosticTK What's historic practice? Currently we don't report an error or 149*56002SbosticTK do all of the flags. 150*56002SbosticDDS Historic practice is a single flag. We follow it. POSIX 151*56002SbosticDDS should be more precise. 152*56002SbosticTK It actually seems reasonable to do multiple flags, i.e. display 153*56002SbosticTK two or more of the matched patterns. Since it's unambiguous (only 154*56002SbosticTK 1-9 are allowed, so /19 *has* to be 1 and 9, not nineteen, we can't 155*56002SbosticTK break any existing scripts. 15655924Sbostic 157*56002Sbostic16. POSIX does not explicitly specify how sed behaves if no script is 158*56002Sbostic specified. Since the sed Synopsis permits this form of the command, 159*56002Sbostic and the language in the Description section states that the input 160*56002Sbostic is output, it seems reasonable that it behave like the cat(1) 161*56002Sbostic command. Historic sed implementations behave differently for "ls | 162*56002Sbostic sed" (no output) and "ls | sed -e#" (like cat). This implementation 163*56002Sbostic behaves like cat in both cases. 16455924Sbostic 165*56002Sbostic17. The POSIX requirement to open all wfiles from the beginning makes 166*56002Sbostic sed behave nonintuitively when the w commands are preceded by 167*56002Sbostic addresses or are within conditional blocks. This implementation 168*56002Sbostic follows historic practice and POSIX, by default, and provides the 169*56002Sbostic -a option for more reasonable behavior. 17055924Sbostic 171*56002Sbostic18. POSIX does not specify how escape sequences other than \n and \D 172*56002Sbostic (where D is the delimiter character) are to be treated. This is 173*56002Sbostic reasonable, however, it doesn't state that the backslash is to be 174*56002Sbostic discarded from the output regardless. A strict reading of POSIX 175*56002Sbostic would be that "echo xyz | sed s/./\a" would display "\ayz". As 176*56002Sbostic historic sed implementations always discarded the backslash, this 177*56002Sbostic implementation does as well. 17855924Sbostic 179*56002Sbostic19. POSIX specifies that an address can be "empty". This implies that 180*56002Sbostic constructs like ,d or 1,d and ,5d are allowed. This is not true 181*56002Sbostic for historic implementations of sed. This implementation follows 182*56002Sbostic historic practice. 18355924Sbostic 184*56002Sbostic20. The b t and : commands are documented in POSIX to ignore leading 185*56002Sbostic white space, but no mention is made of trailing white space. 186*56002Sbostic Historic implementations of sed assigned different locations to 187*56002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 188*56002Sbostic programming errors. This implementation ignores trailing whitespace. 18955924Sbostic 19056001SbosticTK I think that line 11347 points out the the synopsis shows 19156001SbosticTK which are valid. 192*56002SbosticDDS I am talking about _trailing_ white space. In our implementation 193*56002SbosticDDS and historic implementation the label can contain _significant_ 194*56002SbosticDDS white space at its end. This is obscure and not explained in 195*56002SbosticDDS POSIX. 196*56002SbosticTK I think we should delete trailing white space for the above 197*56002SbosticTK reason. 19856001Sbostic 199*56002Sbostic21. Although POSIX specifies that reading from files that do not exist 200*56002Sbostic from within the script must not terminate the script, it does not 201*56002Sbostic specify what happens if a write command fails. Historic practice 202*56002Sbostic is to fail immediately if the file cannot be open or written. This 203*56002Sbostic implementation follows historic practice. 20456001Sbostic 205*56002Sbostic22. Historic practice is that the \n construct can be used for either 206*56002Sbostic string1 or string2 of the y command. This is not specified by 207*56002Sbostic POSIX. This implementation follows historic practice. 20856001Sbostic 209*56002Sbostic23. POSIX does not specify if the "Nth occurrence" of an RE in a 210*56002Sbostic substitute command is an overlapping or a non-overlapping one, 211*56002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 212*56002Sbostic Historical practice is to drop core or only do non-overlapping 213*56002Sbostic expressions. This implementation follows historic practice. 21456001Sbostic 215*56002Sbostic24. Historic implementations of sed ignore the regular expression 21656001Sbostic delimiter characters within character classes. This is not 217*56002Sbostic specified in POSIX. This implementation follows historic practice. 218