1*56001Sbostic# @(#)POSIX 5.2 (Berkeley) 08/24/92 255924Sbostic 3*56001Sbostic Comments on the IEEE P1003.2 Draft 12 455924Sbostic 5*56001Sbostic Part 2: Shell and Utilities 6*56001Sbostic Section 4.55: sed - Stream editor 755924Sbostic 8*56001Sbostic Diomidis Spinellis <dds@doc.ic.ac.uk> 9*56001Sbostic 1055924SbosticIn the following paragraphs, `wrong' means `inconsistent with historic 1155924Sbosticpractice'. Many of the comments refer to undocumented inconsistencies 1255924Sbosticbetween the historical versions of sed and the POSIX standard. All the 1355924Sbosticcomments are notes taken while implementing a POSIX-compatible version 1455924Sbosticof sed, and should not be interpreted as official opinions or criticism 15*56001Sbostictowards the POSIX committee. Some are insignificant, pedantic and even 1655924Sbosticwrong. 1755924Sbostic 18*56001Sbostic 1. For the text argument of the a command it is not specified if 19*56001Sbostic lines are stripped of their initial blanks or not. Historical 20*56001Sbostic practice, followed in this implementation, is to strip the 21*56001Sbostic blanks, i.e.: 2255924Sbostic 2355924Sbostic #!/bin/sed -f 2455924Sbostic a\ 2555924Sbostic foo\ 2655924Sbostic bar 2755924Sbostic 28*56001Sbostic produces: 2955924Sbostic 3055924Sbostic foo 3155924Sbostic bar 3255924Sbostic 33*56001Sbostic 2. Historical versions of sed required that the w flag must be the 34*56001Sbostic last flag to an s command as it takes an additional argument. 35*56001Sbostic This is not specified in the standard. 3655924Sbostic 37*56001Sbostic 3. Historical versions of sed required that whitespace follow a w 38*56001Sbostic flag to an s command. This is not specified in the standard. 39*56001Sbostic This implementation permits whitespace but does not require 40*56001Sbostic it. 4155924Sbostic 42*56001Sbostic 4. Historical versions of sed permitted any number of whitespace 43*56001Sbostic characters to follow the w command. This is not specified in 44*56001Sbostic the standard. This implementation permits whitespace but does 45*56001Sbostic not require it. 4655924Sbostic 47*56001Sbostic 5. The specification of the a command is wrong. With the current 48*56001Sbostic specification both of these scripts should produce the same 49*56001Sbostic output: 50*56001Sbostic 5155924Sbostic #!/bin/sed -f 5255924Sbostic d 5355924Sbostic a\ 5455924Sbostic hello 5555924Sbostic 5655924Sbostic #!/bin/sed -f 5755924Sbostic a\ 5855924Sbostic hello 5955924Sbostic d 6055924Sbostic 61*56001SbosticTK -- Diomidis, the current implementation looks wrong on this case. 6255924Sbostic 63*56001Sbostic 6. The specification of the c command in conjunction with the 64*56001Sbostic specification of the default operation (D2 11293-11299) is 65*56001Sbostic wrong. The default operation specifies that a newline is 66*56001Sbostic printed after the pattern space. This is not the case when 67*56001Sbostic the pattern space has been deleted by a c command. 6855924Sbostic 69*56001SbosticTK Diomidis, the spec seems right to me -- the language in 11293 70*56001SbosticTK talks about copying the pattern space to stdout -- if the pattern space 71*56001SbosticTK is deleted, it can't be copied. 7255924Sbostic 73*56001Sbostic 7. The rule for the l command differs from historic practice. 74*56001Sbostic Table 2-15 includes the various ANSI C escape sequences, 75*56001Sbostic including \\ for backslash. Some historical versions of 76*56001Sbostic sed displayed two digit octal numbers. The POSIX 77*56001Sbostic specification is a cleanup, and this implementation follows 78*56001Sbostic to it. 7955924Sbostic 80*56001Sbostic 8. The specification for ! does not specify that for a single 81*56001Sbostic command the command must not contain an address specification 82*56001Sbostic whereas the command list can contain address specifications. 8355924Sbostic 84*56001SbosticTK I think this is wrong: the script: 85*56001SbosticTK 86*56001SbosticTK 3!p 87*56001SbosticTK 88*56001SbosticTK works fine. Am I misunderstanding your point? 8955924Sbostic 90*56001Sbostic 9. The standard does not specify what happens with consecutive 91*56001Sbostic ! commands (e.g. /foo/!!!p). Historic implementations 92*56001Sbostic allow any number of !'s without changing behaviour. (It 93*56001Sbostic seems logical that each one should reverse the default 94*56001Sbostic behaviour.) This implementation follows historic practice. 95*56001Sbostic 96*56001Sbostic10. Historic versions of sed permitted commands to be separated 97*56001Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' prints the first 98*56001Sbostic three lines of a file. This is not specified by POSIX. 99*56001Sbostic Note, the ; command separator is not allowed for the commands 100*56001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 101*56001Sbostic command. This implementation follows historic practice. 102*56001Sbostic 103*56001Sbostic11. The standard does not specify that if EOF is reached during 104*56001Sbostic the execution of the n command the program terminates (e.g. 105*56001Sbostic 10655924Sbostic sed -e ' 10755924Sbostic n 10855924Sbostic i\ 10955924Sbostic hello 11055924Sbostic ' </dev/null 11155924Sbostic 112*56001Sbostic will not produce any output. This implementation follows 113*56001Sbostic historic practice. 11455924Sbostic 115*56001Sbostic12. The standard does not specify that the q command causes all 116*56001Sbostic lines that have been appended to be output and that the pattern 117*56001Sbostic space is printed before exiting. This implementation follows 118*56001Sbostic historic practice. 11955924Sbostic 120*56001Sbostic13. Historic implementations ignore comments in the text of the i 121*56001Sbostic and a commands. This implementation follows historic practice. 12255924Sbostic 123*56001Sbostic14. Historic implementations do not consider the last line of a 124*56001Sbostic file to match $ if an empty file follows, e.g. 12555924Sbostic 12655924Sbostic sed -n -e '$p' /usr/dict/words /dev/null 12755924Sbostic 128*56001Sbostic will not print anything. This is not mentioned in the POSIX 129*56001Sbostic specification and is almost certainly a bug. This implementation 130*56001Sbostic follows the POSIX specification. 13155924Sbostic 132*56001SbosticTK Diomidis, I think we need to fix this, can you do it? 133*56001SbosticDDS We follow POSIX. You don't mean to do it buggy? 134*56001SbosticTK I see... (I didn't understand that problem until now.) I think 135*56001SbosticTK that we *should* print out the last line of the dictionary, in 136*56001SbosticTK the above example, but I can see how it would be hard. What do 137*56001SbosticTK you think? 13855924Sbostic 139*56001Sbostic15. Historical implementations do not output the change text 140*56001Sbostic of a c command in the case of an address range whose second 141*56001Sbostic line number is greater than the first (e.g. 3,1). The POSIX 142*56001Sbostic standard requires that the text be output. Since the historic 143*56001Sbostic behavior doesn't seem to have any particular purpose, this 144*56001Sbostic implementation follows the POSIX behavior. 14555924Sbostic 146*56001Sbostic16. Historical implementations output the c text on EVERY line not 147*56001Sbostic included in the two address range in the case of a negation '!'. 14855924Sbostic 149*56001SbosticTK Diomidis, this seems reasonable, I don't see where the standard 150*56001SbosticTK conflicts with this. 15155924Sbostic 152*56001Sbostic17. The standard does not specify that the p flag at the s command will 153*56001Sbostic write the pattern space plus a newline on the standard output 154*56001Sbostic 155*56001SbosticTK I think this is covered by the general language aruond 11293 156*56001SbosticTK that says that the pattern space is always followed by a newline 157*56001SbosticTK when output. 158*56001Sbostic 159*56001Sbostic18. The standard does not specify whether address ranges are 160*56001Sbostic checked and reset if a command is not executed due to a 161*56001Sbostic jump. The following program can behave in two different 162*56001Sbostic ways depending on whether the range operator is reset at 163*56001Sbostic line 6 or not. This is important in the case of pattern 164*56001Sbostic matches. 165*56001Sbostic 16655924Sbostic sed -n -e ' 16755924Sbostic 4,8b 16855924Sbostic s/^/XXX/p 16955924Sbostic 1,6 { 17055924Sbostic p 17155924Sbostic }' 17255924Sbostic 173*56001SbosticTK I don't understand this -- can you explain further? 174*56001SbosticDDS The 1,6 operator will not be executed on line 6 (due to the 4,8b 175*56001SbosticDDS line) and thus it will not clear. In this case you can check for 176*56001SbosticDDS line > 6 in apply, but what if the 1,6 was /BEGIN/,/END/ 177*56001SbosticTK OK, I understand, now. Well, I think I do, anyhow. It seems to 178*56001SbosticTK me that applies() will never see the 1,6 line under any circumstances 179*56001SbosticTK (even if it was /BEGIN/,/END/ because for lines 4 through 8. 180*56001SbosticTK A nastier example, as you point out, is: 181*56001SbosticTK 2,4b 182*56001SbosticTK /one/,/three/c\ 183*56001SbosticTK append some text 184*56001SbosticTK 185*56001SbosticTK The BSD sed appends the text after the "branch" no longer applies, 186*56001SbosticTK i.e. with the input: one\ntwo\nthree\nfour\nfive\nsix it displays 187*56001SbosticTK two\nthree\nfour\nappend some text BUT THEN IT STOPS! 188*56001SbosticTK Our sed, of course, simply never outputs "append some text". It 189*56001SbosticTK seems to me that our current approach is "right", because it would 190*56001SbosticTK be possible to have: 191*56001SbosticTK 1,4b 192*56001SbosticTK /one/,/five/c\ 193*56001SbosticTK message 194*56001SbosticTK 195*56001SbosticTK where you only want to see "message" if the patterns "one" ... "five" 196*56001SbosticTK occur, but not in lines 1 to 4. What do you think? 19755924Sbostic 198*56001Sbostic18. Historical implementations allow an output suppressing #n at the 199*56001Sbostic beginning of -e arguments as well. This implementation follows 200*56001Sbostic historical practice. 20155924Sbostic 202*56001Sbostic19. POSIX does not specify whether more than one numeric flag is 203*56001Sbostic allowed on the s command 20455924Sbostic 205*56001SbosticTK What's historic practice? Currently we don't report an error or 206*56001Sbostic do all of the flags. 20755924Sbostic 208*56001Sbostic20. The standard does not specify whether a script is mandatory. 209*56001Sbostic Historic sed implementations behave differently with ls | sed 210*56001Sbostic (no output) and ls | sed - e'' (behaves like cat). 21155924Sbostic 212*56001SbosticTK I don't understand what 'sed - e' does (it should be illegal, 213*56001SbosticTK right?) It seems to me that a script should be mandatory, 214*56001SbosticTK and sed should fail with an error if not given one. 21555924Sbostic 216*56001Sbostic21. The requirement to open all wfiles from the beginning makes sed 217*56001Sbostic behave nonintuitively when the w commands are preceded by addresses 218*56001Sbostic or are within conditional blocks. This implementation follows 219*56001Sbostic historic practice, by default, and provides a flag for more 220*56001Sbostic reasonable behavior. 22155924Sbostic 222*56001SbosticTK I'll put it on my TODO list... ;-} 22355924Sbostic 224*56001Sbostic22. The rule specified in lines 11412-11413 of the standard does 225*56001Sbostic not seem consistent with existing practice. Historic sed 226*56001Sbostic implementations I tested copied the rfile on standard output 227*56001Sbostic every time the r command was executed and not before reading 228*56001Sbostic a line of input. The wording should be changed to be 229*56001Sbostic consistent with the 'a' command i.e. 23055924Sbostic 231*56001SbosticTK Something got dropped, here... Can you explain furtehr what 232*56001SbosticTK historic versoins did, what they should do, what we do? 23355924Sbostic 234*56001Sbostic23. The standard does not specify how excape sequences other 235*56001Sbostic than \n and \D (where D is the delimiter character) are to 236*56001Sbostic be treated. A strict interpretation would be that they 237*56001Sbostic should be treated literaly. In the sed implementations I 238*56001Sbostic have tried the \ is simply ingored. 23955924Sbostic 240*56001SbosticTK I don't understand what you're saying, here. Can you explain? 24155924Sbostic 242*56001Sbostic24. The standard specifies that an address can be "empty". This 243*56001Sbostic implies that constructs like ,d or 1,d and ,5d are allowed. 244*56001Sbostic This is not true for historic implementations of sed. This 245*56001Sbostic implementation follows historic practice. 246*56001Sbostic 247*56001Sbostic25. The b t and : commands ignore leading white space, but not 248*56001Sbostic trailing white space. This is not specified in the standard. 249*56001Sbostic 250*56001SbosticTK I think that line 11347 points out the the synopsis shows 251*56001SbosticTK which are valid. 252*56001Sbostic 253*56001Sbostic Although the standard specifies that reading from files that 254*56001Sbostic do not exist from within the script must not terminate the 255*56001Sbostic script, it does not specify what happens if a write command 256*56001Sbostic fails. Historic practice is to fail immediately if the file 257*56001Sbostic cannot be open or written. This implementation follows that 258*56001Sbostic practice. 259*56001Sbostic 260*56001Sbostic26. Historic practice is that the \n construct can be used for 261*56001Sbostic either string1 or string2 of the y command. This is not 262*56001Sbostic specified by the standard. This implementation follows 263*56001Sbostic historic practice. 264*56001Sbostic 265*56001Sbostic29. The standard does not specify if the "nth occurrence" of a 266*56001Sbostic regular expression in a substitute command is an overlapping 267*56001Sbostic or a non-overlapping one, e.g. what is the result of s/a*/A/2 268*56001Sbostic on the pattern "aaaaa aaaaa". Historical practice is to drop 269*56001Sbostic core or do non-overlapping expressions. This implementation 270*56001Sbostic follows historic practice. 271*56001Sbostic 272*56001Sbostic30. Historic implementations of sed ignore the regular expression 273*56001Sbostic delimiter characters within character classes. This is not 274*56001Sbostic specified in the standard. This implementation follows historic 275*56001Sbostic practice. 276