1*56083Sbostic# @(#)POSIX 5.9 (Berkeley) 08/28/92 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 18*56083Sbostic 1. 32V and BSD derived implementations of sed strip the text 19*56083Sbostic arguments of the a, c and i commands of their initial blanks, 20*56083Sbostic i.e. 2155924Sbostic 2255924Sbostic #!/bin/sed -f 2355924Sbostic a\ 2455924Sbostic foo\ 25*56083Sbostic \ indent\ 2655924Sbostic bar 2755924Sbostic 2856001Sbostic produces: 2955924Sbostic 3055924Sbostic foo 31*56083Sbostic indent 3255924Sbostic bar 3355924Sbostic 34*56083Sbostic POSIX does not specify this behavior as the System V versions of 35*56083Sbostic sed do not do this stripping. The argument against stripping is 36*56083Sbostic that it is difficult to write sed scripts that have leading blanks 37*56083Sbostic if they are stripped. The argument for stripping is that it is 38*56083Sbostic difficult to write readable sed scripts unless indentation is allowed 39*56083Sbostic and ignored, and leading whitespace is obtainable by entering a 40*56083Sbostic backslash in front of it. This implementation follows the BSD 4156002Sbostic historic practice. 4255924Sbostic 4356010Sbostic 2. Historical versions of sed required that the w flag be the last 4456002Sbostic flag to an s command as it takes an additional argument. This 4556002Sbostic is obvious, but not specified in POSIX. 4656001Sbostic 4756010Sbostic 3. Historical versions of sed required that whitespace follow a w 4856002Sbostic flag to an s command. This is not specified in POSIX. This 4956002Sbostic implementation permits whitespace but does not require it. 5055924Sbostic 5156010Sbostic 4. Historical versions of sed permitted any number of whitespace 5256002Sbostic characters to follow the w command. This is not specified in 5356002Sbostic POSIX. This implementation permits whitespace but does not 5456002Sbostic require it. 5555924Sbostic 5656010Sbostic 5. The rule for the l command differs from historic practice. Table 5756002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 5856002Sbostic for backslash. Some historical versions of sed displayed two 5956010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 6056010Sbostic is a cleanup, and is followed by this implementation. 6155924Sbostic 6256010Sbostic 6. The POSIX specification for ! does not specify that for a single 6356001Sbostic command the command must not contain an address specification 6456010Sbostic whereas the command list can contain address specifications. The 6556010Sbostic specification for ! implies that "3!/hello/p" works, and it never 6656047Sbostic has, historically. Note, 6755924Sbostic 6856047Sbostic 3!{ 6956047Sbostic /hello/p 7056047Sbostic } 7156047Sbostic 7256047Sbostic does work. 7356047Sbostic 7456010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 7556002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 7656002Sbostic !'s without changing the behaviour. (It seems logical that each 7756002Sbostic one might reverse the behaviour.) This implementation follows 7856002Sbostic historic practice. 7956001Sbostic 8056010Sbostic 8. Historic versions of sed permitted commands to be separated 8156002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 8256001Sbostic three lines of a file. This is not specified by POSIX. 8356001Sbostic Note, the ; command separator is not allowed for the commands 8456001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 8556002Sbostic command. This implementation follows historic practice and 8656002Sbostic implements the ; separator. 8756001Sbostic 8856010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 8956002Sbostic during the execution of the 'n' command, i.e.: 9056001Sbostic 9155924Sbostic sed -e ' 9255924Sbostic n 9355924Sbostic i\ 9455924Sbostic hello 9555924Sbostic ' </dev/null 9655924Sbostic 9756002Sbostic did not produce any output. POSIX does not specify this behavior. 9856002Sbostic This implementation follows historic practice. 9955924Sbostic 10056010Sbostic10. POSIX does not specify that the q command causes all lines that 10156002Sbostic have been appended to be output and that the pattern space is 10256002Sbostic printed before exiting. This implementation follows historic 10356002Sbostic practice. 10455924Sbostic 10556010Sbostic11. Historical implementations do not output the change text of a c 10656047Sbostic command in the case of an address range whose first line number 10756047Sbostic is greater than the second (e.g. 3,1). POSIX requires that the 10856002Sbostic text be output. Since the historic behavior doesn't seem to have 10956002Sbostic any particular purpose, this implementation follows the POSIX 11056002Sbostic behavior. 11155924Sbostic 11256010Sbostic12. POSIX does not specify whether address ranges are checked and 11356002Sbostic reset if a command is not executed due to a jump. The following 11456067Sbostic program will behave in different ways depending on whether the 11556067Sbostic 'c' command is triggered at the third line, i.e. will the text 11656080Sbostic be output even though line 3 of the input will never logically 11756080Sbostic encounter that command. 11855924Sbostic 11956002Sbostic 2,4b 12056067Sbostic 1,3c\ 12156067Sbostic text 12255924Sbostic 12356080Sbostic Historic implementations, and this implementation, do not output 12456080Sbostic the text in the above example. The general rule, therefore, 12556080Sbostic is that a range whose second address is never matched extends to 12656080Sbostic the end of the input. 12756067Sbostic 12856010Sbostic13. Historical implementations allow an output suppressing #n at the 12956002Sbostic beginning of -e arguments as well as in a script file. POSIX 13056002Sbostic does not specify this. This implementation follows historical 13156002Sbostic practice. 13255924Sbostic 13356010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 13456002Sbostic specified. Since the sed Synopsis permits this form of the command, 13556002Sbostic and the language in the Description section states that the input 13656002Sbostic is output, it seems reasonable that it behave like the cat(1) 13756002Sbostic command. Historic sed implementations behave differently for "ls | 13856010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 13956010Sbostic behave like cat. This implementation behaves like cat in both cases. 14055924Sbostic 141*56083Sbostic15. The POSIX requirement to open all w files at the beginning makes 14256002Sbostic sed behave nonintuitively when the w commands are preceded by 14356002Sbostic addresses or are within conditional blocks. This implementation 14456002Sbostic follows historic practice and POSIX, by default, and provides the 14556010Sbostic -a option which opens the files only when they are needed. 14655924Sbostic 14756010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 14856002Sbostic (where D is the delimiter character) are to be treated. This is 14956010Sbostic reasonable, however, it also doesn't state that the backslash is 15056010Sbostic to be discarded from the output regardless. A strict reading of 15156010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 15256010Sbostic As historic sed implementations always discarded the backslash, 15356010Sbostic this implementation does as well. 15455924Sbostic 15556010Sbostic17. POSIX specifies that an address can be "empty". This implies 15656010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 15756010Sbostic is not true for historic implementations or this implementation 15856010Sbostic of sed. 15955924Sbostic 16056010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 16156002Sbostic white space, but no mention is made of trailing white space. 16256002Sbostic Historic implementations of sed assigned different locations to 16356002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 16456010Sbostic programming errors, but it is historic practice and changing it 16556047Sbostic could theoretically break working scripts. This implementation 16656047Sbostic follows historic practice. 16755924Sbostic 16856010Sbostic19. Although POSIX specifies that reading from files that do not exist 16956002Sbostic from within the script must not terminate the script, it does not 17056002Sbostic specify what happens if a write command fails. Historic practice 17156010Sbostic is to fail immediately if the file cannot be opened or written. 17256010Sbostic This implementation follows historic practice. 17356001Sbostic 17456010Sbostic20. Historic practice is that the \n construct can be used for either 17556002Sbostic string1 or string2 of the y command. This is not specified by 17656002Sbostic POSIX. This implementation follows historic practice. 17756001Sbostic 17856010Sbostic21. POSIX does not specify if the "Nth occurrence" of an RE in a 17956002Sbostic substitute command is an overlapping or a non-overlapping one, 18056002Sbostic i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 18156002Sbostic Historical practice is to drop core or only do non-overlapping 18256047Sbostic RE's. This implementation only does non-overlapping RE's. 18356001Sbostic 18456010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 18556010Sbostic within character classes. This is not specified in POSIX. This 18656010Sbostic implementation follows historic practice. 18756016Sbostic 18856016Sbostic23. Historic implementations handle empty RE's in a special way: the 18956016Sbostic empty RE is interpreted as if it were the last RE encountered, 19056016Sbostic whether in an address or elsewhere. POSIX does not document this 19156016Sbostic behavior. For example the command: 19256016Sbostic 19356016Sbostic sed -e /abc/s//XXX/ 19456016Sbostic 19556016Sbostic substitutes XXX for the pattern abc. The semantics of "the last 19656016Sbostic RE" can be defined in two different ways: 19756016Sbostic 19856016Sbostic 1. The last RE encountered when compiling (lexical/static scope). 19956016Sbostic 2. The last RE encountered while running (dynamic scope). 20056016Sbostic 20156016Sbostic While many historical implementations fail on programs depending 20256016Sbostic on scope differences, the SunOS version exhibited dynamic scope 203*56083Sbostic behaviour. This implementation does dynamic scoping, as this seems 204*56083Sbostic the most useful and in order to remain consistent with historical 205*56083Sbostic practice. 206