1*62224Sbostic# @(#)POSIX 8.1 (Berkeley) 06/06/93 255924Sbostic 356002SbosticComments on the IEEE P1003.2 Draft 12 456002Sbostic Part 2: Shell and Utilities 556002Sbostic Section 4.55: sed - Stream editor 655924Sbostic 756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk> 856002SbosticKeith Bostic <bostic@cs.berkeley.edu> 955924Sbostic 1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with 1156002Sbostichistoric practice", as most of the following comments refer to 1256002Sbosticundocumented inconsistencies between the historical versions of sed and 1356002Sbosticthe POSIX 1003.2 standard. All the comments are notes taken while 1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be 1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee. 1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 1756001Sbostic 1856083Sbostic 1. 32V and BSD derived implementations of sed strip the text 1956083Sbostic arguments of the a, c and i commands of their initial blanks, 2056083Sbostic i.e. 2155924Sbostic 2255924Sbostic #!/bin/sed -f 2355924Sbostic a\ 2455924Sbostic foo\ 2556083Sbostic \ indent\ 2655924Sbostic bar 2755924Sbostic 2856001Sbostic produces: 2955924Sbostic 3055924Sbostic foo 3156083Sbostic indent 3255924Sbostic bar 3355924Sbostic 3456083Sbostic POSIX does not specify this behavior as the System V versions of 3556083Sbostic sed do not do this stripping. The argument against stripping is 3656083Sbostic that it is difficult to write sed scripts that have leading blanks 3756083Sbostic if they are stripped. The argument for stripping is that it is 3856083Sbostic difficult to write readable sed scripts unless indentation is allowed 3956083Sbostic and ignored, and leading whitespace is obtainable by entering a 4056083Sbostic backslash in front of it. This implementation follows the BSD 4156002Sbostic historic practice. 4255924Sbostic 4356010Sbostic 2. Historical versions of sed required that the w flag be the last 4456002Sbostic flag to an s command as it takes an additional argument. This 4556002Sbostic is obvious, but not specified in POSIX. 4656001Sbostic 4756010Sbostic 3. Historical versions of sed required that whitespace follow a w 4856002Sbostic flag to an s command. This is not specified in POSIX. This 4956002Sbostic implementation permits whitespace but does not require it. 5055924Sbostic 5156010Sbostic 4. Historical versions of sed permitted any number of whitespace 5256002Sbostic characters to follow the w command. This is not specified in 5356002Sbostic POSIX. This implementation permits whitespace but does not 5456002Sbostic require it. 5555924Sbostic 5656010Sbostic 5. The rule for the l command differs from historic practice. Table 5756002Sbostic 2-15 includes the various ANSI C escape sequences, including \\ 5856002Sbostic for backslash. Some historical versions of sed displayed two 5956010Sbostic digit octal numbers, too, not three as specified by POSIX. POSIX 6056010Sbostic is a cleanup, and is followed by this implementation. 6155924Sbostic 6256010Sbostic 6. The POSIX specification for ! does not specify that for a single 6356001Sbostic command the command must not contain an address specification 6456010Sbostic whereas the command list can contain address specifications. The 6556010Sbostic specification for ! implies that "3!/hello/p" works, and it never 6656047Sbostic has, historically. Note, 6755924Sbostic 6856047Sbostic 3!{ 6956047Sbostic /hello/p 7056047Sbostic } 7156047Sbostic 7256047Sbostic does work. 7356047Sbostic 7456010Sbostic 7. POSIX does not specify what happens with consecutive ! commands 7556002Sbostic (e.g. /foo/!!!p). Historic implementations allow any number of 7656002Sbostic !'s without changing the behaviour. (It seems logical that each 7756002Sbostic one might reverse the behaviour.) This implementation follows 7856002Sbostic historic practice. 7956001Sbostic 8056010Sbostic 8. Historic versions of sed permitted commands to be separated 8156002Sbostic by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 8256001Sbostic three lines of a file. This is not specified by POSIX. 8356001Sbostic Note, the ; command separator is not allowed for the commands 8456001Sbostic a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 8556002Sbostic command. This implementation follows historic practice and 8656002Sbostic implements the ; separator. 8756001Sbostic 8856010Sbostic 9. Historic versions of sed terminated the script if EOF was reached 8956002Sbostic during the execution of the 'n' command, i.e.: 9056001Sbostic 9155924Sbostic sed -e ' 9255924Sbostic n 9355924Sbostic i\ 9455924Sbostic hello 9555924Sbostic ' </dev/null 9655924Sbostic 9756002Sbostic did not produce any output. POSIX does not specify this behavior. 9856002Sbostic This implementation follows historic practice. 9955924Sbostic 10058535Sbostic10. Deleted. 10155924Sbostic 10256010Sbostic11. Historical implementations do not output the change text of a c 10356047Sbostic command in the case of an address range whose first line number 10456047Sbostic is greater than the second (e.g. 3,1). POSIX requires that the 10556002Sbostic text be output. Since the historic behavior doesn't seem to have 10656002Sbostic any particular purpose, this implementation follows the POSIX 10756002Sbostic behavior. 10855924Sbostic 10956010Sbostic12. POSIX does not specify whether address ranges are checked and 11056002Sbostic reset if a command is not executed due to a jump. The following 11156067Sbostic program will behave in different ways depending on whether the 11256067Sbostic 'c' command is triggered at the third line, i.e. will the text 11356080Sbostic be output even though line 3 of the input will never logically 11456080Sbostic encounter that command. 11555924Sbostic 11656002Sbostic 2,4b 11756067Sbostic 1,3c\ 11856067Sbostic text 11955924Sbostic 12056080Sbostic Historic implementations, and this implementation, do not output 12156080Sbostic the text in the above example. The general rule, therefore, 12256080Sbostic is that a range whose second address is never matched extends to 12356080Sbostic the end of the input. 12456067Sbostic 12556010Sbostic13. Historical implementations allow an output suppressing #n at the 12656002Sbostic beginning of -e arguments as well as in a script file. POSIX 12756002Sbostic does not specify this. This implementation follows historical 12856002Sbostic practice. 12955924Sbostic 13056010Sbostic14. POSIX does not explicitly specify how sed behaves if no script is 13156002Sbostic specified. Since the sed Synopsis permits this form of the command, 13256002Sbostic and the language in the Description section states that the input 13356002Sbostic is output, it seems reasonable that it behave like the cat(1) 13456002Sbostic command. Historic sed implementations behave differently for "ls | 13556010Sbostic sed", where they produce no output, and "ls | sed -e#", where they 13656010Sbostic behave like cat. This implementation behaves like cat in both cases. 13755924Sbostic 13856083Sbostic15. The POSIX requirement to open all w files at the beginning makes 13956002Sbostic sed behave nonintuitively when the w commands are preceded by 14056002Sbostic addresses or are within conditional blocks. This implementation 14156002Sbostic follows historic practice and POSIX, by default, and provides the 14256010Sbostic -a option which opens the files only when they are needed. 14355924Sbostic 14456010Sbostic16. POSIX does not specify how escape sequences other than \n and \D 14556002Sbostic (where D is the delimiter character) are to be treated. This is 14656010Sbostic reasonable, however, it also doesn't state that the backslash is 14756010Sbostic to be discarded from the output regardless. A strict reading of 14856010Sbostic POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 14956010Sbostic As historic sed implementations always discarded the backslash, 15056010Sbostic this implementation does as well. 15155924Sbostic 15256010Sbostic17. POSIX specifies that an address can be "empty". This implies 15356010Sbostic that constructs like ",d" or "1,d" and ",5d" are allowed. This 15456010Sbostic is not true for historic implementations or this implementation 15556010Sbostic of sed. 15655924Sbostic 15756010Sbostic18. The b t and : commands are documented in POSIX to ignore leading 15856002Sbostic white space, but no mention is made of trailing white space. 15956002Sbostic Historic implementations of sed assigned different locations to 16056002Sbostic the labels "x" and "x ". This is not useful, and leads to subtle 16156010Sbostic programming errors, but it is historic practice and changing it 16256047Sbostic could theoretically break working scripts. This implementation 16356047Sbostic follows historic practice. 16455924Sbostic 16556010Sbostic19. Although POSIX specifies that reading from files that do not exist 16656002Sbostic from within the script must not terminate the script, it does not 16756002Sbostic specify what happens if a write command fails. Historic practice 16856010Sbostic is to fail immediately if the file cannot be opened or written. 16956010Sbostic This implementation follows historic practice. 17056001Sbostic 17156010Sbostic20. Historic practice is that the \n construct can be used for either 17256002Sbostic string1 or string2 of the y command. This is not specified by 17356002Sbostic POSIX. This implementation follows historic practice. 17456001Sbostic 17558537Sbostic21. Deleted. 17656001Sbostic 17756010Sbostic22. Historic implementations of sed ignore the RE delimiter characters 17856010Sbostic within character classes. This is not specified in POSIX. This 17956010Sbostic implementation follows historic practice. 18056016Sbostic 18156016Sbostic23. Historic implementations handle empty RE's in a special way: the 18256016Sbostic empty RE is interpreted as if it were the last RE encountered, 18356016Sbostic whether in an address or elsewhere. POSIX does not document this 18456016Sbostic behavior. For example the command: 18556016Sbostic 18656016Sbostic sed -e /abc/s//XXX/ 18756016Sbostic 18856016Sbostic substitutes XXX for the pattern abc. The semantics of "the last 18956016Sbostic RE" can be defined in two different ways: 19056016Sbostic 19156016Sbostic 1. The last RE encountered when compiling (lexical/static scope). 19256016Sbostic 2. The last RE encountered while running (dynamic scope). 19356016Sbostic 19456016Sbostic While many historical implementations fail on programs depending 19556016Sbostic on scope differences, the SunOS version exhibited dynamic scope 19656083Sbostic behaviour. This implementation does dynamic scoping, as this seems 19756083Sbostic the most useful and in order to remain consistent with historical 19856083Sbostic practice. 199