xref: /csrg-svn/usr.bin/sed/POSIX (revision 56083)
1*56083Sbostic#	@(#)POSIX	5.9 (Berkeley) 08/28/92
255924Sbostic
356002SbosticComments on the IEEE P1003.2 Draft 12
456002Sbostic     Part 2: Shell and Utilities
556002Sbostic  Section 4.55: sed - Stream editor
655924Sbostic
756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk>
856002SbosticKeith Bostic <bostic@cs.berkeley.edu>
955924Sbostic
1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with
1156002Sbostichistoric practice", as most of the following comments refer to
1256002Sbosticundocumented inconsistencies between the historical versions of sed and
1356002Sbosticthe POSIX 1003.2 standard.  All the comments are notes taken while
1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be
1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee.
1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
1756001Sbostic
18*56083Sbostic 1.	32V and BSD derived implementations of sed strip the text
19*56083Sbostic	arguments of the a, c and i commands of their initial blanks,
20*56083Sbostic	i.e.
2155924Sbostic
2255924Sbostic	#!/bin/sed -f
2355924Sbostic	a\
2455924Sbostic		foo\
25*56083Sbostic		\  indent\
2655924Sbostic		bar
2755924Sbostic
2856001Sbostic	produces:
2955924Sbostic
3055924Sbostic	foo
31*56083Sbostic	  indent
3255924Sbostic	bar
3355924Sbostic
34*56083Sbostic	POSIX does not specify this behavior as the System V versions of
35*56083Sbostic	sed do not do this stripping.  The argument against stripping is
36*56083Sbostic	that it is difficult to write sed scripts that have leading blanks
37*56083Sbostic	if they are stripped.  The argument for stripping is that it is
38*56083Sbostic	difficult to write readable sed scripts unless indentation is allowed
39*56083Sbostic	and ignored, and leading whitespace is obtainable by entering a
40*56083Sbostic	backslash in front of it.  This implementation follows the BSD
4156002Sbostic	historic practice.
4255924Sbostic
4356010Sbostic 2.	Historical versions of sed required that the w flag be the last
4456002Sbostic	flag to an s command as it takes an additional argument.  This
4556002Sbostic	is obvious, but not specified in POSIX.
4656001Sbostic
4756010Sbostic 3.	Historical versions of sed required that whitespace follow a w
4856002Sbostic	flag to an s command.  This is not specified in POSIX.  This
4956002Sbostic	implementation permits whitespace but does not require it.
5055924Sbostic
5156010Sbostic 4.	Historical versions of sed permitted any number of whitespace
5256002Sbostic	characters to follow the w command.  This is not specified in
5356002Sbostic	POSIX.  This implementation permits whitespace but does not
5456002Sbostic	require it.
5555924Sbostic
5656010Sbostic 5.	The rule for the l command differs from historic practice.  Table
5756002Sbostic	2-15 includes the various ANSI C escape sequences, including \\
5856002Sbostic	for backslash.  Some historical versions of sed displayed two
5956010Sbostic	digit octal numbers, too, not three as specified by POSIX.  POSIX
6056010Sbostic	is a cleanup, and is followed by this implementation.
6155924Sbostic
6256010Sbostic 6.	The POSIX specification for ! does not specify that for a single
6356001Sbostic	command the command must not contain an address specification
6456010Sbostic	whereas the command list can contain address specifications.  The
6556010Sbostic	specification for ! implies that "3!/hello/p" works, and it never
6656047Sbostic	has, historically.  Note,
6755924Sbostic
6856047Sbostic		3!{
6956047Sbostic			/hello/p
7056047Sbostic		}
7156047Sbostic
7256047Sbostic	does work.
7356047Sbostic
7456010Sbostic 7.	POSIX does not specify what happens with consecutive ! commands
7556002Sbostic	(e.g. /foo/!!!p).  Historic implementations allow any number of
7656002Sbostic	!'s without changing the behaviour.  (It seems logical that each
7756002Sbostic	one might reverse the behaviour.)  This implementation follows
7856002Sbostic	historic practice.
7956001Sbostic
8056010Sbostic 8.	Historic versions of sed permitted commands to be separated
8156002Sbostic	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
8256001Sbostic	three lines of a file.  This is not specified by POSIX.
8356001Sbostic	Note, the ; command separator is not allowed for the commands
8456001Sbostic	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
8556002Sbostic	command.  This implementation follows historic practice and
8656002Sbostic	implements the ; separator.
8756001Sbostic
8856010Sbostic 9.	Historic versions of sed terminated the script if EOF was reached
8956002Sbostic	during the execution of the 'n' command, i.e.:
9056001Sbostic
9155924Sbostic	sed -e '
9255924Sbostic	n
9355924Sbostic	i\
9455924Sbostic	hello
9555924Sbostic	' </dev/null
9655924Sbostic
9756002Sbostic	did not produce any output.  POSIX does not specify this behavior.
9856002Sbostic	This implementation follows historic practice.
9955924Sbostic
10056010Sbostic10.	POSIX does not specify that the q command causes all lines that
10156002Sbostic	have been appended to be output and that the pattern space is
10256002Sbostic	printed before exiting.  This implementation follows historic
10356002Sbostic	practice.
10455924Sbostic
10556010Sbostic11.	Historical implementations do not output the change text of a c
10656047Sbostic	command in the case of an address range whose first line number
10756047Sbostic	is greater than the second (e.g. 3,1).  POSIX requires that the
10856002Sbostic	text be output.  Since the historic behavior doesn't seem to have
10956002Sbostic	any particular purpose, this implementation follows the POSIX
11056002Sbostic	behavior.
11155924Sbostic
11256010Sbostic12.	POSIX does not specify whether address ranges are checked and
11356002Sbostic	reset if a command is not executed due to a jump.  The following
11456067Sbostic	program will behave in different ways depending on whether the
11556067Sbostic	'c' command is triggered at the third line, i.e. will the text
11656080Sbostic	be output even though line 3 of the input will never logically
11756080Sbostic	encounter that command.
11855924Sbostic
11956002Sbostic	2,4b
12056067Sbostic	1,3c\
12156067Sbostic		text
12255924Sbostic
12356080Sbostic	Historic implementations, and this implementation, do not output
12456080Sbostic	the text in the above example.  The general rule, therefore,
12556080Sbostic	is that a range whose second address is never matched extends to
12656080Sbostic	the end of the input.
12756067Sbostic
12856010Sbostic13.	Historical implementations allow an output suppressing #n at the
12956002Sbostic	beginning of -e arguments as well as in a script file.  POSIX
13056002Sbostic	does not specify this.  This implementation follows historical
13156002Sbostic	practice.
13255924Sbostic
13356010Sbostic14.	POSIX does not explicitly specify how sed behaves if no script is
13456002Sbostic	specified.  Since the sed Synopsis permits this form of the command,
13556002Sbostic	and the language in the Description section states that the input
13656002Sbostic	is output, it seems reasonable that it behave like the cat(1)
13756002Sbostic	command.  Historic sed implementations behave differently for "ls |
13856010Sbostic	sed", where they produce no output, and "ls | sed -e#", where they
13956010Sbostic	behave like cat.  This implementation behaves like cat in both cases.
14055924Sbostic
141*56083Sbostic15.	The POSIX requirement to open all w files at the beginning makes
14256002Sbostic	sed behave nonintuitively when the w commands are preceded by
14356002Sbostic	addresses or are within conditional blocks.  This implementation
14456002Sbostic	follows historic practice and POSIX, by default, and provides the
14556010Sbostic	-a option which opens the files only when they are needed.
14655924Sbostic
14756010Sbostic16.	POSIX does not specify how escape sequences other than \n and \D
14856002Sbostic	(where D is the delimiter character) are to be treated.  This is
14956010Sbostic	reasonable, however, it also doesn't state that the backslash is
15056010Sbostic	to be discarded from the output regardless.  A strict reading of
15156010Sbostic	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
15256010Sbostic	As historic sed implementations always discarded the backslash,
15356010Sbostic	this implementation does as well.
15455924Sbostic
15556010Sbostic17.	POSIX specifies that an address can be "empty".  This implies
15656010Sbostic	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
15756010Sbostic	is not true for historic implementations or this implementation
15856010Sbostic	of sed.
15955924Sbostic
16056010Sbostic18.	The b t and : commands are documented in POSIX to ignore leading
16156002Sbostic	white space, but no mention is made of trailing white space.
16256002Sbostic	Historic implementations of sed assigned different locations to
16356002Sbostic	the labels "x" and "x ".  This is not useful, and leads to subtle
16456010Sbostic	programming errors, but it is historic practice and changing it
16556047Sbostic	could theoretically break working scripts.  This implementation
16656047Sbostic	follows historic practice.
16755924Sbostic
16856010Sbostic19.	Although POSIX specifies that reading from files that do not exist
16956002Sbostic	from within the script must not terminate the script, it does not
17056002Sbostic	specify what happens if a write command fails.  Historic practice
17156010Sbostic	is to fail immediately if the file cannot be opened or written.
17256010Sbostic	This implementation follows historic practice.
17356001Sbostic
17456010Sbostic20.	Historic practice is that the \n construct can be used for either
17556002Sbostic	string1 or string2 of the y command.  This is not specified by
17656002Sbostic	POSIX.  This implementation follows historic practice.
17756001Sbostic
17856010Sbostic21.	POSIX does not specify if the "Nth occurrence" of an RE in a
17956002Sbostic	substitute command is an overlapping or a non-overlapping one,
18056002Sbostic	i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa".
18156002Sbostic	Historical practice is to drop core or only do non-overlapping
18256047Sbostic	RE's.  This implementation only does non-overlapping RE's.
18356001Sbostic
18456010Sbostic22.	Historic implementations of sed ignore the RE delimiter characters
18556010Sbostic	within character classes.  This is not specified in POSIX.  This
18656010Sbostic	implementation follows historic practice.
18756016Sbostic
18856016Sbostic23.	Historic implementations handle empty RE's in a special way: the
18956016Sbostic	empty RE is interpreted as if it were the last RE encountered,
19056016Sbostic	whether in an address or elsewhere.  POSIX does not document this
19156016Sbostic	behavior.  For example the command:
19256016Sbostic
19356016Sbostic		sed -e /abc/s//XXX/
19456016Sbostic
19556016Sbostic	substitutes XXX for the pattern abc.  The semantics of "the last
19656016Sbostic	RE" can be defined in two different ways:
19756016Sbostic
19856016Sbostic	1. The last RE encountered when compiling (lexical/static scope).
19956016Sbostic	2. The last RE encountered while running (dynamic scope).
20056016Sbostic
20156016Sbostic	While many historical implementations fail on programs depending
20256016Sbostic	on scope differences, the SunOS version exhibited dynamic scope
203*56083Sbostic	behaviour.  This implementation does dynamic scoping, as this seems
204*56083Sbostic	the most useful and in order to remain consistent with historical
205*56083Sbostic	practice.
206