xref: /csrg-svn/usr.bin/sed/POSIX (revision 62224)
1*62224Sbostic#	@(#)POSIX	8.1 (Berkeley) 06/06/93
255924Sbostic
356002SbosticComments on the IEEE P1003.2 Draft 12
456002Sbostic     Part 2: Shell and Utilities
556002Sbostic  Section 4.55: sed - Stream editor
655924Sbostic
756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk>
856002SbosticKeith Bostic <bostic@cs.berkeley.edu>
955924Sbostic
1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with
1156002Sbostichistoric practice", as most of the following comments refer to
1256002Sbosticundocumented inconsistencies between the historical versions of sed and
1356002Sbosticthe POSIX 1003.2 standard.  All the comments are notes taken while
1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be
1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee.
1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
1756001Sbostic
1856083Sbostic 1.	32V and BSD derived implementations of sed strip the text
1956083Sbostic	arguments of the a, c and i commands of their initial blanks,
2056083Sbostic	i.e.
2155924Sbostic
2255924Sbostic	#!/bin/sed -f
2355924Sbostic	a\
2455924Sbostic		foo\
2556083Sbostic		\  indent\
2655924Sbostic		bar
2755924Sbostic
2856001Sbostic	produces:
2955924Sbostic
3055924Sbostic	foo
3156083Sbostic	  indent
3255924Sbostic	bar
3355924Sbostic
3456083Sbostic	POSIX does not specify this behavior as the System V versions of
3556083Sbostic	sed do not do this stripping.  The argument against stripping is
3656083Sbostic	that it is difficult to write sed scripts that have leading blanks
3756083Sbostic	if they are stripped.  The argument for stripping is that it is
3856083Sbostic	difficult to write readable sed scripts unless indentation is allowed
3956083Sbostic	and ignored, and leading whitespace is obtainable by entering a
4056083Sbostic	backslash in front of it.  This implementation follows the BSD
4156002Sbostic	historic practice.
4255924Sbostic
4356010Sbostic 2.	Historical versions of sed required that the w flag be the last
4456002Sbostic	flag to an s command as it takes an additional argument.  This
4556002Sbostic	is obvious, but not specified in POSIX.
4656001Sbostic
4756010Sbostic 3.	Historical versions of sed required that whitespace follow a w
4856002Sbostic	flag to an s command.  This is not specified in POSIX.  This
4956002Sbostic	implementation permits whitespace but does not require it.
5055924Sbostic
5156010Sbostic 4.	Historical versions of sed permitted any number of whitespace
5256002Sbostic	characters to follow the w command.  This is not specified in
5356002Sbostic	POSIX.  This implementation permits whitespace but does not
5456002Sbostic	require it.
5555924Sbostic
5656010Sbostic 5.	The rule for the l command differs from historic practice.  Table
5756002Sbostic	2-15 includes the various ANSI C escape sequences, including \\
5856002Sbostic	for backslash.  Some historical versions of sed displayed two
5956010Sbostic	digit octal numbers, too, not three as specified by POSIX.  POSIX
6056010Sbostic	is a cleanup, and is followed by this implementation.
6155924Sbostic
6256010Sbostic 6.	The POSIX specification for ! does not specify that for a single
6356001Sbostic	command the command must not contain an address specification
6456010Sbostic	whereas the command list can contain address specifications.  The
6556010Sbostic	specification for ! implies that "3!/hello/p" works, and it never
6656047Sbostic	has, historically.  Note,
6755924Sbostic
6856047Sbostic		3!{
6956047Sbostic			/hello/p
7056047Sbostic		}
7156047Sbostic
7256047Sbostic	does work.
7356047Sbostic
7456010Sbostic 7.	POSIX does not specify what happens with consecutive ! commands
7556002Sbostic	(e.g. /foo/!!!p).  Historic implementations allow any number of
7656002Sbostic	!'s without changing the behaviour.  (It seems logical that each
7756002Sbostic	one might reverse the behaviour.)  This implementation follows
7856002Sbostic	historic practice.
7956001Sbostic
8056010Sbostic 8.	Historic versions of sed permitted commands to be separated
8156002Sbostic	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
8256001Sbostic	three lines of a file.  This is not specified by POSIX.
8356001Sbostic	Note, the ; command separator is not allowed for the commands
8456001Sbostic	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
8556002Sbostic	command.  This implementation follows historic practice and
8656002Sbostic	implements the ; separator.
8756001Sbostic
8856010Sbostic 9.	Historic versions of sed terminated the script if EOF was reached
8956002Sbostic	during the execution of the 'n' command, i.e.:
9056001Sbostic
9155924Sbostic	sed -e '
9255924Sbostic	n
9355924Sbostic	i\
9455924Sbostic	hello
9555924Sbostic	' </dev/null
9655924Sbostic
9756002Sbostic	did not produce any output.  POSIX does not specify this behavior.
9856002Sbostic	This implementation follows historic practice.
9955924Sbostic
10058535Sbostic10.	Deleted.
10155924Sbostic
10256010Sbostic11.	Historical implementations do not output the change text of a c
10356047Sbostic	command in the case of an address range whose first line number
10456047Sbostic	is greater than the second (e.g. 3,1).  POSIX requires that the
10556002Sbostic	text be output.  Since the historic behavior doesn't seem to have
10656002Sbostic	any particular purpose, this implementation follows the POSIX
10756002Sbostic	behavior.
10855924Sbostic
10956010Sbostic12.	POSIX does not specify whether address ranges are checked and
11056002Sbostic	reset if a command is not executed due to a jump.  The following
11156067Sbostic	program will behave in different ways depending on whether the
11256067Sbostic	'c' command is triggered at the third line, i.e. will the text
11356080Sbostic	be output even though line 3 of the input will never logically
11456080Sbostic	encounter that command.
11555924Sbostic
11656002Sbostic	2,4b
11756067Sbostic	1,3c\
11856067Sbostic		text
11955924Sbostic
12056080Sbostic	Historic implementations, and this implementation, do not output
12156080Sbostic	the text in the above example.  The general rule, therefore,
12256080Sbostic	is that a range whose second address is never matched extends to
12356080Sbostic	the end of the input.
12456067Sbostic
12556010Sbostic13.	Historical implementations allow an output suppressing #n at the
12656002Sbostic	beginning of -e arguments as well as in a script file.  POSIX
12756002Sbostic	does not specify this.  This implementation follows historical
12856002Sbostic	practice.
12955924Sbostic
13056010Sbostic14.	POSIX does not explicitly specify how sed behaves if no script is
13156002Sbostic	specified.  Since the sed Synopsis permits this form of the command,
13256002Sbostic	and the language in the Description section states that the input
13356002Sbostic	is output, it seems reasonable that it behave like the cat(1)
13456002Sbostic	command.  Historic sed implementations behave differently for "ls |
13556010Sbostic	sed", where they produce no output, and "ls | sed -e#", where they
13656010Sbostic	behave like cat.  This implementation behaves like cat in both cases.
13755924Sbostic
13856083Sbostic15.	The POSIX requirement to open all w files at the beginning makes
13956002Sbostic	sed behave nonintuitively when the w commands are preceded by
14056002Sbostic	addresses or are within conditional blocks.  This implementation
14156002Sbostic	follows historic practice and POSIX, by default, and provides the
14256010Sbostic	-a option which opens the files only when they are needed.
14355924Sbostic
14456010Sbostic16.	POSIX does not specify how escape sequences other than \n and \D
14556002Sbostic	(where D is the delimiter character) are to be treated.  This is
14656010Sbostic	reasonable, however, it also doesn't state that the backslash is
14756010Sbostic	to be discarded from the output regardless.  A strict reading of
14856010Sbostic	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
14956010Sbostic	As historic sed implementations always discarded the backslash,
15056010Sbostic	this implementation does as well.
15155924Sbostic
15256010Sbostic17.	POSIX specifies that an address can be "empty".  This implies
15356010Sbostic	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
15456010Sbostic	is not true for historic implementations or this implementation
15556010Sbostic	of sed.
15655924Sbostic
15756010Sbostic18.	The b t and : commands are documented in POSIX to ignore leading
15856002Sbostic	white space, but no mention is made of trailing white space.
15956002Sbostic	Historic implementations of sed assigned different locations to
16056002Sbostic	the labels "x" and "x ".  This is not useful, and leads to subtle
16156010Sbostic	programming errors, but it is historic practice and changing it
16256047Sbostic	could theoretically break working scripts.  This implementation
16356047Sbostic	follows historic practice.
16455924Sbostic
16556010Sbostic19.	Although POSIX specifies that reading from files that do not exist
16656002Sbostic	from within the script must not terminate the script, it does not
16756002Sbostic	specify what happens if a write command fails.  Historic practice
16856010Sbostic	is to fail immediately if the file cannot be opened or written.
16956010Sbostic	This implementation follows historic practice.
17056001Sbostic
17156010Sbostic20.	Historic practice is that the \n construct can be used for either
17256002Sbostic	string1 or string2 of the y command.  This is not specified by
17356002Sbostic	POSIX.  This implementation follows historic practice.
17456001Sbostic
17558537Sbostic21.	Deleted.
17656001Sbostic
17756010Sbostic22.	Historic implementations of sed ignore the RE delimiter characters
17856010Sbostic	within character classes.  This is not specified in POSIX.  This
17956010Sbostic	implementation follows historic practice.
18056016Sbostic
18156016Sbostic23.	Historic implementations handle empty RE's in a special way: the
18256016Sbostic	empty RE is interpreted as if it were the last RE encountered,
18356016Sbostic	whether in an address or elsewhere.  POSIX does not document this
18456016Sbostic	behavior.  For example the command:
18556016Sbostic
18656016Sbostic		sed -e /abc/s//XXX/
18756016Sbostic
18856016Sbostic	substitutes XXX for the pattern abc.  The semantics of "the last
18956016Sbostic	RE" can be defined in two different ways:
19056016Sbostic
19156016Sbostic	1. The last RE encountered when compiling (lexical/static scope).
19256016Sbostic	2. The last RE encountered while running (dynamic scope).
19356016Sbostic
19456016Sbostic	While many historical implementations fail on programs depending
19556016Sbostic	on scope differences, the SunOS version exhibited dynamic scope
19656083Sbostic	behaviour.  This implementation does dynamic scoping, as this seems
19756083Sbostic	the most useful and in order to remain consistent with historical
19856083Sbostic	practice.
199