xref: /csrg-svn/usr.bin/sed/POSIX (revision 56010)
1*56010Sbostic#	@(#)POSIX	5.4 (Berkeley) 08/24/92
255924Sbostic
356002SbosticComments on the IEEE P1003.2 Draft 12
456002Sbostic     Part 2: Shell and Utilities
556002Sbostic  Section 4.55: sed - Stream editor
655924Sbostic
756002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk>
856002SbosticKeith Bostic <bostic@cs.berkeley.edu>
955924Sbostic
1056002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with
1156002Sbostichistoric practice", as most of the following comments refer to
1256002Sbosticundocumented inconsistencies between the historical versions of sed and
1356002Sbosticthe POSIX 1003.2 standard.  All the comments are notes taken while
1456002Sbosticimplementing a POSIX-compatible version of sed, and should not be
1556002Sbosticinterpreted as official opinions or criticism towards the POSIX committee.
1656002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
1756001Sbostic
1856002Sbostic 1.	Historic implementations of sed strip the text arguments of the
1956002Sbostic	a, c and i commands of their initial blanks, i.e.
2055924Sbostic
2155924Sbostic	#!/bin/sed -f
2255924Sbostic	a\
2355924Sbostic		foo\
2455924Sbostic		bar
2555924Sbostic
2656001Sbostic	produces:
2755924Sbostic
2855924Sbostic	foo
2955924Sbostic	bar
3055924Sbostic
3156002Sbostic	POSIX does not specify this behavior.  This implementation follows
3256002Sbostic	historic practice.
3355924Sbostic
34*56010Sbostic 2.	Historical versions of sed required that the w flag be the last
3556002Sbostic	flag to an s command as it takes an additional argument.  This
3656002Sbostic	is obvious, but not specified in POSIX.
3756001Sbostic
38*56010Sbostic 3.	Historical versions of sed required that whitespace follow a w
3956002Sbostic	flag to an s command.  This is not specified in POSIX.  This
4056002Sbostic	implementation permits whitespace but does not require it.
4155924Sbostic
42*56010Sbostic 4.	Historical versions of sed permitted any number of whitespace
4356002Sbostic	characters to follow the w command.  This is not specified in
4456002Sbostic	POSIX.  This implementation permits whitespace but does not
4556002Sbostic	require it.
4655924Sbostic
47*56010Sbostic 5.	The rule for the l command differs from historic practice.  Table
4856002Sbostic	2-15 includes the various ANSI C escape sequences, including \\
4956002Sbostic	for backslash.  Some historical versions of sed displayed two
50*56010Sbostic	digit octal numbers, too, not three as specified by POSIX.  POSIX
51*56010Sbostic	is a cleanup, and is followed by this implementation.
5255924Sbostic
53*56010Sbostic 6.	The POSIX specification for ! does not specify that for a single
5456001Sbostic	command the command must not contain an address specification
55*56010Sbostic	whereas the command list can contain address specifications.  The
56*56010Sbostic	specification for ! implies that "3!/hello/p" works, and it never
57*56010Sbostic	has, historically.  (Note, "3!{ /hello/p }" does work.)
5855924Sbostic
59*56010Sbostic 7.	POSIX does not specify what happens with consecutive ! commands
6056002Sbostic	(e.g. /foo/!!!p).  Historic implementations allow any number of
6156002Sbostic	!'s without changing the behaviour.  (It seems logical that each
6256002Sbostic	one might reverse the behaviour.)  This implementation follows
6356002Sbostic	historic practice.
6456001Sbostic
65*56010Sbostic 8.	Historic versions of sed permitted commands to be separated
6656002Sbostic	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
6756001Sbostic	three lines of a file.  This is not specified by POSIX.
6856001Sbostic	Note, the ; command separator is not allowed for the commands
6956001Sbostic	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
7056002Sbostic	command.  This implementation follows historic practice and
7156002Sbostic	implements the ; separator.
7256001Sbostic
73*56010Sbostic 9.	Historic versions of sed terminated the script if EOF was reached
7456002Sbostic	during the execution of the 'n' command, i.e.:
7556001Sbostic
7655924Sbostic	sed -e '
7755924Sbostic	n
7855924Sbostic	i\
7955924Sbostic	hello
8055924Sbostic	' </dev/null
8155924Sbostic
8256002Sbostic	did not produce any output.  POSIX does not specify this behavior.
8356002Sbostic	This implementation follows historic practice.
8455924Sbostic
85*56010Sbostic10.	POSIX does not specify that the q command causes all lines that
8656002Sbostic	have been appended to be output and that the pattern space is
8756002Sbostic	printed before exiting.  This implementation follows historic
8856002Sbostic	practice.
8955924Sbostic
90*56010Sbostic11.	Historical implementations do not output the change text of a c
9156002Sbostic	command in the case of an address range whose second line number
9256002Sbostic	is greater than the first (e.g. 3,1).  POSIX requires that the
9356002Sbostic	text be output.  Since the historic behavior doesn't seem to have
9456002Sbostic	any particular purpose, this implementation follows the POSIX
9556002Sbostic	behavior.
9655924Sbostic
97*56010Sbostic12.	POSIX does not specify whether address ranges are checked and
9856002Sbostic	reset if a command is not executed due to a jump.  The following
9956002Sbostic	program, with the input "one\ntwo\nthree\nfour\nfive" can behave
10056002Sbostic	in different ways depending on whether the the /one/,/three/c
10156002Sbostic	command is triggered at the third line.
10255924Sbostic
10356002Sbostic	2,4b
10456002Sbostic	/one,/three/c\
10556002Sbostic		append some text
10655924Sbostic
10756002Sbostic	Historic implementations of sed, for the above example, would
10856002Sbostic	output the text after the "branch" no longer applied, but would
10956002Sbostic	then quit without further processing.  This implementation has
11056002Sbostic	the more intuitive behavior of never outputting the text at all.
11156002Sbostic	This is based on the belief that it would be reasonable to want
11256002Sbostic	to output some text if the pattern /one/,/three/ occurs but only
11356002Sbostic	if it occurs outside of the range of lines 2 to 4.
11455924Sbostic
115*56010Sbostic13.	Historical implementations allow an output suppressing #n at the
11656002Sbostic	beginning of -e arguments as well as in a script file.  POSIX
11756002Sbostic	does not specify this.  This implementation follows historical
11856002Sbostic	practice.
11955924Sbostic
120*56010Sbostic14.	POSIX does not explicitly specify how sed behaves if no script is
12156002Sbostic	specified.  Since the sed Synopsis permits this form of the command,
12256002Sbostic	and the language in the Description section states that the input
12356002Sbostic	is output, it seems reasonable that it behave like the cat(1)
12456002Sbostic	command.  Historic sed implementations behave differently for "ls |
125*56010Sbostic	sed", where they produce no output, and "ls | sed -e#", where they
126*56010Sbostic	behave like cat.  This implementation behaves like cat in both cases.
12755924Sbostic
128*56010Sbostic15.	The POSIX requirement to open all wfiles from the beginning makes
12956002Sbostic	sed behave nonintuitively when the w commands are preceded by
13056002Sbostic	addresses or are within conditional blocks.  This implementation
13156002Sbostic	follows historic practice and POSIX, by default, and provides the
132*56010Sbostic	-a option which opens the files only when they are needed.
13355924Sbostic
134*56010Sbostic16.	POSIX does not specify how escape sequences other than \n and \D
13556002Sbostic	(where D is the delimiter character) are to be treated.  This is
136*56010Sbostic	reasonable, however, it also doesn't state that the backslash is
137*56010Sbostic	to be discarded from the output regardless.  A strict reading of
138*56010Sbostic	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
139*56010Sbostic	As historic sed implementations always discarded the backslash,
140*56010Sbostic	this implementation does as well.
14155924Sbostic
142*56010Sbostic17.	POSIX specifies that an address can be "empty".  This implies
143*56010Sbostic	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
144*56010Sbostic	is not true for historic implementations or this implementation
145*56010Sbostic	of sed.
14655924Sbostic
147*56010Sbostic18.	The b t and : commands are documented in POSIX to ignore leading
14856002Sbostic	white space, but no mention is made of trailing white space.
14956002Sbostic	Historic implementations of sed assigned different locations to
15056002Sbostic	the labels "x" and "x ".  This is not useful, and leads to subtle
151*56010Sbostic	programming errors, but it is historic practice and changing it
152*56010Sbostic	could theoretically break working scripts.
15355924Sbostic
154*56010Sbostic19.	Although POSIX specifies that reading from files that do not exist
15556002Sbostic	from within the script must not terminate the script, it does not
15656002Sbostic	specify what happens if a write command fails.  Historic practice
157*56010Sbostic	is to fail immediately if the file cannot be opened or written.
158*56010Sbostic	This implementation follows historic practice.
15956001Sbostic
160*56010Sbostic20.	Historic practice is that the \n construct can be used for either
16156002Sbostic	string1 or string2 of the y command.  This is not specified by
16256002Sbostic	POSIX.  This implementation follows historic practice.
16356001Sbostic
164*56010Sbostic21.	POSIX does not specify if the "Nth occurrence" of an RE in a
16556002Sbostic	substitute command is an overlapping or a non-overlapping one,
16656002Sbostic	i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa".
16756002Sbostic	Historical practice is to drop core or only do non-overlapping
168*56010Sbostic	RE's.  This implementation only does on-overlapping RE's.
16956001Sbostic
170*56010Sbostic22.	Historic implementations of sed ignore the RE delimiter characters
171*56010Sbostic	within character classes.  This is not specified in POSIX.  This
172*56010Sbostic	implementation follows historic practice.
173