xref: /csrg-svn/usr.bin/sed/POSIX (revision 56002)
1*56002Sbostic#	@(#)POSIX	5.3 (Berkeley) 08/24/92
255924Sbostic
3*56002SbosticComments on the IEEE P1003.2 Draft 12
4*56002Sbostic     Part 2: Shell and Utilities
5*56002Sbostic  Section 4.55: sed - Stream editor
655924Sbostic
7*56002SbosticDiomidis Spinellis <dds@doc.ic.ac.uk>
8*56002SbosticKeith Bostic <bostic@cs.berkeley.edu>
955924Sbostic
10*56002SbosticIn the following paragraphs, "wrong" usually means "inconsistent with
11*56002Sbostichistoric practice", as most of the following comments refer to
12*56002Sbosticundocumented inconsistencies between the historical versions of sed and
13*56002Sbosticthe POSIX 1003.2 standard.  All the comments are notes taken while
14*56002Sbosticimplementing a POSIX-compatible version of sed, and should not be
15*56002Sbosticinterpreted as official opinions or criticism towards the POSIX committee.
16*56002SbosticAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
1756001Sbostic
18*56002Sbostic 1.	Historic implementations of sed strip the text arguments of the
19*56002Sbostic	a, c and i commands of their initial blanks, i.e.
2055924Sbostic
2155924Sbostic	#!/bin/sed -f
2255924Sbostic	a\
2355924Sbostic		foo\
2455924Sbostic		bar
2555924Sbostic
2656001Sbostic	produces:
2755924Sbostic
2855924Sbostic	foo
2955924Sbostic	bar
3055924Sbostic
31*56002Sbostic	POSIX does not specify this behavior.  This implementation follows
32*56002Sbostic	historic practice.
3355924Sbostic
34*56002Sbostic 2.	Historic implementations ignore comments in the text of the i
35*56002Sbostic	and a commands.  This implementation follows historic practice.
3655924Sbostic
37*56002SbosticTK	I can't duplicate this -- the BSD version of sed doesn't, i.e.
38*56002SbosticTK		i\
39*56002SbosticTK		foo\
40*56002SbosticTK	#comment\
41*56002SbosticTK		bar
42*56002SbosticTK	prints
43*56002SbosticTK
44*56002SbosticTK		foo
45*56002SbosticTK		#comment
46*56002SbosticTK		bar
4755924Sbostic
48*56002Sbostic 3.	Historical versions of sed required that the w flag be the last
49*56002Sbostic	flag to an s command as it takes an additional argument.  This
50*56002Sbostic	is obvious, but not specified in POSIX.
5156001Sbostic
52*56002Sbostic 4.	Historical versions of sed required that whitespace follow a w
53*56002Sbostic	flag to an s command.  This is not specified in POSIX.  This
54*56002Sbostic	implementation permits whitespace but does not require it.
5555924Sbostic
56*56002Sbostic 5.	Historical versions of sed permitted any number of whitespace
57*56002Sbostic	characters to follow the w command.  This is not specified in
58*56002Sbostic	POSIX.  This implementation permits whitespace but does not
59*56002Sbostic	require it.
6055924Sbostic
61*56002Sbostic 6.	The rule for the l command differs from historic practice.  Table
62*56002Sbostic	2-15 includes the various ANSI C escape sequences, including \\
63*56002Sbostic	for backslash.  Some historical versions of sed displayed two
64*56002Sbostic	digit octal numbers, too, not three as specified by POSIX.  The
65*56002Sbostic	POSIX specification is a cleanup, and this implementation follows
66*56002Sbostic	it.
6755924Sbostic
68*56002Sbostic 7.	The specification for ! does not specify that for a single
6956001Sbostic	command the command must not contain an address specification
7056001Sbostic	whereas the command list can contain address specifications.
7155924Sbostic
72*56002SbosticTK	I think this is wrong: the script:
7356001SbosticTK
7456001SbosticTK	3!p
7556001SbosticTK
76*56002SbosticTK	works fine.  Am I misunderstanding your point?
77*56002SbosticDDS	Yes.  By the definition of command by POSIX 3!/hello/p should work
78*56002SbosticDDS	just as 3!{/hello/p} does.  The current implementation follows
79*56002SbosticDDS	historic practice and does not implement it.
80*56002SbosticTK	I *still* don't understand.... Would you please try to explain
81*56002SbosticTK	it one more time?  Thanks...
8255924Sbostic
83*56002Sbostic 8.	POSIX does not specify what happens with consecutive ! commands
84*56002Sbostic	(e.g. /foo/!!!p).  Historic implementations allow any number of
85*56002Sbostic	!'s without changing the behaviour.  (It seems logical that each
86*56002Sbostic	one might reverse the behaviour.)  This implementation follows
87*56002Sbostic	historic practice.
8856001Sbostic
89*56002Sbostic 9.	Historic versions of sed permitted commands to be separated
90*56002Sbostic	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
9156001Sbostic	three lines of a file.  This is not specified by POSIX.
9256001Sbostic	Note, the ; command separator is not allowed for the commands
9356001Sbostic	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
94*56002Sbostic	command.  This implementation follows historic practice and
95*56002Sbostic	implements the ; separator.
9656001Sbostic
97*56002Sbostic10.	Historic versions of sed terminated the script if EOF was reached
98*56002Sbostic	during the execution of the 'n' command, i.e.:
9956001Sbostic
10055924Sbostic	sed -e '
10155924Sbostic	n
10255924Sbostic	i\
10355924Sbostic	hello
10455924Sbostic	' </dev/null
10555924Sbostic
106*56002Sbostic	did not produce any output.  POSIX does not specify this behavior.
107*56002Sbostic	This implementation follows historic practice.
10855924Sbostic
109*56002Sbostic11.	POSIX does not specify that the q command causes all lines that
110*56002Sbostic	have been appended to be output and that the pattern space is
111*56002Sbostic	printed before exiting.  This implementation follows historic
112*56002Sbostic	practice.
11355924Sbostic
114*56002Sbostic12.	Historical implementations do not output the change text of a c
115*56002Sbostic	command in the case of an address range whose second line number
116*56002Sbostic	is greater than the first (e.g. 3,1).  POSIX requires that the
117*56002Sbostic	text be output.  Since the historic behavior doesn't seem to have
118*56002Sbostic	any particular purpose, this implementation follows the POSIX
119*56002Sbostic	behavior.
12055924Sbostic
121*56002Sbostic13.	POSIX does not specify whether address ranges are checked and
122*56002Sbostic	reset if a command is not executed due to a jump.  The following
123*56002Sbostic	program, with the input "one\ntwo\nthree\nfour\nfive" can behave
124*56002Sbostic	in different ways depending on whether the the /one/,/three/c
125*56002Sbostic	command is triggered at the third line.
12655924Sbostic
127*56002Sbostic	2,4b
128*56002Sbostic	/one,/three/c\
129*56002Sbostic		append some text
13055924Sbostic
131*56002Sbostic	Historic implementations of sed, for the above example, would
132*56002Sbostic	output the text after the "branch" no longer applied, but would
133*56002Sbostic	then quit without further processing.  This implementation has
134*56002Sbostic	the more intuitive behavior of never outputting the text at all.
135*56002Sbostic	This is based on the belief that it would be reasonable to want
136*56002Sbostic	to output some text if the pattern /one/,/three/ occurs but only
137*56002Sbostic	if it occurs outside of the range of lines 2 to 4.
13855924Sbostic
139*56002Sbostic14.	Historical implementations allow an output suppressing #n at the
140*56002Sbostic	beginning of -e arguments as well as in a script file.  POSIX
141*56002Sbostic	does not specify this.  This implementation follows historical
142*56002Sbostic	practice.
14355924Sbostic
144*56002Sbostic15.	POSIX does not specify whether more than one numeric flag is
145*56002Sbostic	allowed on the s command.  Historic practice is to specify only
146*56002Sbostic	a single flag.
14755924Sbostic
14856001SbosticTK	What's historic practice?  Currently we don't report an error or
149*56002SbosticTK	do all of the flags.
150*56002SbosticDDS	Historic practice is a single flag.  We follow it.  POSIX
151*56002SbosticDDS	should be more precise.
152*56002SbosticTK	It actually seems reasonable to do multiple flags, i.e. display
153*56002SbosticTK	two or more of the matched patterns.  Since it's unambiguous (only
154*56002SbosticTK	1-9 are allowed, so /19 *has* to be 1 and 9, not nineteen, we can't
155*56002SbosticTK	break any existing scripts.
15655924Sbostic
157*56002Sbostic16.	POSIX does not explicitly specify how sed behaves if no script is
158*56002Sbostic	specified.  Since the sed Synopsis permits this form of the command,
159*56002Sbostic	and the language in the Description section states that the input
160*56002Sbostic	is output, it seems reasonable that it behave like the cat(1)
161*56002Sbostic	command.  Historic sed implementations behave differently for "ls |
162*56002Sbostic	sed" (no output) and "ls | sed -e#" (like cat).  This implementation
163*56002Sbostic	behaves like cat in both cases.
16455924Sbostic
165*56002Sbostic17.	The POSIX requirement to open all wfiles from the beginning makes
166*56002Sbostic	sed behave nonintuitively when the w commands are preceded by
167*56002Sbostic	addresses or are within conditional blocks.  This implementation
168*56002Sbostic	follows historic practice and POSIX, by default, and provides the
169*56002Sbostic	-a option for more reasonable behavior.
17055924Sbostic
171*56002Sbostic18.	POSIX does not specify how escape sequences other than \n and \D
172*56002Sbostic	(where D is the delimiter character) are to be treated.  This is
173*56002Sbostic	reasonable, however, it doesn't state that the backslash is to be
174*56002Sbostic	discarded from the output regardless.  A strict reading of POSIX
175*56002Sbostic	would be that "echo xyz | sed s/./\a" would display "\ayz".  As
176*56002Sbostic	historic sed implementations always discarded the backslash, this
177*56002Sbostic	implementation does as well.
17855924Sbostic
179*56002Sbostic19.	POSIX specifies that an address can be "empty".  This implies that
180*56002Sbostic	constructs like ,d or 1,d and ,5d are allowed.  This is not true
181*56002Sbostic	for historic implementations of sed.  This implementation follows
182*56002Sbostic	historic practice.
18355924Sbostic
184*56002Sbostic20.	The b t and : commands are documented in POSIX to ignore leading
185*56002Sbostic	white space, but no mention is made of trailing white space.
186*56002Sbostic	Historic implementations of sed assigned different locations to
187*56002Sbostic	the labels "x" and "x ".  This is not useful, and leads to subtle
188*56002Sbostic	programming errors.  This implementation ignores trailing whitespace.
18955924Sbostic
19056001SbosticTK	I think that line 11347 points out the the synopsis shows
19156001SbosticTK	which are valid.
192*56002SbosticDDS	I am talking about _trailing_ white space.  In our implementation
193*56002SbosticDDS	and historic implementation the label can contain _significant_
194*56002SbosticDDS	white space at its end.  This is obscure and not explained in
195*56002SbosticDDS	POSIX.
196*56002SbosticTK	I think we should delete trailing white space for the above
197*56002SbosticTK	reason.
19856001Sbostic
199*56002Sbostic21.	Although POSIX specifies that reading from files that do not exist
200*56002Sbostic	from within the script must not terminate the script, it does not
201*56002Sbostic	specify what happens if a write command fails.  Historic practice
202*56002Sbostic	is to fail immediately if the file cannot be open or written.  This
203*56002Sbostic	implementation follows historic practice.
20456001Sbostic
205*56002Sbostic22.	Historic practice is that the \n construct can be used for either
206*56002Sbostic	string1 or string2 of the y command.  This is not specified by
207*56002Sbostic	POSIX.  This implementation follows historic practice.
20856001Sbostic
209*56002Sbostic23.	POSIX does not specify if the "Nth occurrence" of an RE in a
210*56002Sbostic	substitute command is an overlapping or a non-overlapping one,
211*56002Sbostic	i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa".
212*56002Sbostic	Historical practice is to drop core or only do non-overlapping
213*56002Sbostic	expressions.  This implementation follows historic practice.
21456001Sbostic
215*56002Sbostic24.	Historic implementations of sed ignore the regular expression
21656001Sbostic	delimiter characters within character classes.  This is not
217*56002Sbostic	specified in POSIX.  This implementation follows historic practice.
218