xref: /csrg-svn/usr.bin/sed/POSIX (revision 56001)
1*56001Sbostic#	@(#)POSIX	5.2 (Berkeley) 08/24/92
255924Sbostic
3*56001Sbostic		Comments on the IEEE P1003.2 Draft 12
455924Sbostic
5*56001Sbostic		     Part 2: Shell and Utilities
6*56001Sbostic		  Section 4.55: sed - Stream editor
755924Sbostic
8*56001Sbostic		 Diomidis Spinellis <dds@doc.ic.ac.uk>
9*56001Sbostic
1055924SbosticIn the following paragraphs, `wrong' means `inconsistent with historic
1155924Sbosticpractice'.  Many of the comments refer to undocumented inconsistencies
1255924Sbosticbetween the historical versions of sed and the POSIX standard.  All the
1355924Sbosticcomments are notes taken while implementing a POSIX-compatible version
1455924Sbosticof sed, and should not be interpreted as official opinions or criticism
15*56001Sbostictowards the POSIX committee.  Some are insignificant, pedantic and even
1655924Sbosticwrong.
1755924Sbostic
18*56001Sbostic 1.	For the text argument of the a command it is not specified if
19*56001Sbostic	lines are stripped of their initial blanks or not.  Historical
20*56001Sbostic	practice, followed in this implementation, is to strip the
21*56001Sbostic	blanks, i.e.:
2255924Sbostic
2355924Sbostic	#!/bin/sed -f
2455924Sbostic	a\
2555924Sbostic		foo\
2655924Sbostic		bar
2755924Sbostic
28*56001Sbostic	produces:
2955924Sbostic
3055924Sbostic	foo
3155924Sbostic	bar
3255924Sbostic
33*56001Sbostic 2.	Historical versions of sed required that the w flag must be the
34*56001Sbostic	last flag to an s command as it takes an additional argument.
35*56001Sbostic	This is not specified in the standard.
3655924Sbostic
37*56001Sbostic 3.	Historical versions of sed required that whitespace follow a w
38*56001Sbostic	flag to an s command.  This is not specified in the standard.
39*56001Sbostic	This implementation permits whitespace but does not require
40*56001Sbostic	it.
4155924Sbostic
42*56001Sbostic 4.	Historical versions of sed permitted any number of whitespace
43*56001Sbostic	characters to follow the w command.  This is not specified in
44*56001Sbostic	the standard.  This implementation permits whitespace but does
45*56001Sbostic	not require it.
4655924Sbostic
47*56001Sbostic 5.	The specification of the a command is wrong.  With the current
48*56001Sbostic	specification both of these scripts should produce the same
49*56001Sbostic	output:
50*56001Sbostic
5155924Sbostic	#!/bin/sed -f
5255924Sbostic	d
5355924Sbostic	a\
5455924Sbostic	hello
5555924Sbostic
5655924Sbostic	#!/bin/sed -f
5755924Sbostic	a\
5855924Sbostic	hello
5955924Sbostic	d
6055924Sbostic
61*56001SbosticTK -- Diomidis, the current implementation looks wrong on this case.
6255924Sbostic
63*56001Sbostic 6.	The specification of the c command in conjunction with the
64*56001Sbostic	specification of the default operation (D2 11293-11299) is
65*56001Sbostic	wrong.  The default operation specifies that a newline is
66*56001Sbostic	printed after the pattern space.  This is not the case when
67*56001Sbostic	the pattern space has been deleted by a c command.
6855924Sbostic
69*56001SbosticTK Diomidis, the spec seems right to me -- the language in 11293
70*56001SbosticTK talks about copying the pattern space to stdout -- if the pattern space
71*56001SbosticTK is deleted, it can't be copied.
7255924Sbostic
73*56001Sbostic 7.	The rule for the l command differs from historic practice.
74*56001Sbostic	Table 2-15 includes the various ANSI C escape sequences,
75*56001Sbostic	including \\ for backslash.  Some historical versions of
76*56001Sbostic	sed displayed two digit octal numbers.  The POSIX
77*56001Sbostic	specification is a cleanup, and this implementation follows
78*56001Sbostic	to it.
7955924Sbostic
80*56001Sbostic 8.	The specification for ! does not specify that for a single
81*56001Sbostic	command the command must not contain an address specification
82*56001Sbostic	whereas the command list can contain address specifications.
8355924Sbostic
84*56001SbosticTK I think this is wrong: the script:
85*56001SbosticTK
86*56001SbosticTK	3!p
87*56001SbosticTK
88*56001SbosticTK works fine.  Am I misunderstanding your point?
8955924Sbostic
90*56001Sbostic 9.	The standard does not specify what happens with consecutive
91*56001Sbostic	! commands (e.g. /foo/!!!p).  Historic implementations
92*56001Sbostic	allow any number of !'s without changing behaviour.  (It
93*56001Sbostic	seems logical that each one should reverse the default
94*56001Sbostic	behaviour.)  This implementation follows historic practice.
95*56001Sbostic
96*56001Sbostic10.	Historic versions of sed permitted commands to be separated
97*56001Sbostic	by semi-colons, e.g. 'sed -ne '1p;2p;3q' prints the first
98*56001Sbostic	three lines of a file.  This is not specified by POSIX.
99*56001Sbostic	Note, the ; command separator is not allowed for the commands
100*56001Sbostic	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
101*56001Sbostic	command.  This implementation follows historic practice.
102*56001Sbostic
103*56001Sbostic11.	The standard does not specify that if EOF is reached during
104*56001Sbostic	the execution of the n command the program terminates (e.g.
105*56001Sbostic
10655924Sbostic	sed -e '
10755924Sbostic	n
10855924Sbostic	i\
10955924Sbostic	hello
11055924Sbostic	' </dev/null
11155924Sbostic
112*56001Sbostic	will not produce any output.  This implementation follows
113*56001Sbostic	historic practice.
11455924Sbostic
115*56001Sbostic12.	The standard does not specify that the q command causes all
116*56001Sbostic	lines that have been appended to be output and that the pattern
117*56001Sbostic	space is printed before exiting.  This implementation follows
118*56001Sbostic	historic practice.
11955924Sbostic
120*56001Sbostic13.	Historic implementations ignore comments in the text of the i
121*56001Sbostic	and a commands.  This implementation follows historic practice.
12255924Sbostic
123*56001Sbostic14.	Historic implementations do not consider the last line of a
124*56001Sbostic	file to match $ if an empty file follows, e.g.
12555924Sbostic
12655924Sbostic	sed -n -e '$p' /usr/dict/words /dev/null
12755924Sbostic
128*56001Sbostic	will not print anything.  This is not mentioned in the POSIX
129*56001Sbostic	specification and is almost certainly a bug.  This implementation
130*56001Sbostic	follows the POSIX specification.
13155924Sbostic
132*56001SbosticTK	Diomidis, I think we need to fix this, can you do it?
133*56001SbosticDDS	We follow POSIX.  You don't mean to do it buggy?
134*56001SbosticTK	I see... (I didn't understand that problem until now.)  I think
135*56001SbosticTK	that we *should* print out the last line of the dictionary, in
136*56001SbosticTK	the above example, but I can see how it would be hard.  What do
137*56001SbosticTK	you think?
13855924Sbostic
139*56001Sbostic15.	Historical implementations do not output the change text
140*56001Sbostic	of a c command in the case of an address range whose second
141*56001Sbostic	line number is greater than the first (e.g. 3,1).  The POSIX
142*56001Sbostic	standard requires that the text be output.  Since the historic
143*56001Sbostic	behavior doesn't seem to have any particular purpose, this
144*56001Sbostic	implementation follows the POSIX behavior.
14555924Sbostic
146*56001Sbostic16.	Historical implementations output the c text on EVERY line not
147*56001Sbostic	included in the two address range in the case of a negation '!'.
14855924Sbostic
149*56001SbosticTK	Diomidis, this seems reasonable, I don't see where the standard
150*56001SbosticTK	conflicts with this.
15155924Sbostic
152*56001Sbostic17.	The standard does not specify that the p flag at the s command will
153*56001Sbostic	write the pattern space plus a newline on the standard output
154*56001Sbostic
155*56001SbosticTK	I think this is covered by the general language aruond 11293
156*56001SbosticTK	that says that the pattern space is always followed by a newline
157*56001SbosticTK	when output.
158*56001Sbostic
159*56001Sbostic18.	The standard does not specify whether address ranges are
160*56001Sbostic	checked and reset if a command is not executed due to a
161*56001Sbostic	jump.  The following program can behave in two different
162*56001Sbostic	ways depending on whether the range operator is reset at
163*56001Sbostic	line 6 or not.  This is important in the case of pattern
164*56001Sbostic	matches.
165*56001Sbostic
16655924Sbostic	sed -n -e '
16755924Sbostic	4,8b
16855924Sbostic	s/^/XXX/p
16955924Sbostic	1,6 {
17055924Sbostic		p
17155924Sbostic	}'
17255924Sbostic
173*56001SbosticTK	I don't understand this -- can you explain further?
174*56001SbosticDDS	The 1,6 operator will not be executed on line 6 (due to the 4,8b
175*56001SbosticDDS	line) and thus it will not clear.  In this case you can check for
176*56001SbosticDDS	line > 6 in apply, but what if the 1,6 was /BEGIN/,/END/
177*56001SbosticTK	OK, I understand, now.  Well, I think I do, anyhow.  It seems to
178*56001SbosticTK	me that applies() will never see the 1,6 line under any circumstances
179*56001SbosticTK	(even if it was /BEGIN/,/END/ because for lines 4 through 8.
180*56001SbosticTK	A nastier example, as you point out, is:
181*56001SbosticTK		2,4b
182*56001SbosticTK		/one/,/three/c\
183*56001SbosticTK			append some text
184*56001SbosticTK
185*56001SbosticTK	The BSD sed appends the text after the "branch" no longer applies,
186*56001SbosticTK	i.e. with the input: one\ntwo\nthree\nfour\nfive\nsix it displays
187*56001SbosticTK	two\nthree\nfour\nappend some text BUT THEN IT STOPS!
188*56001SbosticTK	Our sed, of course, simply never outputs "append some text".  It
189*56001SbosticTK	seems to me that our current approach is "right", because it would
190*56001SbosticTK	be possible to have:
191*56001SbosticTK		1,4b
192*56001SbosticTK		/one/,/five/c\
193*56001SbosticTK			message
194*56001SbosticTK
195*56001SbosticTK	where you only want to see "message" if the patterns "one" ... "five"
196*56001SbosticTK	occur, but not in lines 1 to 4.  What do you think?
19755924Sbostic
198*56001Sbostic18.	Historical implementations allow an output suppressing #n at the
199*56001Sbostic	beginning of -e arguments as well.  This implementation follows
200*56001Sbostic	historical practice.
20155924Sbostic
202*56001Sbostic19.	POSIX does not specify whether more than one numeric flag is
203*56001Sbostic	allowed on the s command
20455924Sbostic
205*56001SbosticTK	What's historic practice?  Currently we don't report an error or
206*56001Sbostic	do all of the flags.
20755924Sbostic
208*56001Sbostic20.	The standard does not specify whether a script is mandatory.
209*56001Sbostic	Historic sed implementations behave differently with ls | sed
210*56001Sbostic	(no output) and ls | sed - e'' (behaves like cat).
21155924Sbostic
212*56001SbosticTK	I don't understand what 'sed - e' does (it should be illegal,
213*56001SbosticTK	right?)  It seems to me that a script should be mandatory,
214*56001SbosticTK	and sed should fail with an error if not given one.
21555924Sbostic
216*56001Sbostic21.	The requirement to open all wfiles from the beginning makes sed
217*56001Sbostic	behave nonintuitively when the w commands are preceded by addresses
218*56001Sbostic	or are within conditional blocks.  This implementation follows
219*56001Sbostic	historic practice, by default, and provides a flag for more
220*56001Sbostic	reasonable behavior.
22155924Sbostic
222*56001SbosticTK	I'll put it on my TODO list... ;-}
22355924Sbostic
224*56001Sbostic22.	The rule specified in lines 11412-11413 of the standard does
225*56001Sbostic	not seem consistent with existing practice.  Historic sed
226*56001Sbostic	implementations I tested copied the rfile on standard output
227*56001Sbostic	every time the r command was executed and not before reading
228*56001Sbostic	a line of input.  The wording should be changed to be
229*56001Sbostic	consistent with the 'a' command i.e.
23055924Sbostic
231*56001SbosticTK	Something got dropped, here... Can you explain furtehr what
232*56001SbosticTK	historic versoins did, what they should do, what we do?
23355924Sbostic
234*56001Sbostic23.	The standard does not specify how excape sequences other
235*56001Sbostic	than \n and \D (where D is the delimiter character) are to
236*56001Sbostic	be treated.   A strict interpretation would be that they
237*56001Sbostic	should be treated literaly.  In the sed implementations I
238*56001Sbostic	have tried the \ is simply ingored.
23955924Sbostic
240*56001SbosticTK	I don't understand what you're saying, here.  Can you explain?
24155924Sbostic
242*56001Sbostic24.	The standard specifies that an address can be "empty".  This
243*56001Sbostic	implies that constructs like ,d or 1,d and ,5d are allowed.
244*56001Sbostic	This is not true for historic implementations of sed.  This
245*56001Sbostic	implementation follows historic practice.
246*56001Sbostic
247*56001Sbostic25.	The b t and : commands ignore leading white space, but not
248*56001Sbostic	trailing white space.  This is not specified in the standard.
249*56001Sbostic
250*56001SbosticTK	I think that line 11347 points out the the synopsis shows
251*56001SbosticTK	which are valid.
252*56001Sbostic
253*56001Sbostic	Although the standard specifies that reading from files that
254*56001Sbostic	do not exist from within the script must not terminate the
255*56001Sbostic	script, it does not specify what happens if a write command
256*56001Sbostic	fails.  Historic practice is to fail immediately if the file
257*56001Sbostic	cannot be open or written.  This implementation follows that
258*56001Sbostic	practice.
259*56001Sbostic
260*56001Sbostic26.	Historic practice is that the \n construct can be used for
261*56001Sbostic	either string1 or string2 of the y command.  This is not
262*56001Sbostic	specified by the standard.  This implementation follows
263*56001Sbostic	historic practice.
264*56001Sbostic
265*56001Sbostic29.	The standard does not specify if the "nth occurrence" of a
266*56001Sbostic	regular expression in a substitute command is an overlapping
267*56001Sbostic	or a non-overlapping one, e.g. what is the result of s/a*/A/2
268*56001Sbostic	on the pattern "aaaaa aaaaa".  Historical practice is to drop
269*56001Sbostic	core or do non-overlapping expressions.  This implementation
270*56001Sbostic	follows historic practice.
271*56001Sbostic
272*56001Sbostic30.	Historic implementations of sed ignore the regular expression
273*56001Sbostic	delimiter characters within character classes.  This is not
274*56001Sbostic	specified in the standard.  This implementation follows historic
275*56001Sbostic	practice.
276