xref: /csrg-svn/usr.bin/sed/POSIX (revision 55924)
1*55924Sbostic#	@(#)POSIX	5.1 (Berkeley) 08/20/92
2*55924Sbostic
3*55924Sbostic	Comments on the IEEE P1003.2 Draft 11.2 September 1991
4*55924Sbostic
5*55924Sbostic		   Part 2: Shell and Utilities
6*55924Sbostic		Section 4.55: sed - Stream editor
7*55924Sbostic
8*55924SbosticIn the following paragraphs, `wrong' means `inconsistent with historic
9*55924Sbosticpractice'.  Many of the comments refer to undocumented inconsistencies
10*55924Sbosticbetween the historical versions of sed and the POSIX standard.  All the
11*55924Sbosticcomments are notes taken while implementing a POSIX-compatible version
12*55924Sbosticof sed, and should not be interpreted as official opinions or criticism
13*55924Sbostictowards the POSIX committee.  Many are insignificant, pedantic and even
14*55924Sbosticwrong.
15*55924Sbostic		Diomidis Spinellis <dds@doc.ic.ac.uk>
16*55924Sbostic
17*55924Sbostic[Some are significant and right, too.  -- Keith Bostic]
18*55924Sbostic
19*55924Sbostic1. For the text argument of the a command it is not specified if lines are
20*55924Sbostic   stripped from their initial blanks or not.  There are some hints in D2
21*55924Sbostic   11335-11337 and in D2 11512-11514, but nothing concrete.  Historical
22*55924Sbostic   practice is to strip the blanks, i.e.:
23*55924Sbostic
24*55924Sbostic	#!/bin/sed -f
25*55924Sbostic	a\
26*55924Sbostic		foo\
27*55924Sbostic		bar
28*55924Sbostic
29*55924Sbostic   produces:
30*55924Sbostic
31*55924Sbostic	foo
32*55924Sbostic	bar
33*55924Sbostic
34*55924Sbostic2. In the s command we assume that the w file is the last flag.  This is
35*55924Sbostic   historical practice, but not specified in the standard.
36*55924Sbostic
37*55924Sbostic3. In the s command the standard does not specify that a space must follow
38*55924Sbostic   w.  Also the standard does not specify that any number of spaces after
39*55924Sbostic   the w command are allowed and removed.
40*55924Sbostic
41*55924Sbostic4. The specification of the a command is wrong.  With the current
42*55924Sbostic   specification both of these scripts should produce the same output:
43*55924Sbostic
44*55924Sbostic	#!/bin/sed -f
45*55924Sbostic	d
46*55924Sbostic	a\
47*55924Sbostic	hello
48*55924Sbostic
49*55924Sbostic	#!/bin/sed -f
50*55924Sbostic	a\
51*55924Sbostic	hello
52*55924Sbostic	d
53*55924Sbostic
54*55924Sbostic5. The specification of the c command in conjunction with the specification
55*55924Sbostic   of the default operation (D2 11293-11299) is wrong.  The default operation
56*55924Sbostic   specifies that a newline is printed after the pattern space.  This is not
57*55924Sbostic   the case when the pattern space has been deleted by a c command.
58*55924Sbostic
59*55924Sbostic6. The rule for the l command differs from historic practice.  Table 2-15
60*55924Sbostic   includes the various escape sequences including \\.  Is this meant by
61*55924Sbostic   the standard?  Furthermore some versions of sed print two digit octal
62*55924Sbostic   numbers.  Why does the standard require a three digit octal number?
63*55924Sbostic   Normally the pattern space does not end with a newline.  Will an implict
64*55924Sbostic   \n be printed?  Finaly the standard does not specify that a newline must
65*55924Sbostic   follow the '$' sign (it seems logical to me).
66*55924Sbostic
67*55924Sbostic7. The specification for ! does not specify that for a single command the
68*55924Sbostic   command must not contain an address specification whereas the command
69*55924Sbostic   list can contain address specifications.
70*55924Sbostic
71*55924Sbostic8. The standard does not specify what happens with consequitive ! commands
72*55924Sbostic   (e.g. /foo/!!!p)  Current implementations allow any number of !'s without
73*55924Sbostic   changing behaviour.  It seems logical that each one should reverse the
74*55924Sbostic   default behaviour.
75*55924Sbostic
76*55924Sbostic9. The ; command separator is not allowed for the commands a c i w r : b t
77*55924Sbostic   # and at the end of a w flag in the s command.
78*55924Sbostic
79*55924Sbostic10. The standard does not specify that if an end of file occurs on the
80*55924Sbostic    execution of the n command the program terminates (e.g.
81*55924Sbostic
82*55924Sbostic	sed -e '
83*55924Sbostic	n
84*55924Sbostic	i\
85*55924Sbostic	hello
86*55924Sbostic	' </dev/null
87*55924Sbostic
88*55924Sbostic    will not produce any output.
89*55924Sbostic
90*55924Sbostic11. The standard does not specify that the q command causes all lines that
91*55924Sbostic    have been appended to be output and that the pattern space is printed
92*55924Sbostic    before exiting.
93*55924Sbostic
94*55924Sbostic12. Historic implementations ignore comments in the text of the i and a
95*55924Sbostic    commands.
96*55924Sbostic
97*55924Sbostic13. The historic implementation does not consider the last line of a file
98*55924Sbostic    to match $ if a null file follows:
99*55924Sbostic
100*55924Sbostic	sed -n -e '$p' /usr/dict/words /dev/null
101*55924Sbostic
102*55924Sbostic    will not print anything.
103*55924Sbostic
104*55924Sbostic14. Historical implementations do not output the change text of a c command
105*55924Sbostic    in the case of an address range whose second line number is greater than
106*55924Sbostic    the first (e.g. 3,1).  The standard seems to imply otherwise.
107*55924Sbostic
108*55924Sbostic15. Historical implementations output the c text on EVERY line not included
109*55924Sbostic    in the two address range in the case of a negation '!'.
110*55924Sbostic
111*55924Sbostic16. The standard does not specify that the p flag at the s command will
112*55924Sbostic    write the pattern space plus a newline on the standard output
113*55924Sbostic
114*55924Sbostic17. The standard does not specify whether address ranges are checked and
115*55924Sbostic    reset if a command is not executed due to a jump.  The following
116*55924Sbostic    program can behave in two different ways depending on whether the range
117*55924Sbostic    operator is reset at line 6 or not.  This is important in the case of
118*55924Sbostic    pattern matches.
119*55924Sbostic
120*55924Sbostic	sed -n -e '
121*55924Sbostic	4,8b
122*55924Sbostic	s/^/XXX/p
123*55924Sbostic	1,6 {
124*55924Sbostic		p
125*55924Sbostic	}'
126*55924Sbostic
127*55924Sbostic18. Historical implementations allow an output suppressing #n at the
128*55924Sbostic    beginning of -e arguments as well.
129*55924Sbostic
130*55924Sbostic19. POSIX does not specify whether more than one numeric flag is
131*55924Sbostic    allowed on the s command
132*55924Sbostic
133*55924Sbostic20. Existing versions of sed have the undocumented feature of allowing
134*55924Sbostic    a semicolon to delimit commands.  It is not specified in the standard.
135*55924Sbostic
136*55924Sbostic21. The standard does not specify whether a script is mandatory.  The
137*55924Sbostic    sed implementations I tested behave differently with ls | sed (no
138*55924Sbostic    output) and ls | sed - e'' (behaves like cat).
139*55924Sbostic
140*55924Sbostic22. The requirement to open all wfiles from the beginning makes sed behave
141*55924Sbostic    nonintuitively when the w commands are preceded by addresses or are
142*55924Sbostic    within conditional blocks.
143*55924Sbostic
144*55924Sbostic23. The rule specified in lines 11412-11413 of the standard does not
145*55924Sbostic    seem consistent with existing practice.  The sed implementations I
146*55924Sbostic    tested copied the rfile on standard output every time the r command was
147*55924Sbostic    executed and not before reading a line of input.  The wording should be
148*55924Sbostic    changed to be consistent with the 'a' command i.e.
149*55924Sbostic
150*55924Sbostic24. The standard does not specify how excape sequences other than \n
151*55924Sbostic    and \D (where D is the delimiter character) are to be treated.   A
152*55924Sbostic    strict interpretation would be that they should be treated literaly.
153*55924Sbostic    In the sed implementations I have tried the \ is simply ingored.
154*55924Sbostic
155*55924Sbostic25. The standard specifies in line 11304 that an address can be empty.
156*55924Sbostic    This is wrong since it implied that constructs like ,d or 1,d or ,5d
157*55924Sbostic    are allowed.  The sed implementation I tested do not allow them.
158*55924Sbostic
159*55924Sbostic26. The b t and : commands ignore leading white space, but not trailing
160*55924Sbostic    white space.  This is not specified in the standard.
161*55924Sbostic
162*55924Sbostic27. Although the standard specifies that reading from files that do not
163*55924Sbostic    exist from within the script must not terminate the script; it does not
164*55924Sbostic    specify what happens if a write command fails.
165*55924Sbostic
166*55924Sbostic28. In the sed implementation I tested the \n construct for newlines
167*55924Sbostic    works on both strings of a y command.  This is not specified in the
168*55924Sbostic    standard.
169*55924Sbostic
170*55924Sbostic29. The standard does not specify if the "nth occurrence" of a regular
171*55924Sbostic    expression in a substitute command is an overlapping or a
172*55924Sbostic    non-overlappoin one.  I.e.  what is the result of s/a*/A/2 on the
173*55924Sbostic    pattern "aaaaa aaaaa".  (It crashes the implementation of sed I
174*55924Sbostic    tested.)
175*55924Sbostic
176*55924Sbostic30. Existing implementations of sed ignore the regular expression
177*55924Sbostic    delimiter characters within character classes.  This is not specified
178*55924Sbostic    in the standard.
179