xref: /csrg-svn/contrib/sort/sort.1 (revision 60910)
1*60910Sbostic.\" Copyright (c) 1991 Regents of the University of California.
2*60910Sbostic.\" All rights reserved.
3*60910Sbostic.\"
4*60910Sbostic.\" This code is derived from software contributed to Berkeley by
5*60910Sbostic.\" the Institute of Electrical and Electronics Engineers, Inc.
6*60910Sbostic.\"
7*60910Sbostic.\" %sccs.include.redist.roff%
8*60910Sbostic.\"
9*60910Sbostic.\"     @(#)sort.1	5.1 (Berkeley) 06/01/93
10*60910Sbostic.\"
11*60910Sbostic.Dd
12*60910Sbostic.Dt SORT 1
13*60910Sbostic.Os
14*60910Sbostic.Sh NAME
15*60910Sbostic.Nm sort
16*60910Sbostic.Nd sort or merge text files
17*60910Sbostic.Sh SYNOPSIS
18*60910Sbostic.Nm sort
19*60910Sbostic.Op Fl cmubdfinr
20*60910Sbostic.Op Fl t Ar char
21*60910Sbostic.Op Fl T Ar char
22*60910Sbostic.Oo
23*60910Sbostic.Cm Fl k Ar field1[,field2]
24*60910Sbostic.Oc
25*60910Sbostic.Ar ...
26*60910Sbostic.Op Fl o Ar output
27*60910Sbostic.Op Ar file
28*60910Sbostic.Ar ...
29*60910Sbostic.Sh DESCRIPTION
30*60910SbosticThe
31*60910Sbostic.Nm sort
32*60910Sbosticutility
33*60910Sbosticsorts text files by lines.
34*60910SbosticComparisons are based on one or more sort keys extracted
35*60910Sbosticfrom each line of input, and are performed
36*60910Sbosticlexicographically. By default, if keys are not given,
37*60910Sbostic.Nm sort
38*60910Sbosticregards each input line as a single field.
39*60910Sbostic.Pp
40*60910SbosticThe following options are available:
41*60910Sbostic.Bl -tag -width indent
42*60910Sbostic.It Fl c
43*60910SbosticCheck that the single input file is sorted.
44*60910SbosticIf the file is not sorted,
45*60910Sbostic.Nm sort
46*60910Sbosticproduces the appropriate error messages and exits with code 1;
47*60910Sbosticotherwise,
48*60910Sbostic.Nm sort
49*60910Sbosticreturns 0.
50*60910Sbostic.Nm Sort
51*60910Sbostic.Fl c
52*60910Sbosticproduces no output.
53*60910Sbostic.It Fl m
54*60910SbosticMerge only; the input files are assumed to be pre-sorted.
55*60910Sbostic.It Fl o Ar output
56*60910SbosticThe argument given is the name of an
57*60910Sbostic.Ar output
58*60910Sbosticfile to
59*60910Sbosticbe used instead of the standard output.
60*60910SbosticThis file
61*60910Sbosticcan be the same as one of the input files.
62*60910Sbostic.It Fl u
63*60910SbosticUnique: suppress all but one in each set of lines
64*60910Sbostichaving equal keys.
65*60910SbosticIf used with the
66*60910Sbostic.Fl c
67*60910Sbosticoption,
68*60910Sbosticcheck that there are no lines with duplicate keys.
69*60910Sbostic.El
70*60910Sbostic.Pp
71*60910SbosticThe following options override the default ordering rules.
72*60910SbosticWhen ordering options appear independent of key field
73*60910Sbosticspecifications, the requested field ordering rules are
74*60910Sbosticapplied globally to all sort keys.
75*60910SbosticWhen attached to a specific key (see
76*60910Sbostic.Fl k ) ,
77*60910Sbosticthe ordering options override
78*60910Sbosticall global ordering options for that key.
79*60910Sbostic.Bl -tag -width indent
80*60910Sbostic.It Fl d
81*60910SbosticOnly blank space and alphanumeric characters
82*60910Sbostic.\" according
83*60910Sbostic.\" to the current setting of LC_CTYPE
84*60910Sbosticare used
85*60910Sbosticin making comparisons.
86*60910Sbostic.It Fl f
87*60910SbosticConsiders all lowercase characters that have uppercase
88*60910Sbosticequivalents to be the same for purposes of
89*60910Sbosticcomparison.
90*60910Sbostic.It Fl i
91*60910SbosticIgnore all non-printable characters.
92*60910Sbostic.It Fl n
93*60910SbosticAn initial numeric string, consisting of optional
94*60910Sbosticblank space, optional minus sign, and zero or more
95*60910Sbosticdigits (including decimal point)
96*60910Sbostic.\" with
97*60910Sbostic.\" optional radix character and thousands
98*60910Sbostic.\" separator
99*60910Sbostic.\" (as defined in the current locale),
100*60910Sbosticis sorted by arithmetic value.
101*60910Sbostic(The
102*60910Sbostic.Fl n
103*60910Sbosticoption no longer implies
104*60910Sbosticthe
105*60910Sbostic.Fl b
106*60910Sbosticoption.)
107*60910Sbostic.It Fl r
108*60910SbosticReverse the sense of comparisons.
109*60910Sbostic.El
110*60910Sbostic.Pp
111*60910SbosticThe treatment of field separators can be altered using the
112*60910Sbosticoptions:
113*60910Sbostic.Bl -tag -width indent
114*60910Sbostic.It Fl b
115*60910SbosticIgnores leading blank space when determining the start
116*60910Sbosticand end of a restricted sort key.
117*60910SbosticA
118*60910Sbostic.Fl b
119*60910Sbosticoption specified before the first
120*60910Sbostic.Fl k
121*60910Sbosticoption applies globally to all
122*60910Sbostic.Fl k
123*60910Sbosticoptions.
124*60910SbosticOtherwise, the
125*60910Sbostic.Fl b
126*60910Sbosticoption can be
127*60910Sbosticattached independently to each
128*60910Sbostic.Ar field
129*60910Sbosticargument of the
130*60910Sbostic.Fl k
131*60910Sbosticoption (see below).
132*60910SbosticNote that the
133*60910Sbostic.Fl b
134*60910Sbosticoption
135*60910Sbostichas no effect unless key fields are specified.
136*60910Sbostic.It Fl t Ar char
137*60910Sbostic.Ar Char
138*60910Sbosticis used as the field separator character. The initial
139*60910Sbostic.Ar char
140*60910Sbosticis not considered to be part of a field when determining
141*60910Sbostickey offsets (see below).
142*60910SbosticEach occurrence of
143*60910Sbostic.Ar char
144*60910Sbosticis significant (for example,
145*60910Sbostic.Dq Ar charchar
146*60910Sbosticdelimits an empty field).
147*60910SbosticIf
148*60910Sbostic.Fl t
149*60910Sbosticis not specified,
150*60910Sbosticblank space characters are used as default field
151*60910Sbosticseparators.
152*60910Sbostic.It Fl T Ar char
153*60910Sbostic.Ar Char
154*60910Sbosticis used as the record separator character.
155*60910SbosticThis should be used with discretion;
156*60910Sbostic.Fl T Ar <alphanumeric>
157*60910Sbosticusually produces undesirable results.
158*60910SbosticThe default line separator is newline.
159*60910Sbostic.It Fl k Ar field1[,field2]
160*60910SbosticDesignates the starting position,
161*60910Sbostic.Ar field1 ,
162*60910Sbosticand optional ending position,
163*60910Sbostic.Ar field2 ,
164*60910Sbosticof a key field.
165*60910SbosticThe
166*60910Sbostic.Fl k
167*60910Sbosticoption replaces the obsolescent options
168*60910Sbostic.Cm \(pl Ns Ar pos1
169*60910Sbosticand
170*60910Sbostic.Fl Ns Ar pos2 .
171*60910Sbostic.El
172*60910Sbostic.Pp
173*60910SbosticThe following operands are available:
174*60910Sbostic.Bl -tag -width indent
175*60910Sbostic.Ar file
176*60910SbosticThe pathname of a file to be sorted, merged, or checked.
177*60910SbosticIf no file
178*60910Sbosticoperands are specified, or if
179*60910Sbostica file operand is
180*60910Sbostic.Fl ,
181*60910Sbosticthe standard input is used.
182*60910Sbostic.Pp
183*60910SbosticA field is
184*60910Sbosticdefined as a minimal sequence of characters followed by a
185*60910Sbosticfield separator or a newline character.
186*60910SbosticBy default, the first
187*60910Sbosticblank space of a sequence of blank spaces acts as the field separator.
188*60910SbosticAll blank spaces in a sequence of blank spaces are considered
189*60910Sbosticas part of the next field; for example, all blank spaces at
190*60910Sbosticthe beginning of a line are considered to be part of the
191*60910Sbosticfirst field.
192*60910Sbostic.Pp
193*60910SbosticFields are specified
194*60910Sbosticby the
195*60910Sbostic.Fl k Ar field1[,field2]
196*60910Sbosticargument. A missing
197*60910Sbostic.Ar field2
198*60910Sbosticargument defaults to the end of a line.
199*60910Sbostic.Pp
200*60910SbosticThe arguments
201*60910Sbostic.Ar field1
202*60910Sbosticand
203*60910Sbostic.Ar field2
204*60910Sbostichave the form
205*60910Sbostic.Em m.n
206*60910Sbosticfollowed by one or more of the options
207*60910Sbostic.Fl b , d , f , i ,
208*60910Sbostic.Fl n , r .
209*60910SbosticA
210*60910Sbostic.Ar field1
211*60910Sbosticposition specified by
212*60910Sbostic.Em m.n
213*60910Sbostic.Em (m,n > 0)
214*60910Sbosticis interpreted as the
215*60910Sbostic.Em n Ns th
216*60910Sbosticcharacter in the
217*60910Sbostic.Em m Ns th
218*60910Sbosticfield.
219*60910SbosticA missing
220*60910Sbostic.Em \&.n
221*60910Sbosticin
222*60910Sbostic.Ar field1
223*60910Sbosticmeans
224*60910Sbostic.Ql \&.1 ,
225*60910Sbosticindicating the first character of the
226*60910Sbostic.Em m Ns th
227*60910Sbosticfield;
228*60910SbosticIf the
229*60910Sbostic.Fl b
230*60910Sbosticoption is in effect,
231*60910Sbostic.Em n
232*60910Sbosticis counted from the first
233*60910Sbosticnon-blank character in the
234*60910Sbostic.Em m Ns th
235*60910Sbosticfield;
236*60910Sbostic.Em m Ns \&.1b
237*60910Sbosticrefers to the first
238*60910Sbosticnon-blank character in the
239*60910Sbostic.Em m Ns th
240*60910Sbosticfield.
241*60910Sbostic.Pp
242*60910SbosticA
243*60910Sbostic.Ar field2
244*60910Sbosticposition specified by
245*60910Sbostic.Em m.n
246*60910Sbosticis interpreted as
247*60910Sbosticthe
248*60910Sbostic.Em n Ns th
249*60910Sbosticcharacter (including separators) of the
250*60910Sbostic.Em m Ns th
251*60910Sbosticfield.
252*60910SbosticA missing
253*60910Sbostic.Em \&.n
254*60910Sbosticindicates the last character of the
255*60910Sbostic.Em m Ns th
256*60910Sbosticfield;
257*60910Sbostic.Em m
258*60910Sbostic= \&0
259*60910Sbosticdesignates the end of a line.
260*60910SbosticThus the option
261*60910Sbostic.Fl k Ar v.x,w.y
262*60910Sbosticis synonymous with the obsolescent option
263*60910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1
264*60910Sbostic.Fl Ns Ar w-\&1.y ;
265*60910Sbosticwhen
266*60910Sbostic.Em y
267*60910Sbosticis omitted,
268*60910Sbostic.Fl k Ar v.x,w
269*60910Sbosticis synonymous with
270*60910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1
271*60910Sbostic.Fl Ns Ar w+1.0 .
272*60910SbosticThe obsolescent
273*60910Sbostic.Cm \(pl Ns Ar pos1
274*60910Sbostic.Fl Ns Ar pos2
275*60910Sbosticoption is still supported, except for
276*60910Sbostic.Fl Ns Ar w\&.0b,
277*60910Sbosticwhich has no
278*60910Sbostic.Fl k
279*60910Sbosticequivalent.
280*60910Sbostic.Sh FILES
281*60910Sbostic.Bl -tag -width Pa -compact
282*60910Sbostic.It Pa /var/tmp/sort.*
283*60910SbosticDefault temporary directories.
284*60910Sbostic.It Pa Ar output Ns #PID
285*60910SbosticTemporary name for
286*60910Sbostic.Ar output
287*60910Sbosticif
288*60910Sbostic.Ar output
289*60910Sbosticalready exists.
290*60910Sbostic.El
291*60910Sbostic.Sh SEE ALSO
292*60910Sbostic.Xr comm 1 ,
293*60910Sbostic.Xr uniq 1 ,
294*60910Sbostic.Xr join 1
295*60910Sbostic.Sh RETURN VALUES
296*60910SbosticSort exits with one of the following values:
297*60910Sbostic.Bl -tag -width flag -compact
298*60910Sbostic.It Pa 0:
299*60910Sbosticnormal behavior.
300*60910Sbostic.It Pa 1:
301*60910Sbosticon disorder (or non-uniqueness) with the
302*60910Sbostic.Fl c
303*60910Sbosticoption
304*60910Sbostic.It Pa 2:
305*60910Sbostican error occurred.
306*60910Sbostic.Sh BUGS
307*60910SbosticLines longer than 65522 characters are discarded and processing continues.
308*60910SbosticTo sort files larger than 60Mb, use
309*60910Sbostic.Nm sort
310*60910Sbostic.Fl H ;
311*60910Sbosticfiles larger than 704Mb must be sorted in smaller pieces, then merged.
312*60910SbosticTo protect data
313*60910Sbostic.Nm sort
314*60910Sbostic.Fl o
315*60910Sbosticcalls link and unlink, and thus fails in protected directories.
316*60910Sbostic.Sh HISTORY
317*60910SbosticA
318*60910Sbostic.Nm sort
319*60910Sbosticcommand appeared in
320*60910Sbostic.At v6 .
321*60910Sbostic.Sh NOTES
322*60910SbosticThe current sort command uses lexicographic radix sorting, which requires
323*60910Sbosticthat sort keys be kept in memory (as opposed to previous versions which used quick
324*60910Sbosticand merge sorts and did not.)
325*60910SbosticThus performance depends highly on efficient choice of sort keys, and the
326*60910Sbostic.Fl b
327*60910Sbosticoption and the
328*60910Sbostic.Ar field2
329*60910Sbosticargument of the
330*60910Sbostic.Fl k
331*60910Sbosticoption should be used whenever possible.
332*60910SbosticSimilarly,
333*60910Sbostic.Nm sort
334*60910Sbostic.Fl k1f
335*60910Sbosticis equivalent to
336*60910Sbostic.Nm sort
337*60910Sbostic.Fl f
338*60910Sbosticand may take twice as long.
339