xref: /netbsd-src/usr.bin/sort/sort.1 (revision 2980e352a13e8f0b545a366830c411e7a542ada8)
1.\"	$NetBSD: sort.1,v 1.26 2008/05/02 18:11:06 martin Exp $
2.\"
3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Ben Harris and Jaromir Dolecek.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.\" Copyright (c) 1991, 1993
31.\"	The Regents of the University of California.  All rights reserved.
32.\"
33.\" This code is derived from software contributed to Berkeley by
34.\" the Institute of Electrical and Electronics Engineers, Inc.
35.\"
36.\" Redistribution and use in source and binary forms, with or without
37.\" modification, are permitted provided that the following conditions
38.\" are met:
39.\" 1. Redistributions of source code must retain the above copyright
40.\"    notice, this list of conditions and the following disclaimer.
41.\" 2. Redistributions in binary form must reproduce the above copyright
42.\"    notice, this list of conditions and the following disclaimer in the
43.\"    documentation and/or other materials provided with the distribution.
44.\" 3. Neither the name of the University nor the names of its contributors
45.\"    may be used to endorse or promote products derived from this software
46.\"    without specific prior written permission.
47.\"
48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
51.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
58.\" SUCH DAMAGE.
59.\"
60.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
61.\"
62.Dd January 13, 2001
63.Dt SORT 1
64.Os
65.Sh NAME
66.Nm sort
67.Nd sort or merge text files
68.Sh SYNOPSIS
69.Nm sort
70.Op Fl bcdfHimnrSsu
71.Oo
72.Fl k
73.Ar field1 Ns Op Li \&, Ns Ar field2
74.Oc
75.Op Fl o Ar output
76.Op Fl R Ar char
77.Op Fl T Ar dir
78.Op Fl t Ar char
79.Op Ar
80.Sh DESCRIPTION
81The
82.Nm
83utility sorts text files by lines.
84Comparisons are based on one or more sort keys extracted
85from each line of input, and are performed lexicographically.
86By default, if keys are not given,
87.Nm
88regards each input line as a single field.
89.Pp
90The following options are available:
91.Bl -tag -width Fl
92.It Fl c
93Check that the single input file is sorted.
94If the file is not sorted,
95.Nm
96produces the appropriate error messages and exits with code 1; otherwise,
97.Nm
98returns 0.
99.Nm
100.Fl c
101produces no output.
102.It Fl m
103Merge only; the input files are assumed to be pre-sorted.
104.It Fl o Ar output
105The argument given is the name of an
106.Ar output
107file to be used instead of the standard output.
108This file can be the same as one of the input files.
109.It Fl T Ar dir
110Use
111.Ar dir
112as the directory for temporary files.
113The default is the value specified in the environment variable
114.Ev TMPDIR or
115.Pa /tmp
116if
117.Ev TMPDIR
118is not defined.
119.It Fl u
120Unique: suppress all but one in each set of lines having equal keys.
121If used with the
122.Fl c
123option, check that there are no lines with duplicate keys.
124.El
125.Pp
126The following options override the default ordering rules.
127When ordering options appear independent of key field
128specifications, the requested field ordering rules are
129applied globally to all sort keys.
130When attached to a specific key (see
131.Fl k ) ,
132the ordering options override
133all global ordering options for that key.
134.Bl -tag -width Fl
135.It Fl d
136Only blank space and alphanumeric characters
137.\" according
138.\" to the current setting of LC_CTYPE
139are used
140in making comparisons.
141.It Fl f
142Considers all lowercase characters that have uppercase
143equivalents to be the same for purposes of comparison.
144.It Fl i
145Ignore all non-printable characters.
146.It Fl n
147An initial numeric string, consisting of optional blank space, optional
148minus sign, and zero or more digits (including decimal point)
149.\" with
150.\" optional radix character and thousands
151.\" separator
152.\" (as defined in the current locale),
153is sorted by arithmetic value.
154(The
155.Fl n
156option no longer implies the
157.Fl b
158option.)
159.It Fl r
160Reverse the sense of comparisons.
161.It Fl S
162Don't use stable sort.
163Default is to use stable sort.
164.It Fl s
165Use stable sort.
166This is the default.
167Provided for compatibility with other
168.Nm
169implementations only.
170.It Fl H
171Use a merge sort instead of a radix sort.
172This option should be used for files larger than 60Mb.
173.El
174.Pp
175The treatment of field separators can be altered using these options:
176.Bl -tag -width Fl
177.It Fl b
178Ignores leading blank space when determining the start
179and end of a restricted sort key.
180A
181.Fl b
182option specified before the first
183.Fl k
184option applies globally to all
185.Fl k
186options.
187Otherwise, the
188.Fl b
189option can be attached independently to each
190.Ar field
191argument of the
192.Fl k
193option (see below).
194Note that the
195.Fl b
196option has no effect unless key fields are specified.
197.It Fl t Ar char
198.Ar char
199is used as the field separator character.
200The initial
201.Ar char
202is not considered to be part of a field when determining
203key offsets (see below).
204Each occurrence of
205.Ar char
206is significant (for example,
207.Dq Ar charchar
208delimits an empty field).
209If
210.Fl t
211is not specified, the default field separator is a sequence of
212blank-space characters, and consecutive blank spaces do
213.Em not
214delimit an empty field; further, the initial blank space
215.Em is
216considered part of a field when determining key offsets.
217.It Fl R Ar char
218.Ar char
219is used as the record separator character.
220This should be used with discretion;
221.Fl R Ar \*[Lt]alphanumeric\*[Gt]
222usually produces undesirable results.
223The default record separator is newline.
224.It Xo
225.Fl k
226.Ar field1 Ns Op Li \&, Ns Ar field2
227.Xc
228Designates the starting position,
229.Ar field1 ,
230and optional ending position,
231.Ar field2 ,
232of a key field.
233The
234.Fl k
235option replaces the obsolescent options
236.Cm \(pl Ns Ar pos1
237and
238.Fl Ns Ar pos2 .
239.El
240.Pp
241The following operands are available:
242.Bl -tag -width Ar
243.It Ar file
244The pathname of a file to be sorted, merged, or checked.
245If no
246.Ar file
247operands are specified, or if
248a
249.Ar file
250operand is
251.Fl ,
252the standard input is used.
253.El
254.Pp
255A field is defined as a minimal sequence of characters followed by a
256field separator or a newline character.
257By default, the first
258blank space of a sequence of blank spaces acts as the field separator.
259All blank spaces in a sequence of blank spaces are considered
260as part of the next field; for example, all blank spaces at
261the beginning of a line are considered to be part of the
262first field.
263.Pp
264Fields are specified
265by the
266.Fl k
267.Ar field1 Ns Op \&, Ns Ar field2
268argument.
269A missing
270.Ar field2
271argument defaults to the end of a line.
272.Pp
273The arguments
274.Ar field1
275and
276.Ar field2
277have the form
278.Ar m Ns Li \&. Ns Ar n
279and can be followed by one or more of the letters
280.Cm b , d , f , i ,
281.Cm n ,
282and
283.Cm r ,
284which correspond to the options discussed above.
285A
286.Ar field1
287position specified by
288.Ar m Ns Li \&. Ns Ar n
289.Pq Ar m , n No \*[Gt] 0
290is interpreted as the
291.Ar n Ns th
292character in the
293.Ar m Ns th
294field.
295A missing
296.Li \&. Ns Ar n
297in
298.Ar field1
299means
300.Ql \&.1 ,
301indicating the first character of the
302.Ar m Ns th
303field; if the
304.Fl b
305option is in effect,
306.Ar n
307is counted from the first non-blank character in the
308.Ar m Ns th
309field;
310.Ar m Ns Li \&.1b
311refers to the first non-blank character in the
312.Ar m Ns th
313field.
314.Pp
315A
316.Ar field2
317position specified by
318.Ar m Ns Li \&. Ns Ar n
319is interpreted as
320the
321.Ar n Ns th
322character (including separators) of the
323.Ar m Ns th
324field.
325A missing
326.Li \&. Ns Ar n
327indicates the last character of the
328.Ar m Ns th
329field;
330.Ar m
331= \&0
332designates the end of a line.
333Thus the option
334.Fl k
335.Sm off
336.Xo
337.Ar v Li \&. Ar x Li \&,
338.Ar w Li \&. Ar y
339.Xc
340.Sm on
341is synonymous with the obsolescent option
342.Sm off
343.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
344.Fl Ar w-\&1 Li \&. Ar y ;
345.Sm on
346when
347.Ar y
348is omitted,
349.Fl k
350.Sm off
351.Ar v Li \&. Ar x Li \&, Ar w
352.Sm on
353is synonymous with
354.Sm off
355.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
356.Fl Ar w+1 Li \&.0 .
357.Sm on
358The obsolescent
359.Cm \(pl Ns Ar pos1
360.Fl Ns Ar pos2
361option is still supported, except for
362.Fl Ns Ar w Ns Li \&.0b ,
363which has no
364.Fl k
365equivalent.
366.Sh RETURN VALUES
367Sort exits with one of the following values:
368.Bl -tag -width flag -compact
369.It 0
370Normal behavior.
371.It 1
372On disorder (or non-uniqueness) with the
373.Fl c
374option
375.It 2
376An error occurred.
377.El
378.Sh ENVIRONMENT
379If the following environment variable exists, it is used by
380.Nm .
381.Bl -tag -width Ev
382.It Ev TMPDIR
383.Nm
384uses the contents of the
385.Ev TMPDIR
386environment variable as the path in which to store
387temporary files.
388.El
389.Sh FILES
390.Bl -tag -width outputNUMBER+some -compact
391.It Pa /tmp/sort.*
392Default temporary files.
393.It Ar output Ns NUMBER
394Temporary file which is used for output if
395.Ar output
396already exists.
397Once sorting is finished, this file replaces
398.Ar output
399(via
400.Xr link 2
401and
402.Xr unlink 2 ) .
403.El
404.Sh SEE ALSO
405.Xr comm 1 ,
406.Xr join 1 ,
407.Xr uniq 1 ,
408.Xr qsort 3 ,
409.Xr radixsort 3
410.Sh HISTORY
411A
412.Nm
413command appeared in
414.At v5 .
415This
416.Nm
417implementation appeared in
418.Bx 4.4
419and is used since
420.Nx 1.6 .
421.Sh BUGS
422To sort files larger than 60Mb, use
423.Nm
424.Fl H ;
425files larger than 704Mb must be sorted in smaller pieces, then merged.
426.Sh NOTES
427This
428.Nm
429has no limits on input line length (other than imposed by available
430memory) or any restrictions on bytes allowed within lines.
431.Pp
432To protect data
433.Nm
434.Fl o
435calls
436.Xr link 2
437and
438.Xr unlink 2 ,
439and thus fails on protected directories.
440.Pp
441Input files should be text files.
442If file doesn't end with record separator (which is typically newline), the
443.Nm
444utility silently supplies one.
445.Pp
446The current
447.Nm
448uses lexicographic radix sorting, which requires
449that sort keys be kept in memory (as opposed to previous versions which used quick
450and merge sorts and did not.)
451Thus performance depends highly on efficient choice of sort keys, and the
452.Fl b
453option and the
454.Ar field2
455argument of the
456.Fl k
457option should be used whenever possible.
458Similarly,
459.Nm
460.Fl k1f
461is equivalent to
462.Nm
463.Fl f
464and may take twice as long.
465