xref: /netbsd-src/usr.bin/sort/sort.1 (revision bdc22b2e01993381dcefeff2bc9b56ca75a4235c)
1.\"	$NetBSD: sort.1,v 1.38 2017/07/03 21:34:21 wiz Exp $
2.\"
3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Ben Harris and Jaromir Dolecek.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.\" Copyright (c) 1991, 1993
31.\"	The Regents of the University of California.  All rights reserved.
32.\"
33.\" This code is derived from software contributed to Berkeley by
34.\" the Institute of Electrical and Electronics Engineers, Inc.
35.\"
36.\" Redistribution and use in source and binary forms, with or without
37.\" modification, are permitted provided that the following conditions
38.\" are met:
39.\" 1. Redistributions of source code must retain the above copyright
40.\"    notice, this list of conditions and the following disclaimer.
41.\" 2. Redistributions in binary form must reproduce the above copyright
42.\"    notice, this list of conditions and the following disclaimer in the
43.\"    documentation and/or other materials provided with the distribution.
44.\" 3. Neither the name of the University nor the names of its contributors
45.\"    may be used to endorse or promote products derived from this software
46.\"    without specific prior written permission.
47.\"
48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
51.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
58.\" SUCH DAMAGE.
59.\"
60.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
61.\"
62.Dd June 1, 2016
63.Dt SORT 1
64.Os
65.Sh NAME
66.Nm sort
67.Nd sort or merge text files
68.Sh SYNOPSIS
69.Nm
70.Op Fl bdfHilmnrSsu
71.Oo
72.Fl k
73.Ar kstart Ns Op Li \&, Ns Ar kend
74.Oc
75.Op Fl o Ar output
76.Op Fl R Ar char
77.Op Fl T Ar dir
78.Op Fl t Ar char
79.Op Ar
80.Nm
81.Fl C Ns | Ns Fl c
82.Op Fl bdfilnru
83.Oo
84.Fl k
85.Ar kstart Ns Op Li \&, Ns Ar kend
86.Op Fl t Ar char
87.Oc
88.Op Fl R Ar char
89.Op Ar file
90.Sh DESCRIPTION
91The
92.Nm
93utility sorts text files by lines.
94Comparisons are based on one or more sort keys extracted
95from each line of input, and are performed lexicographically.
96By default, if keys are not given,
97.Nm
98regards each input line as a single field.
99.Pp
100The following options are available:
101.Bl -tag -width Fl
102.It Fl C
103Identical to
104.Fl c
105without the error messages in the case of unsorted input.
106.It Fl c
107Check that the single input file is sorted.
108If the file is not sorted,
109.Nm
110produces the appropriate error messages and exits with code 1; otherwise,
111.Nm
112returns 0.
113.Nm
114.Fl c
115produces no output.
116See also
117.Fl u .
118.It Fl H
119Ignored for compatibility with earlier versions of
120.Nm .
121.It Fl m
122Merge only; the input files are assumed to be pre-sorted.
123.It Fl o Ar output
124The argument given is the name of an
125.Ar output
126file to be used instead of the standard output.
127This file can be the same as one of the input files.
128.It Fl S
129Don't use stable sort.
130Default is to use stable sort.
131.It Fl s
132Use stable sort, keeps records with equal keys in their original order.
133This is the default.
134Provided for compatibility with other
135.Nm
136implementations only.
137.It Fl T Ar dir
138Use
139.Ar dir
140as the directory for temporary files.
141The default is the value specified in the environment variable
142.Ev TMPDIR or
143.Pa /tmp
144if
145.Ev TMPDIR
146is not defined.
147.It Fl u
148Unique: suppress all but one in each set of lines having equal keys.
149If used with the
150.Fl c
151option, check that there are no lines with duplicate keys.
152.El
153.Pp
154The following options,
155which should be given before any
156.Fl k
157options, override the default ordering rules.
158When ordering options appear independent of,
159and before, key field specifications,
160the requested field ordering rules are
161applied globally to all sort keys.
162When attached to a specific key (see
163.Fl k ) ,
164the ordering options override
165all global ordering options for that key.
166.Bl -tag -width Fl
167.It Fl d
168Only blank space and alphanumeric characters
169.\" according
170.\" to the current setting of LC_CTYPE
171are used
172in making comparisons.
173.It Fl f
174Considers all lowercase characters that have uppercase
175equivalents to be the same for purposes of comparison.
176.It Fl i
177Ignore all non-printable characters.
178.It Fl l
179Sort by the string length of the field, not by the field itself.
180.It Fl n
181An initial numeric string, consisting of optional blank space, optional
182plus or minus sign, and zero or more digits (including decimal point)
183.\" with
184.\" optional radix character and thousands
185.\" separator
186.\" (as defined in the current locale),
187is sorted by arithmetic value.
188(The
189.Fl n
190option no longer implies the
191.Fl b
192option.)
193.It Fl r
194Reverse the sense of comparisons.
195.El
196.Pp
197The treatment of field separators can be altered using these options:
198.Bl -tag -width Fl
199.It Fl b
200Ignores leading blank space when determining the start
201and end of a restricted sort key.
202A
203.Fl b
204option specified before the first
205.Fl k
206option applies globally to all
207.Fl k
208options.
209Otherwise, the
210.Fl b
211option can be attached independently to each
212.Ar field
213argument of the
214.Fl k
215option (see below).
216Note that the
217.Fl b
218option has no effect unless key fields are specified.
219.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend
220Designates the starting position,
221.Ar kstart ,
222and optional ending position,
223.Ar kend ,
224of a key field.
225The
226.Fl k
227option replaces the obsolescent options
228.Cm \(pl Ns Ar pos1
229and
230.Fl Ns Ar pos2 .
231.It Fl R Ar char
232.Ar char
233is used as the record separator character.
234This should be used with discretion;
235.Fl R Aq Ar alphanumeric
236usually produces undesirable results.
237If char is not a single character, then it
238specifies the value of the desired record
239separator as an integer specified in any
240of the normal NNN, 0ooo, or 0xXXX ways,
241or as an octal value preceded by \e.
242Caution: do not attempt to specify Ctl-A
243as
244.Dq -R 1
245which will not do what was intended at all!
246The default record separator is newline.
247.It Fl t Ar char
248.Ar char
249is used as the field separator character.
250The initial
251.Ar char
252is not considered to be part of a field when determining
253key offsets (see below).
254Each occurrence of
255.Ar char
256is significant (for example,
257.Dq Ar charchar
258delimits an empty field).
259If
260.Fl t
261is not specified, the default field separator is a sequence of
262blank-space characters, and consecutive blank spaces do
263.Em not
264delimit an empty field; further, the initial blank space
265.Em is
266considered part of a field when determining key offsets.
267.El
268.Pp
269The following operands are available:
270.Bl -tag -width Ar
271.It Ar file
272The pathname of a file to be sorted, merged, or checked.
273If no
274.Ar file
275operands are specified, or if
276a
277.Ar file
278operand is
279.Fl ,
280the standard input is used.
281.El
282.Pp
283A field is defined as a minimal sequence of characters followed by a
284field separator or a newline character.
285By default, the first
286blank space of a sequence of blank spaces acts as the field separator.
287All blank spaces in a sequence of blank spaces are considered
288as part of the next field; for example, all blank spaces at
289the beginning of a line are considered to be part of the
290first field.
291.Pp
292Fields are specified
293by the
294.Fl k
295.Ar kstart Ns Op \&, Ns Ar kend
296argument.
297A missing
298.Ar kend
299argument defaults to the end of a line.
300.Pp
301The arguments
302.Ar kstart
303and
304.Ar kend
305have the form
306.Ar m Ns Li \&. Ns Ar n
307and can be followed by one or more of the letters
308.Cm b , d , f , i ,
309.Cm l , n ,
310and
311.Cm r ,
312which correspond to the options discussed above.
313A
314.Ar kstart
315position specified by
316.Ar m Ns Li \&. Ns Ar n
317.Pq Ar m , n No > 0
318is interpreted as the
319.Ar n Ns th
320character in the
321.Ar m Ns th
322field.
323A missing
324.Li \&. Ns Ar n
325in
326.Ar kstart
327means
328.Ql \&.1 ,
329indicating the first character of the
330.Ar m Ns th
331field; if the
332.Fl b
333option is in effect,
334.Ar n
335is counted from the first non-blank character in the
336.Ar m Ns th
337field;
338.Ar m Ns Li \&.1b
339refers to the first non-blank character in the
340.Ar m Ns th
341field.
342.Pp
343A
344.Ar kend
345position specified by
346.Ar m Ns Li \&. Ns Ar n
347is interpreted as
348the
349.Ar n Ns th
350character (including separators) of the
351.Ar m Ns th
352field.
353A missing
354.Li \&. Ns Ar n
355indicates the last character of the
356.Ar m Ns th
357field;
358.Ar m
359= \&0
360designates the end of a line.
361Thus the option
362.Fl k
363.Sm off
364.Xo
365.Ar v Li \&. Ar x Li \&,
366.Ar w Li \&. Ar y
367.Xc
368.Sm on
369is synonymous with the obsolescent option
370.Sm off
371.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
372.Fl Ar w-\&1 Li \&. Ar y ;
373.Sm on
374when
375.Ar y
376is omitted,
377.Fl k
378.Sm off
379.Ar v Li \&. Ar x Li \&, Ar w
380.Sm on
381is synonymous with
382.Sm off
383.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
384.Fl Ar w+1 Li \&.0 .
385.Sm on
386The obsolescent
387.Cm \(pl Ns Ar pos1
388.Fl Ns Ar pos2
389option is still supported, except for
390.Fl Ns Ar w Ns Li \&.0b ,
391which has no
392.Fl k
393equivalent.
394.Pp
395.Nm
396compares records by comparing the key fields selected by
397.Fl k
398arguments,
399from first given to last,
400until discovering a difference.
401If there are no
402.Fl k
403arguments, the whole record is treated as a single key.
404After exhausting the
405.Fl k
406arguments, if no difference has been found,
407then the result depends upon the
408.Fl u
409and
410.Fl S
411option settings.
412With
413.Fl u
414the records are considered identical, and one is supressed.
415Otherwise with
416.Fl s
417set (default) the records are left in their original order,
418or with
419.Fl S
420(posix mode) the whole record is considered as a tie breaker.
421.\"
422.\" If you fail to understand why it doesn't matter which order
423.\" the records are output when they are wholly identical, there
424.\" is nothing that this man page can say that wll help!
425.\"
426.Sh ENVIRONMENT
427If the following environment variable exists, it is used by
428.Nm .
429.Bl -tag -width Ev
430.It Ev TMPDIR
431.Nm
432uses the contents of the
433.Ev TMPDIR
434environment variable as the path in which to store
435temporary files.
436.El
437.Sh FILES
438.Bl -tag -width outputNUMBER+some -compact
439.It Pa /tmp/sort.*
440Default temporary files.
441.It Ar output Ns NUMBER
442Temporary file which is used for output if
443.Ar output
444already exists.
445Once sorting is finished, this file replaces
446.Ar output
447(via
448.Xr link 2
449and
450.Xr unlink 2 ) .
451.El
452.Sh EXIT STATUS
453Sort exits with one of the following values:
454.Bl -tag -width flag -compact
455.It 0
456Normal behavior.
457.It 1
458On disorder (or non-uniqueness) with the
459.Fl c
460(or
461.Fl C )
462option.
463.It 2
464An error occurred.
465.El
466.Sh SEE ALSO
467.Xr comm 1 ,
468.Xr join 1 ,
469.Xr uniq 1 ,
470.Xr qsort 3 ,
471.Xr radixsort 3
472.Sh HISTORY
473A
474.Nm
475command appeared in
476.At v5 .
477This
478.Nm
479implementation appeared in
480.Bx 4.4
481and is used since
482.Nx 1.6 .
483.Sh BUGS
484Posix requires the locale's thousands separator be ignored in numbers.
485It may be faster to sort very large files in pieces and then explicitly
486merge them.
487.Sh NOTES
488This
489.Nm
490has no limits on input line length (other than imposed by available
491memory) or any restrictions on bytes allowed within lines.
492.Pp
493To protect data
494.Nm
495.Fl o
496calls
497.Xr link 2
498and
499.Xr unlink 2 ,
500and thus fails on protected directories.
501.Pp
502Input files should be text files.
503If file doesn't end with record separator (which is typically newline), the
504.Nm
505utility silently supplies one.
506.Pp
507The current
508.Nm
509uses lexicographic radix sorting, which requires
510that sort keys be kept in memory (as opposed to previous versions which used quick
511and merge sorts and did not.)
512Thus performance depends highly on efficient choice of sort keys, and the
513.Fl b
514option and the
515.Ar kend
516argument of the
517.Fl k
518option should be used whenever possible.
519Similarly,
520.Nm
521.Fl k1f
522is equivalent to
523.Nm
524.Fl f
525and may take twice as long.
526