xref: /csrg-svn/contrib/sort/sort.1 (revision 62247)
1*62247Sbostic.\" Copyright (c) 1991, 1993
2*62247Sbostic.\"	The Regents of the University of California.  All rights reserved.
360910Sbostic.\"
460910Sbostic.\" This code is derived from software contributed to Berkeley by
560910Sbostic.\" the Institute of Electrical and Electronics Engineers, Inc.
660910Sbostic.\"
760910Sbostic.\" %sccs.include.redist.roff%
860910Sbostic.\"
9*62247Sbostic.\"     @(#)sort.1	8.1 (Berkeley) 06/06/93
1060910Sbostic.\"
1160910Sbostic.Dd
1260910Sbostic.Dt SORT 1
1360910Sbostic.Os
1460910Sbostic.Sh NAME
1560910Sbostic.Nm sort
1660910Sbostic.Nd sort or merge text files
1760910Sbostic.Sh SYNOPSIS
1860910Sbostic.Nm sort
1960910Sbostic.Op Fl cmubdfinr
2060910Sbostic.Op Fl t Ar char
2160910Sbostic.Op Fl T Ar char
2260910Sbostic.Oo
2360910Sbostic.Cm Fl k Ar field1[,field2]
2460910Sbostic.Oc
2560910Sbostic.Ar ...
2660910Sbostic.Op Fl o Ar output
2760910Sbostic.Op Ar file
2860910Sbostic.Ar ...
2960910Sbostic.Sh DESCRIPTION
3060910SbosticThe
3160910Sbostic.Nm sort
3260910Sbosticutility
3360910Sbosticsorts text files by lines.
3460910SbosticComparisons are based on one or more sort keys extracted
3560910Sbosticfrom each line of input, and are performed
3660910Sbosticlexicographically. By default, if keys are not given,
3760910Sbostic.Nm sort
3860910Sbosticregards each input line as a single field.
3960910Sbostic.Pp
4060910SbosticThe following options are available:
4160910Sbostic.Bl -tag -width indent
4260910Sbostic.It Fl c
4360910SbosticCheck that the single input file is sorted.
4460910SbosticIf the file is not sorted,
4560910Sbostic.Nm sort
4660910Sbosticproduces the appropriate error messages and exits with code 1;
4760910Sbosticotherwise,
4860910Sbostic.Nm sort
4960910Sbosticreturns 0.
5060910Sbostic.Nm Sort
5160910Sbostic.Fl c
5260910Sbosticproduces no output.
5360910Sbostic.It Fl m
5460910SbosticMerge only; the input files are assumed to be pre-sorted.
5560910Sbostic.It Fl o Ar output
5660910SbosticThe argument given is the name of an
5760910Sbostic.Ar output
5860910Sbosticfile to
5960910Sbosticbe used instead of the standard output.
6060910SbosticThis file
6160910Sbosticcan be the same as one of the input files.
6260910Sbostic.It Fl u
6360910SbosticUnique: suppress all but one in each set of lines
6460910Sbostichaving equal keys.
6560910SbosticIf used with the
6660910Sbostic.Fl c
6760910Sbosticoption,
6860910Sbosticcheck that there are no lines with duplicate keys.
6960910Sbostic.El
7060910Sbostic.Pp
7160910SbosticThe following options override the default ordering rules.
7260910SbosticWhen ordering options appear independent of key field
7360910Sbosticspecifications, the requested field ordering rules are
7460910Sbosticapplied globally to all sort keys.
7560910SbosticWhen attached to a specific key (see
7660910Sbostic.Fl k ) ,
7760910Sbosticthe ordering options override
7860910Sbosticall global ordering options for that key.
7960910Sbostic.Bl -tag -width indent
8060910Sbostic.It Fl d
8160910SbosticOnly blank space and alphanumeric characters
8260910Sbostic.\" according
8360910Sbostic.\" to the current setting of LC_CTYPE
8460910Sbosticare used
8560910Sbosticin making comparisons.
8660910Sbostic.It Fl f
8760910SbosticConsiders all lowercase characters that have uppercase
8860910Sbosticequivalents to be the same for purposes of
8960910Sbosticcomparison.
9060910Sbostic.It Fl i
9160910SbosticIgnore all non-printable characters.
9260910Sbostic.It Fl n
9360910SbosticAn initial numeric string, consisting of optional
9460910Sbosticblank space, optional minus sign, and zero or more
9560910Sbosticdigits (including decimal point)
9660910Sbostic.\" with
9760910Sbostic.\" optional radix character and thousands
9860910Sbostic.\" separator
9960910Sbostic.\" (as defined in the current locale),
10060910Sbosticis sorted by arithmetic value.
10160910Sbostic(The
10260910Sbostic.Fl n
10360910Sbosticoption no longer implies
10460910Sbosticthe
10560910Sbostic.Fl b
10660910Sbosticoption.)
10760910Sbostic.It Fl r
10860910SbosticReverse the sense of comparisons.
10960910Sbostic.El
11060910Sbostic.Pp
11160910SbosticThe treatment of field separators can be altered using the
11260910Sbosticoptions:
11360910Sbostic.Bl -tag -width indent
11460910Sbostic.It Fl b
11560910SbosticIgnores leading blank space when determining the start
11660910Sbosticand end of a restricted sort key.
11760910SbosticA
11860910Sbostic.Fl b
11960910Sbosticoption specified before the first
12060910Sbostic.Fl k
12160910Sbosticoption applies globally to all
12260910Sbostic.Fl k
12360910Sbosticoptions.
12460910SbosticOtherwise, the
12560910Sbostic.Fl b
12660910Sbosticoption can be
12760910Sbosticattached independently to each
12860910Sbostic.Ar field
12960910Sbosticargument of the
13060910Sbostic.Fl k
13160910Sbosticoption (see below).
13260910SbosticNote that the
13360910Sbostic.Fl b
13460910Sbosticoption
13560910Sbostichas no effect unless key fields are specified.
13660910Sbostic.It Fl t Ar char
13760910Sbostic.Ar Char
13860910Sbosticis used as the field separator character. The initial
13960910Sbostic.Ar char
14060910Sbosticis not considered to be part of a field when determining
14160910Sbostickey offsets (see below).
14260910SbosticEach occurrence of
14360910Sbostic.Ar char
14460910Sbosticis significant (for example,
14560910Sbostic.Dq Ar charchar
14660910Sbosticdelimits an empty field).
14760910SbosticIf
14860910Sbostic.Fl t
14960910Sbosticis not specified,
15060910Sbosticblank space characters are used as default field
15160910Sbosticseparators.
15260910Sbostic.It Fl T Ar char
15360910Sbostic.Ar Char
15460910Sbosticis used as the record separator character.
15560910SbosticThis should be used with discretion;
15660910Sbostic.Fl T Ar <alphanumeric>
15760910Sbosticusually produces undesirable results.
15860910SbosticThe default line separator is newline.
15960910Sbostic.It Fl k Ar field1[,field2]
16060910SbosticDesignates the starting position,
16160910Sbostic.Ar field1 ,
16260910Sbosticand optional ending position,
16360910Sbostic.Ar field2 ,
16460910Sbosticof a key field.
16560910SbosticThe
16660910Sbostic.Fl k
16760910Sbosticoption replaces the obsolescent options
16860910Sbostic.Cm \(pl Ns Ar pos1
16960910Sbosticand
17060910Sbostic.Fl Ns Ar pos2 .
17160910Sbostic.El
17260910Sbostic.Pp
17360910SbosticThe following operands are available:
17460910Sbostic.Bl -tag -width indent
17560910Sbostic.Ar file
17660910SbosticThe pathname of a file to be sorted, merged, or checked.
17760910SbosticIf no file
17860910Sbosticoperands are specified, or if
17960910Sbostica file operand is
18060910Sbostic.Fl ,
18160910Sbosticthe standard input is used.
18260910Sbostic.Pp
18360910SbosticA field is
18460910Sbosticdefined as a minimal sequence of characters followed by a
18560910Sbosticfield separator or a newline character.
18660910SbosticBy default, the first
18760910Sbosticblank space of a sequence of blank spaces acts as the field separator.
18860910SbosticAll blank spaces in a sequence of blank spaces are considered
18960910Sbosticas part of the next field; for example, all blank spaces at
19060910Sbosticthe beginning of a line are considered to be part of the
19160910Sbosticfirst field.
19260910Sbostic.Pp
19360910SbosticFields are specified
19460910Sbosticby the
19560910Sbostic.Fl k Ar field1[,field2]
19660910Sbosticargument. A missing
19760910Sbostic.Ar field2
19860910Sbosticargument defaults to the end of a line.
19960910Sbostic.Pp
20060910SbosticThe arguments
20160910Sbostic.Ar field1
20260910Sbosticand
20360910Sbostic.Ar field2
20460910Sbostichave the form
20560910Sbostic.Em m.n
20660910Sbosticfollowed by one or more of the options
20760910Sbostic.Fl b , d , f , i ,
20860910Sbostic.Fl n , r .
20960910SbosticA
21060910Sbostic.Ar field1
21160910Sbosticposition specified by
21260910Sbostic.Em m.n
21360910Sbostic.Em (m,n > 0)
21460910Sbosticis interpreted as the
21560910Sbostic.Em n Ns th
21660910Sbosticcharacter in the
21760910Sbostic.Em m Ns th
21860910Sbosticfield.
21960910SbosticA missing
22060910Sbostic.Em \&.n
22160910Sbosticin
22260910Sbostic.Ar field1
22360910Sbosticmeans
22460910Sbostic.Ql \&.1 ,
22560910Sbosticindicating the first character of the
22660910Sbostic.Em m Ns th
22760910Sbosticfield;
22860910SbosticIf the
22960910Sbostic.Fl b
23060910Sbosticoption is in effect,
23160910Sbostic.Em n
23260910Sbosticis counted from the first
23360910Sbosticnon-blank character in the
23460910Sbostic.Em m Ns th
23560910Sbosticfield;
23660910Sbostic.Em m Ns \&.1b
23760910Sbosticrefers to the first
23860910Sbosticnon-blank character in the
23960910Sbostic.Em m Ns th
24060910Sbosticfield.
24160910Sbostic.Pp
24260910SbosticA
24360910Sbostic.Ar field2
24460910Sbosticposition specified by
24560910Sbostic.Em m.n
24660910Sbosticis interpreted as
24760910Sbosticthe
24860910Sbostic.Em n Ns th
24960910Sbosticcharacter (including separators) of the
25060910Sbostic.Em m Ns th
25160910Sbosticfield.
25260910SbosticA missing
25360910Sbostic.Em \&.n
25460910Sbosticindicates the last character of the
25560910Sbostic.Em m Ns th
25660910Sbosticfield;
25760910Sbostic.Em m
25860910Sbostic= \&0
25960910Sbosticdesignates the end of a line.
26060910SbosticThus the option
26160910Sbostic.Fl k Ar v.x,w.y
26260910Sbosticis synonymous with the obsolescent option
26360910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1
26460910Sbostic.Fl Ns Ar w-\&1.y ;
26560910Sbosticwhen
26660910Sbostic.Em y
26760910Sbosticis omitted,
26860910Sbostic.Fl k Ar v.x,w
26960910Sbosticis synonymous with
27060910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1
27160910Sbostic.Fl Ns Ar w+1.0 .
27260910SbosticThe obsolescent
27360910Sbostic.Cm \(pl Ns Ar pos1
27460910Sbostic.Fl Ns Ar pos2
27560910Sbosticoption is still supported, except for
27660910Sbostic.Fl Ns Ar w\&.0b,
27760910Sbosticwhich has no
27860910Sbostic.Fl k
27960910Sbosticequivalent.
28060910Sbostic.Sh FILES
28160910Sbostic.Bl -tag -width Pa -compact
28260910Sbostic.It Pa /var/tmp/sort.*
28360910SbosticDefault temporary directories.
28460910Sbostic.It Pa Ar output Ns #PID
28560910SbosticTemporary name for
28660910Sbostic.Ar output
28760910Sbosticif
28860910Sbostic.Ar output
28960910Sbosticalready exists.
29060910Sbostic.El
29160910Sbostic.Sh SEE ALSO
29260910Sbostic.Xr comm 1 ,
29360910Sbostic.Xr uniq 1 ,
29460910Sbostic.Xr join 1
29560910Sbostic.Sh RETURN VALUES
29660910SbosticSort exits with one of the following values:
29760910Sbostic.Bl -tag -width flag -compact
29860910Sbostic.It Pa 0:
29960910Sbosticnormal behavior.
30060910Sbostic.It Pa 1:
30160910Sbosticon disorder (or non-uniqueness) with the
30260910Sbostic.Fl c
30360910Sbosticoption
30460910Sbostic.It Pa 2:
30560910Sbostican error occurred.
30660910Sbostic.Sh BUGS
30760910SbosticLines longer than 65522 characters are discarded and processing continues.
30860910SbosticTo sort files larger than 60Mb, use
30960910Sbostic.Nm sort
31060910Sbostic.Fl H ;
31160910Sbosticfiles larger than 704Mb must be sorted in smaller pieces, then merged.
31260910SbosticTo protect data
31360910Sbostic.Nm sort
31460910Sbostic.Fl o
31560910Sbosticcalls link and unlink, and thus fails in protected directories.
31660910Sbostic.Sh HISTORY
31760910SbosticA
31860910Sbostic.Nm sort
31960910Sbosticcommand appeared in
32060910Sbostic.At v6 .
32160910Sbostic.Sh NOTES
32260910SbosticThe current sort command uses lexicographic radix sorting, which requires
32360910Sbosticthat sort keys be kept in memory (as opposed to previous versions which used quick
32460910Sbosticand merge sorts and did not.)
32560910SbosticThus performance depends highly on efficient choice of sort keys, and the
32660910Sbostic.Fl b
32760910Sbosticoption and the
32860910Sbostic.Ar field2
32960910Sbosticargument of the
33060910Sbostic.Fl k
33160910Sbosticoption should be used whenever possible.
33260910SbosticSimilarly,
33360910Sbostic.Nm sort
33460910Sbostic.Fl k1f
33560910Sbosticis equivalent to
33660910Sbostic.Nm sort
33760910Sbostic.Fl f
33860910Sbosticand may take twice as long.
339