1*62247Sbostic.\" Copyright (c) 1991, 1993 2*62247Sbostic.\" The Regents of the University of California. All rights reserved. 360910Sbostic.\" 460910Sbostic.\" This code is derived from software contributed to Berkeley by 560910Sbostic.\" the Institute of Electrical and Electronics Engineers, Inc. 660910Sbostic.\" 760910Sbostic.\" %sccs.include.redist.roff% 860910Sbostic.\" 9*62247Sbostic.\" @(#)sort.1 8.1 (Berkeley) 06/06/93 1060910Sbostic.\" 1160910Sbostic.Dd 1260910Sbostic.Dt SORT 1 1360910Sbostic.Os 1460910Sbostic.Sh NAME 1560910Sbostic.Nm sort 1660910Sbostic.Nd sort or merge text files 1760910Sbostic.Sh SYNOPSIS 1860910Sbostic.Nm sort 1960910Sbostic.Op Fl cmubdfinr 2060910Sbostic.Op Fl t Ar char 2160910Sbostic.Op Fl T Ar char 2260910Sbostic.Oo 2360910Sbostic.Cm Fl k Ar field1[,field2] 2460910Sbostic.Oc 2560910Sbostic.Ar ... 2660910Sbostic.Op Fl o Ar output 2760910Sbostic.Op Ar file 2860910Sbostic.Ar ... 2960910Sbostic.Sh DESCRIPTION 3060910SbosticThe 3160910Sbostic.Nm sort 3260910Sbosticutility 3360910Sbosticsorts text files by lines. 3460910SbosticComparisons are based on one or more sort keys extracted 3560910Sbosticfrom each line of input, and are performed 3660910Sbosticlexicographically. By default, if keys are not given, 3760910Sbostic.Nm sort 3860910Sbosticregards each input line as a single field. 3960910Sbostic.Pp 4060910SbosticThe following options are available: 4160910Sbostic.Bl -tag -width indent 4260910Sbostic.It Fl c 4360910SbosticCheck that the single input file is sorted. 4460910SbosticIf the file is not sorted, 4560910Sbostic.Nm sort 4660910Sbosticproduces the appropriate error messages and exits with code 1; 4760910Sbosticotherwise, 4860910Sbostic.Nm sort 4960910Sbosticreturns 0. 5060910Sbostic.Nm Sort 5160910Sbostic.Fl c 5260910Sbosticproduces no output. 5360910Sbostic.It Fl m 5460910SbosticMerge only; the input files are assumed to be pre-sorted. 5560910Sbostic.It Fl o Ar output 5660910SbosticThe argument given is the name of an 5760910Sbostic.Ar output 5860910Sbosticfile to 5960910Sbosticbe used instead of the standard output. 6060910SbosticThis file 6160910Sbosticcan be the same as one of the input files. 6260910Sbostic.It Fl u 6360910SbosticUnique: suppress all but one in each set of lines 6460910Sbostichaving equal keys. 6560910SbosticIf used with the 6660910Sbostic.Fl c 6760910Sbosticoption, 6860910Sbosticcheck that there are no lines with duplicate keys. 6960910Sbostic.El 7060910Sbostic.Pp 7160910SbosticThe following options override the default ordering rules. 7260910SbosticWhen ordering options appear independent of key field 7360910Sbosticspecifications, the requested field ordering rules are 7460910Sbosticapplied globally to all sort keys. 7560910SbosticWhen attached to a specific key (see 7660910Sbostic.Fl k ) , 7760910Sbosticthe ordering options override 7860910Sbosticall global ordering options for that key. 7960910Sbostic.Bl -tag -width indent 8060910Sbostic.It Fl d 8160910SbosticOnly blank space and alphanumeric characters 8260910Sbostic.\" according 8360910Sbostic.\" to the current setting of LC_CTYPE 8460910Sbosticare used 8560910Sbosticin making comparisons. 8660910Sbostic.It Fl f 8760910SbosticConsiders all lowercase characters that have uppercase 8860910Sbosticequivalents to be the same for purposes of 8960910Sbosticcomparison. 9060910Sbostic.It Fl i 9160910SbosticIgnore all non-printable characters. 9260910Sbostic.It Fl n 9360910SbosticAn initial numeric string, consisting of optional 9460910Sbosticblank space, optional minus sign, and zero or more 9560910Sbosticdigits (including decimal point) 9660910Sbostic.\" with 9760910Sbostic.\" optional radix character and thousands 9860910Sbostic.\" separator 9960910Sbostic.\" (as defined in the current locale), 10060910Sbosticis sorted by arithmetic value. 10160910Sbostic(The 10260910Sbostic.Fl n 10360910Sbosticoption no longer implies 10460910Sbosticthe 10560910Sbostic.Fl b 10660910Sbosticoption.) 10760910Sbostic.It Fl r 10860910SbosticReverse the sense of comparisons. 10960910Sbostic.El 11060910Sbostic.Pp 11160910SbosticThe treatment of field separators can be altered using the 11260910Sbosticoptions: 11360910Sbostic.Bl -tag -width indent 11460910Sbostic.It Fl b 11560910SbosticIgnores leading blank space when determining the start 11660910Sbosticand end of a restricted sort key. 11760910SbosticA 11860910Sbostic.Fl b 11960910Sbosticoption specified before the first 12060910Sbostic.Fl k 12160910Sbosticoption applies globally to all 12260910Sbostic.Fl k 12360910Sbosticoptions. 12460910SbosticOtherwise, the 12560910Sbostic.Fl b 12660910Sbosticoption can be 12760910Sbosticattached independently to each 12860910Sbostic.Ar field 12960910Sbosticargument of the 13060910Sbostic.Fl k 13160910Sbosticoption (see below). 13260910SbosticNote that the 13360910Sbostic.Fl b 13460910Sbosticoption 13560910Sbostichas no effect unless key fields are specified. 13660910Sbostic.It Fl t Ar char 13760910Sbostic.Ar Char 13860910Sbosticis used as the field separator character. The initial 13960910Sbostic.Ar char 14060910Sbosticis not considered to be part of a field when determining 14160910Sbostickey offsets (see below). 14260910SbosticEach occurrence of 14360910Sbostic.Ar char 14460910Sbosticis significant (for example, 14560910Sbostic.Dq Ar charchar 14660910Sbosticdelimits an empty field). 14760910SbosticIf 14860910Sbostic.Fl t 14960910Sbosticis not specified, 15060910Sbosticblank space characters are used as default field 15160910Sbosticseparators. 15260910Sbostic.It Fl T Ar char 15360910Sbostic.Ar Char 15460910Sbosticis used as the record separator character. 15560910SbosticThis should be used with discretion; 15660910Sbostic.Fl T Ar <alphanumeric> 15760910Sbosticusually produces undesirable results. 15860910SbosticThe default line separator is newline. 15960910Sbostic.It Fl k Ar field1[,field2] 16060910SbosticDesignates the starting position, 16160910Sbostic.Ar field1 , 16260910Sbosticand optional ending position, 16360910Sbostic.Ar field2 , 16460910Sbosticof a key field. 16560910SbosticThe 16660910Sbostic.Fl k 16760910Sbosticoption replaces the obsolescent options 16860910Sbostic.Cm \(pl Ns Ar pos1 16960910Sbosticand 17060910Sbostic.Fl Ns Ar pos2 . 17160910Sbostic.El 17260910Sbostic.Pp 17360910SbosticThe following operands are available: 17460910Sbostic.Bl -tag -width indent 17560910Sbostic.Ar file 17660910SbosticThe pathname of a file to be sorted, merged, or checked. 17760910SbosticIf no file 17860910Sbosticoperands are specified, or if 17960910Sbostica file operand is 18060910Sbostic.Fl , 18160910Sbosticthe standard input is used. 18260910Sbostic.Pp 18360910SbosticA field is 18460910Sbosticdefined as a minimal sequence of characters followed by a 18560910Sbosticfield separator or a newline character. 18660910SbosticBy default, the first 18760910Sbosticblank space of a sequence of blank spaces acts as the field separator. 18860910SbosticAll blank spaces in a sequence of blank spaces are considered 18960910Sbosticas part of the next field; for example, all blank spaces at 19060910Sbosticthe beginning of a line are considered to be part of the 19160910Sbosticfirst field. 19260910Sbostic.Pp 19360910SbosticFields are specified 19460910Sbosticby the 19560910Sbostic.Fl k Ar field1[,field2] 19660910Sbosticargument. A missing 19760910Sbostic.Ar field2 19860910Sbosticargument defaults to the end of a line. 19960910Sbostic.Pp 20060910SbosticThe arguments 20160910Sbostic.Ar field1 20260910Sbosticand 20360910Sbostic.Ar field2 20460910Sbostichave the form 20560910Sbostic.Em m.n 20660910Sbosticfollowed by one or more of the options 20760910Sbostic.Fl b , d , f , i , 20860910Sbostic.Fl n , r . 20960910SbosticA 21060910Sbostic.Ar field1 21160910Sbosticposition specified by 21260910Sbostic.Em m.n 21360910Sbostic.Em (m,n > 0) 21460910Sbosticis interpreted as the 21560910Sbostic.Em n Ns th 21660910Sbosticcharacter in the 21760910Sbostic.Em m Ns th 21860910Sbosticfield. 21960910SbosticA missing 22060910Sbostic.Em \&.n 22160910Sbosticin 22260910Sbostic.Ar field1 22360910Sbosticmeans 22460910Sbostic.Ql \&.1 , 22560910Sbosticindicating the first character of the 22660910Sbostic.Em m Ns th 22760910Sbosticfield; 22860910SbosticIf the 22960910Sbostic.Fl b 23060910Sbosticoption is in effect, 23160910Sbostic.Em n 23260910Sbosticis counted from the first 23360910Sbosticnon-blank character in the 23460910Sbostic.Em m Ns th 23560910Sbosticfield; 23660910Sbostic.Em m Ns \&.1b 23760910Sbosticrefers to the first 23860910Sbosticnon-blank character in the 23960910Sbostic.Em m Ns th 24060910Sbosticfield. 24160910Sbostic.Pp 24260910SbosticA 24360910Sbostic.Ar field2 24460910Sbosticposition specified by 24560910Sbostic.Em m.n 24660910Sbosticis interpreted as 24760910Sbosticthe 24860910Sbostic.Em n Ns th 24960910Sbosticcharacter (including separators) of the 25060910Sbostic.Em m Ns th 25160910Sbosticfield. 25260910SbosticA missing 25360910Sbostic.Em \&.n 25460910Sbosticindicates the last character of the 25560910Sbostic.Em m Ns th 25660910Sbosticfield; 25760910Sbostic.Em m 25860910Sbostic= \&0 25960910Sbosticdesignates the end of a line. 26060910SbosticThus the option 26160910Sbostic.Fl k Ar v.x,w.y 26260910Sbosticis synonymous with the obsolescent option 26360910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1 26460910Sbostic.Fl Ns Ar w-\&1.y ; 26560910Sbosticwhen 26660910Sbostic.Em y 26760910Sbosticis omitted, 26860910Sbostic.Fl k Ar v.x,w 26960910Sbosticis synonymous with 27060910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1 27160910Sbostic.Fl Ns Ar w+1.0 . 27260910SbosticThe obsolescent 27360910Sbostic.Cm \(pl Ns Ar pos1 27460910Sbostic.Fl Ns Ar pos2 27560910Sbosticoption is still supported, except for 27660910Sbostic.Fl Ns Ar w\&.0b, 27760910Sbosticwhich has no 27860910Sbostic.Fl k 27960910Sbosticequivalent. 28060910Sbostic.Sh FILES 28160910Sbostic.Bl -tag -width Pa -compact 28260910Sbostic.It Pa /var/tmp/sort.* 28360910SbosticDefault temporary directories. 28460910Sbostic.It Pa Ar output Ns #PID 28560910SbosticTemporary name for 28660910Sbostic.Ar output 28760910Sbosticif 28860910Sbostic.Ar output 28960910Sbosticalready exists. 29060910Sbostic.El 29160910Sbostic.Sh SEE ALSO 29260910Sbostic.Xr comm 1 , 29360910Sbostic.Xr uniq 1 , 29460910Sbostic.Xr join 1 29560910Sbostic.Sh RETURN VALUES 29660910SbosticSort exits with one of the following values: 29760910Sbostic.Bl -tag -width flag -compact 29860910Sbostic.It Pa 0: 29960910Sbosticnormal behavior. 30060910Sbostic.It Pa 1: 30160910Sbosticon disorder (or non-uniqueness) with the 30260910Sbostic.Fl c 30360910Sbosticoption 30460910Sbostic.It Pa 2: 30560910Sbostican error occurred. 30660910Sbostic.Sh BUGS 30760910SbosticLines longer than 65522 characters are discarded and processing continues. 30860910SbosticTo sort files larger than 60Mb, use 30960910Sbostic.Nm sort 31060910Sbostic.Fl H ; 31160910Sbosticfiles larger than 704Mb must be sorted in smaller pieces, then merged. 31260910SbosticTo protect data 31360910Sbostic.Nm sort 31460910Sbostic.Fl o 31560910Sbosticcalls link and unlink, and thus fails in protected directories. 31660910Sbostic.Sh HISTORY 31760910SbosticA 31860910Sbostic.Nm sort 31960910Sbosticcommand appeared in 32060910Sbostic.At v6 . 32160910Sbostic.Sh NOTES 32260910SbosticThe current sort command uses lexicographic radix sorting, which requires 32360910Sbosticthat sort keys be kept in memory (as opposed to previous versions which used quick 32460910Sbosticand merge sorts and did not.) 32560910SbosticThus performance depends highly on efficient choice of sort keys, and the 32660910Sbostic.Fl b 32760910Sbosticoption and the 32860910Sbostic.Ar field2 32960910Sbosticargument of the 33060910Sbostic.Fl k 33160910Sbosticoption should be used whenever possible. 33260910SbosticSimilarly, 33360910Sbostic.Nm sort 33460910Sbostic.Fl k1f 33560910Sbosticis equivalent to 33660910Sbostic.Nm sort 33760910Sbostic.Fl f 33860910Sbosticand may take twice as long. 339