1*60910Sbostic.\" Copyright (c) 1991 Regents of the University of California. 2*60910Sbostic.\" All rights reserved. 3*60910Sbostic.\" 4*60910Sbostic.\" This code is derived from software contributed to Berkeley by 5*60910Sbostic.\" the Institute of Electrical and Electronics Engineers, Inc. 6*60910Sbostic.\" 7*60910Sbostic.\" %sccs.include.redist.roff% 8*60910Sbostic.\" 9*60910Sbostic.\" @(#)sort.1 5.1 (Berkeley) 06/01/93 10*60910Sbostic.\" 11*60910Sbostic.Dd 12*60910Sbostic.Dt SORT 1 13*60910Sbostic.Os 14*60910Sbostic.Sh NAME 15*60910Sbostic.Nm sort 16*60910Sbostic.Nd sort or merge text files 17*60910Sbostic.Sh SYNOPSIS 18*60910Sbostic.Nm sort 19*60910Sbostic.Op Fl cmubdfinr 20*60910Sbostic.Op Fl t Ar char 21*60910Sbostic.Op Fl T Ar char 22*60910Sbostic.Oo 23*60910Sbostic.Cm Fl k Ar field1[,field2] 24*60910Sbostic.Oc 25*60910Sbostic.Ar ... 26*60910Sbostic.Op Fl o Ar output 27*60910Sbostic.Op Ar file 28*60910Sbostic.Ar ... 29*60910Sbostic.Sh DESCRIPTION 30*60910SbosticThe 31*60910Sbostic.Nm sort 32*60910Sbosticutility 33*60910Sbosticsorts text files by lines. 34*60910SbosticComparisons are based on one or more sort keys extracted 35*60910Sbosticfrom each line of input, and are performed 36*60910Sbosticlexicographically. By default, if keys are not given, 37*60910Sbostic.Nm sort 38*60910Sbosticregards each input line as a single field. 39*60910Sbostic.Pp 40*60910SbosticThe following options are available: 41*60910Sbostic.Bl -tag -width indent 42*60910Sbostic.It Fl c 43*60910SbosticCheck that the single input file is sorted. 44*60910SbosticIf the file is not sorted, 45*60910Sbostic.Nm sort 46*60910Sbosticproduces the appropriate error messages and exits with code 1; 47*60910Sbosticotherwise, 48*60910Sbostic.Nm sort 49*60910Sbosticreturns 0. 50*60910Sbostic.Nm Sort 51*60910Sbostic.Fl c 52*60910Sbosticproduces no output. 53*60910Sbostic.It Fl m 54*60910SbosticMerge only; the input files are assumed to be pre-sorted. 55*60910Sbostic.It Fl o Ar output 56*60910SbosticThe argument given is the name of an 57*60910Sbostic.Ar output 58*60910Sbosticfile to 59*60910Sbosticbe used instead of the standard output. 60*60910SbosticThis file 61*60910Sbosticcan be the same as one of the input files. 62*60910Sbostic.It Fl u 63*60910SbosticUnique: suppress all but one in each set of lines 64*60910Sbostichaving equal keys. 65*60910SbosticIf used with the 66*60910Sbostic.Fl c 67*60910Sbosticoption, 68*60910Sbosticcheck that there are no lines with duplicate keys. 69*60910Sbostic.El 70*60910Sbostic.Pp 71*60910SbosticThe following options override the default ordering rules. 72*60910SbosticWhen ordering options appear independent of key field 73*60910Sbosticspecifications, the requested field ordering rules are 74*60910Sbosticapplied globally to all sort keys. 75*60910SbosticWhen attached to a specific key (see 76*60910Sbostic.Fl k ) , 77*60910Sbosticthe ordering options override 78*60910Sbosticall global ordering options for that key. 79*60910Sbostic.Bl -tag -width indent 80*60910Sbostic.It Fl d 81*60910SbosticOnly blank space and alphanumeric characters 82*60910Sbostic.\" according 83*60910Sbostic.\" to the current setting of LC_CTYPE 84*60910Sbosticare used 85*60910Sbosticin making comparisons. 86*60910Sbostic.It Fl f 87*60910SbosticConsiders all lowercase characters that have uppercase 88*60910Sbosticequivalents to be the same for purposes of 89*60910Sbosticcomparison. 90*60910Sbostic.It Fl i 91*60910SbosticIgnore all non-printable characters. 92*60910Sbostic.It Fl n 93*60910SbosticAn initial numeric string, consisting of optional 94*60910Sbosticblank space, optional minus sign, and zero or more 95*60910Sbosticdigits (including decimal point) 96*60910Sbostic.\" with 97*60910Sbostic.\" optional radix character and thousands 98*60910Sbostic.\" separator 99*60910Sbostic.\" (as defined in the current locale), 100*60910Sbosticis sorted by arithmetic value. 101*60910Sbostic(The 102*60910Sbostic.Fl n 103*60910Sbosticoption no longer implies 104*60910Sbosticthe 105*60910Sbostic.Fl b 106*60910Sbosticoption.) 107*60910Sbostic.It Fl r 108*60910SbosticReverse the sense of comparisons. 109*60910Sbostic.El 110*60910Sbostic.Pp 111*60910SbosticThe treatment of field separators can be altered using the 112*60910Sbosticoptions: 113*60910Sbostic.Bl -tag -width indent 114*60910Sbostic.It Fl b 115*60910SbosticIgnores leading blank space when determining the start 116*60910Sbosticand end of a restricted sort key. 117*60910SbosticA 118*60910Sbostic.Fl b 119*60910Sbosticoption specified before the first 120*60910Sbostic.Fl k 121*60910Sbosticoption applies globally to all 122*60910Sbostic.Fl k 123*60910Sbosticoptions. 124*60910SbosticOtherwise, the 125*60910Sbostic.Fl b 126*60910Sbosticoption can be 127*60910Sbosticattached independently to each 128*60910Sbostic.Ar field 129*60910Sbosticargument of the 130*60910Sbostic.Fl k 131*60910Sbosticoption (see below). 132*60910SbosticNote that the 133*60910Sbostic.Fl b 134*60910Sbosticoption 135*60910Sbostichas no effect unless key fields are specified. 136*60910Sbostic.It Fl t Ar char 137*60910Sbostic.Ar Char 138*60910Sbosticis used as the field separator character. The initial 139*60910Sbostic.Ar char 140*60910Sbosticis not considered to be part of a field when determining 141*60910Sbostickey offsets (see below). 142*60910SbosticEach occurrence of 143*60910Sbostic.Ar char 144*60910Sbosticis significant (for example, 145*60910Sbostic.Dq Ar charchar 146*60910Sbosticdelimits an empty field). 147*60910SbosticIf 148*60910Sbostic.Fl t 149*60910Sbosticis not specified, 150*60910Sbosticblank space characters are used as default field 151*60910Sbosticseparators. 152*60910Sbostic.It Fl T Ar char 153*60910Sbostic.Ar Char 154*60910Sbosticis used as the record separator character. 155*60910SbosticThis should be used with discretion; 156*60910Sbostic.Fl T Ar <alphanumeric> 157*60910Sbosticusually produces undesirable results. 158*60910SbosticThe default line separator is newline. 159*60910Sbostic.It Fl k Ar field1[,field2] 160*60910SbosticDesignates the starting position, 161*60910Sbostic.Ar field1 , 162*60910Sbosticand optional ending position, 163*60910Sbostic.Ar field2 , 164*60910Sbosticof a key field. 165*60910SbosticThe 166*60910Sbostic.Fl k 167*60910Sbosticoption replaces the obsolescent options 168*60910Sbostic.Cm \(pl Ns Ar pos1 169*60910Sbosticand 170*60910Sbostic.Fl Ns Ar pos2 . 171*60910Sbostic.El 172*60910Sbostic.Pp 173*60910SbosticThe following operands are available: 174*60910Sbostic.Bl -tag -width indent 175*60910Sbostic.Ar file 176*60910SbosticThe pathname of a file to be sorted, merged, or checked. 177*60910SbosticIf no file 178*60910Sbosticoperands are specified, or if 179*60910Sbostica file operand is 180*60910Sbostic.Fl , 181*60910Sbosticthe standard input is used. 182*60910Sbostic.Pp 183*60910SbosticA field is 184*60910Sbosticdefined as a minimal sequence of characters followed by a 185*60910Sbosticfield separator or a newline character. 186*60910SbosticBy default, the first 187*60910Sbosticblank space of a sequence of blank spaces acts as the field separator. 188*60910SbosticAll blank spaces in a sequence of blank spaces are considered 189*60910Sbosticas part of the next field; for example, all blank spaces at 190*60910Sbosticthe beginning of a line are considered to be part of the 191*60910Sbosticfirst field. 192*60910Sbostic.Pp 193*60910SbosticFields are specified 194*60910Sbosticby the 195*60910Sbostic.Fl k Ar field1[,field2] 196*60910Sbosticargument. A missing 197*60910Sbostic.Ar field2 198*60910Sbosticargument defaults to the end of a line. 199*60910Sbostic.Pp 200*60910SbosticThe arguments 201*60910Sbostic.Ar field1 202*60910Sbosticand 203*60910Sbostic.Ar field2 204*60910Sbostichave the form 205*60910Sbostic.Em m.n 206*60910Sbosticfollowed by one or more of the options 207*60910Sbostic.Fl b , d , f , i , 208*60910Sbostic.Fl n , r . 209*60910SbosticA 210*60910Sbostic.Ar field1 211*60910Sbosticposition specified by 212*60910Sbostic.Em m.n 213*60910Sbostic.Em (m,n > 0) 214*60910Sbosticis interpreted as the 215*60910Sbostic.Em n Ns th 216*60910Sbosticcharacter in the 217*60910Sbostic.Em m Ns th 218*60910Sbosticfield. 219*60910SbosticA missing 220*60910Sbostic.Em \&.n 221*60910Sbosticin 222*60910Sbostic.Ar field1 223*60910Sbosticmeans 224*60910Sbostic.Ql \&.1 , 225*60910Sbosticindicating the first character of the 226*60910Sbostic.Em m Ns th 227*60910Sbosticfield; 228*60910SbosticIf the 229*60910Sbostic.Fl b 230*60910Sbosticoption is in effect, 231*60910Sbostic.Em n 232*60910Sbosticis counted from the first 233*60910Sbosticnon-blank character in the 234*60910Sbostic.Em m Ns th 235*60910Sbosticfield; 236*60910Sbostic.Em m Ns \&.1b 237*60910Sbosticrefers to the first 238*60910Sbosticnon-blank character in the 239*60910Sbostic.Em m Ns th 240*60910Sbosticfield. 241*60910Sbostic.Pp 242*60910SbosticA 243*60910Sbostic.Ar field2 244*60910Sbosticposition specified by 245*60910Sbostic.Em m.n 246*60910Sbosticis interpreted as 247*60910Sbosticthe 248*60910Sbostic.Em n Ns th 249*60910Sbosticcharacter (including separators) of the 250*60910Sbostic.Em m Ns th 251*60910Sbosticfield. 252*60910SbosticA missing 253*60910Sbostic.Em \&.n 254*60910Sbosticindicates the last character of the 255*60910Sbostic.Em m Ns th 256*60910Sbosticfield; 257*60910Sbostic.Em m 258*60910Sbostic= \&0 259*60910Sbosticdesignates the end of a line. 260*60910SbosticThus the option 261*60910Sbostic.Fl k Ar v.x,w.y 262*60910Sbosticis synonymous with the obsolescent option 263*60910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1 264*60910Sbostic.Fl Ns Ar w-\&1.y ; 265*60910Sbosticwhen 266*60910Sbostic.Em y 267*60910Sbosticis omitted, 268*60910Sbostic.Fl k Ar v.x,w 269*60910Sbosticis synonymous with 270*60910Sbostic.Cm \(pl Ns Ar v-\&1.x-\&1 271*60910Sbostic.Fl Ns Ar w+1.0 . 272*60910SbosticThe obsolescent 273*60910Sbostic.Cm \(pl Ns Ar pos1 274*60910Sbostic.Fl Ns Ar pos2 275*60910Sbosticoption is still supported, except for 276*60910Sbostic.Fl Ns Ar w\&.0b, 277*60910Sbosticwhich has no 278*60910Sbostic.Fl k 279*60910Sbosticequivalent. 280*60910Sbostic.Sh FILES 281*60910Sbostic.Bl -tag -width Pa -compact 282*60910Sbostic.It Pa /var/tmp/sort.* 283*60910SbosticDefault temporary directories. 284*60910Sbostic.It Pa Ar output Ns #PID 285*60910SbosticTemporary name for 286*60910Sbostic.Ar output 287*60910Sbosticif 288*60910Sbostic.Ar output 289*60910Sbosticalready exists. 290*60910Sbostic.El 291*60910Sbostic.Sh SEE ALSO 292*60910Sbostic.Xr comm 1 , 293*60910Sbostic.Xr uniq 1 , 294*60910Sbostic.Xr join 1 295*60910Sbostic.Sh RETURN VALUES 296*60910SbosticSort exits with one of the following values: 297*60910Sbostic.Bl -tag -width flag -compact 298*60910Sbostic.It Pa 0: 299*60910Sbosticnormal behavior. 300*60910Sbostic.It Pa 1: 301*60910Sbosticon disorder (or non-uniqueness) with the 302*60910Sbostic.Fl c 303*60910Sbosticoption 304*60910Sbostic.It Pa 2: 305*60910Sbostican error occurred. 306*60910Sbostic.Sh BUGS 307*60910SbosticLines longer than 65522 characters are discarded and processing continues. 308*60910SbosticTo sort files larger than 60Mb, use 309*60910Sbostic.Nm sort 310*60910Sbostic.Fl H ; 311*60910Sbosticfiles larger than 704Mb must be sorted in smaller pieces, then merged. 312*60910SbosticTo protect data 313*60910Sbostic.Nm sort 314*60910Sbostic.Fl o 315*60910Sbosticcalls link and unlink, and thus fails in protected directories. 316*60910Sbostic.Sh HISTORY 317*60910SbosticA 318*60910Sbostic.Nm sort 319*60910Sbosticcommand appeared in 320*60910Sbostic.At v6 . 321*60910Sbostic.Sh NOTES 322*60910SbosticThe current sort command uses lexicographic radix sorting, which requires 323*60910Sbosticthat sort keys be kept in memory (as opposed to previous versions which used quick 324*60910Sbosticand merge sorts and did not.) 325*60910SbosticThus performance depends highly on efficient choice of sort keys, and the 326*60910Sbostic.Fl b 327*60910Sbosticoption and the 328*60910Sbostic.Ar field2 329*60910Sbosticargument of the 330*60910Sbostic.Fl k 331*60910Sbosticoption should be used whenever possible. 332*60910SbosticSimilarly, 333*60910Sbostic.Nm sort 334*60910Sbostic.Fl k1f 335*60910Sbosticis equivalent to 336*60910Sbostic.Nm sort 337*60910Sbostic.Fl f 338*60910Sbosticand may take twice as long. 339