1.\" $NetBSD: sort.1,v 1.26 2008/05/02 18:11:06 martin Exp $ 2.\" 3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Ben Harris and Jaromir Dolecek. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" Copyright (c) 1991, 1993 31.\" The Regents of the University of California. All rights reserved. 32.\" 33.\" This code is derived from software contributed to Berkeley by 34.\" the Institute of Electrical and Electronics Engineers, Inc. 35.\" 36.\" Redistribution and use in source and binary forms, with or without 37.\" modification, are permitted provided that the following conditions 38.\" are met: 39.\" 1. Redistributions of source code must retain the above copyright 40.\" notice, this list of conditions and the following disclaimer. 41.\" 2. Redistributions in binary form must reproduce the above copyright 42.\" notice, this list of conditions and the following disclaimer in the 43.\" documentation and/or other materials provided with the distribution. 44.\" 3. Neither the name of the University nor the names of its contributors 45.\" may be used to endorse or promote products derived from this software 46.\" without specific prior written permission. 47.\" 48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 51.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 58.\" SUCH DAMAGE. 59.\" 60.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 61.\" 62.Dd January 13, 2001 63.Dt SORT 1 64.Os 65.Sh NAME 66.Nm sort 67.Nd sort or merge text files 68.Sh SYNOPSIS 69.Nm sort 70.Op Fl bcdfHimnrSsu 71.Oo 72.Fl k 73.Ar field1 Ns Op Li \&, Ns Ar field2 74.Oc 75.Op Fl o Ar output 76.Op Fl R Ar char 77.Op Fl T Ar dir 78.Op Fl t Ar char 79.Op Ar 80.Sh DESCRIPTION 81The 82.Nm 83utility sorts text files by lines. 84Comparisons are based on one or more sort keys extracted 85from each line of input, and are performed lexicographically. 86By default, if keys are not given, 87.Nm 88regards each input line as a single field. 89.Pp 90The following options are available: 91.Bl -tag -width Fl 92.It Fl c 93Check that the single input file is sorted. 94If the file is not sorted, 95.Nm 96produces the appropriate error messages and exits with code 1; otherwise, 97.Nm 98returns 0. 99.Nm 100.Fl c 101produces no output. 102.It Fl m 103Merge only; the input files are assumed to be pre-sorted. 104.It Fl o Ar output 105The argument given is the name of an 106.Ar output 107file to be used instead of the standard output. 108This file can be the same as one of the input files. 109.It Fl T Ar dir 110Use 111.Ar dir 112as the directory for temporary files. 113The default is the value specified in the environment variable 114.Ev TMPDIR or 115.Pa /tmp 116if 117.Ev TMPDIR 118is not defined. 119.It Fl u 120Unique: suppress all but one in each set of lines having equal keys. 121If used with the 122.Fl c 123option, check that there are no lines with duplicate keys. 124.El 125.Pp 126The following options override the default ordering rules. 127When ordering options appear independent of key field 128specifications, the requested field ordering rules are 129applied globally to all sort keys. 130When attached to a specific key (see 131.Fl k ) , 132the ordering options override 133all global ordering options for that key. 134.Bl -tag -width Fl 135.It Fl d 136Only blank space and alphanumeric characters 137.\" according 138.\" to the current setting of LC_CTYPE 139are used 140in making comparisons. 141.It Fl f 142Considers all lowercase characters that have uppercase 143equivalents to be the same for purposes of comparison. 144.It Fl i 145Ignore all non-printable characters. 146.It Fl n 147An initial numeric string, consisting of optional blank space, optional 148minus sign, and zero or more digits (including decimal point) 149.\" with 150.\" optional radix character and thousands 151.\" separator 152.\" (as defined in the current locale), 153is sorted by arithmetic value. 154(The 155.Fl n 156option no longer implies the 157.Fl b 158option.) 159.It Fl r 160Reverse the sense of comparisons. 161.It Fl S 162Don't use stable sort. 163Default is to use stable sort. 164.It Fl s 165Use stable sort. 166This is the default. 167Provided for compatibility with other 168.Nm 169implementations only. 170.It Fl H 171Use a merge sort instead of a radix sort. 172This option should be used for files larger than 60Mb. 173.El 174.Pp 175The treatment of field separators can be altered using these options: 176.Bl -tag -width Fl 177.It Fl b 178Ignores leading blank space when determining the start 179and end of a restricted sort key. 180A 181.Fl b 182option specified before the first 183.Fl k 184option applies globally to all 185.Fl k 186options. 187Otherwise, the 188.Fl b 189option can be attached independently to each 190.Ar field 191argument of the 192.Fl k 193option (see below). 194Note that the 195.Fl b 196option has no effect unless key fields are specified. 197.It Fl t Ar char 198.Ar char 199is used as the field separator character. 200The initial 201.Ar char 202is not considered to be part of a field when determining 203key offsets (see below). 204Each occurrence of 205.Ar char 206is significant (for example, 207.Dq Ar charchar 208delimits an empty field). 209If 210.Fl t 211is not specified, the default field separator is a sequence of 212blank-space characters, and consecutive blank spaces do 213.Em not 214delimit an empty field; further, the initial blank space 215.Em is 216considered part of a field when determining key offsets. 217.It Fl R Ar char 218.Ar char 219is used as the record separator character. 220This should be used with discretion; 221.Fl R Ar \*[Lt]alphanumeric\*[Gt] 222usually produces undesirable results. 223The default record separator is newline. 224.It Xo 225.Fl k 226.Ar field1 Ns Op Li \&, Ns Ar field2 227.Xc 228Designates the starting position, 229.Ar field1 , 230and optional ending position, 231.Ar field2 , 232of a key field. 233The 234.Fl k 235option replaces the obsolescent options 236.Cm \(pl Ns Ar pos1 237and 238.Fl Ns Ar pos2 . 239.El 240.Pp 241The following operands are available: 242.Bl -tag -width Ar 243.It Ar file 244The pathname of a file to be sorted, merged, or checked. 245If no 246.Ar file 247operands are specified, or if 248a 249.Ar file 250operand is 251.Fl , 252the standard input is used. 253.El 254.Pp 255A field is defined as a minimal sequence of characters followed by a 256field separator or a newline character. 257By default, the first 258blank space of a sequence of blank spaces acts as the field separator. 259All blank spaces in a sequence of blank spaces are considered 260as part of the next field; for example, all blank spaces at 261the beginning of a line are considered to be part of the 262first field. 263.Pp 264Fields are specified 265by the 266.Fl k 267.Ar field1 Ns Op \&, Ns Ar field2 268argument. 269A missing 270.Ar field2 271argument defaults to the end of a line. 272.Pp 273The arguments 274.Ar field1 275and 276.Ar field2 277have the form 278.Ar m Ns Li \&. Ns Ar n 279and can be followed by one or more of the letters 280.Cm b , d , f , i , 281.Cm n , 282and 283.Cm r , 284which correspond to the options discussed above. 285A 286.Ar field1 287position specified by 288.Ar m Ns Li \&. Ns Ar n 289.Pq Ar m , n No \*[Gt] 0 290is interpreted as the 291.Ar n Ns th 292character in the 293.Ar m Ns th 294field. 295A missing 296.Li \&. Ns Ar n 297in 298.Ar field1 299means 300.Ql \&.1 , 301indicating the first character of the 302.Ar m Ns th 303field; if the 304.Fl b 305option is in effect, 306.Ar n 307is counted from the first non-blank character in the 308.Ar m Ns th 309field; 310.Ar m Ns Li \&.1b 311refers to the first non-blank character in the 312.Ar m Ns th 313field. 314.Pp 315A 316.Ar field2 317position specified by 318.Ar m Ns Li \&. Ns Ar n 319is interpreted as 320the 321.Ar n Ns th 322character (including separators) of the 323.Ar m Ns th 324field. 325A missing 326.Li \&. Ns Ar n 327indicates the last character of the 328.Ar m Ns th 329field; 330.Ar m 331= \&0 332designates the end of a line. 333Thus the option 334.Fl k 335.Sm off 336.Xo 337.Ar v Li \&. Ar x Li \&, 338.Ar w Li \&. Ar y 339.Xc 340.Sm on 341is synonymous with the obsolescent option 342.Sm off 343.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 344.Fl Ar w-\&1 Li \&. Ar y ; 345.Sm on 346when 347.Ar y 348is omitted, 349.Fl k 350.Sm off 351.Ar v Li \&. Ar x Li \&, Ar w 352.Sm on 353is synonymous with 354.Sm off 355.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 356.Fl Ar w+1 Li \&.0 . 357.Sm on 358The obsolescent 359.Cm \(pl Ns Ar pos1 360.Fl Ns Ar pos2 361option is still supported, except for 362.Fl Ns Ar w Ns Li \&.0b , 363which has no 364.Fl k 365equivalent. 366.Sh RETURN VALUES 367Sort exits with one of the following values: 368.Bl -tag -width flag -compact 369.It 0 370Normal behavior. 371.It 1 372On disorder (or non-uniqueness) with the 373.Fl c 374option 375.It 2 376An error occurred. 377.El 378.Sh ENVIRONMENT 379If the following environment variable exists, it is used by 380.Nm . 381.Bl -tag -width Ev 382.It Ev TMPDIR 383.Nm 384uses the contents of the 385.Ev TMPDIR 386environment variable as the path in which to store 387temporary files. 388.El 389.Sh FILES 390.Bl -tag -width outputNUMBER+some -compact 391.It Pa /tmp/sort.* 392Default temporary files. 393.It Ar output Ns NUMBER 394Temporary file which is used for output if 395.Ar output 396already exists. 397Once sorting is finished, this file replaces 398.Ar output 399(via 400.Xr link 2 401and 402.Xr unlink 2 ) . 403.El 404.Sh SEE ALSO 405.Xr comm 1 , 406.Xr join 1 , 407.Xr uniq 1 , 408.Xr qsort 3 , 409.Xr radixsort 3 410.Sh HISTORY 411A 412.Nm 413command appeared in 414.At v5 . 415This 416.Nm 417implementation appeared in 418.Bx 4.4 419and is used since 420.Nx 1.6 . 421.Sh BUGS 422To sort files larger than 60Mb, use 423.Nm 424.Fl H ; 425files larger than 704Mb must be sorted in smaller pieces, then merged. 426.Sh NOTES 427This 428.Nm 429has no limits on input line length (other than imposed by available 430memory) or any restrictions on bytes allowed within lines. 431.Pp 432To protect data 433.Nm 434.Fl o 435calls 436.Xr link 2 437and 438.Xr unlink 2 , 439and thus fails on protected directories. 440.Pp 441Input files should be text files. 442If file doesn't end with record separator (which is typically newline), the 443.Nm 444utility silently supplies one. 445.Pp 446The current 447.Nm 448uses lexicographic radix sorting, which requires 449that sort keys be kept in memory (as opposed to previous versions which used quick 450and merge sorts and did not.) 451Thus performance depends highly on efficient choice of sort keys, and the 452.Fl b 453option and the 454.Ar field2 455argument of the 456.Fl k 457option should be used whenever possible. 458Similarly, 459.Nm 460.Fl k1f 461is equivalent to 462.Nm 463.Fl f 464and may take twice as long. 465