1.\" $NetBSD: sort.1,v 1.38 2017/07/03 21:34:21 wiz Exp $ 2.\" 3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Ben Harris and Jaromir Dolecek. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" Copyright (c) 1991, 1993 31.\" The Regents of the University of California. All rights reserved. 32.\" 33.\" This code is derived from software contributed to Berkeley by 34.\" the Institute of Electrical and Electronics Engineers, Inc. 35.\" 36.\" Redistribution and use in source and binary forms, with or without 37.\" modification, are permitted provided that the following conditions 38.\" are met: 39.\" 1. Redistributions of source code must retain the above copyright 40.\" notice, this list of conditions and the following disclaimer. 41.\" 2. Redistributions in binary form must reproduce the above copyright 42.\" notice, this list of conditions and the following disclaimer in the 43.\" documentation and/or other materials provided with the distribution. 44.\" 3. Neither the name of the University nor the names of its contributors 45.\" may be used to endorse or promote products derived from this software 46.\" without specific prior written permission. 47.\" 48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 51.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 58.\" SUCH DAMAGE. 59.\" 60.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 61.\" 62.Dd June 1, 2016 63.Dt SORT 1 64.Os 65.Sh NAME 66.Nm sort 67.Nd sort or merge text files 68.Sh SYNOPSIS 69.Nm 70.Op Fl bdfHilmnrSsu 71.Oo 72.Fl k 73.Ar kstart Ns Op Li \&, Ns Ar kend 74.Oc 75.Op Fl o Ar output 76.Op Fl R Ar char 77.Op Fl T Ar dir 78.Op Fl t Ar char 79.Op Ar 80.Nm 81.Fl C Ns | Ns Fl c 82.Op Fl bdfilnru 83.Oo 84.Fl k 85.Ar kstart Ns Op Li \&, Ns Ar kend 86.Op Fl t Ar char 87.Oc 88.Op Fl R Ar char 89.Op Ar file 90.Sh DESCRIPTION 91The 92.Nm 93utility sorts text files by lines. 94Comparisons are based on one or more sort keys extracted 95from each line of input, and are performed lexicographically. 96By default, if keys are not given, 97.Nm 98regards each input line as a single field. 99.Pp 100The following options are available: 101.Bl -tag -width Fl 102.It Fl C 103Identical to 104.Fl c 105without the error messages in the case of unsorted input. 106.It Fl c 107Check that the single input file is sorted. 108If the file is not sorted, 109.Nm 110produces the appropriate error messages and exits with code 1; otherwise, 111.Nm 112returns 0. 113.Nm 114.Fl c 115produces no output. 116See also 117.Fl u . 118.It Fl H 119Ignored for compatibility with earlier versions of 120.Nm . 121.It Fl m 122Merge only; the input files are assumed to be pre-sorted. 123.It Fl o Ar output 124The argument given is the name of an 125.Ar output 126file to be used instead of the standard output. 127This file can be the same as one of the input files. 128.It Fl S 129Don't use stable sort. 130Default is to use stable sort. 131.It Fl s 132Use stable sort, keeps records with equal keys in their original order. 133This is the default. 134Provided for compatibility with other 135.Nm 136implementations only. 137.It Fl T Ar dir 138Use 139.Ar dir 140as the directory for temporary files. 141The default is the value specified in the environment variable 142.Ev TMPDIR or 143.Pa /tmp 144if 145.Ev TMPDIR 146is not defined. 147.It Fl u 148Unique: suppress all but one in each set of lines having equal keys. 149If used with the 150.Fl c 151option, check that there are no lines with duplicate keys. 152.El 153.Pp 154The following options, 155which should be given before any 156.Fl k 157options, override the default ordering rules. 158When ordering options appear independent of, 159and before, key field specifications, 160the requested field ordering rules are 161applied globally to all sort keys. 162When attached to a specific key (see 163.Fl k ) , 164the ordering options override 165all global ordering options for that key. 166.Bl -tag -width Fl 167.It Fl d 168Only blank space and alphanumeric characters 169.\" according 170.\" to the current setting of LC_CTYPE 171are used 172in making comparisons. 173.It Fl f 174Considers all lowercase characters that have uppercase 175equivalents to be the same for purposes of comparison. 176.It Fl i 177Ignore all non-printable characters. 178.It Fl l 179Sort by the string length of the field, not by the field itself. 180.It Fl n 181An initial numeric string, consisting of optional blank space, optional 182plus or minus sign, and zero or more digits (including decimal point) 183.\" with 184.\" optional radix character and thousands 185.\" separator 186.\" (as defined in the current locale), 187is sorted by arithmetic value. 188(The 189.Fl n 190option no longer implies the 191.Fl b 192option.) 193.It Fl r 194Reverse the sense of comparisons. 195.El 196.Pp 197The treatment of field separators can be altered using these options: 198.Bl -tag -width Fl 199.It Fl b 200Ignores leading blank space when determining the start 201and end of a restricted sort key. 202A 203.Fl b 204option specified before the first 205.Fl k 206option applies globally to all 207.Fl k 208options. 209Otherwise, the 210.Fl b 211option can be attached independently to each 212.Ar field 213argument of the 214.Fl k 215option (see below). 216Note that the 217.Fl b 218option has no effect unless key fields are specified. 219.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend 220Designates the starting position, 221.Ar kstart , 222and optional ending position, 223.Ar kend , 224of a key field. 225The 226.Fl k 227option replaces the obsolescent options 228.Cm \(pl Ns Ar pos1 229and 230.Fl Ns Ar pos2 . 231.It Fl R Ar char 232.Ar char 233is used as the record separator character. 234This should be used with discretion; 235.Fl R Aq Ar alphanumeric 236usually produces undesirable results. 237If char is not a single character, then it 238specifies the value of the desired record 239separator as an integer specified in any 240of the normal NNN, 0ooo, or 0xXXX ways, 241or as an octal value preceded by \e. 242Caution: do not attempt to specify Ctl-A 243as 244.Dq -R 1 245which will not do what was intended at all! 246The default record separator is newline. 247.It Fl t Ar char 248.Ar char 249is used as the field separator character. 250The initial 251.Ar char 252is not considered to be part of a field when determining 253key offsets (see below). 254Each occurrence of 255.Ar char 256is significant (for example, 257.Dq Ar charchar 258delimits an empty field). 259If 260.Fl t 261is not specified, the default field separator is a sequence of 262blank-space characters, and consecutive blank spaces do 263.Em not 264delimit an empty field; further, the initial blank space 265.Em is 266considered part of a field when determining key offsets. 267.El 268.Pp 269The following operands are available: 270.Bl -tag -width Ar 271.It Ar file 272The pathname of a file to be sorted, merged, or checked. 273If no 274.Ar file 275operands are specified, or if 276a 277.Ar file 278operand is 279.Fl , 280the standard input is used. 281.El 282.Pp 283A field is defined as a minimal sequence of characters followed by a 284field separator or a newline character. 285By default, the first 286blank space of a sequence of blank spaces acts as the field separator. 287All blank spaces in a sequence of blank spaces are considered 288as part of the next field; for example, all blank spaces at 289the beginning of a line are considered to be part of the 290first field. 291.Pp 292Fields are specified 293by the 294.Fl k 295.Ar kstart Ns Op \&, Ns Ar kend 296argument. 297A missing 298.Ar kend 299argument defaults to the end of a line. 300.Pp 301The arguments 302.Ar kstart 303and 304.Ar kend 305have the form 306.Ar m Ns Li \&. Ns Ar n 307and can be followed by one or more of the letters 308.Cm b , d , f , i , 309.Cm l , n , 310and 311.Cm r , 312which correspond to the options discussed above. 313A 314.Ar kstart 315position specified by 316.Ar m Ns Li \&. Ns Ar n 317.Pq Ar m , n No > 0 318is interpreted as the 319.Ar n Ns th 320character in the 321.Ar m Ns th 322field. 323A missing 324.Li \&. Ns Ar n 325in 326.Ar kstart 327means 328.Ql \&.1 , 329indicating the first character of the 330.Ar m Ns th 331field; if the 332.Fl b 333option is in effect, 334.Ar n 335is counted from the first non-blank character in the 336.Ar m Ns th 337field; 338.Ar m Ns Li \&.1b 339refers to the first non-blank character in the 340.Ar m Ns th 341field. 342.Pp 343A 344.Ar kend 345position specified by 346.Ar m Ns Li \&. Ns Ar n 347is interpreted as 348the 349.Ar n Ns th 350character (including separators) of the 351.Ar m Ns th 352field. 353A missing 354.Li \&. Ns Ar n 355indicates the last character of the 356.Ar m Ns th 357field; 358.Ar m 359= \&0 360designates the end of a line. 361Thus the option 362.Fl k 363.Sm off 364.Xo 365.Ar v Li \&. Ar x Li \&, 366.Ar w Li \&. Ar y 367.Xc 368.Sm on 369is synonymous with the obsolescent option 370.Sm off 371.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 372.Fl Ar w-\&1 Li \&. Ar y ; 373.Sm on 374when 375.Ar y 376is omitted, 377.Fl k 378.Sm off 379.Ar v Li \&. Ar x Li \&, Ar w 380.Sm on 381is synonymous with 382.Sm off 383.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 384.Fl Ar w+1 Li \&.0 . 385.Sm on 386The obsolescent 387.Cm \(pl Ns Ar pos1 388.Fl Ns Ar pos2 389option is still supported, except for 390.Fl Ns Ar w Ns Li \&.0b , 391which has no 392.Fl k 393equivalent. 394.Pp 395.Nm 396compares records by comparing the key fields selected by 397.Fl k 398arguments, 399from first given to last, 400until discovering a difference. 401If there are no 402.Fl k 403arguments, the whole record is treated as a single key. 404After exhausting the 405.Fl k 406arguments, if no difference has been found, 407then the result depends upon the 408.Fl u 409and 410.Fl S 411option settings. 412With 413.Fl u 414the records are considered identical, and one is supressed. 415Otherwise with 416.Fl s 417set (default) the records are left in their original order, 418or with 419.Fl S 420(posix mode) the whole record is considered as a tie breaker. 421.\" 422.\" If you fail to understand why it doesn't matter which order 423.\" the records are output when they are wholly identical, there 424.\" is nothing that this man page can say that wll help! 425.\" 426.Sh ENVIRONMENT 427If the following environment variable exists, it is used by 428.Nm . 429.Bl -tag -width Ev 430.It Ev TMPDIR 431.Nm 432uses the contents of the 433.Ev TMPDIR 434environment variable as the path in which to store 435temporary files. 436.El 437.Sh FILES 438.Bl -tag -width outputNUMBER+some -compact 439.It Pa /tmp/sort.* 440Default temporary files. 441.It Ar output Ns NUMBER 442Temporary file which is used for output if 443.Ar output 444already exists. 445Once sorting is finished, this file replaces 446.Ar output 447(via 448.Xr link 2 449and 450.Xr unlink 2 ) . 451.El 452.Sh EXIT STATUS 453Sort exits with one of the following values: 454.Bl -tag -width flag -compact 455.It 0 456Normal behavior. 457.It 1 458On disorder (or non-uniqueness) with the 459.Fl c 460(or 461.Fl C ) 462option. 463.It 2 464An error occurred. 465.El 466.Sh SEE ALSO 467.Xr comm 1 , 468.Xr join 1 , 469.Xr uniq 1 , 470.Xr qsort 3 , 471.Xr radixsort 3 472.Sh HISTORY 473A 474.Nm 475command appeared in 476.At v5 . 477This 478.Nm 479implementation appeared in 480.Bx 4.4 481and is used since 482.Nx 1.6 . 483.Sh BUGS 484Posix requires the locale's thousands separator be ignored in numbers. 485It may be faster to sort very large files in pieces and then explicitly 486merge them. 487.Sh NOTES 488This 489.Nm 490has no limits on input line length (other than imposed by available 491memory) or any restrictions on bytes allowed within lines. 492.Pp 493To protect data 494.Nm 495.Fl o 496calls 497.Xr link 2 498and 499.Xr unlink 2 , 500and thus fails on protected directories. 501.Pp 502Input files should be text files. 503If file doesn't end with record separator (which is typically newline), the 504.Nm 505utility silently supplies one. 506.Pp 507The current 508.Nm 509uses lexicographic radix sorting, which requires 510that sort keys be kept in memory (as opposed to previous versions which used quick 511and merge sorts and did not.) 512Thus performance depends highly on efficient choice of sort keys, and the 513.Fl b 514option and the 515.Ar kend 516argument of the 517.Fl k 518option should be used whenever possible. 519Similarly, 520.Nm 521.Fl k1f 522is equivalent to 523.Nm 524.Fl f 525and may take twice as long. 526