1.\" $OpenBSD: awk.1,v 1.8 2000/11/10 05:10:21 aaron Exp $ 2.\" EX/EE is a Bd 3.Dd June 29, 1996 4.Dt AWK 1 5.Os 6.Sh NAME 7.Nm awk 8.Nd pattern-directed scanning and processing language 9.Sh SYNOPSIS 10.Nm awk 11.Op Fl F Ar fs 12.Op Fl v Ar var=value 13.Op Fl safe 14.Op Fl mr Ar n 15.Op Fl mf Ar n 16.Op Ar prog | Fl f Ar progfile 17.Ar 18.Nm nawk 19.Ar ... 20.Sh DESCRIPTION 21.Nm 22scans each input 23.Ar file 24for lines that match any of a set of patterns specified literally in 25.Ar prog 26or in one or more files 27specified as 28.Fl f Ar progfile . 29With each pattern 30there can be an associated action that will be performed 31when a line of a 32.Ar file 33matches the pattern. 34Each line is matched against the 35pattern portion of every pattern-action statement; 36the associated action is performed for each matched pattern. 37The file name 38.Sq Pa \- 39means the standard input. 40Any 41.Ar file 42of the form 43.Ar var=value 44is treated as an assignment, not a filename, 45and is executed at the time it would have been opened if it were a filename. 46The option 47.Fl v 48followed by 49.Ar var=value 50is an assignment to be done before 51.Ar prog 52is executed; 53any number of 54.Fl v 55options may be present. 56The 57.Fl F Ar fs 58option defines the input field separator to be the regular expression 59.Ar fs . 60The 61.Fl safe 62option disables file output 63.Po 64.Ic print Ic > , 65.Ic print Ic >> , 66.Pc 67process creation 68.Po 69.Ar cmd Ic \&| getline , 70.Ic print \&| , system 71.Pc 72and access to the environment 73.Pq Va ENVIRON . 74This 75is a first (and not very reliable) approximation to a 76.Dq safe 77version of 78.Nm awk . 79.Pp 80An input line is normally made up of fields separated by whitespace, 81or by regular expression 82.Va FS . 83The fields are denoted 84.Va $1 , $2 , ... , 85while 86.Va $0 87refers to the entire line. 88If 89.Va FS 90is null, the input line is split into one field per character. 91.Pp 92To compensate for inadequate implementation of storage management, 93the 94.Fl mr 95option can be used to set the maximum size of the input record, 96and the 97.Fl mf 98option to set the maximum number of fields. 99.Pp 100A pattern-action statement has the form 101.Pp 102.D1 Ar pattern Ic \&{ Ar action Ic \&} 103.Pp 104A missing 105.Ic \&{ Ar action Ic \&} 106means print the line; 107a missing pattern always matches. 108Pattern-action statements are separated by newlines or semicolons. 109.Pp 110An action is a sequence of statements. 111A statement can be one of the following: 112.Pp 113.Bd -unfilled -offset indent 114.Ic if ( Xo 115.Ar expression ) statement \& 116.Op Ic else Ar statement 117.Xc 118.Ic while ( Ar expression ) statement 119.Ic for ( Xo 120.Ar expression ; expression ; expression ) statement 121.Xc 122.Ic for ( Xo 123.Ar var Ic in Ar array ) statement 124.Xc 125.Ic do Ar statement Ic while ( Ar expression ) 126.Ic break 127.Ic continue 128.Ic { Oo Ar statement ... Oc Ic \& } 129.Ar expression Xo 130.No "# commonly" \& 131.Ar var Ic = Ar expression 132.Xc 133.Ic print Xo 134.Op Ar expression-list 135.Op Ic > Ns Ar expression 136.Xc 137.Ic printf Ar format Xo 138.Op Ar ... , expression-list 139.Op Ic > Ns Ar expression 140.Xc 141.Ic return Op Ar expression 142.Ic next Xo 143.No "# skip remaining patterns on this input line" 144.Xc 145.Ic nextfile Xo 146.No "# skip rest of this file, open next, start at top" 147.Xc 148.Ic delete Ar array Ns Xo 149.Ic \&[ Ns Ar expression Ns Ic \&] 150.No \& "# delete an array element" 151.Xc 152.Ic delete Ar array Xo 153.No "# delete all elements of array" 154.Xc 155.Ic exit Xo 156.Op Ar expression 157.No \& "# exit immediately; status is" Ar expression 158.Xc 159.Ed 160.Pp 161Statements are terminated by 162semicolons, newlines or right braces. 163An empty 164.Ar expression-list 165stands for 166.Ar $0 . 167String constants are quoted 168.Li \&"" , 169with the usual C escapes recognized within. 170Expressions take on string or numeric values as appropriate, 171and are built using the operators 172.Ic + \- * / % ^ 173(exponentiation), and concatenation (indicated by whitespace). 174The operators 175.Ic ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 176are also available in expressions. 177Variables may be scalars, array elements 178(denoted 179.Li x[i] ) 180or fields. 181Variables are initialized to the null string. 182Array subscripts may be any string, 183not necessarily numeric; 184this allows for a form of associative memory. 185Multiple subscripts such as 186.Li [i,j,k] 187are permitted; the constituents are concatenated, 188separated by the value of 189.Va SUBSEP . 190.Pp 191The 192.Ic print 193statement prints its arguments on the standard output 194(or on a file if 195.Ic > Ns Ar file 196or 197.Ic >> Ns Ar file 198is present or on a pipe if 199.Ic \&| Ar cmd 200is present), separated by the current output field separator, 201and terminated by the output record separator. 202.Ar file 203and 204.Ar cmd 205may be literal names or parenthesized expressions; 206identical string values in different statements denote 207the same open file. 208The 209.Ic printf 210statement formats its expression list according to the format 211(see 212.Xr printf 3 . 213The built-in function 214.Fn close expr 215closes the file or pipe 216.Fa expr . 217The built-in function 218.Fn fflush expr 219flushes any buffered output for the file or pipe 220.Fa expr . 221.Pp 222The mathematical functions 223.Fn exp , 224.Fn log , 225.Fn sqrt , 226.Fn sin , 227.Fn cos , 228and 229.Fn atan2 230are built in. 231Other built-in functions: 232.Pp 233.Bl -tag -width Fn 234.It Fn length 235the length of its argument 236taken as a string, 237or of 238.Va $0 239if no argument. 240.It Fn rand 241random number on (0,1) 242.It Fn srand 243sets seed for 244.Fn rand 245and returns the previous seed. 246.It Fn int 247truncates to an integer value. 248.It Fn substr s m n 249the 250.Fa n Ns No -character 251substring of 252.Fa s 253that begins at position 254.Fa m 255counted from 1. 256.It Fn index s t 257the position in 258.Fa s 259where the string 260.Fa t 261occurs, or 0 if it does not. 262.It Fn match s r 263the position in 264.Fa s 265where the regular expression 266.Fa r 267occurs, or 0 if it does not. 268The variables 269.Va RSTART 270and 271.Va RLENGTH 272are set to the position and length of the matched string. 273.It Fn split s a fs 274splits the string 275.Fa s 276into array elements 277.Va a[1] , a[2] , ... , a[n] 278and returns 279.Va n . 280The separation is done with the regular expression 281.Ar fs 282or with the field separator 283.Va FS 284if 285.Ar fs 286is not given. 287An empty string as field separator splits the string 288into one array element per character. 289.It Fn sub r t s 290substitutes 291.Fa t 292for the first occurrence of the regular expression 293.Fa r 294in the string 295.Fa s . 296If 297.Fa s 298is not given, 299.Va $0 300is used. 301.It Fn gsub r t s 302same as 303.Fn sub 304except that all occurrences of the regular expression 305are replaced; 306.Fn sub 307and 308.Fn gsub 309return the number of replacements. 310.It Fn sprintf fmt expr ... 311the string resulting from formatting 312.Fa expr , ... 313according to the 314.Xr printf 3 315format 316.Fa fmt . 317.It Fn system cmd 318executes 319.Fa cmd 320and returns its exit status. 321.It Fn tolower str 322returns a copy of 323.Fa str 324with all upper-case characters translated to their 325corresponding lower-case equivalents. 326.It Fn toupper str 327returns a copy of 328.Fa str 329with all lower-case characters translated to their 330corresponding upper-case equivalents. 331.El 332.Pp 333The 334.Sq function 335.Ic getline 336sets 337.Va $0 338to the next input record from the current input file; 339.Ic getline < Ar file 340sets 341.Va $0 342to the next record from 343.Ar file . 344.Ic getline Va x 345sets variable 346.Va x 347instead. 348Finally, 349.Ar cmd Ic \&| getline 350pipes the output of 351.Ar cmd 352into 353.Ic getline ; 354each call of 355.Ic getline 356returns the next line of output from 357.Ar cmd . 358In all cases, 359.Ic getline 360returns 1 for a successful input, 3610 for end of file, and \-1 for an error. 362.Pp 363Patterns are arbitrary Boolean combinations 364(with 365.Ic "! || &&" ) 366of regular expressions and 367relational expressions. 368Regular expressions are as in 369.Xr egrep 1 . 370Isolated regular expressions 371in a pattern apply to the entire line. 372Regular expressions may also occur in 373relational expressions, using the operators 374.Ic ~ 375and 376.Ic !~ . 377.Ic / Ns Ar re Ns Ic / 378is a constant regular expression; 379any string (constant or variable) may be used 380as a regular expression, except in the position of an isolated regular expression 381in a pattern. 382.Pp 383A pattern may consist of two patterns separated by a comma; 384in this case, the action is performed for all lines 385from an occurrence of the first pattern 386though an occurrence of the second. 387.Pp 388A relational expression is one of the following: 389.Bd -unfilled -offset indent 390.Ar expression matchop regular-expression 391.Ar expression relop expression 392.Ar expression Ic in Ar array-name 393.Ic \&( Ns Xo 394.Ar expr , expr , \&... Ns Ic \&) in 395.Ar \& array-name 396.Xc 397.Ed 398where a 399.Ar relop 400is any of the six relational operators in C, and a 401.Ar matchop 402is either 403.Ic ~ 404(matches) 405or 406.Ic !~ 407(does not match). 408A conditional is an arithmetic expression, 409a relational expression, 410or a Boolean combination 411of these. 412.Pp 413The special patterns 414.Ic BEGIN 415and 416.Ic END 417may be used to capture control before the first input line is read 418and after the last. 419.Ic BEGIN 420and 421.Ic END 422do not combine with other patterns. 423.Pp 424Variable names with special meanings: 425.Pp 426.Bl -tag -width Va -compact 427.It Va CONVFMT 428conversion format used when converting numbers 429(default 430.Qq Li %.6g ) 431.It Va FS 432regular expression used to separate fields; also settable 433by option 434.Fl fs . 435.It Va NF 436number of fields in the current record 437.It Va NR 438ordinal number of the current record 439.It Va FNR 440ordinal number of the current record in the current file 441.It Va FILENAME 442the name of the current input file 443.It Va RS 444input record separator (default newline) 445.It Va OFS 446output field separator (default blank) 447.It Va ORS 448output record separator (default newline) 449.It Va OFMT 450output format for numbers (default 451.Qq Li %.6g ) 452.It Va SUBSEP 453separates multiple subscripts (default 034) 454.It Va ARGC 455argument count, assignable 456.It Va ARGV 457argument array, assignable; 458non-null members are taken as filenames 459.It Va ENVIRON 460array of environment variables; subscripts are names. 461.El 462.Pp 463Functions may be defined (at the position of a pattern-action statement) 464thusly: 465.Pp 466.Dl function foo(a, b, c) { ...; return x } 467.Pp 468Parameters are passed by value if scalar and by reference if array name; 469functions may be called recursively. 470Parameters are local to the function; all other variables are global. 471Thus local variables may be created by providing excess parameters in 472the function definition. 473.Sh EXAMPLES 474.Dl length($0) > 72 475Print lines longer than 72 characters. 476.Pp 477.Dl { print $2, $1 } 478Print first two fields in opposite order. 479.Pp 480.Bd -literal -offset indent 481BEGIN { FS = ",[ \et]*|[ \et]+" } 482 { print $2, $1 } 483.Ed 484Same, with input fields separated by comma and/or blanks and tabs. 485.Pp 486.Bd -literal -offset indent 487{ s += $1 } 488END { print "sum is", s, " average is", s/NR } 489.Ed 490Add up first column, print sum and average. 491.Pp 492.Dl /start/, /stop/ 493Print all lines between start/stop pairs. 494.Pp 495.Bd -literal -offset indent 496BEGIN { # Simulate echo(1) 497 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 498 printf "\en" 499 exit } 500.Ed 501.Sh SEE ALSO 502.Xr lex 1 , 503.Xr sed 1 504.Rs 505.%A A. V. Aho 506.%A B. W. Kernighan 507.%A P. J. Weinberger 508.%T The AWK Programming Language 509.%I Addison-Wesley 510.%D 1988 511.%O ISBN 0-201-07981-X 512.Re 513.Sh HISTORY 514AT&T 515.Nm 516by B. W. Kernighan was updated for 517.Bx 4.4 518and again in 1996. 519.Sh BUGS 520There are no explicit conversions between numbers and strings. 521To force an expression to be treated as a number add 0 to it; 522to force it to be treated as a string concatenate 523.Li \&"" 524to it. 525.Pp 526The scope rules for variables in functions are a botch; 527the syntax is worse. 528