1.\" $OpenBSD: awk.1,v 1.15 2003/11/24 10:58:08 jmc Exp $ 2.\" EX/EE is a Bd 3.\" 4.\" Copyright (C) Lucent Technologies 1997 5.\" All Rights Reserved 6.\" 7.\" Permission to use, copy, modify, and distribute this software and 8.\" its documentation for any purpose and without fee is hereby 9.\" granted, provided that the above copyright notice appear in all 10.\" copies and that both that the copyright notice and this 11.\" permission notice and warranty disclaimer appear in supporting 12.\" documentation, and that the name Lucent Technologies or any of 13.\" its entities not be used in advertising or publicity pertaining 14.\" to distribution of the software without specific, written prior 15.\" permission. 16.\" 17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 24.\" THIS SOFTWARE. 25.\" 26.Dd June 29, 1996 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl F Ar fs 35.Op Fl v Ar var=value 36.Op Fl safe 37.Op Fl mr Ar n 38.Op Fl mf Ar n 39.Op Ar prog | Fl f Ar progfile 40.Ar 41.Nm nawk 42.Ar ... 43.Sh DESCRIPTION 44.Nm 45scans each input 46.Ar file 47for lines that match any of a set of patterns specified literally in 48.Ar prog 49or in one or more files 50specified as 51.Fl f Ar progfile . 52With each pattern 53there can be an associated action that will be performed 54when a line of a 55.Ar file 56matches the pattern. 57Each line is matched against the 58pattern portion of every pattern-action statement; 59the associated action is performed for each matched pattern. 60The file name 61.Sq Pa \- 62means the standard input. 63Any 64.Ar file 65of the form 66.Ar var=value 67is treated as an assignment, not a filename, 68and is executed at the time it would have been opened if it were a filename. 69The option 70.Fl v 71followed by 72.Ar var=value 73is an assignment to be done before 74.Ar prog 75is executed; 76any number of 77.Fl v 78options may be present. 79The 80.Fl F Ar fs 81option defines the input field separator to be the regular expression 82.Ar fs . 83The 84.Fl safe 85option disables file output 86.Po 87.Ic print Ic > , 88.Ic print Ic >> , 89.Pc 90process creation 91.Po 92.Ar cmd Ic \&| getline , 93.Ic print \&| , system 94.Pc 95and access to the environment 96.Pq Va ENVIRON . 97This 98is a first (and not very reliable) approximation to a 99.Dq safe 100version of 101.Nm awk . 102.Pp 103An input line is normally made up of fields separated by whitespace, 104or by regular expression 105.Va FS . 106The fields are denoted 107.Va $1 , $2 , ... , 108while 109.Va $0 110refers to the entire line. 111If 112.Va FS 113is null, the input line is split into one field per character. 114.Pp 115To compensate for inadequate implementation of storage management, 116the 117.Fl mr 118option can be used to set the maximum size of the input record, 119and the 120.Fl mf 121option to set the maximum number of fields. 122.Pp 123A pattern-action statement has the form 124.Pp 125.D1 Ar pattern Ic \&{ Ar action Ic \&} 126.Pp 127A missing 128.Ic \&{ Ar action Ic \&} 129means print the line; 130a missing pattern always matches. 131Pattern-action statements are separated by newlines or semicolons. 132.Pp 133An action is a sequence of statements. 134A statement can be one of the following: 135.Bd -unfilled -offset indent 136.Ic if ( Xo 137.Ar expression ) statement \& 138.Op Ic else Ar statement 139.Xc 140.Ic while ( Ar expression ) statement 141.Ic for ( Xo 142.Ar expression ; expression ; expression ) statement 143.Xc 144.Ic for ( Xo 145.Ar var Ic in Ar array ) statement 146.Xc 147.Ic do Ar statement Ic while ( Ar expression ) 148.Ic break 149.Ic continue 150.Ic { Oo Ar statement ... Oc Ic \& } 151.Ar expression Xo 152.No "# commonly" \& 153.Ar var Ic = Ar expression 154.Xc 155.Ic print Xo 156.Op Ar expression-list 157.Op Ic > Ns Ar expression 158.Xc 159.Ic printf Ar format Xo 160.Op Ar ... , expression-list 161.Op Ic > Ns Ar expression 162.Xc 163.Ic return Op Ar expression 164.Ic next Xo 165.No "# skip remaining patterns on this input line" 166.Xc 167.Ic nextfile Xo 168.No "# skip rest of this file, open next, start at top" 169.Xc 170.Ic delete Ar array Ns Xo 171.Ic \&[ Ns Ar expression Ns Ic \&] 172.No \& "# delete an array element" 173.Xc 174.Ic delete Ar array Xo 175.No "# delete all elements of array" 176.Xc 177.Ic exit Xo 178.Op Ar expression 179.No \& "# exit immediately; status is" Ar expression 180.Xc 181.Ed 182.Pp 183Statements are terminated by 184semicolons, newlines or right braces. 185An empty 186.Ar expression-list 187stands for 188.Ar $0 . 189String constants are quoted 190.Li \&"" , 191with the usual C escapes recognized within. 192Expressions take on string or numeric values as appropriate, 193and are built using the operators 194.Ic + \- * / % ^ 195(exponentiation), and concatenation (indicated by whitespace). 196The operators 197.Ic \&! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 198are also available in expressions. 199Variables may be scalars, array elements 200(denoted 201.Li x[i] ) 202or fields. 203Variables are initialized to the null string. 204Array subscripts may be any string, 205not necessarily numeric; 206this allows for a form of associative memory. 207Multiple subscripts such as 208.Li [i,j,k] 209are permitted; the constituents are concatenated, 210separated by the value of 211.Va SUBSEP . 212.Pp 213The 214.Ic print 215statement prints its arguments on the standard output 216(or on a file if 217.Ic > Ns Ar file 218or 219.Ic >> Ns Ar file 220is present or on a pipe if 221.Ic \&| Ar cmd 222is present), separated by the current output field separator, 223and terminated by the output record separator. 224.Ar file 225and 226.Ar cmd 227may be literal names or parenthesized expressions; 228identical string values in different statements denote 229the same open file. 230The 231.Ic printf 232statement formats its expression list according to the format 233(see 234.Xr printf 3 ) . 235The built-in function 236.Fn close expr 237closes the file or pipe 238.Fa expr . 239The built-in function 240.Fn fflush expr 241flushes any buffered output for the file or pipe 242.Fa expr . 243.Pp 244The mathematical functions 245.Fn exp , 246.Fn log , 247.Fn sqrt , 248.Fn sin , 249.Fn cos , 250and 251.Fn atan2 252are built in. 253Other built-in functions: 254.Bl -tag -width Fn 255.It Fn length 256the length of its argument 257taken as a string, 258or of 259.Va $0 260if no argument. 261.It Fn rand 262random number on (0,1) 263.It Fn srand 264sets seed for 265.Fn rand 266and returns the previous seed. 267.It Fn int 268truncates to an integer value. 269.It Fn substr s m n 270the 271.Fa n Ns No -character 272substring of 273.Fa s 274that begins at position 275.Fa m 276counted from 1. 277.It Fn index s t 278the position in 279.Fa s 280where the string 281.Fa t 282occurs, or 0 if it does not. 283.It Fn match s r 284the position in 285.Fa s 286where the regular expression 287.Fa r 288occurs, or 0 if it does not. 289The variables 290.Va RSTART 291and 292.Va RLENGTH 293are set to the position and length of the matched string. 294.It Fn split s a fs 295splits the string 296.Fa s 297into array elements 298.Va a[1] , a[2] , ... , a[n] 299and returns 300.Va n . 301The separation is done with the regular expression 302.Ar fs 303or with the field separator 304.Va FS 305if 306.Ar fs 307is not given. 308An empty string as field separator splits the string 309into one array element per character. 310.It Fn sub r t s 311substitutes 312.Fa t 313for the first occurrence of the regular expression 314.Fa r 315in the string 316.Fa s . 317If 318.Fa s 319is not given, 320.Va $0 321is used. 322.It Fn gsub r t s 323same as 324.Fn sub 325except that all occurrences of the regular expression 326are replaced; 327.Fn sub 328and 329.Fn gsub 330return the number of replacements. 331.It Fn sprintf fmt expr ... 332the string resulting from formatting 333.Fa expr , ... 334according to the 335.Xr printf 3 336format 337.Fa fmt . 338.It Fn system cmd 339executes 340.Fa cmd 341and returns its exit status. 342.It Fn tolower str 343returns a copy of 344.Fa str 345with all upper-case characters translated to their 346corresponding lower-case equivalents. 347.It Fn toupper str 348returns a copy of 349.Fa str 350with all lower-case characters translated to their 351corresponding upper-case equivalents. 352.El 353.Pp 354The 355.Sq function 356.Ic getline 357sets 358.Va $0 359to the next input record from the current input file; 360.Ic getline < Ar file 361sets 362.Va $0 363to the next record from 364.Ar file . 365.Ic getline Va x 366sets variable 367.Va x 368instead. 369Finally, 370.Ar cmd Ic \&| getline 371pipes the output of 372.Ar cmd 373into 374.Ic getline ; 375each call of 376.Ic getline 377returns the next line of output from 378.Ar cmd . 379In all cases, 380.Ic getline 381returns 1 for a successful input, 3820 for end of file, and \-1 for an error. 383.Pp 384Patterns are arbitrary Boolean combinations 385(with 386.Ic "\&! || &&" ) 387of regular expressions and 388relational expressions. 389Regular expressions are as in 390.Xr egrep 1 . 391Isolated regular expressions 392in a pattern apply to the entire line. 393Regular expressions may also occur in 394relational expressions, using the operators 395.Ic ~ 396and 397.Ic !~ . 398.Ic / Ns Ar re Ns Ic / 399is a constant regular expression; 400any string (constant or variable) may be used 401as a regular expression, except in the position of an isolated regular expression 402in a pattern. 403.Pp 404A pattern may consist of two patterns separated by a comma; 405in this case, the action is performed for all lines 406from an occurrence of the first pattern 407through an occurrence of the second. 408.Pp 409A relational expression is one of the following: 410.Bd -unfilled -offset indent 411.Ar expression matchop regular-expression 412.Ar expression relop expression 413.Ar expression Ic in Ar array-name 414.Ic \&( Ns Xo 415.Ar expr , expr , \&... Ns Ic \&) in 416.Ar \& array-name 417.Xc 418.Ed 419.Pp 420where a 421.Ar relop 422is any of the six relational operators in C, and a 423.Ar matchop 424is either 425.Ic ~ 426(matches) 427or 428.Ic !~ 429(does not match). 430A conditional is an arithmetic expression, 431a relational expression, 432or a Boolean combination 433of these. 434.Pp 435The special patterns 436.Ic BEGIN 437and 438.Ic END 439may be used to capture control before the first input line is read 440and after the last. 441.Ic BEGIN 442and 443.Ic END 444do not combine with other patterns. 445.Pp 446Variable names with special meanings: 447.Pp 448.Bl -tag -width Va -compact 449.It Va CONVFMT 450conversion format used when converting numbers 451(default 452.Qq Li %.6g ) 453.It Va FS 454regular expression used to separate fields; also settable 455by option 456.Fl F Ar fs . 457.It Va NF 458number of fields in the current record 459.It Va NR 460ordinal number of the current record 461.It Va FNR 462ordinal number of the current record in the current file 463.It Va FILENAME 464the name of the current input file 465.It Va RS 466input record separator (default newline) 467.It Va OFS 468output field separator (default blank) 469.It Va ORS 470output record separator (default newline) 471.It Va OFMT 472output format for numbers (default 473.Qq Li %.6g ) 474.It Va SUBSEP 475separates multiple subscripts (default 034) 476.It Va ARGC 477argument count, assignable 478.It Va ARGV 479argument array, assignable; 480non-null members are taken as filenames 481.It Va ENVIRON 482array of environment variables; subscripts are names. 483.El 484.Pp 485Functions may be defined (at the position of a pattern-action statement) 486thusly: 487.Pp 488.Dl function foo(a, b, c) { ...; return x } 489.Pp 490Parameters are passed by value if scalar and by reference if array name; 491functions may be called recursively. 492Parameters are local to the function; all other variables are global. 493Thus local variables may be created by providing excess parameters in 494the function definition. 495.Sh EXAMPLES 496.Dl length($0) > 72 497Print lines longer than 72 characters. 498.Pp 499.Dl { print $2, $1 } 500Print first two fields in opposite order. 501.Bd -literal -offset indent 502BEGIN { FS = ",[ \et]*|[ \et]+" } 503 { print $2, $1 } 504.Ed 505Same, with input fields separated by comma and/or blanks and tabs. 506.Bd -literal -offset indent 507{ s += $1 } 508END { print "sum is", s, " average is", s/NR } 509.Ed 510Add up first column, print sum and average. 511.Pp 512.Dl /start/, /stop/ 513Print all lines between start/stop pairs. 514.Bd -literal -offset indent 515BEGIN { # Simulate echo(1) 516 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 517 printf "\en" 518 exit } 519.Ed 520.Sh SEE ALSO 521.Xr lex 1 , 522.Xr sed 1 523.Rs 524.%A A. V. Aho 525.%A B. W. Kernighan 526.%A P. J. Weinberger 527.%T The AWK Programming Language 528.%I Addison-Wesley 529.%D 1988 530.%O ISBN 0-201-07981-X 531.Re 532.Sh HISTORY 533An 534.Nm 535utility appeared in 536.At v7 . 537.Sh BUGS 538There are no explicit conversions between numbers and strings. 539To force an expression to be treated as a number add 0 to it; 540to force it to be treated as a string concatenate 541.Li \&"" 542to it. 543.Pp 544The scope rules for variables in functions are a botch; 545the syntax is worse. 546