1.\" $NetBSD: awk.1,v 1.4 2015/04/06 14:36:41 wiz Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.Dd April 6, 2015 26.Dt AWK 1 27.Os 28.Sh NAME 29.Nm awk 30.Nd pattern-directed scanning and processing language 31.Sh SYNOPSIS 32.Nm 33.Op Fl F Ar fs 34.Op Fl v Ar var=value 35.Op Fl safe 36.Op Fl d Ns Op Ar N 37.Op Ar prog | Fl f Ar filename 38.Ar 39.Nm 40.Fl version 41.Sh DESCRIPTION 42.Nm 43is the Bell Labs' implementation of the AWK programming language as 44described in the 45.Em The AWK Programming Language 46by 47A. V. Aho, B. W. Kernighan, and P. J. Weinberger. 48.Pp 49.Nm 50scans each input 51.Ar file 52for lines that match any of a set of patterns specified literally in 53.Ar prog 54or in one or more files 55specified as 56.Fl f Ar filename . 57With each pattern 58there can be an associated action that will be performed 59when a line of a 60.Ar file 61matches the pattern. 62Each line is matched against the 63pattern portion of every pattern-action statement; 64the associated action is performed for each matched pattern. 65The file name 66.Ar - 67means the standard input. 68Any 69.Ar file 70of the form 71.Ar var=value 72is treated as an assignment, not a filename, 73and is executed at the time it would have been opened if it were a filename. 74.Pp 75The options are as follows: 76.Bl -tag -width indent 77.It Fl d Ns Op Ar N 78Set debug level to specified number 79.Ar N . 80If the number is omitted, debug level is set to 1. 81.It Fl f Ar filename 82Read the AWK program source from specified file 83.Ar filename , 84instead of the first command line argument. 85Multiple 86.Fl f 87options may be specified. 88.It Fl F Ar fs 89Set the input field separator 90.Va FS 91to the regular expression 92.Ar fs . 93.It Fl mr Ar NNN , Fl mf Ar NNN 94Obsolete, no longer needed options. 95Set limit on maximum record or 96fields number. 97.It Fl safe 98Potentially unsafe functions such as 99.Fn system 100make the program abort (with a warning message). 101.It Fl v Ar var Ns = Ns Ar value 102Assign the value 103.Ar value 104to the variable 105.Va var 106before 107.Ar prog 108is executed. 109Any number of 110.Fl v 111options may be present. 112.It Fl version 113Print 114.Nm 115version on standard output and exit. 116.El 117.Pp 118An input line is normally made up of fields separated by white space, 119or by regular expression 120.Va FS . 121The fields are denoted 122.Va $1 , 123.Va $2 , 124\&..., while 125.Va $0 126refers to the entire line. 127If 128.Va FS 129is null, the input line is split into one field per character. 130.Pp 131A pattern-action statement has the form 132.Lp 133.Dl pattern \&{ action \&} 134.Lp 135A missing \&{ action \&} 136means print the line; 137a missing pattern always matches. 138Pattern-action statements are separated by newlines or semicolons. 139.Pp 140An action is a sequence of statements. 141Statements are terminated by 142semicolons, newlines or right braces. 143An empty 144.Ar expression-list 145stands for 146.Va $0 . 147String constants are quoted 148.Em \&"\ \&" , 149with the usual C escapes recognized within. 150Expressions take on string or numeric values as appropriate, 151and are built using the 152.Sx Operators 153(see next subsection). 154Variables may be scalars, array elements 155(denoted 156.Va x[i] ) 157or fields. 158Variables are initialized to the null string. 159Array subscripts may be any string, 160not necessarily numeric; 161this allows for a form of associative memory. 162Multiple subscripts such as 163.Va [i,j,k] 164are permitted; the constituents are concatenated, 165separated by the value of 166.Va SUBSEP . 167.Ss Operators 168.Nm 169operators, in order of decreasing precedence, are: 170.Pp 171.Bl -tag -width ident -compact 172.It Ic (...) 173Grouping 174.It Ic $ 175Field reference 176.It Ic ++ -- 177Increment and decrement, can be used either as postfix or prefix. 178.It Ic ^ 179Exponentiation (the 180.Ic ** 181form is also supported, and 182.Ic **= 183for the assignment operator). 184.It + - \&! 185Unary plus, unary minus and logical negation. 186.It * / % 187Multiplication, division and modulus. 188.It + - 189Addition and subtraction. 190.It Ar space 191String concatenation. 192.It Ic \*[Lt] \*[Gt] 193.It Ic \*[Le] \*[Ge] 194.It Ic != == 195Regular relational operators 196.It Ic ~ !~ 197Regular expression match and not match 198.It Ic in 199Array membership 200.It Ic "\*[Am]\*[Am]" 201Logical AND 202.It Ic "||" 203Logical OR 204.It Ic ?: 205C conditional expression. 206This is used as 207.Ar expr1 Ic \&? Ar expr2 Ic \&: Ar expr3 No . 208If 209.Ar expr1 210is true, the result value is 211.Ar expr2 , 212otherwise it is 213.Ar expr3 . 214Only one of 215.Ar expr2 216and 217.Ar expr3 218is evaluated. 219.It Ic = += -= 220.It Ic *= /= %= ^= 221Assignment and Operator-Assignment 222.El 223.Ss Control Statements 224The control statements are as follows: 225.Pp 226.Bl -hang -offset indent -width indent -compact 227.It Ic if \&( Ar expression Ic \&) Ar statement Bq Ic else Ar statement 228.It Ic while \&( Ar expression Ic \&) Ar statement 229.It Ic for \&( Ar expression Ic \&; Ar expression Ic \&; \ 230Ar expression Ic \&) Ar statement 231.It Ic for \&( Va var Ic in Ar array Ic \&) Ar statement 232.It Ic do Ar statement Ic while \&( Ar expression Ic \&) 233.It Ic break 234.It Ic continue 235.It Ic delete Va array Bq Ar expression 236.It Ic delete Va array 237.It Ic exit Bq Ar expression 238.Ar expression 239.It Ic return Bq Ar expression 240.It Ic \&{ Ar [ statement ... ] Ic \&} 241.El 242.Ss I/O Statements 243The input/output statements are as follows: 244.Pp 245.Bl -tag -width indent 246.It Fn close expr 247Closes the file or pipe 248.Ar expr . 249Returns zero on success; otherwise nonzero. 250.It Fn fflush expr 251Flushes any buffered output for the file or pipe 252.Ar expr . 253Returns zero on success; otherwise nonzero. 254.It Ic getline Bq Va var 255Set 256.Va var 257(or 258.Va $0 if 259.Va var 260is not specified) 261to the next input record from the current input file. 262.Ic getline 263returns 1 for a successful input, 2640 for end of file, and \-1 for an error. 265.It Ic getline Bo Va var Bc Ic \*[Lt] Ar file 266Set 267.Va var 268(or 269.Va $0 if 270.Va var 271is not specified) 272to the next input record from the specified file 273.Ar file . 274.It Ar expr Ic \&| getline 275Pipes the output of 276.Ar expr 277into 278.Ic getline ; 279each call of 280.Ic getline 281returns the next line of output from 282.Ar expr . 283.It Ic next 284Skip remaining patterns on this input line. 285.It Ic nextfile 286Skip rest of this file, open next, start at top. 287.It Ic print Bo Ar expr-list Bc Bq Ic \*[Gt] Ar file 288The 289.Ic print 290statement prints its arguments on the standard output (or to a file 291if 292.Ic \*[Gt] file 293or to a pipe if 294.Ic | Ar expr 295is present), 296separated by the current output field separator 297.Va OFS , 298and terminated by the 299output record separator 300.Va ORS . 301Both 302.Ar file 303and 304.Ar expr 305may be literal names or parenthesized expressions; identical string values in 306different statements denote the same open file. 307.It Ic printf Ar format Bo Ic \&, Ar expr-list Bc Bq Ic \*[Gt] Ar file 308Format and print its expression list according to 309.Ar format . 310See 311.Xr printf 3 312for list of supported formats and their meaning. 313.El 314.Ss Mathematical and Numeric Functions 315AWK has the following mathematical and numerical functions built-in: 316.Pp 317.Bl -tag -width indent 318.It Fn atan2 x y 319Returns the arctangent of 320.Ar x Ic / Ar y 321in radians. 322See also 323.Xr atan2 3 . 324.It Fn cos expr 325Computes the cosine of 326.Ar expr , 327measured in radians. 328See also 329.Xr cos 3 . 330.It Fn exp expr 331Computes the exponential value of the given argument 332.Ar expr . 333See also 334.Xr exp 3 . 335.It Fn int expr 336Truncates 337.Ar expr 338to integer. 339.It Fn log expr 340Computes the value of the natural logarithm of argument 341.Ar expr . 342See also 343.Xr log 3 . 344.It Fn rand 345Returns random number between 0 and 1. 346.It Fn sin expr 347Computes the sine of 348.Ar expr , 349measured in radians. 350See also 351.Xr sin 3 . 352.It Fn sqrt expr 353Computes the non-negative square root of 354.Ar expr . 355See also 356.Xr sqrt 3 . 357.It Fn srand [expr] 358Sets seed for random number generator ( 359.Fn rand ) 360and returns the previous seed. 361.El 362.Ss String Functions 363AWK has the following string functions built-in: 364.Pp 365.Bl -tag -width indent 366.It Fn gensub r s h [t] 367Search the target string 368.Ar t 369for matches of the regular expression 370.Ar r . 371If 372.Ar h 373is a string beginning with 374.Ic g 375or 376.Ic G , 377then replace all matches of 378.Ar r 379with 380.Ar s . 381Otherwise, 382.Ar h 383is a number indicating which match of 384.Ar r 385to replace. 386If no 387.Ar t 388is supplied, 389.Va $0 390is used instead. 391.\"Within the replacement text 392.\".Ar s , 393.\"the sequence 394.\".Ar \en , 395.\"where 396.\".Ar n 397.\"is a digit from 1 to 9, may be used to indicate just the text that 398.\"matched the 399.\".Ar n Ap th 400.\"parenthesized subexpression. 401.\"The sequence 402.\".Ic \e0 403.\"represents the entire text, as does the character 404.\".Ic & . 405Unlike 406.Fn sub 407and 408.Fn gsub , 409the modified string is returned as the result of the function, 410and the original target is 411.Em not 412changed. 413Note that the 414.Ar \en 415sequences within replacement string 416.Ar s 417supported by GNU 418.Nm 419are 420.Em not 421supported at this moment. 422.It Fn gsub r s "[t]" 423Same as 424.Fn sub 425except that all occurrences of the regular expression 426are replaced; 427.Fn sub 428and 429.Fn gsub 430return the number of replacements. 431.It Fn index s t 432the position in 433.Ar s 434where the string 435.Ar t 436occurs, or 0 if it does not. 437.It Fn length "[string]" 438the length of its argument 439taken as a string, 440or of 441.Va $0 442if no argument. 443.It Fn match s r 444the position in 445.Ar s 446where the regular expression 447.Ar r 448occurs, or 0 if it does not. 449The variables 450.Va RSTART 451and 452.Va RLENGTH 453are set to the position and length of the matched string. 454.It Fn split s a "[fs]" 455splits the string 456.Ar s 457into array elements 458.Va a[1] , 459.Va a[2] , 460\&..., 461.Va a[n] , 462and returns 463.Va n . 464The separation is done with the regular expression 465.Ar fs 466or with the field separator 467.Va FS 468if 469.Ar fs 470is not given. 471An empty string as field separator splits the string 472into one array element per character. 473.It Fn sprintf fmt expr "..." 474Returns the string resulting from formatting 475.Ar expr 476according to the 477.Xr printf 3 478format 479.Ar fmt . 480.It Fn sub r s "[t]" 481substitutes 482.Ar s 483for the first occurrence of the regular expression 484.Ar r 485in the target string 486.Ar t . 487If 488.Ar t 489is not given, 490.Va $0 491is used. 492.It Fn substr s m [n] 493Returns the at most 494.Ar n Ns No -character 495substring of 496.Ar s 497starting at position 498.Ar m , 499counted from 1. 500If 501.Ar n 502is omitted, the rest of 503.Ar s 504is returned. 505.It Fn tolower str 506returns a copy of 507.Ar str 508with all upper-case characters translated to their 509corresponding lower-case equivalents. 510.It Fn toupper str 511returns a copy of 512.Ar str 513with all lower-case characters translated to their 514corresponding upper-case equivalents. 515.El 516.Ss Time Functions 517This 518.Nm 519provides the following two functions for obtaining time 520stamps and formatting them: 521.Bl -tag -width indent 522.It Fn systime 523Returns the value of time in seconds since the start of 524.Tn Unix 525Epoch (Midnight, January 1, 1970, Coordinated Universal Time). 526See also 527.Xr time 3 . 528.It Fn strftime "[format [, timestamp]]" 529Formats the time 530.Ar timestamp 531according to the string 532.Ar format . 533.Ar timestamp 534should be in same form as value returned by 535.Fn systime . 536If 537.Ar timestamp 538is missing, current time is used. 539If 540.Ar format 541is missing, a default format equivalent to the output of 542.Xr date 1 543would be used. 544See the specification of ANSI C 545.Xr strftime 3 546for the format conversions which are supported. 547.El 548.Ss Other built-in functions 549.Bl -tag -width indent 550.It Fn system cmd 551executes 552.Ar cmd 553and returns its exit status 554.El 555.Ss Patterns 556Patterns are arbitrary Boolean combinations 557(with 558.Ic "! || \*[Am]\*[Am]" ) 559of regular expressions and 560relational expressions. 561Regular expressions are as in 562.Xr egrep 1 . 563Isolated regular expressions 564in a pattern apply to the entire line. 565Regular expressions may also occur in 566relational expressions, using the operators 567.Ic ~ 568and 569.Ic !~ . 570.Ic / re / 571is a constant regular expression; 572any string (constant or variable) may be used 573as a regular expression, except in the position of an isolated regular expression 574in a pattern. 575.Pp 576A pattern may consist of two patterns separated by a comma; 577in this case, the action is performed for all lines 578from an occurrence of the first pattern 579though an occurrence of the second. 580.Pp 581A relational expression is one of the following: 582.Bl -tag -offset indent -width indent -compact 583.It Ar expression matchop regular-expression 584.It Ar expression relop expression 585.It Ar expression Ic in Ar array-name 586.It ( Ar expr , expr,\&... Ic ") in" Ar array-name 587.El 588.Pp 589where a 590.Ar relop 591is any of the six relational operators in C, 592and a 593.Ar matchop 594is either 595.Ic ~ 596(matches) 597or 598.Ic !~ 599(does not match). 600A conditional is an arithmetic expression, 601a relational expression, 602or a Boolean combination 603of these. 604.Pp 605The special patterns 606.Ic BEGIN 607and 608.Ic END 609may be used to capture control before the first input line is read 610and after the last. 611.Ic BEGIN 612and 613.Ic END 614do not combine with other patterns. 615.Ss Built-in Variables 616Variable names with special meanings: 617.Bl -hang -width FILENAMES 618.It Va ARGC 619argument count, assignable 620.It Va ARGV 621argument array, assignable; 622non-null members are taken as filenames 623.It Va CONVFMT 624conversion format used when converting numbers 625(default 626.Qq %.6g ) 627.It Va ENVIRON 628array of environment variables; subscripts are names. 629.It Va FILENAME 630the name of the current input file 631.It Va FNR 632ordinal number of the current record in the current file 633.It Va FS 634regular expression used to separate fields; also settable 635by option 636.Fl F Ar fs . 637.It Va NF 638number of fields in the current record 639.It Va NR 640ordinal number of the current record 641.It Va OFMT 642output format for numbers (default 643.Qq "%.6g" 644) 645.It Va OFS 646output field separator (default blank) 647.It Va ORS 648output record separator (default newline) 649.It Va RS 650input record separator (default newline) 651.It Va RSTART 652Position of the first character matched by 653.Fn match ; 6540 if not match. 655.It Va RLENGTH 656Length of the string matched by 657.Fn match ; 658-1 if no match. 659.It Va SUBSEP 660separates multiple subscripts (default 034) 661.El 662.Ss Functions 663Functions may be defined (at the position of a pattern-action statement) thus: 664.Bd -filled -offset indent 665.Ic function foo(a, b, c) { ...; return x } 666.Ed 667.Pp 668Parameters are passed by value if scalar and by reference if array name; 669functions may be called recursively. 670Parameters are local to the function; all other variables are global. 671Thus local variables may be created by providing excess parameters in 672the function definition. 673.Sh EXAMPLES 674.Bl -tag -width indent -compact 675.It Ic length($0) \*[Gt] 72 676Print lines longer than 72 characters. 677.Pp 678.It Ic \&{ print $2, $1 \&} 679Print first two fields in opposite order. 680.Pp 681.It Ic BEGIN { FS = \&",[ \et]*|[ \et]+\&" } 682.It Ic "\ \ \ \ \ \ {" print \&$2, \&$1 } 683Same, with input fields separated by comma and/or blanks and tabs. 684.Pp 685.It Ic "\ \ \ \ {" s += $1 } 686.It Ic END { print \&"sum is\&", s, \&" average is\ \&",\ s/NR\ } 687Add up first column, print sum and average. 688.Pp 689.It Ic /start/, /stop/ 690Print all lines between start/stop pairs. 691.Pp 692.It Ic BEGIN { # Simulate echo(1) 693.It Ic "\ \ \ \ " for (i = 1; i \*[Lt] ARGC;\ i++)\ printf\ \&"%s\ \&",\ ARGV[i] 694.It Ic "\ \ \ \ " printf \&"\en\&" 695.It Ic "\ \ \ \ " exit } 696.El 697.Sh SEE ALSO 698.Xr egrep 1 , 699.Xr lex 1 , 700.Xr sed 1 , 701.Xr atan2 3 , 702.Xr cos 3 , 703.Xr exp 3 , 704.Xr log 3 , 705.Xr sin 3 , 706.Xr sqrt 3 , 707.Xr strftime 3 , 708.Xr time 3 709.Pp 710A. V. Aho, B. W. Kernighan, P. J. Weinberger, 711.Em The AWK Programming Language , 712Addison-Wesley, 1988. 713ISBN 0-201-07981-X 714.Pp 715.Em AWK Language Programming , 716Edition 1.0, published by the Free Software Foundation, 1995 717.Sh HISTORY 718.Nm nawk 719has been the default system 720.Nm 721since 722.Nx 2.0 , 723replacing the previously used GNU 724.Nm . 725.Sh BUGS 726There are no explicit conversions between numbers and strings. 727To force an expression to be treated as a number add 0 to it; 728to force it to be treated as a string concatenate 729\&"\&" to it. 730.Pp 731The scope rules for variables in functions are a botch; 732the syntax is worse. 733