1.\" $NetBSD: awk.1,v 1.35 2024/09/20 07:49:31 rin Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.Dd July 5, 2022 26.Dt AWK 1 27.Os 28.Sh NAME 29.Nm awk 30.Nd pattern-directed scanning and processing language 31.Sh SYNOPSIS 32.Nm 33.Op Fl F Ar fs 34.Op Fl v Ar var\| Ns Cm \&= Ns Ar value 35.Op Fl safe 36.Op Fl d Ns Op Ar N 37.Op Ar prog | Fl f Ar progfile 38.Ar 39.Nm 40.Fl version 41.Sh DESCRIPTION 42.Nm 43is the Bell Labs' implementation of the AWK programming language as 44described in the 45.Em The AWK Programming Language 46by 47A.V.\~Aho, B.W.\~Kernighan, P.\^J.\~Weinberger. 48.Pp 49.Nm 50scans each input 51.Ar file 52for lines that match any of a set of patterns specified literally in 53.Ar prog 54or in one or more files 55specified as 56.Fl f Ar progfile . 57With each pattern 58there can be an associated action that will be performed 59when a line of a 60.Ar file 61matches the pattern. 62Each line is matched against the 63pattern portion of every pattern-action statement; 64the associated action is performed for each matched pattern. 65The file name 66.Ar - 67means the standard input. 68Any 69.Ar file 70of the form 71.Ar var\| Ns Cm \&= Ns Ar value 72is treated as an assignment, not a filename, 73and is executed at the time it would have been opened if it were a filename. 74The option 75.Fl v 76followed by 77.Ar var\| Ns Cm \&= Ns Ar value 78is an assignment to be done before 79.Ar prog 80is executed; any number of 81.Fl v 82options may be present. 83The 84.Fl F Ar fs 85option defines the input field separator to be the regular expression 86.Ar fs . 87.Pp 88The options are as follows: 89.Bl -tag -width Fl 90.It Fl d Ns Op Ar N 91Set debug level to specified number 92.Ar N . 93If the number is omitted, debug level is set to 1. 94.It Fl f Ar filename 95Read the AWK program source from specified file 96.Ar filename , 97instead of the first command line argument. 98Multiple 99.Fl f 100options may be specified. 101.It Fl F Ar fs 102Set the input field separator 103.Va FS 104to the regular expression 105.Ar fs . 106.It Fl mr Ar NNN , Fl mf Ar NNN 107Obsolete, no longer needed options. 108Set limit on maximum record or 109fields number. 110.It Fl safe 111Potentially unsafe functions such as 112.Fn system 113make the program abort (with a warning message). 114.It Fl v Ar var\| Ns Cm \&= Ns Ar value 115Assign the value 116.Ar value 117to the variable 118.Ar var 119before 120.Ar prog 121is executed. 122Any number of 123.Fl v 124options may be present. 125.It Fl version 126Print 127.Nm 128version on standard output and exit. 129.El 130.Pp 131An input line is normally made up of fields separated by white space, 132or by the regular expression the built-in variable 133.Va FS 134is set to. 135If 136.Va FS 137is null, the input line is split into one field per character. 138The fields are denoted 139.Li $ Ns Va 1 , 140.Li $ Ns Va 2 , 141\&..., while 142.Li $ Ns Va 0 143refers to the entire line. 144Setting any other field causes the re-evaluation of 145.Li $ Ns Va 0 146Assigning to 147.Li $ Ns Va 0 148resets the values of all other fields and the 149.Va NF 150built-in variable. 151.Pp 152A pattern-action statement has the form 153.Lp 154.D1 Ar pattern Li \&{ Ar action Li \&} 155.Lp 156A missing 157.Li \&{ Ar action Li \&} 158means print the line; 159a missing pattern always matches. 160Pattern-action statements are separated by newlines or semicolons. 161.Pp 162An action is a sequence of statements. 163Statements are terminated by 164semicolons, newlines or right braces. 165An empty 166.Ar expression-list 167stands for 168.Li $ Ns Va 0 . 169String constants are quoted 170.Li \(dq\(dq , 171with the usual C escapes recognized within. 172Expressions take on string or numeric values as appropriate, 173and are built using the 174.Sx Operators 175(see next subsection). 176Variables may be scalars, array elements 177(denoted 178.Va x\| Ns Li [ Ns Va i\^ Ns Li ] ) 179or fields. 180Variables are initialized to the null string. 181Array subscripts may be any string, 182not necessarily numeric; 183this allows for a form of associative memory. 184Multiple subscripts such as 185.Li [ Ns Ar i Ns Li \&, Ns Ar j Ns Li \&, Ns Ar k Ns Li ] 186are permitted; the constituents are concatenated, 187separated by the value of 188.Va SUBSEP . 189.Ss Operators 190.Nm 191operators, in order of decreasing precedence, are: 192.Pp 193.Bl -tag -width Ic -compact 194.It Ic \&( Ns No ... Ns Ic \&) 195Grouping 196.It Ic $ 197Field reference 198.It Ic ++ -- 199Increment and decrement, can be used either as postfix or prefix. 200.It Ic ^ 201Exponentiation (the 202.Ic **\^ 203form is also supported, and 204.Ic **\^= 205for the assignment operator). 206.It + \- \&! 207Unary plus, unary minus and logical negation. 208.It * / % 209Multiplication, division and modulus. 210.It + \- 211Addition and subtraction. 212.It Em space 213String concatenation. 214.It Ic \*[Lt] \*[Gt] 215.It Ic <= >= 216.It Ic != == 217Regular relational operators 218.It Ic ~ !~ 219Regular expression match and not match 220.It Ic in 221Array membership 222.It Ic "\*[Am]\*[Am]" 223Logical AND 224.It Ic "||" 225Logical OR 226.It Ic ?: 227C conditional expression. 228This is used as 229.Ar expr1 Ic \&? Ar expr2 Ic \&: Ar expr3 . 230If 231.Ar expr1 232is true, the result value is 233.Ar expr2 , 234otherwise it is 235.Ar expr3 . 236Only one of 237.Ar expr2 238and 239.Ar expr3 240is evaluated. 241.It Ic = += -= 242.It Ic *= /= %= ^= 243Assignment and Operator-Assignment 244.El 245.Ss Control Statements 246The control statements are as follows: 247.Bl -tag -width Fn 248.It Ic if \&( Ns Ar expression\^ Ns Ic \&) Ar statement Bq Ic else Ar statement 249.It Ic while \&( Ns Ar expression\^ Ns Ic \&) Ar statement 250.It Ic for \&( Ns Ar expression\^ Ns Ic \&; \ 251 Ar expression\^ Ns Ic \&; \ 252 Ar expression\^ Ns Ic \&) Ar statement 253.It Ic for \&( Ns Ar var Ic in Ar array\^ Ns Ic \&) Ar statement 254.It Ic do Ar statement Ic while \&( Ns Ar expression\^ Ns Ic \&) 255.It Ic break 256.It Ic continue 257.It Ic \&{ Oo Ar statement ... Oc Ic \&} 258.It Ar expression 259Commonly 260.Ar var Ic = Ar expression 261.It Ic return Op Ar expression 262.It Ic next 263Skip remaining patterns on this input line 264.It Ic nextfile 265Skip rest of this file, open next, start at top 266.It Ic delete Ar array\| Ns Cm \&[ Ns Ar expression\^ Ns Cm \&] 267Delete an array element 268.It Ic delete Ar array 269Delete all elements of an array 270.It Ic exit Op Ar expression 271Exit immediately; status is 272.Ar expression 273.El 274.Ss I/O Statements 275The input/output statements are as follows: 276.Bl -tag -width Fn 277.It Fn close expr 278Closes the file or pipe 279.Ar expr . 280Returns zero on success; otherwise nonzero. 281.It Fn fflush expr 282Flushes any buffered output for the file or pipe 283.Ar expr . 284Returns zero on success; otherwise nonzero. 285.It Ic getline Op Ar var 286Set 287.Ar var 288(or 289.Li $ Ns Va 0 290if 291.Ar var 292is not specified) 293to the next input record from the current input file. 294.Ic getline 295returns 1 for a successful input, 2960 for end of file, and \-1 for an error. 297.It Ic getline Oo Ar var Oc Ic < Ar file 298Set 299.Ar var 300(or 301.Li $ Ns Va 0 302if 303.Ar var 304is not specified) 305to the next input record from the specified file 306.Ar file . 307.It Ar expr Ic \&| getline 308Pipes the output of 309.Ar expr 310into 311.Ic getline ; 312each call of 313.Ic getline 314returns the next line of output from 315.Ar expr . 316.It Ic print Oo Ar expr-list Oc Op Ar redirection 317Print arguments separated by the current output field separator 318.Va OFS , 319and terminated by the 320output record separator 321.Va ORS . 322.It Ic printf Ar format\| Ns Oo Ic \&, Ar expr-list Oc Op Ar redirection 323Format and print its expression list according to 324.Ar format . 325See 326.Xr printf 3 327for list of supported formats and their meaning. 328.El 329.Pp 330Both 331.Ic print 332and 333.Ic printf 334statements write to standard output by default. 335The output is written to the file or pipe specified by 336.Ar redirection 337if one is supplied, as follows: 338.Ic \&> Ar file , "" 339.Ic \&>> Ar file , No or 340.Ic \&| Ar expr . 341Both 342.Ar file 343and 344.Ar expr 345may be literal names or parenthesized expressions; identical string values in 346different statements denote the same open file. 347For that purpose the file names 348.Pa /dev/stdin , 349.Pa /dev/stdout , 350and 351.Pa /dev/stderr 352refer to the program's 353.Va stdin , 354.Va stdout , 355and 356.Va stderr 357respectively (and are unrelated to the 358.Xr fd 4 359devices of the same names). 360.Ss Mathematical and Numeric Functions 361AWK has the following mathematical and numerical functions built-in: 362.Bl -tag -width Fn 363.It Fn atan2 x y 364Returns the arctangent of 365.Ar x\| Ns Li / Ns Ar y 366in radians. 367See also 368.Xr atan2 3 . 369.It Fn cos expr 370Computes the cosine of 371.Ar expr , 372measured in radians. 373See also 374.Xr cos 3 . 375.It Fn exp expr 376Computes the exponential value of the given argument 377.Ar expr . 378See also 379.Xr exp 3 . 380.It Fn int expr 381Truncates 382.Ar expr 383to integer. 384.It Fn log expr 385Computes the value of the natural logarithm of argument 386.Ar expr . 387See also 388.Xr log 3 . 389.It Fn rand 390Returns random number between 0 and 1. 391.It Fn sin expr 392Computes the sine of 393.Ar expr , 394measured in radians. 395See also 396.Xr sin 3 . 397.It Fn sqrt expr 398Computes the non-negative square root of 399.Ar expr . 400See also 401.Xr sqrt 3 . 402.It Fn srand [expr] 403Sets seed for random number generator 404.Pq Fn rand 405and returns the previous seed. 406.El 407.Ss String Functions 408AWK has the following string functions built-in: 409.Bl -tag -width Fn 410.It Xo Fo gensub 411.Fa r s h\| 412.Oo Fa t 413.Oc Fc Xc 414Search the target string 415.Ar t 416for matches of the regular expression 417.Ar r . 418If 419.Ar h 420is a string beginning with 421.Ql g 422or 423.Ql G , 424then replace all matches of 425.Ar r 426with 427.Ar s . 428Otherwise, 429.Ar h 430is a number indicating which match of 431.Ar r 432to replace. 433If no 434.Ar t 435is supplied, 436.Li $ Ns Va 0 437is used instead. 438.\"Within the replacement text 439.\".Ar s , 440.\"the sequence 441.\".Sq Li \e Ns Ar n , 442.\"where 443.\".Ar n 444.\"is a digit from 1 to 9, may be used to indicate just the text that 445.\"matched the 446.\".Ar n Ap th 447.\"parenthesized subexpression. 448.\"The sequence 449.\".Ic \e0 450.\"represents the entire text, as does the character 451.\".Ic & . 452Unlike 453.Fn sub 454and 455.Fn gsub , 456the modified string is returned as the result of the function, 457and the original target is 458.Em not 459changed. 460Note that the 461.Sq Li \e Ns Ar n 462sequences (backreferences) within replacement string 463.Ar s 464supported by GNU 465.Nm 466are 467.Em not 468supported at this moment. 469.It Xo Fo gsub 470.Fa r s\| 471.Oo Fa t 472.Oc Fc Xc 473Same as 474.Fn sub 475except that all occurrences of the regular expression 476are replaced; 477.Fn sub 478and 479.Fn gsub 480return the number of replacements. 481.It Fn index s t 482The position in 483.Ar s 484where the string 485.Ar t 486occurs, or 0 if it does not. 487.\" .Fn cannot be told to omit parens, so piece this together manually 488.\" to mark empty parens optional too 489.It Xo Ic length\^ Ns Oo \&( Ns 490.Oo Ns 491.Fa string 492.Oc Ns \&) 493.Oc Xc 494The length of its argument 495taken as a string, 496or of 497.Li $ Ns Va 0 498if no argument. 499.It Fn match s r 500The position in 501.Ar s 502where the regular expression 503.Ar r 504occurs, or 0 if it does not. 505The variables 506.Va RSTART 507and 508.Va RLENGTH 509are set to the position and length of the matched string. 510.It Xo Fo split 511.Fa s a\| 512.Oo Fa fs 513.Oc Fc Xc 514Splits the string 515.Ar s 516into array elements 517.Ar a Ns Li [1] , 518.Ar a Ns Li [2] , 519\&..., 520.Ar a Ns Li \&[ Ns Ar n Ns Li \&] , 521and returns 522.Ar n . 523The separation is done with the regular expression 524.Ar fs 525or with the field separator 526.Va FS 527if 528.Ar fs 529is not given. 530An empty string as field separator splits the string 531into one array element per character. 532.It Fn sprintf fmt expr "..." 533Returns the string resulting from formatting 534.Ar expr 535according to the 536.Xr printf 3 537format 538.Ar fmt . 539.It Xo Fo sub 540.Fa r s\| 541.Oo Fa t 542.Oc Fc Xc 543Substitutes 544.Ar s 545for the first occurrence of the regular expression 546.Ar r 547in the target string 548.Ar t . 549If 550.Ar t 551is not given, 552.Li $ Ns Va 0 553is used. 554.It Xo Fo substr 555.Fa s m\| 556.Oo Fa n 557.Oc Fc Xc 558Returns the at most 559.Ar n\^ Ns No -character 560substring of 561.Ar s 562starting at position 563.Ar m , 564counted from 1. 565If 566.Ar n 567is omitted, the rest of 568.Ar s 569is returned. 570.It Fn tolower str 571Returns a copy of 572.Ar str 573with all upper-case characters translated to their 574corresponding lower-case equivalents. 575.It Fn toupper str 576Returns a copy of 577.Ar str 578with all lower-case characters translated to their 579corresponding upper-case equivalents. 580.El 581.Ss Time Functions 582This 583.Nm 584provides the following two functions for obtaining time 585stamps and formatting them: 586.Bl -tag -width Fn 587.It Fn systime 588Returns the value of time in seconds since the start of 589Unix Epoch (midnight, January 1, 1970, Coordinated Universal Time). 590See also 591.Xr time 3 . 592.\"It Fn strftime "[format [, timestamp]]" 593.It Xo Fo strftime 594.Oo Fa format\| 595.Oo Fa timestamp\| 596.Oc Oc Fc Xc 597Formats the time 598.Ar timestamp 599according to the string 600.Ar format . 601.Ar timestamp 602should be in same form as value returned by 603.Fn systime . 604If 605.Ar timestamp 606is missing, current time is used. 607If 608.Ar format 609is missing, a default format equivalent to the output of 610.Xr date 1 611would be used. 612See the specification of ANSI C 613.Xr strftime 3 614for the format conversions which are supported. 615.El 616.Ss Other built-in functions 617.Bl -tag -width Fn 618.It Fn system cmd 619Executes 620.Ar cmd 621and returns its exit status. 622.El 623.Ss Patterns 624Patterns are arbitrary Boolean combinations 625(with 626.Ic "! || \*[Am]\*[Am]" ) 627of regular expressions and 628relational expressions. 629Regular expressions are as in 630.Xr egrep 1 . 631Isolated regular expressions 632in a pattern apply to the entire line. 633Regular expressions may also occur in 634relational expressions, using the operators 635.Ic ~ 636and 637.Ic !~ . 638.Ic / Ns Ar re Ns Ic / 639is a constant regular expression; 640any string (constant or variable) may be used 641as a regular expression, except in the position of an isolated regular expression 642in a pattern. 643.Pp 644A pattern may consist of two patterns separated by a comma; 645in this case, the action is performed for all lines 646from an occurrence of the first pattern 647though an occurrence of the second. 648.Pp 649A relational expression is one of the following: 650.Bl -tag -offset indent -width Fn -compact 651.It Ar expression matchop regular-expression 652.It Ar expression relop expression 653.It Ar expression Ic in Ar array-name 654.It Ic \&( Ns Ar expr Ns Ic \&, Ar expr Ns Ic \&, Ar ... Ic \&) in Ar array-name 655.El 656.Pp 657where a 658.Ar relop 659is any of the six relational operators in C, 660and a 661.Ar matchop 662is either 663.Ic ~ 664(matches) 665or 666.Ic !~ 667(does not match). 668A conditional is an arithmetic expression, 669a relational expression, 670or a Boolean combination 671of these. 672.Pp 673The special patterns 674.Ic BEGIN 675and 676.Ic END 677may be used to capture control before the first input line is read 678and after the last. 679.Ic BEGIN 680and 681.Ic END 682do not combine with other patterns. 683.Pp 684If an awk program consists of only actions with the pattern 685.Ic BEGIN , 686and the 687.Ic BEGIN 688action contains no 689.Ic getline 690statement, awk exits without reading its input when the last 691statement in the last 692.Ic BEGIN 693action is executed. 694If an awk program consists of only actions with the pattern 695.Ic END 696or only actions with the patterns 697.Ic BEGIN 698and 699.Ic END , 700the input is read before the statements in the 701.Ic END 702actions are executed. 703.Ss Built-in Variables 704Variable names with special meanings: 705.Bl -hang -width Va 706.It Va ARGC 707argument count, assignable 708.It Va ARGV 709argument array, assignable; 710non-null members are taken as filenames 711.It Va CONVFMT 712conversion format used when converting numbers 713(default 714.Li \(dq%.6g\(dq ) 715.It Va ENVIRON 716array of environment variables; subscripts are names. 717.It Va FILENAME 718the name of the current input file 719.It Va FNR 720ordinal number of the current record in the current file 721.It Va FS 722regular expression used to separate fields; also settable 723by option 724.Fl F Ar fs . 725.It Va NF 726number of fields in the current record 727.It Va NR 728ordinal number of the current record 729.It Va OFMT 730output format for numbers (default 731.Li \(dq%.6g\(dq ) 732.It Va OFS 733output field separator (default blank) 734.It Va ORS 735output record separator (default newline) 736.It Va RS 737input record separator (default newline) 738.It Va RSTART 739position of the first character matched by 740.Fn match ; 7410 if no match. 742.It Va RLENGTH 743length of the string matched by 744.Fn match ; 745\-1 if no match. 746.It Va SUBSEP 747separates multiple subscripts (default 748.Li 034 ) 749.El 750.Ss Functions 751Functions may be defined (at the position of a pattern-action statement) thus: 752.Bd -literal -offset indent 753function foo(a, b, c) { ...; return x } 754.Ed 755.Pp 756Parameters are passed by value if scalar and by reference if array name; 757functions may be called recursively. 758Parameters are local to the function; all other variables are global. 759Thus local variables may be created by providing excess parameters in 760the function definition. 761.Sh EXAMPLES 762Print lines longer than 72 characters. 763.Fn length 764defaults to 765.Li $ Ns Va 0 766and the empty parens can also be omitted in this case: 767.Pp 768.Dl length > 72 769.Pp 770Print first two fields in opposite order: 771.Pp 772.Dl { print $2, $1 } 773.Pp 774Same, with input fields separated by comma and/or blanks and tabs: 775.Bd -literal -offset indent 776BEGIN { FS = ",[ \et]*|[ \et]+" } 777 { print $2, $1 } 778.Ed 779.Pp 780Add up first column, print sum and average: 781.Bd -literal -offset indent 782{ s += $1 } 783END { print "sum is", s, "average is", s/NR } 784.Ed 785.Pp 786Print all lines between start/stop pairs: 787.Pp 788.Dl /start/, /stop/ 789.Pp 790Simulate 791.Xr echo 1 : 792.Bd -literal -offset indent 793BEGIN { 794 for (i = 1; i < ARGC; ++i) 795 printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ") 796} 797.Ed 798.Pp 799Another way to do the same that demonstrates field assignment and 800.Li $ Ns Va 0 801re-evaluation: 802.Pp 803.Dl BEGIN { for (i = 1; i < ARGC; ++i) $i = ARGV[i]; print } 804.Pp 805Print an error message to standard error: 806.Bd -literal -offset indent 807{ print "error!" > "/dev/stderr" } 808.Ed 809.Sh SEE ALSO 810.Xr egrep 1 , 811.Xr lex 1 , 812.Xr sed 1 , 813.Xr atan2 3 , 814.Xr cos 3 , 815.Xr exp 3 , 816.Xr log 3 , 817.Xr sin 3 , 818.Xr sqrt 3 , 819.Xr strftime 3 , 820.Xr time 3 821.Pp 822A.\^V.\~Aho, B.\^W.\~Kernighan, P.\^J.\~Weinberger, 823.Em The AWK Programming Language , 824Addison-Wesley, 1988. 825ISBN 0-201-07981-X 826.Pp 827.Em AWK Language Programming , 828Edition 1.0, published by the Free Software Foundation, 1995 829.Sh HISTORY 830.Nm nawk 831has been the default system 832.Nm 833since 834.Nx 2.0 , 835replacing the previously used GNU 836.Nm . 837.Sh BUGS 838There are no explicit conversions between numbers and strings. 839To force an expression to be treated as a number add 0 to it; 840to force it to be treated as a string concatenate 841\&"\&" to it. 842.Pp 843The scope rules for variables in functions are a botch; 844the syntax is worse. 845.Pp 846Only eight-bit characters sets are handled correctly. 847