1.\" $OpenBSD: awk.1,v 1.45 2019/05/26 01:16:09 naddy Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.Dd $Mdocdate: May 26 2019 $ 26.Dt AWK 1 27.Os 28.Sh NAME 29.Nm awk 30.Nd pattern-directed scanning and processing language 31.Sh SYNOPSIS 32.Nm awk 33.Op Fl safe 34.Op Fl V 35.Op Fl d Ns Op Ar n 36.Op Fl F Ar fs 37.Op Fl v Ar var Ns = Ns Ar value 38.Op Ar prog | Fl f Ar progfile 39.Ar 40.Sh DESCRIPTION 41.Nm 42scans each input 43.Ar file 44for lines that match any of a set of patterns specified literally in 45.Ar prog 46or in one or more files specified as 47.Fl f Ar progfile . 48With each pattern there can be an associated action that will be performed 49when a line of a 50.Ar file 51matches the pattern. 52Each line is matched against the 53pattern portion of every pattern-action statement; 54the associated action is performed for each matched pattern. 55The file name 56.Sq - 57means the standard input. 58Any 59.Ar file 60of the form 61.Ar var Ns = Ns Ar value 62is treated as an assignment, not a filename, 63and is executed at the time it would have been opened if it were a filename. 64.Pp 65The options are as follows: 66.Bl -tag -width "-safe " 67.It Fl d Ns Op Ar n 68Debug mode. 69Set debug level to 70.Ar n , 71or 1 if 72.Ar n 73is not specified. 74A value greater than 1 causes 75.Nm 76to dump core on fatal errors. 77.It Fl F Ar fs 78Define the input field separator to be the regular expression 79.Ar fs . 80.It Fl f Ar progfile 81Read program code from the specified file 82.Ar progfile 83instead of from the command line. 84.It Fl safe 85Disable file output 86.Pf ( Ic print No > , 87.Ic print No >> ) , 88process creation 89.Po 90.Ar cmd | Ic getline , 91.Ic print | , 92.Ic system 93.Pc 94and access to the environment 95.Pf ( Va ENVIRON ; 96see the section on variables below). 97This is a first 98.Pq and not very reliable 99approximation to a 100.Dq safe 101version of 102.Nm . 103.It Fl V 104Print the version number of 105.Nm 106to standard output and exit. 107.It Fl v Ar var Ns = Ns Ar value 108Assign 109.Ar value 110to variable 111.Ar var 112before 113.Ar prog 114is executed; 115any number of 116.Fl v 117options may be present. 118.El 119.Pp 120The input is normally made up of input lines 121.Pq records 122separated by newlines, or by the value of 123.Va RS . 124If 125.Va RS 126is null, then any number of blank lines are used as the record separator, 127and newlines are used as field separators 128(in addition to the value of 129.Va FS ) . 130This is convenient when working with multi-line records. 131.Pp 132An input line is normally made up of fields separated by whitespace, 133or by the regular expression 134.Va FS . 135The fields are denoted 136.Va $1 , $2 , ... , 137while 138.Va $0 139refers to the entire line. 140If 141.Va FS 142is null, the input line is split into one field per character. 143.Pp 144Normally, any number of blanks separate fields. 145In order to set the field separator to a single blank, use the 146.Fl F 147option with a value of 148.Sq [\ \&] . 149If a field separator of 150.Sq t 151is specified, 152.Nm 153treats it as if 154.Sq \et 155had been specified and uses 156.Aq TAB 157as the field separator. 158In order to use a literal 159.Sq t 160as the field separator, use the 161.Fl F 162option with a value of 163.Sq [t] . 164.Pp 165A pattern-action statement has the form 166.Pp 167.D1 Ar pattern Ic \&{ Ar action Ic \&} 168.Pp 169A missing 170.Ic \&{ Ar action Ic \&} 171means print the line; 172a missing pattern always matches. 173Pattern-action statements are separated by newlines or semicolons. 174.Pp 175Newlines are permitted after a terminating statement or following a comma 176.Pq Sq ,\& , 177an open brace 178.Pq Sq { , 179a logical AND 180.Pq Sq && , 181a logical OR 182.Pq Sq || , 183after the 184.Sq do 185or 186.Sq else 187keywords, 188or after the closing parenthesis of an 189.Sq if , 190.Sq for , 191or 192.Sq while 193statement. 194Additionally, a backslash 195.Pq Sq \e 196can be used to escape a newline between tokens. 197.Pp 198An action is a sequence of statements. 199A statement can be one of the following: 200.Pp 201.Bl -tag -width Ds -offset indent -compact 202.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement 203.It Ic while Ar ( expression ) Ar statement 204.It Ic for Ar ( expression ; expression ; expression ) statement 205.It Ic for Ar ( var Ic in Ar array ) statement 206.It Ic do Ar statement Ic while Ar ( expression ) 207.It Ic break 208.It Ic continue 209.It Xo Ic { 210.Op Ar statement ... 211.Ic } 212.Xc 213.It Xo Ar expression 214.No # commonly 215.Ar var No = Ar expression 216.Xc 217.It Xo Ic print 218.Op Ar expression-list 219.Op > Ns Ar expression 220.Xc 221.It Xo Ic printf Ar format 222.Op Ar ... , expression-list 223.Op > Ns Ar expression 224.Xc 225.It Ic return Op Ar expression 226.It Xo Ic next 227.No # skip remaining patterns on this input line 228.Xc 229.It Xo Ic nextfile 230.No # skip rest of this file, open next, start at top 231.Xc 232.It Xo Ic delete 233.Sm off 234.Ar array Ic \&[ Ar expression Ic \&] 235.Sm on 236.No # delete an array element 237.Xc 238.It Xo Ic delete Ar array 239.No # delete all elements of array 240.Xc 241.It Xo Ic exit 242.Op Ar expression 243.No # exit immediately; status is Ar expression 244.Xc 245.El 246.Pp 247Statements are terminated by 248semicolons, newlines or right braces. 249An empty 250.Ar expression-list 251stands for 252.Ar $0 . 253String constants are quoted 254.Li \&"" , 255with the usual C escapes recognized within 256(see 257.Xr printf 1 258for a complete list of these). 259Expressions take on string or numeric values as appropriate, 260and are built using the operators 261.Ic + \- * / % ^ 262.Pq exponentiation , 263and concatenation 264.Pq indicated by whitespace . 265The operators 266.Ic \&! ++ \-\- += \-= *= /= %= ^= 267.Ic > >= < <= == != ?: 268are also available in expressions. 269Variables may be scalars, array elements 270(denoted 271.Li x[i] ) 272or fields. 273Variables are initialized to the null string. 274Array subscripts may be any string, 275not necessarily numeric; 276this allows for a form of associative memory. 277Multiple subscripts such as 278.Li [i,j,k] 279are permitted; the constituents are concatenated, 280separated by the value of 281.Va SUBSEP 282.Pq see the section on variables below . 283.Pp 284The 285.Ic print 286statement prints its arguments on the standard output 287(or on a file if 288.Pf > Ar file 289or 290.Pf >> Ar file 291is present or on a pipe if 292.Pf |\ \& Ar cmd 293is present), separated by the current output field separator, 294and terminated by the output record separator. 295.Ar file 296and 297.Ar cmd 298may be literal names or parenthesized expressions; 299identical string values in different statements denote 300the same open file. 301The 302.Ic printf 303statement formats its expression list according to the format 304(see 305.Xr printf 1 ) . 306.Pp 307Patterns are arbitrary Boolean combinations 308(with 309.Ic "\&! || &&" ) 310of regular expressions and 311relational expressions. 312.Nm 313supports extended regular expressions 314.Pq EREs . 315See 316.Xr re_format 7 317for more information on regular expressions. 318Isolated regular expressions 319in a pattern apply to the entire line. 320Regular expressions may also occur in 321relational expressions, using the operators 322.Ic ~ 323and 324.Ic !~ . 325.Pf / Ar re Ns / 326is a constant regular expression; 327any string (constant or variable) may be used 328as a regular expression, except in the position of an isolated regular expression 329in a pattern. 330.Pp 331A pattern may consist of two patterns separated by a comma; 332in this case, the action is performed for all lines 333from an occurrence of the first pattern 334through an occurrence of the second. 335.Pp 336A relational expression is one of the following: 337.Pp 338.Bl -tag -width Ds -offset indent -compact 339.It Ar expression matchop regular-expression 340.It Ar expression relop expression 341.It Ar expression Ic in Ar array-name 342.It Xo Ic \&( Ns 343.Ar expr , expr , \&... Ns Ic \&) in 344.Ar array-name 345.Xc 346.El 347.Pp 348where a 349.Ar relop 350is any of the six relational operators in C, and a 351.Ar matchop 352is either 353.Ic ~ 354(matches) 355or 356.Ic !~ 357(does not match). 358A conditional is an arithmetic expression, 359a relational expression, 360or a Boolean combination 361of these. 362.Pp 363The special patterns 364.Ic BEGIN 365and 366.Ic END 367may be used to capture control before the first input line is read 368and after the last. 369.Ic BEGIN 370and 371.Ic END 372do not combine with other patterns. 373.Pp 374Variable names with special meanings: 375.Pp 376.Bl -tag -width "FILENAME " -compact 377.It Va ARGC 378Argument count, assignable. 379.It Va ARGV 380Argument array, assignable; 381non-null members are taken as filenames. 382.It Va CONVFMT 383Conversion format when converting numbers 384(default 385.Qq Li %.6g ) . 386.It Va ENVIRON 387Array of environment variables; subscripts are names. 388.It Va FILENAME 389The name of the current input file. 390.It Va FNR 391Ordinal number of the current record in the current file. 392.It Va FS 393Regular expression used to separate fields; also settable 394by option 395.Fl F Ar fs . 396.It Va NF 397Number of fields in the current record. 398.Va $NF 399can be used to obtain the value of the last field in the current record. 400.It Va NR 401Ordinal number of the current record. 402.It Va OFMT 403Output format for numbers (default 404.Qq Li %.6g ) . 405.It Va OFS 406Output field separator (default blank). 407.It Va ORS 408Output record separator (default newline). 409.It Va RLENGTH 410The length of the string matched by the 411.Fn match 412function. 413.It Va RS 414Input record separator (default newline). 415.It Va RSTART 416The starting position of the string matched by the 417.Fn match 418function. 419.It Va SUBSEP 420Separates multiple subscripts (default 034). 421.El 422.Sh FUNCTIONS 423The awk language has a variety of built-in functions: 424arithmetic, string, input/output, general, and bit-operation. 425.Pp 426Functions may be defined (at the position of a pattern-action statement) 427thusly: 428.Pp 429.Dl function foo(a, b, c) { ...; return x } 430.Pp 431Parameters are passed by value if scalar, and by reference if array name; 432functions may be called recursively. 433Parameters are local to the function; all other variables are global. 434Thus local variables may be created by providing excess parameters in 435the function definition. 436.Ss Arithmetic Functions 437.Bl -tag -width "atan2(y, x)" 438.It Fn atan2 y x 439Return the arctangent of 440.Fa y Ns / Ns Fa x 441in radians. 442.It Fn cos x 443Return the cosine of 444.Fa x , 445where 446.Fa x 447is in radians. 448.It Fn exp x 449Return the exponential of 450.Fa x . 451.It Fn int x 452Return 453.Fa x 454truncated to an integer value. 455.It Fn log x 456Return the natural logarithm of 457.Fa x . 458.It Fn rand 459Return a random number, 460.Fa n , 461such that 462.Sm off 463.Pf 0 \*(Le Fa n No \*(Lt 1 . 464.Sm on 465.It Fn sin x 466Return the sine of 467.Fa x , 468where 469.Fa x 470is in radians. 471.It Fn sqrt x 472Return the square root of 473.Fa x . 474.It Fn srand expr 475Sets seed for 476.Fn rand 477to 478.Fa expr 479and returns the previous seed. 480If 481.Fa expr 482is omitted, the time of day is used instead. 483.El 484.Ss String Functions 485.Bl -tag -width "split(s, a, fs)" 486.It Fn gsub r t s 487The same as 488.Fn sub 489except that all occurrences of the regular expression are replaced. 490.Fn gsub 491returns the number of replacements. 492.It Fn index s t 493The position in 494.Fa s 495where the string 496.Fa t 497occurs, or 0 if it does not. 498.It Fn length s 499The length of 500.Fa s 501taken as a string, 502or of 503.Va $0 504if no argument is given. 505.It Fn match s r 506The position in 507.Fa s 508where the regular expression 509.Fa r 510occurs, or 0 if it does not. 511The variable 512.Va RSTART 513is set to the starting position of the matched string 514.Pq which is the same as the returned value 515or zero if no match is found. 516The variable 517.Va RLENGTH 518is set to the length of the matched string, 519or \-1 if no match is found. 520.It Fn split s a fs 521Splits the string 522.Fa s 523into array elements 524.Va a[1] , a[2] , ... , a[n] 525and returns 526.Va n . 527The separation is done with the regular expression 528.Ar fs 529or with the field separator 530.Va FS 531if 532.Ar fs 533is not given. 534An empty string as field separator splits the string 535into one array element per character. 536.It Fn sprintf fmt expr ... 537The string resulting from formatting 538.Fa expr , ... 539according to the 540.Xr printf 1 541format 542.Fa fmt . 543.It Fn sub r t s 544Substitutes 545.Fa t 546for the first occurrence of the regular expression 547.Fa r 548in the string 549.Fa s . 550If 551.Fa s 552is not given, 553.Va $0 554is used. 555An ampersand 556.Pq Sq & 557in 558.Fa t 559is replaced in string 560.Fa s 561with regular expression 562.Fa r . 563A literal ampersand can be specified by preceding it with two backslashes 564.Pq Sq \e\e . 565A literal backslash can be specified by preceding it with another backslash 566.Pq Sq \e\e . 567.Fn sub 568returns the number of replacements. 569.It Fn substr s m n 570Return at most the 571.Fa n Ns -character 572substring of 573.Fa s 574that begins at position 575.Fa m 576counted from 1. 577If 578.Fa n 579is omitted, or if 580.Fa n 581specifies more characters than are left in the string, 582the length of the substring is limited by the length of 583.Fa s . 584.It Fn tolower str 585Returns a copy of 586.Fa str 587with all upper-case characters translated to their 588corresponding lower-case equivalents. 589.It Fn toupper str 590Returns a copy of 591.Fa str 592with all lower-case characters translated to their 593corresponding upper-case equivalents. 594.El 595.Ss Input/Output and General Functions 596.Bl -tag -width "getline [var] < file" 597.It Fn close expr 598Closes the file or pipe 599.Fa expr . 600.Fa expr 601should match the string that was used to open the file or pipe. 602.It Ar cmd | Ic getline Op Va var 603Read a record of input from a stream piped from the output of 604.Ar cmd . 605If 606.Va var 607is omitted, the variables 608.Va $0 609and 610.Va NF 611are set. 612Otherwise 613.Va var 614is set. 615If the stream is not open, it is opened. 616As long as the stream remains open, subsequent calls 617will read subsequent records from the stream. 618The stream remains open until explicitly closed with a call to 619.Fn close . 620.Ic getline 621returns 1 for a successful input, 0 for end of file, and \-1 for an error. 622.It Fn fflush [expr] 623Flushes any buffered output for the file or pipe 624.Fa expr , 625or all open files or pipes if 626.Fa expr 627is omitted. 628.Fa expr 629should match the string that was used to open the file or pipe. 630.It Ic getline 631Sets 632.Va $0 633to the next input record from the current input file. 634This form of 635.Ic getline 636sets the variables 637.Va NF , 638.Va NR , 639and 640.Va FNR . 641.Ic getline 642returns 1 for a successful input, 0 for end of file, and \-1 for an error. 643.It Ic getline Va var 644Sets 645.Va $0 646to variable 647.Va var . 648This form of 649.Ic getline 650sets the variables 651.Va NR 652and 653.Va FNR . 654.Ic getline 655returns 1 for a successful input, 0 for end of file, and \-1 for an error. 656.It Xo 657.Ic getline Op Va var 658.Pf \ \&< Ar file 659.Xc 660Sets 661.Va $0 662to the next record from 663.Ar file . 664If 665.Va var 666is omitted, the variables 667.Va $0 668and 669.Va NF 670are set. 671Otherwise 672.Va var 673is set. 674If 675.Ar file 676is not open, it is opened. 677As long as the stream remains open, subsequent calls will read subsequent 678records from 679.Ar file . 680.Ar file 681remains open until explicitly closed with a call to 682.Fn close . 683.It Fn system cmd 684Executes 685.Fa cmd 686and returns its exit status. 687.El 688.Ss Bit-Operation Functions 689.Bl -tag -width "lshift(a, b)" 690.It Fn compl x 691Returns the bitwise complement of integer argument x. 692.It Fn and x y 693Performs a bitwise AND on integer arguments x and y. 694.It Fn or x y 695Performs a bitwise OR on integer arguments x and y. 696.It Fn xor x y 697Performs a bitwise Exclusive-OR on integer arguments x and y. 698.It Fn lshift x n 699Returns integer argument x shifted by n bits to the left. 700.It Fn rshift x n 701Returns integer argument x shifted by n bits to the right. 702.El 703.Sh EXIT STATUS 704.Ex -std awk 705.Pp 706But note that the 707.Ic exit 708expression can modify the exit status. 709.Sh EXAMPLES 710Print lines longer than 72 characters: 711.Pp 712.Dl length($0) > 72 713.Pp 714Print first two fields in opposite order: 715.Pp 716.Dl { print $2, $1 } 717.Pp 718Same, with input fields separated by comma and/or blanks and tabs: 719.Bd -literal -offset indent 720BEGIN { FS = ",[ \et]*|[ \et]+" } 721 { print $2, $1 } 722.Ed 723.Pp 724Add up first column, print sum and average: 725.Bd -literal -offset indent 726{ s += $1 } 727END { print "sum is", s, " average is", s/NR } 728.Ed 729.Pp 730Print all lines between start/stop pairs: 731.Pp 732.Dl /start/, /stop/ 733.Pp 734Simulate 735.Xr echo 1 : 736.Bd -literal -offset indent 737BEGIN { # Simulate echo(1) 738 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 739 printf "\en" 740 exit } 741.Ed 742.Pp 743Print an error message to standard error: 744.Bd -literal -offset indent 745{ print "error!" > "/dev/stderr" } 746.Ed 747.Sh SEE ALSO 748.Xr cut 1 , 749.Xr lex 1 , 750.Xr printf 1 , 751.Xr sed 1 , 752.Xr re_format 7 , 753.Xr script 7 754.Rs 755.%A A. V. Aho 756.%A B. W. Kernighan 757.%A P. J. Weinberger 758.%T The AWK Programming Language 759.%I Addison-Wesley 760.%D 1988 761.%O ISBN 0-201-07981-X 762.Re 763.Sh STANDARDS 764The 765.Nm 766utility is compliant with the 767.St -p1003.1-2008 768specification, 769except 770.Nm 771does not support {n,m} pattern matching. 772.Pp 773The flags 774.Op Fl \&dV 775and 776.Op Fl safe , 777as well as the commands 778.Cm fflush , compl , and , or , 779.Cm xor , lshift , rshift , 780are extensions to that specification. 781.Sh HISTORY 782An 783.Nm 784utility appeared in 785.At v7 . 786.Sh BUGS 787There are no explicit conversions between numbers and strings. 788To force an expression to be treated as a number add 0 to it; 789to force it to be treated as a string concatenate 790.Li \&"" 791to it. 792.Pp 793The scope rules for variables in functions are a botch; 794the syntax is worse. 795