1.\" $OpenBSD: awk.1,v 1.46 2020/01/22 03:47:38 deraadt Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.Dd $Mdocdate: January 22 2020 $ 26.Dt AWK 1 27.Os 28.Sh NAME 29.Nm awk 30.Nd pattern-directed scanning and processing language 31.Sh SYNOPSIS 32.Nm awk 33.Op Fl safe 34.Op Fl V 35.Op Fl d Ns Op Ar n 36.Op Fl F Ar fs 37.Op Fl v Ar var Ns = Ns Ar value 38.Op Ar prog | Fl f Ar progfile 39.Ar 40.Sh DESCRIPTION 41.Nm 42scans each input 43.Ar file 44for lines that match any of a set of patterns specified literally in 45.Ar prog 46or in one or more files specified as 47.Fl f Ar progfile . 48With each pattern there can be an associated action that will be performed 49when a line of a 50.Ar file 51matches the pattern. 52Each line is matched against the 53pattern portion of every pattern-action statement; 54the associated action is performed for each matched pattern. 55The file name 56.Sq - 57means the standard input. 58Any 59.Ar file 60of the form 61.Ar var Ns = Ns Ar value 62is treated as an assignment, not a filename, 63and is executed at the time it would have been opened if it were a filename. 64.Pp 65The options are as follows: 66.Bl -tag -width "-safe " 67.It Fl d Ns Op Ar n 68Debug mode. 69Set debug level to 70.Ar n , 71or 1 if 72.Ar n 73is not specified. 74A value greater than 1 causes 75.Nm 76to dump core on fatal errors. 77.It Fl F Ar fs 78Define the input field separator to be the regular expression 79.Ar fs . 80.It Fl f Ar progfile 81Read program code from the specified file 82.Ar progfile 83instead of from the command line. 84.It Fl safe 85Disable file output 86.Pf ( Ic print No > , 87.Ic print No >> ) , 88process creation 89.Po 90.Ar cmd | Ic getline , 91.Ic print | , 92.Ic system 93.Pc 94and access to the environment 95.Pf ( Va ENVIRON ; 96see the section on variables below). 97This is a first 98.Pq and not very reliable 99approximation to a 100.Dq safe 101version of 102.Nm . 103.It Fl V 104Print the version number of 105.Nm 106to standard output and exit. 107.It Fl v Ar var Ns = Ns Ar value 108Assign 109.Ar value 110to variable 111.Ar var 112before 113.Ar prog 114is executed; 115any number of 116.Fl v 117options may be present. 118.El 119.Pp 120The input is normally made up of input lines 121.Pq records 122separated by newlines, or by the value of 123.Va RS . 124If 125.Va RS 126is null, then any number of blank lines are used as the record separator, 127and newlines are used as field separators 128(in addition to the value of 129.Va FS ) . 130This is convenient when working with multi-line records. 131.Pp 132An input line is normally made up of fields separated by whitespace, 133or by the regular expression 134.Va FS . 135The fields are denoted 136.Va $1 , $2 , ... , 137while 138.Va $0 139refers to the entire line. 140If 141.Va FS 142is null, the input line is split into one field per character. 143.Pp 144Normally, any number of blanks separate fields. 145In order to set the field separator to a single blank, use the 146.Fl F 147option with a value of 148.Sq [\ \&] . 149If a field separator of 150.Sq t 151is specified, 152.Nm 153treats it as if 154.Sq \et 155had been specified and uses 156.Aq TAB 157as the field separator. 158In order to use a literal 159.Sq t 160as the field separator, use the 161.Fl F 162option with a value of 163.Sq [t] . 164.Pp 165A pattern-action statement has the form 166.Pp 167.D1 Ar pattern Ic \&{ Ar action Ic \&} 168.Pp 169A missing 170.Ic \&{ Ar action Ic \&} 171means print the line; 172a missing pattern always matches. 173Pattern-action statements are separated by newlines or semicolons. 174.Pp 175Newlines are permitted after a terminating statement or following a comma 176.Pq Sq ,\& , 177an open brace 178.Pq Sq { , 179a logical AND 180.Pq Sq && , 181a logical OR 182.Pq Sq || , 183after the 184.Sq do 185or 186.Sq else 187keywords, 188or after the closing parenthesis of an 189.Sq if , 190.Sq for , 191or 192.Sq while 193statement. 194Additionally, a backslash 195.Pq Sq \e 196can be used to escape a newline between tokens. 197.Pp 198An action is a sequence of statements. 199A statement can be one of the following: 200.Pp 201.Bl -tag -width Ds -offset indent -compact 202.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement 203.It Ic while Ar ( expression ) Ar statement 204.It Ic for Ar ( expression ; expression ; expression ) statement 205.It Ic for Ar ( var Ic in Ar array ) statement 206.It Ic do Ar statement Ic while Ar ( expression ) 207.It Ic break 208.It Ic continue 209.It Xo Ic { 210.Op Ar statement ... 211.Ic } 212.Xc 213.It Xo Ar expression 214.No # commonly 215.Ar var No = Ar expression 216.Xc 217.It Xo Ic print 218.Op Ar expression-list 219.Op > Ns Ar expression 220.Xc 221.It Xo Ic printf Ar format 222.Op Ar ... , expression-list 223.Op > Ns Ar expression 224.Xc 225.It Ic return Op Ar expression 226.It Xo Ic next 227.No # skip remaining patterns on this input line 228.Xc 229.It Xo Ic nextfile 230.No # skip rest of this file, open next, start at top 231.Xc 232.It Xo Ic delete 233.Sm off 234.Ar array Ic \&[ Ar expression Ic \&] 235.Sm on 236.No # delete an array element 237.Xc 238.It Xo Ic delete Ar array 239.No # delete all elements of array 240.Xc 241.It Xo Ic exit 242.Op Ar expression 243.No # exit processing, and perform 244.Ic END 245processing; status is 246.Ar expression 247.Xc 248.El 249.Pp 250Statements are terminated by 251semicolons, newlines or right braces. 252An empty 253.Ar expression-list 254stands for 255.Ar $0 . 256String constants are quoted 257.Li \&"" , 258with the usual C escapes recognized within 259(see 260.Xr printf 1 261for a complete list of these). 262Expressions take on string or numeric values as appropriate, 263and are built using the operators 264.Ic + \- * / % ^ 265.Pq exponentiation , 266and concatenation 267.Pq indicated by whitespace . 268The operators 269.Ic \&! ++ \-\- += \-= *= /= %= ^= 270.Ic > >= < <= == != ?: 271are also available in expressions. 272Variables may be scalars, array elements 273(denoted 274.Li x[i] ) 275or fields. 276Variables are initialized to the null string. 277Array subscripts may be any string, 278not necessarily numeric; 279this allows for a form of associative memory. 280Multiple subscripts such as 281.Li [i,j,k] 282are permitted; the constituents are concatenated, 283separated by the value of 284.Va SUBSEP 285.Pq see the section on variables below . 286.Pp 287The 288.Ic print 289statement prints its arguments on the standard output 290(or on a file if 291.Pf > Ar file 292or 293.Pf >> Ar file 294is present or on a pipe if 295.Pf |\ \& Ar cmd 296is present), separated by the current output field separator, 297and terminated by the output record separator. 298.Ar file 299and 300.Ar cmd 301may be literal names or parenthesized expressions; 302identical string values in different statements denote 303the same open file. 304The 305.Ic printf 306statement formats its expression list according to the format 307(see 308.Xr printf 1 ) . 309.Pp 310Patterns are arbitrary Boolean combinations 311(with 312.Ic "\&! || &&" ) 313of regular expressions and 314relational expressions. 315.Nm 316supports extended regular expressions 317.Pq EREs . 318See 319.Xr re_format 7 320for more information on regular expressions. 321Isolated regular expressions 322in a pattern apply to the entire line. 323Regular expressions may also occur in 324relational expressions, using the operators 325.Ic ~ 326and 327.Ic !~ . 328.Pf / Ar re Ns / 329is a constant regular expression; 330any string (constant or variable) may be used 331as a regular expression, except in the position of an isolated regular expression 332in a pattern. 333.Pp 334A pattern may consist of two patterns separated by a comma; 335in this case, the action is performed for all lines 336from an occurrence of the first pattern 337through an occurrence of the second. 338.Pp 339A relational expression is one of the following: 340.Pp 341.Bl -tag -width Ds -offset indent -compact 342.It Ar expression matchop regular-expression 343.It Ar expression relop expression 344.It Ar expression Ic in Ar array-name 345.It Xo Ic \&( Ns 346.Ar expr , expr , \&... Ns Ic \&) in 347.Ar array-name 348.Xc 349.El 350.Pp 351where a 352.Ar relop 353is any of the six relational operators in C, and a 354.Ar matchop 355is either 356.Ic ~ 357(matches) 358or 359.Ic !~ 360(does not match). 361A conditional is an arithmetic expression, 362a relational expression, 363or a Boolean combination 364of these. 365.Pp 366The special pattern 367.Ic BEGIN 368may be used to capture control before the first input line is read. 369The special pattern 370.Ic END 371may be used to capture control after processing is finished. 372.Ic BEGIN 373and 374.Ic END 375do not combine with other patterns. 376.Pp 377Variable names with special meanings: 378.Pp 379.Bl -tag -width "FILENAME " -compact 380.It Va ARGC 381Argument count, assignable. 382.It Va ARGV 383Argument array, assignable; 384non-null members are taken as filenames. 385.It Va CONVFMT 386Conversion format when converting numbers 387(default 388.Qq Li %.6g ) . 389.It Va ENVIRON 390Array of environment variables; subscripts are names. 391.It Va FILENAME 392The name of the current input file. 393.It Va FNR 394Ordinal number of the current record in the current file. 395.It Va FS 396Regular expression used to separate fields; also settable 397by option 398.Fl F Ar fs . 399.It Va NF 400Number of fields in the current record. 401.Va $NF 402can be used to obtain the value of the last field in the current record. 403.It Va NR 404Ordinal number of the current record. 405.It Va OFMT 406Output format for numbers (default 407.Qq Li %.6g ) . 408.It Va OFS 409Output field separator (default blank). 410.It Va ORS 411Output record separator (default newline). 412.It Va RLENGTH 413The length of the string matched by the 414.Fn match 415function. 416.It Va RS 417Input record separator (default newline). 418.It Va RSTART 419The starting position of the string matched by the 420.Fn match 421function. 422.It Va SUBSEP 423Separates multiple subscripts (default 034). 424.El 425.Sh FUNCTIONS 426The awk language has a variety of built-in functions: 427arithmetic, string, input/output, general, and bit-operation. 428.Pp 429Functions may be defined (at the position of a pattern-action statement) 430thusly: 431.Pp 432.Dl function foo(a, b, c) { ...; return x } 433.Pp 434Parameters are passed by value if scalar, and by reference if array name; 435functions may be called recursively. 436Parameters are local to the function; all other variables are global. 437Thus local variables may be created by providing excess parameters in 438the function definition. 439.Ss Arithmetic Functions 440.Bl -tag -width "atan2(y, x)" 441.It Fn atan2 y x 442Return the arctangent of 443.Fa y Ns / Ns Fa x 444in radians. 445.It Fn cos x 446Return the cosine of 447.Fa x , 448where 449.Fa x 450is in radians. 451.It Fn exp x 452Return the exponential of 453.Fa x . 454.It Fn int x 455Return 456.Fa x 457truncated to an integer value. 458.It Fn log x 459Return the natural logarithm of 460.Fa x . 461.It Fn rand 462Return a random number, 463.Fa n , 464such that 465.Sm off 466.Pf 0 \*(Le Fa n No \*(Lt 1 . 467.Sm on 468.It Fn sin x 469Return the sine of 470.Fa x , 471where 472.Fa x 473is in radians. 474.It Fn sqrt x 475Return the square root of 476.Fa x . 477.It Fn srand expr 478Sets seed for 479.Fn rand 480to 481.Fa expr 482and returns the previous seed. 483If 484.Fa expr 485is omitted, the time of day is used instead. 486.El 487.Ss String Functions 488.Bl -tag -width "split(s, a, fs)" 489.It Fn gsub r t s 490The same as 491.Fn sub 492except that all occurrences of the regular expression are replaced. 493.Fn gsub 494returns the number of replacements. 495.It Fn index s t 496The position in 497.Fa s 498where the string 499.Fa t 500occurs, or 0 if it does not. 501.It Fn length s 502The length of 503.Fa s 504taken as a string, 505or of 506.Va $0 507if no argument is given. 508.It Fn match s r 509The position in 510.Fa s 511where the regular expression 512.Fa r 513occurs, or 0 if it does not. 514The variable 515.Va RSTART 516is set to the starting position of the matched string 517.Pq which is the same as the returned value 518or zero if no match is found. 519The variable 520.Va RLENGTH 521is set to the length of the matched string, 522or \-1 if no match is found. 523.It Fn split s a fs 524Splits the string 525.Fa s 526into array elements 527.Va a[1] , a[2] , ... , a[n] 528and returns 529.Va n . 530The separation is done with the regular expression 531.Ar fs 532or with the field separator 533.Va FS 534if 535.Ar fs 536is not given. 537An empty string as field separator splits the string 538into one array element per character. 539.It Fn sprintf fmt expr ... 540The string resulting from formatting 541.Fa expr , ... 542according to the 543.Xr printf 1 544format 545.Fa fmt . 546.It Fn sub r t s 547Substitutes 548.Fa t 549for the first occurrence of the regular expression 550.Fa r 551in the string 552.Fa s . 553If 554.Fa s 555is not given, 556.Va $0 557is used. 558An ampersand 559.Pq Sq & 560in 561.Fa t 562is replaced in string 563.Fa s 564with regular expression 565.Fa r . 566A literal ampersand can be specified by preceding it with two backslashes 567.Pq Sq \e\e . 568A literal backslash can be specified by preceding it with another backslash 569.Pq Sq \e\e . 570.Fn sub 571returns the number of replacements. 572.It Fn substr s m n 573Return at most the 574.Fa n Ns -character 575substring of 576.Fa s 577that begins at position 578.Fa m 579counted from 1. 580If 581.Fa n 582is omitted, or if 583.Fa n 584specifies more characters than are left in the string, 585the length of the substring is limited by the length of 586.Fa s . 587.It Fn tolower str 588Returns a copy of 589.Fa str 590with all upper-case characters translated to their 591corresponding lower-case equivalents. 592.It Fn toupper str 593Returns a copy of 594.Fa str 595with all lower-case characters translated to their 596corresponding upper-case equivalents. 597.El 598.Ss Input/Output and General Functions 599.Bl -tag -width "getline [var] < file" 600.It Fn close expr 601Closes the file or pipe 602.Fa expr . 603.Fa expr 604should match the string that was used to open the file or pipe. 605.It Ar cmd | Ic getline Op Va var 606Read a record of input from a stream piped from the output of 607.Ar cmd . 608If 609.Va var 610is omitted, the variables 611.Va $0 612and 613.Va NF 614are set. 615Otherwise 616.Va var 617is set. 618If the stream is not open, it is opened. 619As long as the stream remains open, subsequent calls 620will read subsequent records from the stream. 621The stream remains open until explicitly closed with a call to 622.Fn close . 623.Ic getline 624returns 1 for a successful input, 0 for end of file, and \-1 for an error. 625.It Fn fflush [expr] 626Flushes any buffered output for the file or pipe 627.Fa expr , 628or all open files or pipes if 629.Fa expr 630is omitted. 631.Fa expr 632should match the string that was used to open the file or pipe. 633.It Ic getline 634Sets 635.Va $0 636to the next input record from the current input file. 637This form of 638.Ic getline 639sets the variables 640.Va NF , 641.Va NR , 642and 643.Va FNR . 644.Ic getline 645returns 1 for a successful input, 0 for end of file, and \-1 for an error. 646.It Ic getline Va var 647Sets 648.Va $0 649to variable 650.Va var . 651This form of 652.Ic getline 653sets the variables 654.Va NR 655and 656.Va FNR . 657.Ic getline 658returns 1 for a successful input, 0 for end of file, and \-1 for an error. 659.It Xo 660.Ic getline Op Va var 661.Pf \ \&< Ar file 662.Xc 663Sets 664.Va $0 665to the next record from 666.Ar file . 667If 668.Va var 669is omitted, the variables 670.Va $0 671and 672.Va NF 673are set. 674Otherwise 675.Va var 676is set. 677If 678.Ar file 679is not open, it is opened. 680As long as the stream remains open, subsequent calls will read subsequent 681records from 682.Ar file . 683.Ar file 684remains open until explicitly closed with a call to 685.Fn close . 686.It Fn system cmd 687Executes 688.Fa cmd 689and returns its exit status. 690.El 691.Ss Bit-Operation Functions 692.Bl -tag -width "lshift(a, b)" 693.It Fn compl x 694Returns the bitwise complement of integer argument x. 695.It Fn and x y 696Performs a bitwise AND on integer arguments x and y. 697.It Fn or x y 698Performs a bitwise OR on integer arguments x and y. 699.It Fn xor x y 700Performs a bitwise Exclusive-OR on integer arguments x and y. 701.It Fn lshift x n 702Returns integer argument x shifted by n bits to the left. 703.It Fn rshift x n 704Returns integer argument x shifted by n bits to the right. 705.El 706.Sh EXIT STATUS 707.Ex -std awk 708.Pp 709But note that the 710.Ic exit 711expression can modify the exit status. 712.Sh EXAMPLES 713Print lines longer than 72 characters: 714.Pp 715.Dl length($0) > 72 716.Pp 717Print first two fields in opposite order: 718.Pp 719.Dl { print $2, $1 } 720.Pp 721Same, with input fields separated by comma and/or blanks and tabs: 722.Bd -literal -offset indent 723BEGIN { FS = ",[ \et]*|[ \et]+" } 724 { print $2, $1 } 725.Ed 726.Pp 727Add up first column, print sum and average: 728.Bd -literal -offset indent 729{ s += $1 } 730END { print "sum is", s, " average is", s/NR } 731.Ed 732.Pp 733Print all lines between start/stop pairs: 734.Pp 735.Dl /start/, /stop/ 736.Pp 737Simulate 738.Xr echo 1 : 739.Bd -literal -offset indent 740BEGIN { # Simulate echo(1) 741 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 742 printf "\en" 743 exit } 744.Ed 745.Pp 746Print an error message to standard error: 747.Bd -literal -offset indent 748{ print "error!" > "/dev/stderr" } 749.Ed 750.Sh SEE ALSO 751.Xr cut 1 , 752.Xr lex 1 , 753.Xr printf 1 , 754.Xr sed 1 , 755.Xr re_format 7 , 756.Xr script 7 757.Rs 758.%A A. V. Aho 759.%A B. W. Kernighan 760.%A P. J. Weinberger 761.%T The AWK Programming Language 762.%I Addison-Wesley 763.%D 1988 764.%O ISBN 0-201-07981-X 765.Re 766.Sh STANDARDS 767The 768.Nm 769utility is compliant with the 770.St -p1003.1-2008 771specification, 772except 773.Nm 774does not support {n,m} pattern matching. 775.Pp 776The flags 777.Op Fl \&dV 778and 779.Op Fl safe , 780as well as the commands 781.Cm fflush , compl , and , or , 782.Cm xor , lshift , rshift , 783are extensions to that specification. 784.Sh HISTORY 785An 786.Nm 787utility appeared in 788.At v7 . 789.Sh BUGS 790There are no explicit conversions between numbers and strings. 791To force an expression to be treated as a number add 0 to it; 792to force it to be treated as a string concatenate 793.Li \&"" 794to it. 795.Pp 796The scope rules for variables in functions are a botch; 797the syntax is worse. 798