1.\" $OpenBSD: awk.1,v 1.33 2009/02/08 17:15:09 jmc Exp $ 2.\" EX/EE is a Bd 3.\" 4.\" Copyright (C) Lucent Technologies 1997 5.\" All Rights Reserved 6.\" 7.\" Permission to use, copy, modify, and distribute this software and 8.\" its documentation for any purpose and without fee is hereby 9.\" granted, provided that the above copyright notice appear in all 10.\" copies and that both that the copyright notice and this 11.\" permission notice and warranty disclaimer appear in supporting 12.\" documentation, and that the name Lucent Technologies or any of 13.\" its entities not be used in advertising or publicity pertaining 14.\" to distribution of the software without specific, written prior 15.\" permission. 16.\" 17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 24.\" THIS SOFTWARE. 25.\" 26.Dd $Mdocdate: February 8 2009 $ 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl safe 35.Op Fl V 36.Op Fl d Ns Op Ar n 37.Op Fl F Ar fs 38.Oo Fl v Ar var Ns = 39.Ns Ar value Oc 40.Op Ar prog | Fl f Ar progfile 41.Ar 42.Nm nawk 43.Ar ... 44.Sh DESCRIPTION 45.Nm 46scans each input 47.Ar file 48for lines that match any of a set of patterns specified literally in 49.Ar prog 50or in one or more files specified as 51.Fl f Ar progfile . 52With each pattern there can be an associated action that will be performed 53when a line of a 54.Ar file 55matches the pattern. 56Each line is matched against the 57pattern portion of every pattern-action statement; 58the associated action is performed for each matched pattern. 59The file name 60.Sq - 61means the standard input. 62Any 63.Ar file 64of the form 65.Ar var Ns = Ns Ar value 66is treated as an assignment, not a filename, 67and is executed at the time it would have been opened if it were a filename. 68.Pp 69The options are as follows: 70.Bl -tag -width "-safe " 71.It Fl d Ns Op Ar n 72Debug mode. 73Set debug level to 74.Ar n , 75or 1 if 76.Ar n 77is not specified. 78A value greater than 1 causes 79.Nm 80to dump core on fatal errors. 81.It Fl F Ar fs 82Define the input field separator to be the regular expression 83.Ar fs . 84.It Fl f Ar progfile 85Read program code from the specified file 86.Ar progfile 87instead of from the command line. 88.It Fl safe 89Disable file output 90.Pf ( Ic print No > , 91.Ic print No >> ) , 92process creation 93.Po 94.Ar cmd | Ic getline , 95.Ic print No \&| , 96.Ic system 97.Pc 98and access to the environment 99.Pf ( Va ENVIRON ; 100see the section on variables below). 101This is a first 102.Pq and not very reliable 103approximation to a 104.Dq safe 105version of 106.Nm . 107.It Fl V 108Print the version number of 109.Nm 110to standard output and exit. 111.It Fl v Ar var Ns = Ns Ar value 112Assign 113.Ar value 114to variable 115.Ar var 116before 117.Ar prog 118is executed; 119any number of 120.Fl v 121options may be present. 122.El 123.Pp 124The input is normally made up of input lines 125.Pq records 126separated by newlines, or by the value of 127.Va RS . 128If 129.Va RS 130is null, then any number of blank lines are used as the record separator, 131and newlines are used as field separators 132(in addition to the value of 133.Va FS ) . 134This is convenient when working with multi-line records. 135.Pp 136An input line is normally made up of fields separated by whitespace, 137or by the regular expression 138.Va FS . 139The fields are denoted 140.Va $1 , $2 , ... , 141while 142.Va $0 143refers to the entire line. 144If 145.Va FS 146is null, the input line is split into one field per character. 147.Pp 148Normally, any number of blanks separate fields. 149In order to set the field separator to a single blank, use the 150.Fl F 151option with a value of 152.Sq [\ \&] . 153If a field separator of 154.Sq t 155is specified, 156.Nm 157treats it as if 158.Sq \et 159had been specified and uses 160.Aq TAB 161as the field separator. 162In order to use a literal 163.Sq t 164as the field separator, use the 165.Fl F 166option with a value of 167.Sq [t] . 168.Pp 169A pattern-action statement has the form 170.Pp 171.D1 Ar pattern Ic \&{ Ar action Ic \&} 172.Pp 173A missing 174.Ic \&{ Ar action Ic \&} 175means print the line; 176a missing pattern always matches. 177Pattern-action statements are separated by newlines or semicolons. 178.Pp 179Newlines are permitted after a terminating statement or following a comma 180.Pq Sq ,\& , 181an open brace 182.Pq Sq { , 183a logical AND 184.Pq Sq && , 185a logical OR 186.Pq Sq || , 187after the 188.Sq do 189or 190.Sq else 191keywords, 192or after the closing parenthesis of an 193.Sq if , 194.Sq for , 195or 196.Sq while 197statement. 198Additionally, a backslash 199.Pq Sq \e 200can be used to escape a newline between tokens. 201.Pp 202An action is a sequence of statements. 203A statement can be one of the following: 204.Bd -unfilled -offset indent 205.Ic if ( Xo 206.Ar expression ) statement \& 207.Op Ic else Ar statement 208.Xc 209.Ic while ( Ar expression ) statement 210.Ic for ( Xo 211.Ar expression ; expression ; expression ) statement 212.Xc 213.Ic for ( Xo 214.Ar var Ic in Ar array ) statement 215.Xc 216.Ic do Ar statement Ic while ( Ar expression ) 217.Ic break 218.Ic continue 219.Ic { Oo Ar statement ... Oc Ic \& } 220.Ar expression Xo 221.No "# commonly" \& 222.Ar var Ic = Ar expression 223.Xc 224.Ic print Xo 225.Op Ar expression-list 226.Op > Ns Ar expression 227.Xc 228.Ic printf Ar format Xo 229.Op Ar ... , expression-list 230.Op > Ns Ar expression 231.Xc 232.Ic return Op Ar expression 233.Ic next Xo 234.No "# skip remaining patterns on this input line" 235.Xc 236.Ic nextfile Xo 237.No "# skip rest of this file, open next, start at top" 238.Xc 239.Ic delete Ar array Ns Xo 240.Ic \&[ Ns Ar expression Ns Ic \&] 241.No \& "# delete an array element" 242.Xc 243.Ic delete Ar array Xo 244.No "# delete all elements of array" 245.Xc 246.Ic exit Xo 247.Op Ar expression 248.No \& "# exit immediately; status is" Ar expression 249.Xc 250.Ed 251.Pp 252Statements are terminated by 253semicolons, newlines or right braces. 254An empty 255.Ar expression-list 256stands for 257.Ar $0 . 258String constants are quoted 259.Li \&"" , 260with the usual C escapes recognized within 261(see 262.Xr printf 1 263for a complete list of these). 264Expressions take on string or numeric values as appropriate, 265and are built using the operators 266.Ic + \- * / % ^ 267.Pq exponentiation , 268and concatenation 269.Pq indicated by whitespace . 270The operators 271.Ic \&! ++ \-\- += \-= *= /= %= ^= 272.Ic > >= < <= == != ?: 273are also available in expressions. 274Variables may be scalars, array elements 275(denoted 276.Li x[i] ) 277or fields. 278Variables are initialized to the null string. 279Array subscripts may be any string, 280not necessarily numeric; 281this allows for a form of associative memory. 282Multiple subscripts such as 283.Li [i,j,k] 284are permitted; the constituents are concatenated, 285separated by the value of 286.Va SUBSEP 287.Pq see the section on variables below . 288.Pp 289The 290.Ic print 291statement prints its arguments on the standard output 292(or on a file if 293.Pf > Ns Ar file 294or 295.Pf >> Ns Ar file 296is present or on a pipe if 297.Pf |\ \& Ar cmd 298is present), separated by the current output field separator, 299and terminated by the output record separator. 300.Ar file 301and 302.Ar cmd 303may be literal names or parenthesized expressions; 304identical string values in different statements denote 305the same open file. 306The 307.Ic printf 308statement formats its expression list according to the format 309(see 310.Xr printf 1 ) . 311.Pp 312Patterns are arbitrary Boolean combinations 313(with 314.Ic "\&! || &&" ) 315of regular expressions and 316relational expressions. 317.Nm 318supports extended regular expressions 319.Pq EREs . 320See 321.Xr re_format 7 322for more information on regular expressions. 323Isolated regular expressions 324in a pattern apply to the entire line. 325Regular expressions may also occur in 326relational expressions, using the operators 327.Ic ~ 328and 329.Ic !~ . 330.Pf / Ns Ar re Ns / 331is a constant regular expression; 332any string (constant or variable) may be used 333as a regular expression, except in the position of an isolated regular expression 334in a pattern. 335.Pp 336A pattern may consist of two patterns separated by a comma; 337in this case, the action is performed for all lines 338from an occurrence of the first pattern 339through an occurrence of the second. 340.Pp 341A relational expression is one of the following: 342.Bd -unfilled -offset indent 343.Ar expression matchop regular-expression 344.Ar expression relop expression 345.Ar expression Ic in Ar array-name 346.Ic \&( Ns Xo 347.Ar expr , expr , \&... Ns Ic \&) in 348.Ar \& array-name 349.Xc 350.Ed 351.Pp 352where a 353.Ar relop 354is any of the six relational operators in C, and a 355.Ar matchop 356is either 357.Ic ~ 358(matches) 359or 360.Ic !~ 361(does not match). 362A conditional is an arithmetic expression, 363a relational expression, 364or a Boolean combination 365of these. 366.Pp 367The special patterns 368.Ic BEGIN 369and 370.Ic END 371may be used to capture control before the first input line is read 372and after the last. 373.Ic BEGIN 374and 375.Ic END 376do not combine with other patterns. 377.Pp 378Variable names with special meanings: 379.Pp 380.Bl -tag -width "FILENAME " -compact 381.It Va ARGC 382Argument count, assignable. 383.It Va ARGV 384Argument array, assignable; 385non-null members are taken as filenames. 386.It Va CONVFMT 387Conversion format when converting numbers 388(default 389.Qq Li %.6g ) . 390.It Va ENVIRON 391Array of environment variables; subscripts are names. 392.It Va FILENAME 393The name of the current input file. 394.It Va FNR 395Ordinal number of the current record in the current file. 396.It Va FS 397Regular expression used to separate fields; also settable 398by option 399.Fl F Ar fs . 400.It Va NF 401Number of fields in the current record. 402.Va $NF 403can be used to obtain the value of the last field in the current record. 404.It Va NR 405Ordinal number of the current record. 406.It Va OFMT 407Output format for numbers (default 408.Qq Li %.6g ) . 409.It Va OFS 410Output field separator (default blank). 411.It Va ORS 412Output record separator (default newline). 413.It Va RLENGTH 414The length of the string matched by the 415.Fn match 416function. 417.It Va RS 418Input record separator (default newline). 419.It Va RSTART 420The starting position of the string matched by the 421.Fn match 422function. 423.It Va SUBSEP 424Separates multiple subscripts (default 034). 425.El 426.Sh FUNCTIONS 427The awk language has a variety of built-in functions: 428arithmetic, string, input/output, general, and bit-operation. 429.Pp 430Functions may be defined (at the position of a pattern-action statement) 431thusly: 432.Pp 433.Dl function foo(a, b, c) { ...; return x } 434.Pp 435Parameters are passed by value if scalar, and by reference if array name; 436functions may be called recursively. 437Parameters are local to the function; all other variables are global. 438Thus local variables may be created by providing excess parameters in 439the function definition. 440.Ss Arithmetic Functions 441.Bl -tag -width "atan2(y, x)" 442.It Fn atan2 y x 443Return the arctangent of 444.Fa y Ns / Ns Fa x 445in radians. 446.It Fn cos x 447Return the cosine of 448.Fa x , 449where 450.Fa x 451is in radians. 452.It Fn exp x 453Return the exponential of 454.Fa x . 455.It Fn int x 456Return 457.Fa x 458truncated to an integer value. 459.It Fn log x 460Return the natural logarithm of 461.Fa x . 462.It Fn rand 463Return a random number, 464.Fa n , 465such that 466.Sm off 467.Pf 0 \*(Le Fa n No \*(Lt 1 . 468.Sm on 469.It Fn sin x 470Return the sine of 471.Fa x , 472where 473.Fa x 474is in radians. 475.It Fn sqrt x 476Return the square root of 477.Fa x . 478.It Fn srand expr 479Sets seed for 480.Fn rand 481to 482.Fa expr 483and returns the previous seed. 484If 485.Fa expr 486is omitted, the time of day is used instead. 487.El 488.Ss String Functions 489.Bl -tag -width "split(s, a, fs)" 490.It Fn gsub r t s 491The same as 492.Fn sub 493except that all occurrences of the regular expression are replaced. 494.Fn gsub 495returns the number of replacements. 496.It Fn index s t 497The position in 498.Fa s 499where the string 500.Fa t 501occurs, or 0 if it does not. 502.It Fn length s 503The length of 504.Fa s 505taken as a string, 506or of 507.Va $0 508if no argument is given. 509.It Fn match s r 510The position in 511.Fa s 512where the regular expression 513.Fa r 514occurs, or 0 if it does not. 515The variable 516.Va RSTART 517is set to the starting position of the matched string 518.Pq which is the same as the returned value 519or zero if no match is found. 520The variable 521.Va RLENGTH 522is set to the length of the matched string, 523or \-1 if no match is found. 524.It Fn split s a fs 525Splits the string 526.Fa s 527into array elements 528.Va a[1] , a[2] , ... , a[n] 529and returns 530.Va n . 531The separation is done with the regular expression 532.Ar fs 533or with the field separator 534.Va FS 535if 536.Ar fs 537is not given. 538An empty string as field separator splits the string 539into one array element per character. 540.It Fn sprintf fmt expr ... 541The string resulting from formatting 542.Fa expr , ... 543according to the 544.Xr printf 1 545format 546.Fa fmt . 547.It Fn sub r t s 548Substitutes 549.Fa t 550for the first occurrence of the regular expression 551.Fa r 552in the string 553.Fa s . 554If 555.Fa s 556is not given, 557.Va $0 558is used. 559An ampersand 560.Pq Sq & 561in 562.Fa t 563is replaced in string 564.Fa s 565with regular expression 566.Fa r . 567A literal ampersand can be specified by preceding it with two backslashes 568.Pq Sq \e\e . 569A literal backslash can be specified by preceding it with another backslash 570.Pq Sq \e\e . 571.Fn sub 572returns the number of replacements. 573.It Fn substr s m n 574Return at most the 575.Fa n Ns -character 576substring of 577.Fa s 578that begins at position 579.Fa m 580counted from 1. 581If 582.Fa n 583is omitted, or if 584.Fa n 585specifies more characters than are left in the string, 586the length of the substring is limited by the length of 587.Fa s . 588.It Fn tolower str 589Returns a copy of 590.Fa str 591with all upper-case characters translated to their 592corresponding lower-case equivalents. 593.It Fn toupper str 594Returns a copy of 595.Fa str 596with all lower-case characters translated to their 597corresponding upper-case equivalents. 598.El 599.Ss Input/Output and General Functions 600.Bl -tag -width "getline [var] < file" 601.It Fn close expr 602Closes the file or pipe 603.Fa expr . 604.Fa expr 605should match the string that was used to open the file or pipe. 606.It Ar cmd | Ic getline Op Va var 607Read a record of input from a stream piped from the output of 608.Ar cmd . 609If 610.Va var 611is omitted, the variables 612.Va $0 613and 614.Va NF 615are set. 616Otherwise 617.Va var 618is set. 619If the stream is not open, it is opened. 620As long as the stream remains open, subsequent calls 621will read subsequent records from the stream. 622The stream remains open until explicitly closed with a call to 623.Fn close . 624.Ic getline 625returns 1 for a successful input, 0 for end of file, and \-1 for an error. 626.It Fn fflush [expr] 627Flushes any buffered output for the file, pipe 628.Fa expr , 629or all open files or pipes if 630.Fa expr 631is omitted. 632.Fa expr 633should match the string that was used to open the file or pipe. 634.It Ic getline 635Sets 636.Va $0 637to the next input record from the current input file. 638This form of 639.Ic getline 640sets the variables 641.Va NF , 642.Va NR , 643and 644.Va FNR . 645.Ic getline 646returns 1 for a successful input, 0 for end of file, and \-1 for an error. 647.It Ic getline Va var 648Sets 649.Va $0 650to variable 651.Va var . 652This form of 653.Ic getline 654sets the variables 655.Va NR 656and 657.Va FNR . 658.Ic getline 659returns 1 for a successful input, 0 for end of file, and \-1 for an error. 660.It Xo 661.Ic getline Op Va var 662.Pf \ \&< Ar file 663.Xc 664Sets 665.Va $0 666to the next record from 667.Ar file . 668If 669.Va var 670is omitted, the variables 671.Va $0 672and 673.Va NF 674are set. 675Otherwise 676.Va var 677is set. 678If 679.Ar file 680is not open, it is opened. 681As long as the stream remains open, subsequent calls will read subsequent 682records from 683.Ar file . 684.Ar file 685remains open until explicitly closed with a call to 686.Fn close . 687.It Fn system cmd 688Executes 689.Fa cmd 690and returns its exit status. 691.El 692.Ss Bit-Operation Functions 693.Bl -tag -width "lshift(a, b)" 694.It Fn compl x 695Returns the bitwise complement of integer argument x. 696.It Fn and x y 697Performs a bitwise AND on integer arguments x and y. 698.It Fn or x y 699Performs a bitwise OR on integer arguments x and y. 700.It Fn xor x y 701Performs a bitwise Exclusive-OR on integer arguments x and y. 702.It Fn lshift x n 703Returns x shifted by n bits to the left. 704.It Fn rshift x n 705Returns y shifted by n bits to the right. 706.El 707.Sh EXAMPLES 708Print lines longer than 72 characters: 709.Pp 710.Dl length($0) > 72 711.Pp 712Print first two fields in opposite order: 713.Pp 714.Dl { print $2, $1 } 715.Pp 716Same, with input fields separated by comma and/or blanks and tabs: 717.Bd -literal -offset indent 718BEGIN { FS = ",[ \et]*|[ \et]+" } 719 { print $2, $1 } 720.Ed 721.Pp 722Add up first column, print sum and average: 723.Bd -literal -offset indent 724{ s += $1 } 725END { print "sum is", s, " average is", s/NR } 726.Ed 727.Pp 728Print all lines between start/stop pairs: 729.Pp 730.Dl /start/, /stop/ 731.Pp 732Simulate echo(1): 733.Bd -literal -offset indent 734BEGIN { # Simulate echo(1) 735 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 736 printf "\en" 737 exit } 738.Ed 739.Pp 740Print an error message to standard error: 741.Bd -literal -offset indent 742{ print "error!" > "/dev/stderr" } 743.Ed 744.Sh SEE ALSO 745.Xr lex 1 , 746.Xr printf 1 , 747.Xr sed 1 , 748.Xr re_format 7 , 749.Xr script 7 750.Pp 751"Awk \(em A Pattern Scanning and Processing Language", 752.Pa /usr/share/doc/usd/16.awk/ . 753.Rs 754.%A A. V. Aho 755.%A B. W. Kernighan 756.%A P. J. Weinberger 757.%T The AWK Programming Language 758.%I Addison-Wesley 759.%D 1988 760.%O ISBN 0-201-07981-X 761.Re 762.Sh STANDARDS 763The 764.Nm 765utility is compliant with the 766.St -p1003.1-2008 767specification. 768.Pp 769The flags 770.Op Fl \&dV 771and 772.Op Fl safe , 773as well as the commands 774.Cm fflush , compl , and , or , 775.Cm xor , lshift , rshift , 776are extensions to that specification. 777.Pp 778.Nm 779does not support {n,m} pattern matching. 780.Sh HISTORY 781An 782.Nm 783utility appeared in 784.At v7 . 785.Sh BUGS 786There are no explicit conversions between numbers and strings. 787To force an expression to be treated as a number add 0 to it; 788to force it to be treated as a string concatenate 789.Li \&"" 790to it. 791.Pp 792The scope rules for variables in functions are a botch; 793the syntax is worse. 794