xref: /openbsd-src/usr.bin/awk/awk.1 (revision 0b7734b3d77bb9b21afec6f4621cae6c805dbd45)
1.\"	$OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.Dd $Mdocdate: September 14 2015 $
26.Dt AWK 1
27.Os
28.Sh NAME
29.Nm awk
30.Nd pattern-directed scanning and processing language
31.Sh SYNOPSIS
32.Nm awk
33.Op Fl safe
34.Op Fl V
35.Op Fl d Ns Op Ar n
36.Op Fl F Ar fs
37.Op Fl v Ar var Ns = Ns Ar value
38.Op Ar prog | Fl f Ar progfile
39.Ar
40.Sh DESCRIPTION
41.Nm
42scans each input
43.Ar file
44for lines that match any of a set of patterns specified literally in
45.Ar prog
46or in one or more files specified as
47.Fl f Ar progfile .
48With each pattern there can be an associated action that will be performed
49when a line of a
50.Ar file
51matches the pattern.
52Each line is matched against the
53pattern portion of every pattern-action statement;
54the associated action is performed for each matched pattern.
55The file name
56.Sq -
57means the standard input.
58Any
59.Ar file
60of the form
61.Ar var Ns = Ns Ar value
62is treated as an assignment, not a filename,
63and is executed at the time it would have been opened if it were a filename.
64.Pp
65The options are as follows:
66.Bl -tag -width "-safe "
67.It Fl d Ns Op Ar n
68Debug mode.
69Set debug level to
70.Ar n ,
71or 1 if
72.Ar n
73is not specified.
74A value greater than 1 causes
75.Nm
76to dump core on fatal errors.
77.It Fl F Ar fs
78Define the input field separator to be the regular expression
79.Ar fs .
80.It Fl f Ar progfile
81Read program code from the specified file
82.Ar progfile
83instead of from the command line.
84.It Fl safe
85Disable file output
86.Pf ( Ic print No > ,
87.Ic print No >> ) ,
88process creation
89.Po
90.Ar cmd | Ic getline ,
91.Ic print | ,
92.Ic system
93.Pc
94and access to the environment
95.Pf ( Va ENVIRON ;
96see the section on variables below).
97This is a first
98.Pq and not very reliable
99approximation to a
100.Dq safe
101version of
102.Nm .
103.It Fl V
104Print the version number of
105.Nm
106to standard output and exit.
107.It Fl v Ar var Ns = Ns Ar value
108Assign
109.Ar value
110to variable
111.Ar var
112before
113.Ar prog
114is executed;
115any number of
116.Fl v
117options may be present.
118.El
119.Pp
120The input is normally made up of input lines
121.Pq records
122separated by newlines, or by the value of
123.Va RS .
124If
125.Va RS
126is null, then any number of blank lines are used as the record separator,
127and newlines are used as field separators
128(in addition to the value of
129.Va FS ) .
130This is convenient when working with multi-line records.
131.Pp
132An input line is normally made up of fields separated by whitespace,
133or by the regular expression
134.Va FS .
135The fields are denoted
136.Va $1 , $2 , ... ,
137while
138.Va $0
139refers to the entire line.
140If
141.Va FS
142is null, the input line is split into one field per character.
143.Pp
144Normally, any number of blanks separate fields.
145In order to set the field separator to a single blank, use the
146.Fl F
147option with a value of
148.Sq [\ \&] .
149If a field separator of
150.Sq t
151is specified,
152.Nm
153treats it as if
154.Sq \et
155had been specified and uses
156.Aq TAB
157as the field separator.
158In order to use a literal
159.Sq t
160as the field separator, use the
161.Fl F
162option with a value of
163.Sq [t] .
164.Pp
165A pattern-action statement has the form
166.Pp
167.D1 Ar pattern Ic \&{ Ar action Ic \&}
168.Pp
169A missing
170.Ic \&{ Ar action Ic \&}
171means print the line;
172a missing pattern always matches.
173Pattern-action statements are separated by newlines or semicolons.
174.Pp
175Newlines are permitted after a terminating statement or following a comma
176.Pq Sq ,\& ,
177an open brace
178.Pq Sq { ,
179a logical AND
180.Pq Sq && ,
181a logical OR
182.Pq Sq || ,
183after the
184.Sq do
185or
186.Sq else
187keywords,
188or after the closing parenthesis of an
189.Sq if ,
190.Sq for ,
191or
192.Sq while
193statement.
194Additionally, a backslash
195.Pq Sq \e
196can be used to escape a newline between tokens.
197.Pp
198An action is a sequence of statements.
199A statement can be one of the following:
200.Pp
201.Bl -tag -width Ds -offset indent -compact
202.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
203.It Ic while Ar ( expression ) Ar statement
204.It Ic for Ar ( expression ; expression ; expression ) statement
205.It Ic for Ar ( var Ic in Ar array ) statement
206.It Ic do Ar statement Ic while Ar ( expression )
207.It Ic break
208.It Ic continue
209.It Xo Ic {
210.Op Ar statement ...
211.Ic }
212.Xc
213.It Xo Ar expression
214.No # commonly
215.Ar var No = Ar expression
216.Xc
217.It Xo Ic print
218.Op Ar expression-list
219.Op > Ns Ar expression
220.Xc
221.It Xo Ic printf Ar format
222.Op Ar ... , expression-list
223.Op > Ns Ar expression
224.Xc
225.It Ic return Op Ar expression
226.It Xo Ic next
227.No # skip remaining patterns on this input line
228.Xc
229.It Xo Ic nextfile
230.No # skip rest of this file, open next, start at top
231.Xc
232.It Xo Ic delete
233.Sm off
234.Ar array Ic \&[ Ar expression Ic \&]
235.Sm on
236.No # delete an array element
237.Xc
238.It Xo Ic delete Ar array
239.No # delete all elements of array
240.Xc
241.It Xo Ic exit
242.Op Ar expression
243.No # exit immediately; status is Ar expression
244.Xc
245.El
246.Pp
247Statements are terminated by
248semicolons, newlines or right braces.
249An empty
250.Ar expression-list
251stands for
252.Ar $0 .
253String constants are quoted
254.Li \&"" ,
255with the usual C escapes recognized within
256(see
257.Xr printf 1
258for a complete list of these).
259Expressions take on string or numeric values as appropriate,
260and are built using the operators
261.Ic + \- * / % ^
262.Pq exponentiation ,
263and concatenation
264.Pq indicated by whitespace .
265The operators
266.Ic \&! ++ \-\- += \-= *= /= %= ^=
267.Ic > >= < <= == != ?:
268are also available in expressions.
269Variables may be scalars, array elements
270(denoted
271.Li x[i] )
272or fields.
273Variables are initialized to the null string.
274Array subscripts may be any string,
275not necessarily numeric;
276this allows for a form of associative memory.
277Multiple subscripts such as
278.Li [i,j,k]
279are permitted; the constituents are concatenated,
280separated by the value of
281.Va SUBSEP
282.Pq see the section on variables below .
283.Pp
284The
285.Ic print
286statement prints its arguments on the standard output
287(or on a file if
288.Pf > Ar file
289or
290.Pf >> Ar file
291is present or on a pipe if
292.Pf |\ \& Ar cmd
293is present), separated by the current output field separator,
294and terminated by the output record separator.
295.Ar file
296and
297.Ar cmd
298may be literal names or parenthesized expressions;
299identical string values in different statements denote
300the same open file.
301The
302.Ic printf
303statement formats its expression list according to the format
304(see
305.Xr printf 1 ) .
306.Pp
307Patterns are arbitrary Boolean combinations
308(with
309.Ic "\&! || &&" )
310of regular expressions and
311relational expressions.
312.Nm
313supports extended regular expressions
314.Pq EREs .
315See
316.Xr re_format 7
317for more information on regular expressions.
318Isolated regular expressions
319in a pattern apply to the entire line.
320Regular expressions may also occur in
321relational expressions, using the operators
322.Ic ~
323and
324.Ic !~ .
325.Pf / Ar re Ns /
326is a constant regular expression;
327any string (constant or variable) may be used
328as a regular expression, except in the position of an isolated regular expression
329in a pattern.
330.Pp
331A pattern may consist of two patterns separated by a comma;
332in this case, the action is performed for all lines
333from an occurrence of the first pattern
334through an occurrence of the second.
335.Pp
336A relational expression is one of the following:
337.Pp
338.Bl -tag -width Ds -offset indent -compact
339.It Ar expression matchop regular-expression
340.It Ar expression relop expression
341.It Ar expression Ic in Ar array-name
342.It Xo Ic \&( Ns
343.Ar expr , expr , \&... Ns Ic \&) in
344.Ar array-name
345.Xc
346.El
347.Pp
348where a
349.Ar relop
350is any of the six relational operators in C, and a
351.Ar matchop
352is either
353.Ic ~
354(matches)
355or
356.Ic !~
357(does not match).
358A conditional is an arithmetic expression,
359a relational expression,
360or a Boolean combination
361of these.
362.Pp
363The special patterns
364.Ic BEGIN
365and
366.Ic END
367may be used to capture control before the first input line is read
368and after the last.
369.Ic BEGIN
370and
371.Ic END
372do not combine with other patterns.
373.Pp
374Variable names with special meanings:
375.Pp
376.Bl -tag -width "FILENAME " -compact
377.It Va ARGC
378Argument count, assignable.
379.It Va ARGV
380Argument array, assignable;
381non-null members are taken as filenames.
382.It Va CONVFMT
383Conversion format when converting numbers
384(default
385.Qq Li %.6g ) .
386.It Va ENVIRON
387Array of environment variables; subscripts are names.
388.It Va FILENAME
389The name of the current input file.
390.It Va FNR
391Ordinal number of the current record in the current file.
392.It Va FS
393Regular expression used to separate fields; also settable
394by option
395.Fl F Ar fs .
396.It Va NF
397Number of fields in the current record.
398.Va $NF
399can be used to obtain the value of the last field in the current record.
400.It Va NR
401Ordinal number of the current record.
402.It Va OFMT
403Output format for numbers (default
404.Qq Li %.6g ) .
405.It Va OFS
406Output field separator (default blank).
407.It Va ORS
408Output record separator (default newline).
409.It Va RLENGTH
410The length of the string matched by the
411.Fn match
412function.
413.It Va RS
414Input record separator (default newline).
415.It Va RSTART
416The starting position of the string matched by the
417.Fn match
418function.
419.It Va SUBSEP
420Separates multiple subscripts (default 034).
421.El
422.Sh FUNCTIONS
423The awk language has a variety of built-in functions:
424arithmetic, string, input/output, general, and bit-operation.
425.Pp
426Functions may be defined (at the position of a pattern-action statement)
427thusly:
428.Pp
429.Dl function foo(a, b, c) { ...; return x }
430.Pp
431Parameters are passed by value if scalar, and by reference if array name;
432functions may be called recursively.
433Parameters are local to the function; all other variables are global.
434Thus local variables may be created by providing excess parameters in
435the function definition.
436.Ss Arithmetic Functions
437.Bl -tag -width "atan2(y, x)"
438.It Fn atan2 y x
439Return the arctangent of
440.Fa y Ns / Ns Fa x
441in radians.
442.It Fn cos x
443Return the cosine of
444.Fa x ,
445where
446.Fa x
447is in radians.
448.It Fn exp x
449Return the exponential of
450.Fa x .
451.It Fn int x
452Return
453.Fa x
454truncated to an integer value.
455.It Fn log x
456Return the natural logarithm of
457.Fa x .
458.It Fn rand
459Return a random number,
460.Fa n ,
461such that
462.Sm off
463.Pf 0 \*(Le Fa n No \*(Lt 1 .
464.Sm on
465.It Fn sin x
466Return the sine of
467.Fa x ,
468where
469.Fa x
470is in radians.
471.It Fn sqrt x
472Return the square root of
473.Fa x .
474.It Fn srand expr
475Sets seed for
476.Fn rand
477to
478.Fa expr
479and returns the previous seed.
480If
481.Fa expr
482is omitted, the time of day is used instead.
483.El
484.Ss String Functions
485.Bl -tag -width "split(s, a, fs)"
486.It Fn gsub r t s
487The same as
488.Fn sub
489except that all occurrences of the regular expression are replaced.
490.Fn gsub
491returns the number of replacements.
492.It Fn index s t
493The position in
494.Fa s
495where the string
496.Fa t
497occurs, or 0 if it does not.
498.It Fn length s
499The length of
500.Fa s
501taken as a string,
502or of
503.Va $0
504if no argument is given.
505.It Fn match s r
506The position in
507.Fa s
508where the regular expression
509.Fa r
510occurs, or 0 if it does not.
511The variable
512.Va RSTART
513is set to the starting position of the matched string
514.Pq which is the same as the returned value
515or zero if no match is found.
516The variable
517.Va RLENGTH
518is set to the length of the matched string,
519or \-1 if no match is found.
520.It Fn split s a fs
521Splits the string
522.Fa s
523into array elements
524.Va a[1] , a[2] , ... , a[n]
525and returns
526.Va n .
527The separation is done with the regular expression
528.Ar fs
529or with the field separator
530.Va FS
531if
532.Ar fs
533is not given.
534An empty string as field separator splits the string
535into one array element per character.
536.It Fn sprintf fmt expr ...
537The string resulting from formatting
538.Fa expr , ...
539according to the
540.Xr printf 1
541format
542.Fa fmt .
543.It Fn sub r t s
544Substitutes
545.Fa t
546for the first occurrence of the regular expression
547.Fa r
548in the string
549.Fa s .
550If
551.Fa s
552is not given,
553.Va $0
554is used.
555An ampersand
556.Pq Sq &
557in
558.Fa t
559is replaced in string
560.Fa s
561with regular expression
562.Fa r .
563A literal ampersand can be specified by preceding it with two backslashes
564.Pq Sq \e\e .
565A literal backslash can be specified by preceding it with another backslash
566.Pq Sq \e\e .
567.Fn sub
568returns the number of replacements.
569.It Fn substr s m n
570Return at most the
571.Fa n Ns -character
572substring of
573.Fa s
574that begins at position
575.Fa m
576counted from 1.
577If
578.Fa n
579is omitted, or if
580.Fa n
581specifies more characters than are left in the string,
582the length of the substring is limited by the length of
583.Fa s .
584.It Fn tolower str
585Returns a copy of
586.Fa str
587with all upper-case characters translated to their
588corresponding lower-case equivalents.
589.It Fn toupper str
590Returns a copy of
591.Fa str
592with all lower-case characters translated to their
593corresponding upper-case equivalents.
594.El
595.Ss Input/Output and General Functions
596.Bl -tag -width "getline [var] < file"
597.It Fn close expr
598Closes the file or pipe
599.Fa expr .
600.Fa expr
601should match the string that was used to open the file or pipe.
602.It Ar cmd | Ic getline Op Va var
603Read a record of input from a stream piped from the output of
604.Ar cmd .
605If
606.Va var
607is omitted, the variables
608.Va $0
609and
610.Va NF
611are set.
612Otherwise
613.Va var
614is set.
615If the stream is not open, it is opened.
616As long as the stream remains open, subsequent calls
617will read subsequent records from the stream.
618The stream remains open until explicitly closed with a call to
619.Fn close .
620.Ic getline
621returns 1 for a successful input, 0 for end of file, and \-1 for an error.
622.It Fn fflush [expr]
623Flushes any buffered output for the file or pipe
624.Fa expr ,
625or all open files or pipes if
626.Fa expr
627is omitted.
628.Fa expr
629should match the string that was used to open the file or pipe.
630.It Ic getline
631Sets
632.Va $0
633to the next input record from the current input file.
634This form of
635.Ic getline
636sets the variables
637.Va NF ,
638.Va NR ,
639and
640.Va FNR .
641.Ic getline
642returns 1 for a successful input, 0 for end of file, and \-1 for an error.
643.It Ic getline Va var
644Sets
645.Va $0
646to variable
647.Va var .
648This form of
649.Ic getline
650sets the variables
651.Va NR
652and
653.Va FNR .
654.Ic getline
655returns 1 for a successful input, 0 for end of file, and \-1 for an error.
656.It Xo
657.Ic getline Op Va var
658.Pf \ \&< Ar file
659.Xc
660Sets
661.Va $0
662to the next record from
663.Ar file .
664If
665.Va var
666is omitted, the variables
667.Va $0
668and
669.Va NF
670are set.
671Otherwise
672.Va var
673is set.
674If
675.Ar file
676is not open, it is opened.
677As long as the stream remains open, subsequent calls will read subsequent
678records from
679.Ar file .
680.Ar file
681remains open until explicitly closed with a call to
682.Fn close .
683.It Fn system cmd
684Executes
685.Fa cmd
686and returns its exit status.
687.El
688.Ss Bit-Operation Functions
689.Bl -tag -width "lshift(a, b)"
690.It Fn compl x
691Returns the bitwise complement of integer argument x.
692.It Fn and x y
693Performs a bitwise AND on integer arguments x and y.
694.It Fn or x y
695Performs a bitwise OR on integer arguments x and y.
696.It Fn xor x y
697Performs a bitwise Exclusive-OR on integer arguments x and y.
698.It Fn lshift x n
699Returns integer argument x shifted by n bits to the left.
700.It Fn rshift x n
701Returns integer argument x shifted by n bits to the right.
702.El
703.Sh EXIT STATUS
704.Ex -std awk
705.Pp
706But note that the
707.Ic exit
708expression can modify the exit status.
709.Sh EXAMPLES
710Print lines longer than 72 characters:
711.Pp
712.Dl length($0) > 72
713.Pp
714Print first two fields in opposite order:
715.Pp
716.Dl { print $2, $1 }
717.Pp
718Same, with input fields separated by comma and/or blanks and tabs:
719.Bd -literal -offset indent
720BEGIN { FS = ",[ \et]*|[ \et]+" }
721      { print $2, $1 }
722.Ed
723.Pp
724Add up first column, print sum and average:
725.Bd -literal -offset indent
726{ s += $1 }
727END { print "sum is", s, " average is", s/NR }
728.Ed
729.Pp
730Print all lines between start/stop pairs:
731.Pp
732.Dl /start/, /stop/
733.Pp
734Simulate echo(1):
735.Bd -literal -offset indent
736BEGIN { # Simulate echo(1)
737        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
738        printf "\en"
739        exit }
740.Ed
741.Pp
742Print an error message to standard error:
743.Bd -literal -offset indent
744{ print "error!" > "/dev/stderr" }
745.Ed
746.Sh SEE ALSO
747.Xr cut 1 ,
748.Xr lex 1 ,
749.Xr printf 1 ,
750.Xr sed 1 ,
751.Xr re_format 7 ,
752.Xr script 7
753.Rs
754.%A A. V. Aho
755.%A B. W. Kernighan
756.%A P. J. Weinberger
757.%T The AWK Programming Language
758.%I Addison-Wesley
759.%D 1988
760.%O ISBN 0-201-07981-X
761.Re
762.Sh STANDARDS
763The
764.Nm
765utility is compliant with the
766.St -p1003.1-2008
767specification,
768except
769.Nm
770does not support {n,m} pattern matching.
771.Pp
772The flags
773.Op Fl \&dV
774and
775.Op Fl safe ,
776as well as the commands
777.Cm fflush , compl , and , or ,
778.Cm xor , lshift , rshift ,
779are extensions to that specification.
780.Sh HISTORY
781An
782.Nm
783utility appeared in
784.At v7 .
785.Sh BUGS
786There are no explicit conversions between numbers and strings.
787To force an expression to be treated as a number add 0 to it;
788to force it to be treated as a string concatenate
789.Li \&""
790to it.
791.Pp
792The scope rules for variables in functions are a botch;
793the syntax is worse.
794