xref: /openbsd-src/usr.bin/awk/awk.1 (revision 99fd087599a8791921855f21bd7e36130f39aadc)
1.\"	$OpenBSD: awk.1,v 1.46 2020/01/22 03:47:38 deraadt Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.Dd $Mdocdate: January 22 2020 $
26.Dt AWK 1
27.Os
28.Sh NAME
29.Nm awk
30.Nd pattern-directed scanning and processing language
31.Sh SYNOPSIS
32.Nm awk
33.Op Fl safe
34.Op Fl V
35.Op Fl d Ns Op Ar n
36.Op Fl F Ar fs
37.Op Fl v Ar var Ns = Ns Ar value
38.Op Ar prog | Fl f Ar progfile
39.Ar
40.Sh DESCRIPTION
41.Nm
42scans each input
43.Ar file
44for lines that match any of a set of patterns specified literally in
45.Ar prog
46or in one or more files specified as
47.Fl f Ar progfile .
48With each pattern there can be an associated action that will be performed
49when a line of a
50.Ar file
51matches the pattern.
52Each line is matched against the
53pattern portion of every pattern-action statement;
54the associated action is performed for each matched pattern.
55The file name
56.Sq -
57means the standard input.
58Any
59.Ar file
60of the form
61.Ar var Ns = Ns Ar value
62is treated as an assignment, not a filename,
63and is executed at the time it would have been opened if it were a filename.
64.Pp
65The options are as follows:
66.Bl -tag -width "-safe "
67.It Fl d Ns Op Ar n
68Debug mode.
69Set debug level to
70.Ar n ,
71or 1 if
72.Ar n
73is not specified.
74A value greater than 1 causes
75.Nm
76to dump core on fatal errors.
77.It Fl F Ar fs
78Define the input field separator to be the regular expression
79.Ar fs .
80.It Fl f Ar progfile
81Read program code from the specified file
82.Ar progfile
83instead of from the command line.
84.It Fl safe
85Disable file output
86.Pf ( Ic print No > ,
87.Ic print No >> ) ,
88process creation
89.Po
90.Ar cmd | Ic getline ,
91.Ic print | ,
92.Ic system
93.Pc
94and access to the environment
95.Pf ( Va ENVIRON ;
96see the section on variables below).
97This is a first
98.Pq and not very reliable
99approximation to a
100.Dq safe
101version of
102.Nm .
103.It Fl V
104Print the version number of
105.Nm
106to standard output and exit.
107.It Fl v Ar var Ns = Ns Ar value
108Assign
109.Ar value
110to variable
111.Ar var
112before
113.Ar prog
114is executed;
115any number of
116.Fl v
117options may be present.
118.El
119.Pp
120The input is normally made up of input lines
121.Pq records
122separated by newlines, or by the value of
123.Va RS .
124If
125.Va RS
126is null, then any number of blank lines are used as the record separator,
127and newlines are used as field separators
128(in addition to the value of
129.Va FS ) .
130This is convenient when working with multi-line records.
131.Pp
132An input line is normally made up of fields separated by whitespace,
133or by the regular expression
134.Va FS .
135The fields are denoted
136.Va $1 , $2 , ... ,
137while
138.Va $0
139refers to the entire line.
140If
141.Va FS
142is null, the input line is split into one field per character.
143.Pp
144Normally, any number of blanks separate fields.
145In order to set the field separator to a single blank, use the
146.Fl F
147option with a value of
148.Sq [\ \&] .
149If a field separator of
150.Sq t
151is specified,
152.Nm
153treats it as if
154.Sq \et
155had been specified and uses
156.Aq TAB
157as the field separator.
158In order to use a literal
159.Sq t
160as the field separator, use the
161.Fl F
162option with a value of
163.Sq [t] .
164.Pp
165A pattern-action statement has the form
166.Pp
167.D1 Ar pattern Ic \&{ Ar action Ic \&}
168.Pp
169A missing
170.Ic \&{ Ar action Ic \&}
171means print the line;
172a missing pattern always matches.
173Pattern-action statements are separated by newlines or semicolons.
174.Pp
175Newlines are permitted after a terminating statement or following a comma
176.Pq Sq ,\& ,
177an open brace
178.Pq Sq { ,
179a logical AND
180.Pq Sq && ,
181a logical OR
182.Pq Sq || ,
183after the
184.Sq do
185or
186.Sq else
187keywords,
188or after the closing parenthesis of an
189.Sq if ,
190.Sq for ,
191or
192.Sq while
193statement.
194Additionally, a backslash
195.Pq Sq \e
196can be used to escape a newline between tokens.
197.Pp
198An action is a sequence of statements.
199A statement can be one of the following:
200.Pp
201.Bl -tag -width Ds -offset indent -compact
202.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
203.It Ic while Ar ( expression ) Ar statement
204.It Ic for Ar ( expression ; expression ; expression ) statement
205.It Ic for Ar ( var Ic in Ar array ) statement
206.It Ic do Ar statement Ic while Ar ( expression )
207.It Ic break
208.It Ic continue
209.It Xo Ic {
210.Op Ar statement ...
211.Ic }
212.Xc
213.It Xo Ar expression
214.No # commonly
215.Ar var No = Ar expression
216.Xc
217.It Xo Ic print
218.Op Ar expression-list
219.Op > Ns Ar expression
220.Xc
221.It Xo Ic printf Ar format
222.Op Ar ... , expression-list
223.Op > Ns Ar expression
224.Xc
225.It Ic return Op Ar expression
226.It Xo Ic next
227.No # skip remaining patterns on this input line
228.Xc
229.It Xo Ic nextfile
230.No # skip rest of this file, open next, start at top
231.Xc
232.It Xo Ic delete
233.Sm off
234.Ar array Ic \&[ Ar expression Ic \&]
235.Sm on
236.No # delete an array element
237.Xc
238.It Xo Ic delete Ar array
239.No # delete all elements of array
240.Xc
241.It Xo Ic exit
242.Op Ar expression
243.No # exit processing, and perform
244.Ic END
245processing; status is
246.Ar expression
247.Xc
248.El
249.Pp
250Statements are terminated by
251semicolons, newlines or right braces.
252An empty
253.Ar expression-list
254stands for
255.Ar $0 .
256String constants are quoted
257.Li \&"" ,
258with the usual C escapes recognized within
259(see
260.Xr printf 1
261for a complete list of these).
262Expressions take on string or numeric values as appropriate,
263and are built using the operators
264.Ic + \- * / % ^
265.Pq exponentiation ,
266and concatenation
267.Pq indicated by whitespace .
268The operators
269.Ic \&! ++ \-\- += \-= *= /= %= ^=
270.Ic > >= < <= == != ?:
271are also available in expressions.
272Variables may be scalars, array elements
273(denoted
274.Li x[i] )
275or fields.
276Variables are initialized to the null string.
277Array subscripts may be any string,
278not necessarily numeric;
279this allows for a form of associative memory.
280Multiple subscripts such as
281.Li [i,j,k]
282are permitted; the constituents are concatenated,
283separated by the value of
284.Va SUBSEP
285.Pq see the section on variables below .
286.Pp
287The
288.Ic print
289statement prints its arguments on the standard output
290(or on a file if
291.Pf > Ar file
292or
293.Pf >> Ar file
294is present or on a pipe if
295.Pf |\ \& Ar cmd
296is present), separated by the current output field separator,
297and terminated by the output record separator.
298.Ar file
299and
300.Ar cmd
301may be literal names or parenthesized expressions;
302identical string values in different statements denote
303the same open file.
304The
305.Ic printf
306statement formats its expression list according to the format
307(see
308.Xr printf 1 ) .
309.Pp
310Patterns are arbitrary Boolean combinations
311(with
312.Ic "\&! || &&" )
313of regular expressions and
314relational expressions.
315.Nm
316supports extended regular expressions
317.Pq EREs .
318See
319.Xr re_format 7
320for more information on regular expressions.
321Isolated regular expressions
322in a pattern apply to the entire line.
323Regular expressions may also occur in
324relational expressions, using the operators
325.Ic ~
326and
327.Ic !~ .
328.Pf / Ar re Ns /
329is a constant regular expression;
330any string (constant or variable) may be used
331as a regular expression, except in the position of an isolated regular expression
332in a pattern.
333.Pp
334A pattern may consist of two patterns separated by a comma;
335in this case, the action is performed for all lines
336from an occurrence of the first pattern
337through an occurrence of the second.
338.Pp
339A relational expression is one of the following:
340.Pp
341.Bl -tag -width Ds -offset indent -compact
342.It Ar expression matchop regular-expression
343.It Ar expression relop expression
344.It Ar expression Ic in Ar array-name
345.It Xo Ic \&( Ns
346.Ar expr , expr , \&... Ns Ic \&) in
347.Ar array-name
348.Xc
349.El
350.Pp
351where a
352.Ar relop
353is any of the six relational operators in C, and a
354.Ar matchop
355is either
356.Ic ~
357(matches)
358or
359.Ic !~
360(does not match).
361A conditional is an arithmetic expression,
362a relational expression,
363or a Boolean combination
364of these.
365.Pp
366The special pattern
367.Ic BEGIN
368may be used to capture control before the first input line is read.
369The special pattern
370.Ic END
371may be used to capture control after processing is finished.
372.Ic BEGIN
373and
374.Ic END
375do not combine with other patterns.
376.Pp
377Variable names with special meanings:
378.Pp
379.Bl -tag -width "FILENAME " -compact
380.It Va ARGC
381Argument count, assignable.
382.It Va ARGV
383Argument array, assignable;
384non-null members are taken as filenames.
385.It Va CONVFMT
386Conversion format when converting numbers
387(default
388.Qq Li %.6g ) .
389.It Va ENVIRON
390Array of environment variables; subscripts are names.
391.It Va FILENAME
392The name of the current input file.
393.It Va FNR
394Ordinal number of the current record in the current file.
395.It Va FS
396Regular expression used to separate fields; also settable
397by option
398.Fl F Ar fs .
399.It Va NF
400Number of fields in the current record.
401.Va $NF
402can be used to obtain the value of the last field in the current record.
403.It Va NR
404Ordinal number of the current record.
405.It Va OFMT
406Output format for numbers (default
407.Qq Li %.6g ) .
408.It Va OFS
409Output field separator (default blank).
410.It Va ORS
411Output record separator (default newline).
412.It Va RLENGTH
413The length of the string matched by the
414.Fn match
415function.
416.It Va RS
417Input record separator (default newline).
418.It Va RSTART
419The starting position of the string matched by the
420.Fn match
421function.
422.It Va SUBSEP
423Separates multiple subscripts (default 034).
424.El
425.Sh FUNCTIONS
426The awk language has a variety of built-in functions:
427arithmetic, string, input/output, general, and bit-operation.
428.Pp
429Functions may be defined (at the position of a pattern-action statement)
430thusly:
431.Pp
432.Dl function foo(a, b, c) { ...; return x }
433.Pp
434Parameters are passed by value if scalar, and by reference if array name;
435functions may be called recursively.
436Parameters are local to the function; all other variables are global.
437Thus local variables may be created by providing excess parameters in
438the function definition.
439.Ss Arithmetic Functions
440.Bl -tag -width "atan2(y, x)"
441.It Fn atan2 y x
442Return the arctangent of
443.Fa y Ns / Ns Fa x
444in radians.
445.It Fn cos x
446Return the cosine of
447.Fa x ,
448where
449.Fa x
450is in radians.
451.It Fn exp x
452Return the exponential of
453.Fa x .
454.It Fn int x
455Return
456.Fa x
457truncated to an integer value.
458.It Fn log x
459Return the natural logarithm of
460.Fa x .
461.It Fn rand
462Return a random number,
463.Fa n ,
464such that
465.Sm off
466.Pf 0 \*(Le Fa n No \*(Lt 1 .
467.Sm on
468.It Fn sin x
469Return the sine of
470.Fa x ,
471where
472.Fa x
473is in radians.
474.It Fn sqrt x
475Return the square root of
476.Fa x .
477.It Fn srand expr
478Sets seed for
479.Fn rand
480to
481.Fa expr
482and returns the previous seed.
483If
484.Fa expr
485is omitted, the time of day is used instead.
486.El
487.Ss String Functions
488.Bl -tag -width "split(s, a, fs)"
489.It Fn gsub r t s
490The same as
491.Fn sub
492except that all occurrences of the regular expression are replaced.
493.Fn gsub
494returns the number of replacements.
495.It Fn index s t
496The position in
497.Fa s
498where the string
499.Fa t
500occurs, or 0 if it does not.
501.It Fn length s
502The length of
503.Fa s
504taken as a string,
505or of
506.Va $0
507if no argument is given.
508.It Fn match s r
509The position in
510.Fa s
511where the regular expression
512.Fa r
513occurs, or 0 if it does not.
514The variable
515.Va RSTART
516is set to the starting position of the matched string
517.Pq which is the same as the returned value
518or zero if no match is found.
519The variable
520.Va RLENGTH
521is set to the length of the matched string,
522or \-1 if no match is found.
523.It Fn split s a fs
524Splits the string
525.Fa s
526into array elements
527.Va a[1] , a[2] , ... , a[n]
528and returns
529.Va n .
530The separation is done with the regular expression
531.Ar fs
532or with the field separator
533.Va FS
534if
535.Ar fs
536is not given.
537An empty string as field separator splits the string
538into one array element per character.
539.It Fn sprintf fmt expr ...
540The string resulting from formatting
541.Fa expr , ...
542according to the
543.Xr printf 1
544format
545.Fa fmt .
546.It Fn sub r t s
547Substitutes
548.Fa t
549for the first occurrence of the regular expression
550.Fa r
551in the string
552.Fa s .
553If
554.Fa s
555is not given,
556.Va $0
557is used.
558An ampersand
559.Pq Sq &
560in
561.Fa t
562is replaced in string
563.Fa s
564with regular expression
565.Fa r .
566A literal ampersand can be specified by preceding it with two backslashes
567.Pq Sq \e\e .
568A literal backslash can be specified by preceding it with another backslash
569.Pq Sq \e\e .
570.Fn sub
571returns the number of replacements.
572.It Fn substr s m n
573Return at most the
574.Fa n Ns -character
575substring of
576.Fa s
577that begins at position
578.Fa m
579counted from 1.
580If
581.Fa n
582is omitted, or if
583.Fa n
584specifies more characters than are left in the string,
585the length of the substring is limited by the length of
586.Fa s .
587.It Fn tolower str
588Returns a copy of
589.Fa str
590with all upper-case characters translated to their
591corresponding lower-case equivalents.
592.It Fn toupper str
593Returns a copy of
594.Fa str
595with all lower-case characters translated to their
596corresponding upper-case equivalents.
597.El
598.Ss Input/Output and General Functions
599.Bl -tag -width "getline [var] < file"
600.It Fn close expr
601Closes the file or pipe
602.Fa expr .
603.Fa expr
604should match the string that was used to open the file or pipe.
605.It Ar cmd | Ic getline Op Va var
606Read a record of input from a stream piped from the output of
607.Ar cmd .
608If
609.Va var
610is omitted, the variables
611.Va $0
612and
613.Va NF
614are set.
615Otherwise
616.Va var
617is set.
618If the stream is not open, it is opened.
619As long as the stream remains open, subsequent calls
620will read subsequent records from the stream.
621The stream remains open until explicitly closed with a call to
622.Fn close .
623.Ic getline
624returns 1 for a successful input, 0 for end of file, and \-1 for an error.
625.It Fn fflush [expr]
626Flushes any buffered output for the file or pipe
627.Fa expr ,
628or all open files or pipes if
629.Fa expr
630is omitted.
631.Fa expr
632should match the string that was used to open the file or pipe.
633.It Ic getline
634Sets
635.Va $0
636to the next input record from the current input file.
637This form of
638.Ic getline
639sets the variables
640.Va NF ,
641.Va NR ,
642and
643.Va FNR .
644.Ic getline
645returns 1 for a successful input, 0 for end of file, and \-1 for an error.
646.It Ic getline Va var
647Sets
648.Va $0
649to variable
650.Va var .
651This form of
652.Ic getline
653sets the variables
654.Va NR
655and
656.Va FNR .
657.Ic getline
658returns 1 for a successful input, 0 for end of file, and \-1 for an error.
659.It Xo
660.Ic getline Op Va var
661.Pf \ \&< Ar file
662.Xc
663Sets
664.Va $0
665to the next record from
666.Ar file .
667If
668.Va var
669is omitted, the variables
670.Va $0
671and
672.Va NF
673are set.
674Otherwise
675.Va var
676is set.
677If
678.Ar file
679is not open, it is opened.
680As long as the stream remains open, subsequent calls will read subsequent
681records from
682.Ar file .
683.Ar file
684remains open until explicitly closed with a call to
685.Fn close .
686.It Fn system cmd
687Executes
688.Fa cmd
689and returns its exit status.
690.El
691.Ss Bit-Operation Functions
692.Bl -tag -width "lshift(a, b)"
693.It Fn compl x
694Returns the bitwise complement of integer argument x.
695.It Fn and x y
696Performs a bitwise AND on integer arguments x and y.
697.It Fn or x y
698Performs a bitwise OR on integer arguments x and y.
699.It Fn xor x y
700Performs a bitwise Exclusive-OR on integer arguments x and y.
701.It Fn lshift x n
702Returns integer argument x shifted by n bits to the left.
703.It Fn rshift x n
704Returns integer argument x shifted by n bits to the right.
705.El
706.Sh EXIT STATUS
707.Ex -std awk
708.Pp
709But note that the
710.Ic exit
711expression can modify the exit status.
712.Sh EXAMPLES
713Print lines longer than 72 characters:
714.Pp
715.Dl length($0) > 72
716.Pp
717Print first two fields in opposite order:
718.Pp
719.Dl { print $2, $1 }
720.Pp
721Same, with input fields separated by comma and/or blanks and tabs:
722.Bd -literal -offset indent
723BEGIN { FS = ",[ \et]*|[ \et]+" }
724      { print $2, $1 }
725.Ed
726.Pp
727Add up first column, print sum and average:
728.Bd -literal -offset indent
729{ s += $1 }
730END { print "sum is", s, " average is", s/NR }
731.Ed
732.Pp
733Print all lines between start/stop pairs:
734.Pp
735.Dl /start/, /stop/
736.Pp
737Simulate
738.Xr echo 1 :
739.Bd -literal -offset indent
740BEGIN { # Simulate echo(1)
741        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
742        printf "\en"
743        exit }
744.Ed
745.Pp
746Print an error message to standard error:
747.Bd -literal -offset indent
748{ print "error!" > "/dev/stderr" }
749.Ed
750.Sh SEE ALSO
751.Xr cut 1 ,
752.Xr lex 1 ,
753.Xr printf 1 ,
754.Xr sed 1 ,
755.Xr re_format 7 ,
756.Xr script 7
757.Rs
758.%A A. V. Aho
759.%A B. W. Kernighan
760.%A P. J. Weinberger
761.%T The AWK Programming Language
762.%I Addison-Wesley
763.%D 1988
764.%O ISBN 0-201-07981-X
765.Re
766.Sh STANDARDS
767The
768.Nm
769utility is compliant with the
770.St -p1003.1-2008
771specification,
772except
773.Nm
774does not support {n,m} pattern matching.
775.Pp
776The flags
777.Op Fl \&dV
778and
779.Op Fl safe ,
780as well as the commands
781.Cm fflush , compl , and , or ,
782.Cm xor , lshift , rshift ,
783are extensions to that specification.
784.Sh HISTORY
785An
786.Nm
787utility appeared in
788.At v7 .
789.Sh BUGS
790There are no explicit conversions between numbers and strings.
791To force an expression to be treated as a number add 0 to it;
792to force it to be treated as a string concatenate
793.Li \&""
794to it.
795.Pp
796The scope rules for variables in functions are a botch;
797the syntax is worse.
798