xref: /openbsd-src/usr.bin/awk/awk.1 (revision 2b0358df1d88d06ef4139321dd05bd5e05d91eaf)
1.\"	$OpenBSD: awk.1,v 1.33 2009/02/08 17:15:09 jmc Exp $
2.\" EX/EE is a Bd
3.\"
4.\" Copyright (C) Lucent Technologies 1997
5.\" All Rights Reserved
6.\"
7.\" Permission to use, copy, modify, and distribute this software and
8.\" its documentation for any purpose and without fee is hereby
9.\" granted, provided that the above copyright notice appear in all
10.\" copies and that both that the copyright notice and this
11.\" permission notice and warranty disclaimer appear in supporting
12.\" documentation, and that the name Lucent Technologies or any of
13.\" its entities not be used in advertising or publicity pertaining
14.\" to distribution of the software without specific, written prior
15.\" permission.
16.\"
17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
24.\" THIS SOFTWARE.
25.\"
26.Dd $Mdocdate: February 8 2009 $
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl safe
35.Op Fl V
36.Op Fl d Ns Op Ar n
37.Op Fl F Ar fs
38.Oo Fl v Ar var Ns =
39.Ns Ar value Oc
40.Op Ar prog | Fl f Ar progfile
41.Ar
42.Nm nawk
43.Ar ...
44.Sh DESCRIPTION
45.Nm
46scans each input
47.Ar file
48for lines that match any of a set of patterns specified literally in
49.Ar prog
50or in one or more files specified as
51.Fl f Ar progfile .
52With each pattern there can be an associated action that will be performed
53when a line of a
54.Ar file
55matches the pattern.
56Each line is matched against the
57pattern portion of every pattern-action statement;
58the associated action is performed for each matched pattern.
59The file name
60.Sq -
61means the standard input.
62Any
63.Ar file
64of the form
65.Ar var Ns = Ns Ar value
66is treated as an assignment, not a filename,
67and is executed at the time it would have been opened if it were a filename.
68.Pp
69The options are as follows:
70.Bl -tag -width "-safe "
71.It Fl d Ns Op Ar n
72Debug mode.
73Set debug level to
74.Ar n ,
75or 1 if
76.Ar n
77is not specified.
78A value greater than 1 causes
79.Nm
80to dump core on fatal errors.
81.It Fl F Ar fs
82Define the input field separator to be the regular expression
83.Ar fs .
84.It Fl f Ar progfile
85Read program code from the specified file
86.Ar progfile
87instead of from the command line.
88.It Fl safe
89Disable file output
90.Pf ( Ic print No > ,
91.Ic print No >> ) ,
92process creation
93.Po
94.Ar cmd | Ic getline ,
95.Ic print No \&| ,
96.Ic system
97.Pc
98and access to the environment
99.Pf ( Va ENVIRON ;
100see the section on variables below).
101This is a first
102.Pq and not very reliable
103approximation to a
104.Dq safe
105version of
106.Nm .
107.It Fl V
108Print the version number of
109.Nm
110to standard output and exit.
111.It Fl v Ar var Ns = Ns Ar value
112Assign
113.Ar value
114to variable
115.Ar var
116before
117.Ar prog
118is executed;
119any number of
120.Fl v
121options may be present.
122.El
123.Pp
124The input is normally made up of input lines
125.Pq records
126separated by newlines, or by the value of
127.Va RS .
128If
129.Va RS
130is null, then any number of blank lines are used as the record separator,
131and newlines are used as field separators
132(in addition to the value of
133.Va FS ) .
134This is convenient when working with multi-line records.
135.Pp
136An input line is normally made up of fields separated by whitespace,
137or by the regular expression
138.Va FS .
139The fields are denoted
140.Va $1 , $2 , ... ,
141while
142.Va $0
143refers to the entire line.
144If
145.Va FS
146is null, the input line is split into one field per character.
147.Pp
148Normally, any number of blanks separate fields.
149In order to set the field separator to a single blank, use the
150.Fl F
151option with a value of
152.Sq [\ \&] .
153If a field separator of
154.Sq t
155is specified,
156.Nm
157treats it as if
158.Sq \et
159had been specified and uses
160.Aq TAB
161as the field separator.
162In order to use a literal
163.Sq t
164as the field separator, use the
165.Fl F
166option with a value of
167.Sq [t] .
168.Pp
169A pattern-action statement has the form
170.Pp
171.D1 Ar pattern Ic \&{ Ar action Ic \&}
172.Pp
173A missing
174.Ic \&{ Ar action Ic \&}
175means print the line;
176a missing pattern always matches.
177Pattern-action statements are separated by newlines or semicolons.
178.Pp
179Newlines are permitted after a terminating statement or following a comma
180.Pq Sq ,\& ,
181an open brace
182.Pq Sq { ,
183a logical AND
184.Pq Sq && ,
185a logical OR
186.Pq Sq || ,
187after the
188.Sq do
189or
190.Sq else
191keywords,
192or after the closing parenthesis of an
193.Sq if ,
194.Sq for ,
195or
196.Sq while
197statement.
198Additionally, a backslash
199.Pq Sq \e
200can be used to escape a newline between tokens.
201.Pp
202An action is a sequence of statements.
203A statement can be one of the following:
204.Bd -unfilled -offset indent
205.Ic if ( Xo
206.Ar expression ) statement \&
207.Op Ic else Ar statement
208.Xc
209.Ic while ( Ar expression ) statement
210.Ic for ( Xo
211.Ar expression ; expression ; expression ) statement
212.Xc
213.Ic for ( Xo
214.Ar var Ic in Ar array ) statement
215.Xc
216.Ic do Ar statement Ic while ( Ar expression )
217.Ic break
218.Ic continue
219.Ic { Oo Ar statement ... Oc Ic \& }
220.Ar expression Xo
221.No "# commonly" \&
222.Ar var Ic = Ar expression
223.Xc
224.Ic print Xo
225.Op Ar expression-list
226.Op > Ns Ar expression
227.Xc
228.Ic printf Ar format Xo
229.Op Ar ... , expression-list
230.Op > Ns Ar expression
231.Xc
232.Ic return Op Ar expression
233.Ic next Xo
234.No "# skip remaining patterns on this input line"
235.Xc
236.Ic nextfile Xo
237.No "# skip rest of this file, open next, start at top"
238.Xc
239.Ic delete Ar array Ns Xo
240.Ic \&[ Ns Ar expression Ns Ic \&]
241.No \& "# delete an array element"
242.Xc
243.Ic delete Ar array Xo
244.No "# delete all elements of array"
245.Xc
246.Ic exit Xo
247.Op Ar expression
248.No \& "# exit immediately; status is" Ar expression
249.Xc
250.Ed
251.Pp
252Statements are terminated by
253semicolons, newlines or right braces.
254An empty
255.Ar expression-list
256stands for
257.Ar $0 .
258String constants are quoted
259.Li \&"" ,
260with the usual C escapes recognized within
261(see
262.Xr printf 1
263for a complete list of these).
264Expressions take on string or numeric values as appropriate,
265and are built using the operators
266.Ic + \- * / % ^
267.Pq exponentiation ,
268and concatenation
269.Pq indicated by whitespace .
270The operators
271.Ic \&! ++ \-\- += \-= *= /= %= ^=
272.Ic > >= < <= == != ?:
273are also available in expressions.
274Variables may be scalars, array elements
275(denoted
276.Li x[i] )
277or fields.
278Variables are initialized to the null string.
279Array subscripts may be any string,
280not necessarily numeric;
281this allows for a form of associative memory.
282Multiple subscripts such as
283.Li [i,j,k]
284are permitted; the constituents are concatenated,
285separated by the value of
286.Va SUBSEP
287.Pq see the section on variables below .
288.Pp
289The
290.Ic print
291statement prints its arguments on the standard output
292(or on a file if
293.Pf > Ns Ar file
294or
295.Pf >> Ns Ar file
296is present or on a pipe if
297.Pf |\ \& Ar cmd
298is present), separated by the current output field separator,
299and terminated by the output record separator.
300.Ar file
301and
302.Ar cmd
303may be literal names or parenthesized expressions;
304identical string values in different statements denote
305the same open file.
306The
307.Ic printf
308statement formats its expression list according to the format
309(see
310.Xr printf 1 ) .
311.Pp
312Patterns are arbitrary Boolean combinations
313(with
314.Ic "\&! || &&" )
315of regular expressions and
316relational expressions.
317.Nm
318supports extended regular expressions
319.Pq EREs .
320See
321.Xr re_format 7
322for more information on regular expressions.
323Isolated regular expressions
324in a pattern apply to the entire line.
325Regular expressions may also occur in
326relational expressions, using the operators
327.Ic ~
328and
329.Ic !~ .
330.Pf / Ns Ar re Ns /
331is a constant regular expression;
332any string (constant or variable) may be used
333as a regular expression, except in the position of an isolated regular expression
334in a pattern.
335.Pp
336A pattern may consist of two patterns separated by a comma;
337in this case, the action is performed for all lines
338from an occurrence of the first pattern
339through an occurrence of the second.
340.Pp
341A relational expression is one of the following:
342.Bd -unfilled -offset indent
343.Ar expression matchop regular-expression
344.Ar expression relop expression
345.Ar expression Ic in Ar array-name
346.Ic \&( Ns Xo
347.Ar expr , expr , \&... Ns Ic \&) in
348.Ar \& array-name
349.Xc
350.Ed
351.Pp
352where a
353.Ar relop
354is any of the six relational operators in C, and a
355.Ar matchop
356is either
357.Ic ~
358(matches)
359or
360.Ic !~
361(does not match).
362A conditional is an arithmetic expression,
363a relational expression,
364or a Boolean combination
365of these.
366.Pp
367The special patterns
368.Ic BEGIN
369and
370.Ic END
371may be used to capture control before the first input line is read
372and after the last.
373.Ic BEGIN
374and
375.Ic END
376do not combine with other patterns.
377.Pp
378Variable names with special meanings:
379.Pp
380.Bl -tag -width "FILENAME " -compact
381.It Va ARGC
382Argument count, assignable.
383.It Va ARGV
384Argument array, assignable;
385non-null members are taken as filenames.
386.It Va CONVFMT
387Conversion format when converting numbers
388(default
389.Qq Li %.6g ) .
390.It Va ENVIRON
391Array of environment variables; subscripts are names.
392.It Va FILENAME
393The name of the current input file.
394.It Va FNR
395Ordinal number of the current record in the current file.
396.It Va FS
397Regular expression used to separate fields; also settable
398by option
399.Fl F Ar fs .
400.It Va NF
401Number of fields in the current record.
402.Va $NF
403can be used to obtain the value of the last field in the current record.
404.It Va NR
405Ordinal number of the current record.
406.It Va OFMT
407Output format for numbers (default
408.Qq Li %.6g ) .
409.It Va OFS
410Output field separator (default blank).
411.It Va ORS
412Output record separator (default newline).
413.It Va RLENGTH
414The length of the string matched by the
415.Fn match
416function.
417.It Va RS
418Input record separator (default newline).
419.It Va RSTART
420The starting position of the string matched by the
421.Fn match
422function.
423.It Va SUBSEP
424Separates multiple subscripts (default 034).
425.El
426.Sh FUNCTIONS
427The awk language has a variety of built-in functions:
428arithmetic, string, input/output, general, and bit-operation.
429.Pp
430Functions may be defined (at the position of a pattern-action statement)
431thusly:
432.Pp
433.Dl function foo(a, b, c) { ...; return x }
434.Pp
435Parameters are passed by value if scalar, and by reference if array name;
436functions may be called recursively.
437Parameters are local to the function; all other variables are global.
438Thus local variables may be created by providing excess parameters in
439the function definition.
440.Ss Arithmetic Functions
441.Bl -tag -width "atan2(y, x)"
442.It Fn atan2 y x
443Return the arctangent of
444.Fa y Ns / Ns Fa x
445in radians.
446.It Fn cos x
447Return the cosine of
448.Fa x ,
449where
450.Fa x
451is in radians.
452.It Fn exp x
453Return the exponential of
454.Fa x .
455.It Fn int x
456Return
457.Fa x
458truncated to an integer value.
459.It Fn log x
460Return the natural logarithm of
461.Fa x .
462.It Fn rand
463Return a random number,
464.Fa n ,
465such that
466.Sm off
467.Pf 0 \*(Le Fa n No \*(Lt 1 .
468.Sm on
469.It Fn sin x
470Return the sine of
471.Fa x ,
472where
473.Fa x
474is in radians.
475.It Fn sqrt x
476Return the square root of
477.Fa x .
478.It Fn srand expr
479Sets seed for
480.Fn rand
481to
482.Fa expr
483and returns the previous seed.
484If
485.Fa expr
486is omitted, the time of day is used instead.
487.El
488.Ss String Functions
489.Bl -tag -width "split(s, a, fs)"
490.It Fn gsub r t s
491The same as
492.Fn sub
493except that all occurrences of the regular expression are replaced.
494.Fn gsub
495returns the number of replacements.
496.It Fn index s t
497The position in
498.Fa s
499where the string
500.Fa t
501occurs, or 0 if it does not.
502.It Fn length s
503The length of
504.Fa s
505taken as a string,
506or of
507.Va $0
508if no argument is given.
509.It Fn match s r
510The position in
511.Fa s
512where the regular expression
513.Fa r
514occurs, or 0 if it does not.
515The variable
516.Va RSTART
517is set to the starting position of the matched string
518.Pq which is the same as the returned value
519or zero if no match is found.
520The variable
521.Va RLENGTH
522is set to the length of the matched string,
523or \-1 if no match is found.
524.It Fn split s a fs
525Splits the string
526.Fa s
527into array elements
528.Va a[1] , a[2] , ... , a[n]
529and returns
530.Va n .
531The separation is done with the regular expression
532.Ar fs
533or with the field separator
534.Va FS
535if
536.Ar fs
537is not given.
538An empty string as field separator splits the string
539into one array element per character.
540.It Fn sprintf fmt expr ...
541The string resulting from formatting
542.Fa expr , ...
543according to the
544.Xr printf 1
545format
546.Fa fmt .
547.It Fn sub r t s
548Substitutes
549.Fa t
550for the first occurrence of the regular expression
551.Fa r
552in the string
553.Fa s .
554If
555.Fa s
556is not given,
557.Va $0
558is used.
559An ampersand
560.Pq Sq &
561in
562.Fa t
563is replaced in string
564.Fa s
565with regular expression
566.Fa r .
567A literal ampersand can be specified by preceding it with two backslashes
568.Pq Sq \e\e .
569A literal backslash can be specified by preceding it with another backslash
570.Pq Sq \e\e .
571.Fn sub
572returns the number of replacements.
573.It Fn substr s m n
574Return at most the
575.Fa n Ns -character
576substring of
577.Fa s
578that begins at position
579.Fa m
580counted from 1.
581If
582.Fa n
583is omitted, or if
584.Fa n
585specifies more characters than are left in the string,
586the length of the substring is limited by the length of
587.Fa s .
588.It Fn tolower str
589Returns a copy of
590.Fa str
591with all upper-case characters translated to their
592corresponding lower-case equivalents.
593.It Fn toupper str
594Returns a copy of
595.Fa str
596with all lower-case characters translated to their
597corresponding upper-case equivalents.
598.El
599.Ss Input/Output and General Functions
600.Bl -tag -width "getline [var] < file"
601.It Fn close expr
602Closes the file or pipe
603.Fa expr .
604.Fa expr
605should match the string that was used to open the file or pipe.
606.It Ar cmd | Ic getline Op Va var
607Read a record of input from a stream piped from the output of
608.Ar cmd .
609If
610.Va var
611is omitted, the variables
612.Va $0
613and
614.Va NF
615are set.
616Otherwise
617.Va var
618is set.
619If the stream is not open, it is opened.
620As long as the stream remains open, subsequent calls
621will read subsequent records from the stream.
622The stream remains open until explicitly closed with a call to
623.Fn close .
624.Ic getline
625returns 1 for a successful input, 0 for end of file, and \-1 for an error.
626.It Fn fflush [expr]
627Flushes any buffered output for the file, pipe
628.Fa expr ,
629or all open files or pipes if
630.Fa expr
631is omitted.
632.Fa expr
633should match the string that was used to open the file or pipe.
634.It Ic getline
635Sets
636.Va $0
637to the next input record from the current input file.
638This form of
639.Ic getline
640sets the variables
641.Va NF ,
642.Va NR ,
643and
644.Va FNR .
645.Ic getline
646returns 1 for a successful input, 0 for end of file, and \-1 for an error.
647.It Ic getline Va var
648Sets
649.Va $0
650to variable
651.Va var .
652This form of
653.Ic getline
654sets the variables
655.Va NR
656and
657.Va FNR .
658.Ic getline
659returns 1 for a successful input, 0 for end of file, and \-1 for an error.
660.It Xo
661.Ic getline Op Va var
662.Pf \ \&< Ar file
663.Xc
664Sets
665.Va $0
666to the next record from
667.Ar file .
668If
669.Va var
670is omitted, the variables
671.Va $0
672and
673.Va NF
674are set.
675Otherwise
676.Va var
677is set.
678If
679.Ar file
680is not open, it is opened.
681As long as the stream remains open, subsequent calls will read subsequent
682records from
683.Ar file .
684.Ar file
685remains open until explicitly closed with a call to
686.Fn close .
687.It Fn system cmd
688Executes
689.Fa cmd
690and returns its exit status.
691.El
692.Ss Bit-Operation Functions
693.Bl -tag -width "lshift(a, b)"
694.It Fn compl x
695Returns the bitwise complement of integer argument x.
696.It Fn and x y
697Performs a bitwise AND on integer arguments x and y.
698.It Fn or x y
699Performs a bitwise OR on integer arguments x and y.
700.It Fn xor x y
701Performs a bitwise Exclusive-OR on integer arguments x and y.
702.It Fn lshift x n
703Returns x shifted by n bits to the left.
704.It Fn rshift x n
705Returns y shifted by n bits to the right.
706.El
707.Sh EXAMPLES
708Print lines longer than 72 characters:
709.Pp
710.Dl length($0) > 72
711.Pp
712Print first two fields in opposite order:
713.Pp
714.Dl { print $2, $1 }
715.Pp
716Same, with input fields separated by comma and/or blanks and tabs:
717.Bd -literal -offset indent
718BEGIN { FS = ",[ \et]*|[ \et]+" }
719      { print $2, $1 }
720.Ed
721.Pp
722Add up first column, print sum and average:
723.Bd -literal -offset indent
724{ s += $1 }
725END { print "sum is", s, " average is", s/NR }
726.Ed
727.Pp
728Print all lines between start/stop pairs:
729.Pp
730.Dl /start/, /stop/
731.Pp
732Simulate echo(1):
733.Bd -literal -offset indent
734BEGIN { # Simulate echo(1)
735        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
736        printf "\en"
737        exit }
738.Ed
739.Pp
740Print an error message to standard error:
741.Bd -literal -offset indent
742{ print "error!" > "/dev/stderr" }
743.Ed
744.Sh SEE ALSO
745.Xr lex 1 ,
746.Xr printf 1 ,
747.Xr sed 1 ,
748.Xr re_format 7 ,
749.Xr script 7
750.Pp
751"Awk \(em A Pattern Scanning and Processing Language",
752.Pa /usr/share/doc/usd/16.awk/ .
753.Rs
754.%A A. V. Aho
755.%A B. W. Kernighan
756.%A P. J. Weinberger
757.%T The AWK Programming Language
758.%I Addison-Wesley
759.%D 1988
760.%O ISBN 0-201-07981-X
761.Re
762.Sh STANDARDS
763The
764.Nm
765utility is compliant with the
766.St -p1003.1-2008
767specification.
768.Pp
769The flags
770.Op Fl \&dV
771and
772.Op Fl safe ,
773as well as the commands
774.Cm fflush , compl , and , or ,
775.Cm xor , lshift , rshift ,
776are extensions to that specification.
777.Pp
778.Nm
779does not support {n,m} pattern matching.
780.Sh HISTORY
781An
782.Nm
783utility appeared in
784.At v7 .
785.Sh BUGS
786There are no explicit conversions between numbers and strings.
787To force an expression to be treated as a number add 0 to it;
788to force it to be treated as a string concatenate
789.Li \&""
790to it.
791.Pp
792The scope rules for variables in functions are a botch;
793the syntax is worse.
794