xref: /openbsd-src/usr.bin/awk/awk.1 (revision b2ea75c1b17e1a9a339660e7ed45cd24946b230e)
1.\"	$OpenBSD: awk.1,v 1.8 2000/11/10 05:10:21 aaron Exp $
2.\" EX/EE is a Bd
3.Dd June 29, 1996
4.Dt AWK 1
5.Os
6.Sh NAME
7.Nm awk
8.Nd pattern-directed scanning and processing language
9.Sh SYNOPSIS
10.Nm awk
11.Op Fl F Ar fs
12.Op Fl v Ar var=value
13.Op Fl safe
14.Op Fl mr Ar n
15.Op Fl mf Ar n
16.Op Ar prog | Fl f Ar progfile
17.Ar
18.Nm nawk
19.Ar ...
20.Sh DESCRIPTION
21.Nm
22scans each input
23.Ar file
24for lines that match any of a set of patterns specified literally in
25.Ar prog
26or in one or more files
27specified as
28.Fl f Ar progfile .
29With each pattern
30there can be an associated action that will be performed
31when a line of a
32.Ar file
33matches the pattern.
34Each line is matched against the
35pattern portion of every pattern-action statement;
36the associated action is performed for each matched pattern.
37The file name
38.Sq Pa \-
39means the standard input.
40Any
41.Ar file
42of the form
43.Ar var=value
44is treated as an assignment, not a filename,
45and is executed at the time it would have been opened if it were a filename.
46The option
47.Fl v
48followed by
49.Ar var=value
50is an assignment to be done before
51.Ar prog
52is executed;
53any number of
54.Fl v
55options may be present.
56The
57.Fl F Ar fs
58option defines the input field separator to be the regular expression
59.Ar fs .
60The
61.Fl safe
62option disables file output
63.Po
64.Ic print Ic > ,
65.Ic print Ic >> ,
66.Pc
67process creation
68.Po
69.Ar cmd Ic \&| getline ,
70.Ic print \&| , system
71.Pc
72and access to the environment
73.Pq Va ENVIRON .
74This
75is a first (and not very reliable) approximation to a
76.Dq safe
77version of
78.Nm awk .
79.Pp
80An input line is normally made up of fields separated by whitespace,
81or by regular expression
82.Va FS .
83The fields are denoted
84.Va $1 , $2 , ... ,
85while
86.Va $0
87refers to the entire line.
88If
89.Va FS
90is null, the input line is split into one field per character.
91.Pp
92To compensate for inadequate implementation of storage management,
93the
94.Fl mr
95option can be used to set the maximum size of the input record,
96and the
97.Fl mf
98option to set the maximum number of fields.
99.Pp
100A pattern-action statement has the form
101.Pp
102.D1 Ar pattern Ic \&{ Ar action Ic \&}
103.Pp
104A missing
105.Ic \&{ Ar action Ic \&}
106means print the line;
107a missing pattern always matches.
108Pattern-action statements are separated by newlines or semicolons.
109.Pp
110An action is a sequence of statements.
111A statement can be one of the following:
112.Pp
113.Bd -unfilled -offset indent
114.Ic if ( Xo
115.Ar expression ) statement \&
116.Op Ic else Ar statement
117.Xc
118.Ic while ( Ar expression ) statement
119.Ic for ( Xo
120.Ar expression ; expression ; expression ) statement
121.Xc
122.Ic for ( Xo
123.Ar var Ic in Ar array ) statement
124.Xc
125.Ic do Ar statement Ic while ( Ar expression )
126.Ic break
127.Ic continue
128.Ic { Oo Ar statement ... Oc Ic \& }
129.Ar expression Xo
130.No "# commonly" \&
131.Ar var Ic = Ar expression
132.Xc
133.Ic print Xo
134.Op Ar expression-list
135.Op Ic > Ns Ar expression
136.Xc
137.Ic printf Ar format Xo
138.Op Ar ... , expression-list
139.Op Ic > Ns Ar expression
140.Xc
141.Ic return Op Ar expression
142.Ic next Xo
143.No "# skip remaining patterns on this input line"
144.Xc
145.Ic nextfile Xo
146.No "# skip rest of this file, open next, start at top"
147.Xc
148.Ic delete Ar array Ns Xo
149.Ic \&[ Ns Ar expression Ns Ic \&]
150.No \& "# delete an array element"
151.Xc
152.Ic delete Ar array Xo
153.No "# delete all elements of array"
154.Xc
155.Ic exit Xo
156.Op Ar expression
157.No \& "# exit immediately; status is" Ar expression
158.Xc
159.Ed
160.Pp
161Statements are terminated by
162semicolons, newlines or right braces.
163An empty
164.Ar expression-list
165stands for
166.Ar $0 .
167String constants are quoted
168.Li \&"" ,
169with the usual C escapes recognized within.
170Expressions take on string or numeric values as appropriate,
171and are built using the operators
172.Ic + \- * / % ^
173(exponentiation), and concatenation (indicated by whitespace).
174The operators
175.Ic ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
176are also available in expressions.
177Variables may be scalars, array elements
178(denoted
179.Li x[i] )
180or fields.
181Variables are initialized to the null string.
182Array subscripts may be any string,
183not necessarily numeric;
184this allows for a form of associative memory.
185Multiple subscripts such as
186.Li [i,j,k]
187are permitted; the constituents are concatenated,
188separated by the value of
189.Va SUBSEP .
190.Pp
191The
192.Ic print
193statement prints its arguments on the standard output
194(or on a file if
195.Ic > Ns Ar file
196or
197.Ic >> Ns Ar file
198is present or on a pipe if
199.Ic \&| Ar cmd
200is present), separated by the current output field separator,
201and terminated by the output record separator.
202.Ar file
203and
204.Ar cmd
205may be literal names or parenthesized expressions;
206identical string values in different statements denote
207the same open file.
208The
209.Ic printf
210statement formats its expression list according to the format
211(see
212.Xr printf 3 .
213The built-in function
214.Fn close expr
215closes the file or pipe
216.Fa expr .
217The built-in function
218.Fn fflush expr
219flushes any buffered output for the file or pipe
220.Fa expr .
221.Pp
222The mathematical functions
223.Fn exp ,
224.Fn log ,
225.Fn sqrt ,
226.Fn sin ,
227.Fn cos ,
228and
229.Fn atan2
230are built in.
231Other built-in functions:
232.Pp
233.Bl -tag -width Fn
234.It Fn length
235the length of its argument
236taken as a string,
237or of
238.Va $0
239if no argument.
240.It Fn rand
241random number on (0,1)
242.It Fn srand
243sets seed for
244.Fn rand
245and returns the previous seed.
246.It Fn int
247truncates to an integer value.
248.It Fn substr s m n
249the
250.Fa n Ns No -character
251substring of
252.Fa s
253that begins at position
254.Fa m
255counted from 1.
256.It Fn index s t
257the position in
258.Fa s
259where the string
260.Fa t
261occurs, or 0 if it does not.
262.It Fn match s r
263the position in
264.Fa s
265where the regular expression
266.Fa r
267occurs, or 0 if it does not.
268The variables
269.Va RSTART
270and
271.Va RLENGTH
272are set to the position and length of the matched string.
273.It Fn split s a fs
274splits the string
275.Fa s
276into array elements
277.Va a[1] , a[2] , ... , a[n]
278and returns
279.Va n .
280The separation is done with the regular expression
281.Ar fs
282or with the field separator
283.Va FS
284if
285.Ar fs
286is not given.
287An empty string as field separator splits the string
288into one array element per character.
289.It Fn sub r t s
290substitutes
291.Fa t
292for the first occurrence of the regular expression
293.Fa r
294in the string
295.Fa s .
296If
297.Fa s
298is not given,
299.Va $0
300is used.
301.It Fn gsub r t s
302same as
303.Fn sub
304except that all occurrences of the regular expression
305are replaced;
306.Fn sub
307and
308.Fn gsub
309return the number of replacements.
310.It Fn sprintf fmt expr ...
311the string resulting from formatting
312.Fa expr , ...
313according to the
314.Xr printf 3
315format
316.Fa fmt .
317.It Fn system cmd
318executes
319.Fa cmd
320and returns its exit status.
321.It Fn tolower str
322returns a copy of
323.Fa str
324with all upper-case characters translated to their
325corresponding lower-case equivalents.
326.It Fn toupper str
327returns a copy of
328.Fa str
329with all lower-case characters translated to their
330corresponding upper-case equivalents.
331.El
332.Pp
333The
334.Sq function
335.Ic getline
336sets
337.Va $0
338to the next input record from the current input file;
339.Ic getline < Ar file
340sets
341.Va $0
342to the next record from
343.Ar file .
344.Ic getline Va x
345sets variable
346.Va x
347instead.
348Finally,
349.Ar cmd Ic \&| getline
350pipes the output of
351.Ar cmd
352into
353.Ic getline ;
354each call of
355.Ic getline
356returns the next line of output from
357.Ar cmd .
358In all cases,
359.Ic getline
360returns 1 for a successful input,
3610 for end of file, and \-1 for an error.
362.Pp
363Patterns are arbitrary Boolean combinations
364(with
365.Ic "! || &&" )
366of regular expressions and
367relational expressions.
368Regular expressions are as in
369.Xr egrep  1 .
370Isolated regular expressions
371in a pattern apply to the entire line.
372Regular expressions may also occur in
373relational expressions, using the operators
374.Ic ~
375and
376.Ic !~ .
377.Ic / Ns Ar re Ns Ic /
378is a constant regular expression;
379any string (constant or variable) may be used
380as a regular expression, except in the position of an isolated regular expression
381in a pattern.
382.Pp
383A pattern may consist of two patterns separated by a comma;
384in this case, the action is performed for all lines
385from an occurrence of the first pattern
386though an occurrence of the second.
387.Pp
388A relational expression is one of the following:
389.Bd -unfilled -offset indent
390.Ar expression matchop regular-expression
391.Ar expression relop expression
392.Ar expression Ic in Ar array-name
393.Ic \&( Ns Xo
394.Ar expr , expr , \&... Ns Ic \&) in
395.Ar \& array-name
396.Xc
397.Ed
398where a
399.Ar relop
400is any of the six relational operators in C, and a
401.Ar matchop
402is either
403.Ic ~
404(matches)
405or
406.Ic !~
407(does not match).
408A conditional is an arithmetic expression,
409a relational expression,
410or a Boolean combination
411of these.
412.Pp
413The special patterns
414.Ic BEGIN
415and
416.Ic END
417may be used to capture control before the first input line is read
418and after the last.
419.Ic BEGIN
420and
421.Ic END
422do not combine with other patterns.
423.Pp
424Variable names with special meanings:
425.Pp
426.Bl -tag -width Va -compact
427.It Va CONVFMT
428conversion format used when converting numbers
429(default
430.Qq Li %.6g )
431.It Va FS
432regular expression used to separate fields; also settable
433by option
434.Fl fs .
435.It Va NF
436number of fields in the current record
437.It Va NR
438ordinal number of the current record
439.It Va FNR
440ordinal number of the current record in the current file
441.It Va FILENAME
442the name of the current input file
443.It Va RS
444input record separator (default newline)
445.It Va OFS
446output field separator (default blank)
447.It Va ORS
448output record separator (default newline)
449.It Va OFMT
450output format for numbers (default
451.Qq Li %.6g )
452.It Va SUBSEP
453separates multiple subscripts (default 034)
454.It Va ARGC
455argument count, assignable
456.It Va ARGV
457argument array, assignable;
458non-null members are taken as filenames
459.It Va ENVIRON
460array of environment variables; subscripts are names.
461.El
462.Pp
463Functions may be defined (at the position of a pattern-action statement)
464thusly:
465.Pp
466.Dl function foo(a, b, c) { ...; return x }
467.Pp
468Parameters are passed by value if scalar and by reference if array name;
469functions may be called recursively.
470Parameters are local to the function; all other variables are global.
471Thus local variables may be created by providing excess parameters in
472the function definition.
473.Sh EXAMPLES
474.Dl length($0) > 72
475Print lines longer than 72 characters.
476.Pp
477.Dl { print $2, $1 }
478Print first two fields in opposite order.
479.Pp
480.Bd -literal -offset indent
481BEGIN { FS = ",[ \et]*|[ \et]+" }
482      { print $2, $1 }
483.Ed
484Same, with input fields separated by comma and/or blanks and tabs.
485.Pp
486.Bd -literal -offset indent
487{ s += $1 }
488END { print "sum is", s, " average is", s/NR }
489.Ed
490Add up first column, print sum and average.
491.Pp
492.Dl /start/, /stop/
493Print all lines between start/stop pairs.
494.Pp
495.Bd -literal -offset indent
496BEGIN { # Simulate echo(1)
497        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
498        printf "\en"
499        exit }
500.Ed
501.Sh SEE ALSO
502.Xr lex 1 ,
503.Xr sed 1
504.Rs
505.%A A. V. Aho
506.%A B. W. Kernighan
507.%A P. J. Weinberger
508.%T The AWK Programming Language
509.%I Addison-Wesley
510.%D 1988
511.%O ISBN 0-201-07981-X
512.Re
513.Sh HISTORY
514AT&T
515.Nm
516by B. W. Kernighan was updated for
517.Bx 4.4
518and again in 1996.
519.Sh BUGS
520There are no explicit conversions between numbers and strings.
521To force an expression to be treated as a number add 0 to it;
522to force it to be treated as a string concatenate
523.Li \&""
524to it.
525.Pp
526The scope rules for variables in functions are a botch;
527the syntax is worse.
528