xref: /openbsd-src/usr.bin/awk/awk.1 (revision 8500990981f885cbe5e6a4958549cacc238b5ae6)
1.\"	$OpenBSD: awk.1,v 1.15 2003/11/24 10:58:08 jmc Exp $
2.\" EX/EE is a Bd
3.\"
4.\" Copyright (C) Lucent Technologies 1997
5.\" All Rights Reserved
6.\"
7.\" Permission to use, copy, modify, and distribute this software and
8.\" its documentation for any purpose and without fee is hereby
9.\" granted, provided that the above copyright notice appear in all
10.\" copies and that both that the copyright notice and this
11.\" permission notice and warranty disclaimer appear in supporting
12.\" documentation, and that the name Lucent Technologies or any of
13.\" its entities not be used in advertising or publicity pertaining
14.\" to distribution of the software without specific, written prior
15.\" permission.
16.\"
17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
24.\" THIS SOFTWARE.
25.\"
26.Dd June 29, 1996
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl F Ar fs
35.Op Fl v Ar var=value
36.Op Fl safe
37.Op Fl mr Ar n
38.Op Fl mf Ar n
39.Op Ar prog | Fl f Ar progfile
40.Ar
41.Nm nawk
42.Ar ...
43.Sh DESCRIPTION
44.Nm
45scans each input
46.Ar file
47for lines that match any of a set of patterns specified literally in
48.Ar prog
49or in one or more files
50specified as
51.Fl f Ar progfile .
52With each pattern
53there can be an associated action that will be performed
54when a line of a
55.Ar file
56matches the pattern.
57Each line is matched against the
58pattern portion of every pattern-action statement;
59the associated action is performed for each matched pattern.
60The file name
61.Sq Pa \-
62means the standard input.
63Any
64.Ar file
65of the form
66.Ar var=value
67is treated as an assignment, not a filename,
68and is executed at the time it would have been opened if it were a filename.
69The option
70.Fl v
71followed by
72.Ar var=value
73is an assignment to be done before
74.Ar prog
75is executed;
76any number of
77.Fl v
78options may be present.
79The
80.Fl F Ar fs
81option defines the input field separator to be the regular expression
82.Ar fs .
83The
84.Fl safe
85option disables file output
86.Po
87.Ic print Ic > ,
88.Ic print Ic >> ,
89.Pc
90process creation
91.Po
92.Ar cmd Ic \&| getline ,
93.Ic print \&| , system
94.Pc
95and access to the environment
96.Pq Va ENVIRON .
97This
98is a first (and not very reliable) approximation to a
99.Dq safe
100version of
101.Nm awk .
102.Pp
103An input line is normally made up of fields separated by whitespace,
104or by regular expression
105.Va FS .
106The fields are denoted
107.Va $1 , $2 , ... ,
108while
109.Va $0
110refers to the entire line.
111If
112.Va FS
113is null, the input line is split into one field per character.
114.Pp
115To compensate for inadequate implementation of storage management,
116the
117.Fl mr
118option can be used to set the maximum size of the input record,
119and the
120.Fl mf
121option to set the maximum number of fields.
122.Pp
123A pattern-action statement has the form
124.Pp
125.D1 Ar pattern Ic \&{ Ar action Ic \&}
126.Pp
127A missing
128.Ic \&{ Ar action Ic \&}
129means print the line;
130a missing pattern always matches.
131Pattern-action statements are separated by newlines or semicolons.
132.Pp
133An action is a sequence of statements.
134A statement can be one of the following:
135.Bd -unfilled -offset indent
136.Ic if ( Xo
137.Ar expression ) statement \&
138.Op Ic else Ar statement
139.Xc
140.Ic while ( Ar expression ) statement
141.Ic for ( Xo
142.Ar expression ; expression ; expression ) statement
143.Xc
144.Ic for ( Xo
145.Ar var Ic in Ar array ) statement
146.Xc
147.Ic do Ar statement Ic while ( Ar expression )
148.Ic break
149.Ic continue
150.Ic { Oo Ar statement ... Oc Ic \& }
151.Ar expression Xo
152.No "# commonly" \&
153.Ar var Ic = Ar expression
154.Xc
155.Ic print Xo
156.Op Ar expression-list
157.Op Ic > Ns Ar expression
158.Xc
159.Ic printf Ar format Xo
160.Op Ar ... , expression-list
161.Op Ic > Ns Ar expression
162.Xc
163.Ic return Op Ar expression
164.Ic next Xo
165.No "# skip remaining patterns on this input line"
166.Xc
167.Ic nextfile Xo
168.No "# skip rest of this file, open next, start at top"
169.Xc
170.Ic delete Ar array Ns Xo
171.Ic \&[ Ns Ar expression Ns Ic \&]
172.No \& "# delete an array element"
173.Xc
174.Ic delete Ar array Xo
175.No "# delete all elements of array"
176.Xc
177.Ic exit Xo
178.Op Ar expression
179.No \& "# exit immediately; status is" Ar expression
180.Xc
181.Ed
182.Pp
183Statements are terminated by
184semicolons, newlines or right braces.
185An empty
186.Ar expression-list
187stands for
188.Ar $0 .
189String constants are quoted
190.Li \&"" ,
191with the usual C escapes recognized within.
192Expressions take on string or numeric values as appropriate,
193and are built using the operators
194.Ic + \- * / % ^
195(exponentiation), and concatenation (indicated by whitespace).
196The operators
197.Ic \&! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
198are also available in expressions.
199Variables may be scalars, array elements
200(denoted
201.Li x[i] )
202or fields.
203Variables are initialized to the null string.
204Array subscripts may be any string,
205not necessarily numeric;
206this allows for a form of associative memory.
207Multiple subscripts such as
208.Li [i,j,k]
209are permitted; the constituents are concatenated,
210separated by the value of
211.Va SUBSEP .
212.Pp
213The
214.Ic print
215statement prints its arguments on the standard output
216(or on a file if
217.Ic > Ns Ar file
218or
219.Ic >> Ns Ar file
220is present or on a pipe if
221.Ic \&| Ar cmd
222is present), separated by the current output field separator,
223and terminated by the output record separator.
224.Ar file
225and
226.Ar cmd
227may be literal names or parenthesized expressions;
228identical string values in different statements denote
229the same open file.
230The
231.Ic printf
232statement formats its expression list according to the format
233(see
234.Xr printf 3 ) .
235The built-in function
236.Fn close expr
237closes the file or pipe
238.Fa expr .
239The built-in function
240.Fn fflush expr
241flushes any buffered output for the file or pipe
242.Fa expr .
243.Pp
244The mathematical functions
245.Fn exp ,
246.Fn log ,
247.Fn sqrt ,
248.Fn sin ,
249.Fn cos ,
250and
251.Fn atan2
252are built in.
253Other built-in functions:
254.Bl -tag -width Fn
255.It Fn length
256the length of its argument
257taken as a string,
258or of
259.Va $0
260if no argument.
261.It Fn rand
262random number on (0,1)
263.It Fn srand
264sets seed for
265.Fn rand
266and returns the previous seed.
267.It Fn int
268truncates to an integer value.
269.It Fn substr s m n
270the
271.Fa n Ns No -character
272substring of
273.Fa s
274that begins at position
275.Fa m
276counted from 1.
277.It Fn index s t
278the position in
279.Fa s
280where the string
281.Fa t
282occurs, or 0 if it does not.
283.It Fn match s r
284the position in
285.Fa s
286where the regular expression
287.Fa r
288occurs, or 0 if it does not.
289The variables
290.Va RSTART
291and
292.Va RLENGTH
293are set to the position and length of the matched string.
294.It Fn split s a fs
295splits the string
296.Fa s
297into array elements
298.Va a[1] , a[2] , ... , a[n]
299and returns
300.Va n .
301The separation is done with the regular expression
302.Ar fs
303or with the field separator
304.Va FS
305if
306.Ar fs
307is not given.
308An empty string as field separator splits the string
309into one array element per character.
310.It Fn sub r t s
311substitutes
312.Fa t
313for the first occurrence of the regular expression
314.Fa r
315in the string
316.Fa s .
317If
318.Fa s
319is not given,
320.Va $0
321is used.
322.It Fn gsub r t s
323same as
324.Fn sub
325except that all occurrences of the regular expression
326are replaced;
327.Fn sub
328and
329.Fn gsub
330return the number of replacements.
331.It Fn sprintf fmt expr ...
332the string resulting from formatting
333.Fa expr , ...
334according to the
335.Xr printf 3
336format
337.Fa fmt .
338.It Fn system cmd
339executes
340.Fa cmd
341and returns its exit status.
342.It Fn tolower str
343returns a copy of
344.Fa str
345with all upper-case characters translated to their
346corresponding lower-case equivalents.
347.It Fn toupper str
348returns a copy of
349.Fa str
350with all lower-case characters translated to their
351corresponding upper-case equivalents.
352.El
353.Pp
354The
355.Sq function
356.Ic getline
357sets
358.Va $0
359to the next input record from the current input file;
360.Ic getline < Ar file
361sets
362.Va $0
363to the next record from
364.Ar file .
365.Ic getline Va x
366sets variable
367.Va x
368instead.
369Finally,
370.Ar cmd Ic \&| getline
371pipes the output of
372.Ar cmd
373into
374.Ic getline ;
375each call of
376.Ic getline
377returns the next line of output from
378.Ar cmd .
379In all cases,
380.Ic getline
381returns 1 for a successful input,
3820 for end of file, and \-1 for an error.
383.Pp
384Patterns are arbitrary Boolean combinations
385(with
386.Ic "\&! || &&" )
387of regular expressions and
388relational expressions.
389Regular expressions are as in
390.Xr egrep 1 .
391Isolated regular expressions
392in a pattern apply to the entire line.
393Regular expressions may also occur in
394relational expressions, using the operators
395.Ic ~
396and
397.Ic !~ .
398.Ic / Ns Ar re Ns Ic /
399is a constant regular expression;
400any string (constant or variable) may be used
401as a regular expression, except in the position of an isolated regular expression
402in a pattern.
403.Pp
404A pattern may consist of two patterns separated by a comma;
405in this case, the action is performed for all lines
406from an occurrence of the first pattern
407through an occurrence of the second.
408.Pp
409A relational expression is one of the following:
410.Bd -unfilled -offset indent
411.Ar expression matchop regular-expression
412.Ar expression relop expression
413.Ar expression Ic in Ar array-name
414.Ic \&( Ns Xo
415.Ar expr , expr , \&... Ns Ic \&) in
416.Ar \& array-name
417.Xc
418.Ed
419.Pp
420where a
421.Ar relop
422is any of the six relational operators in C, and a
423.Ar matchop
424is either
425.Ic ~
426(matches)
427or
428.Ic !~
429(does not match).
430A conditional is an arithmetic expression,
431a relational expression,
432or a Boolean combination
433of these.
434.Pp
435The special patterns
436.Ic BEGIN
437and
438.Ic END
439may be used to capture control before the first input line is read
440and after the last.
441.Ic BEGIN
442and
443.Ic END
444do not combine with other patterns.
445.Pp
446Variable names with special meanings:
447.Pp
448.Bl -tag -width Va -compact
449.It Va CONVFMT
450conversion format used when converting numbers
451(default
452.Qq Li %.6g )
453.It Va FS
454regular expression used to separate fields; also settable
455by option
456.Fl F Ar fs .
457.It Va NF
458number of fields in the current record
459.It Va NR
460ordinal number of the current record
461.It Va FNR
462ordinal number of the current record in the current file
463.It Va FILENAME
464the name of the current input file
465.It Va RS
466input record separator (default newline)
467.It Va OFS
468output field separator (default blank)
469.It Va ORS
470output record separator (default newline)
471.It Va OFMT
472output format for numbers (default
473.Qq Li %.6g )
474.It Va SUBSEP
475separates multiple subscripts (default 034)
476.It Va ARGC
477argument count, assignable
478.It Va ARGV
479argument array, assignable;
480non-null members are taken as filenames
481.It Va ENVIRON
482array of environment variables; subscripts are names.
483.El
484.Pp
485Functions may be defined (at the position of a pattern-action statement)
486thusly:
487.Pp
488.Dl function foo(a, b, c) { ...; return x }
489.Pp
490Parameters are passed by value if scalar and by reference if array name;
491functions may be called recursively.
492Parameters are local to the function; all other variables are global.
493Thus local variables may be created by providing excess parameters in
494the function definition.
495.Sh EXAMPLES
496.Dl length($0) > 72
497Print lines longer than 72 characters.
498.Pp
499.Dl { print $2, $1 }
500Print first two fields in opposite order.
501.Bd -literal -offset indent
502BEGIN { FS = ",[ \et]*|[ \et]+" }
503      { print $2, $1 }
504.Ed
505Same, with input fields separated by comma and/or blanks and tabs.
506.Bd -literal -offset indent
507{ s += $1 }
508END { print "sum is", s, " average is", s/NR }
509.Ed
510Add up first column, print sum and average.
511.Pp
512.Dl /start/, /stop/
513Print all lines between start/stop pairs.
514.Bd -literal -offset indent
515BEGIN { # Simulate echo(1)
516        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
517        printf "\en"
518        exit }
519.Ed
520.Sh SEE ALSO
521.Xr lex 1 ,
522.Xr sed 1
523.Rs
524.%A A. V. Aho
525.%A B. W. Kernighan
526.%A P. J. Weinberger
527.%T The AWK Programming Language
528.%I Addison-Wesley
529.%D 1988
530.%O ISBN 0-201-07981-X
531.Re
532.Sh HISTORY
533An
534.Nm
535utility appeared in
536.At v7 .
537.Sh BUGS
538There are no explicit conversions between numbers and strings.
539To force an expression to be treated as a number add 0 to it;
540to force it to be treated as a string concatenate
541.Li \&""
542to it.
543.Pp
544The scope rules for variables in functions are a botch;
545the syntax is worse.
546