xref: /minix3/external/historical/nawk/bin/awk.1 (revision 0a6a1f1d05b60e214de2f05a7310ddd1f0e590e7)
1.\"	$NetBSD: awk.1,v 1.4 2015/04/06 14:36:41 wiz Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.Dd April 6, 2015
26.Dt AWK 1
27.Os
28.Sh NAME
29.Nm awk
30.Nd pattern-directed scanning and processing language
31.Sh SYNOPSIS
32.Nm
33.Op Fl F Ar fs
34.Op Fl v Ar var=value
35.Op Fl safe
36.Op Fl d Ns Op Ar N
37.Op Ar prog | Fl f Ar filename
38.Ar
39.Nm
40.Fl version
41.Sh DESCRIPTION
42.Nm
43is the Bell Labs' implementation of the AWK programming language as
44described in the
45.Em The AWK Programming Language
46by
47A. V. Aho, B. W. Kernighan, and P. J. Weinberger.
48.Pp
49.Nm
50scans each input
51.Ar file
52for lines that match any of a set of patterns specified literally in
53.Ar prog
54or in one or more files
55specified as
56.Fl f Ar filename .
57With each pattern
58there can be an associated action that will be performed
59when a line of a
60.Ar file
61matches the pattern.
62Each line is matched against the
63pattern portion of every pattern-action statement;
64the associated action is performed for each matched pattern.
65The file name
66.Ar -
67means the standard input.
68Any
69.Ar file
70of the form
71.Ar var=value
72is treated as an assignment, not a filename,
73and is executed at the time it would have been opened if it were a filename.
74.Pp
75The options are as follows:
76.Bl -tag -width indent
77.It Fl d Ns Op Ar N
78Set debug level to specified number
79.Ar N .
80If the number is omitted, debug level is set to 1.
81.It Fl f Ar filename
82Read the AWK program source from specified file
83.Ar filename ,
84instead of the first command line argument.
85Multiple
86.Fl f
87options may be specified.
88.It Fl F Ar fs
89Set the input field separator
90.Va FS
91to the regular expression
92.Ar fs .
93.It Fl mr Ar NNN , Fl mf Ar NNN
94Obsolete, no longer needed options.
95Set limit on maximum record or
96fields number.
97.It Fl safe
98Potentially unsafe functions such as
99.Fn system
100make the program abort (with a warning message).
101.It Fl v Ar var Ns = Ns Ar value
102Assign the value
103.Ar value
104to the variable
105.Va var
106before
107.Ar prog
108is executed.
109Any number of
110.Fl v
111options may be present.
112.It Fl version
113Print
114.Nm
115version on standard output and exit.
116.El
117.Pp
118An input line is normally made up of fields separated by white space,
119or by regular expression
120.Va FS .
121The fields are denoted
122.Va $1 ,
123.Va $2 ,
124\&..., while
125.Va $0
126refers to the entire line.
127If
128.Va FS
129is null, the input line is split into one field per character.
130.Pp
131A pattern-action statement has the form
132.Lp
133.Dl pattern \&{ action \&}
134.Lp
135A missing \&{ action \&}
136means print the line;
137a missing pattern always matches.
138Pattern-action statements are separated by newlines or semicolons.
139.Pp
140An action is a sequence of statements.
141Statements are terminated by
142semicolons, newlines or right braces.
143An empty
144.Ar expression-list
145stands for
146.Va $0 .
147String constants are quoted
148.Em \&"\ \&" ,
149with the usual C escapes recognized within.
150Expressions take on string or numeric values as appropriate,
151and are built using the
152.Sx Operators
153(see next subsection).
154Variables may be scalars, array elements
155(denoted
156.Va x[i] )
157or fields.
158Variables are initialized to the null string.
159Array subscripts may be any string,
160not necessarily numeric;
161this allows for a form of associative memory.
162Multiple subscripts such as
163.Va [i,j,k]
164are permitted; the constituents are concatenated,
165separated by the value of
166.Va SUBSEP .
167.Ss Operators
168.Nm
169operators, in order of decreasing precedence, are:
170.Pp
171.Bl -tag -width ident -compact
172.It Ic (...)
173Grouping
174.It Ic $
175Field reference
176.It Ic ++ --
177Increment and decrement, can be used either as postfix or prefix.
178.It Ic ^
179Exponentiation (the
180.Ic **
181form is also supported, and
182.Ic **=
183for the assignment operator).
184.It + - \&!
185Unary plus, unary minus and logical negation.
186.It * / %
187Multiplication, division and modulus.
188.It + -
189Addition and subtraction.
190.It Ar space
191String concatenation.
192.It Ic \*[Lt] \*[Gt]
193.It Ic \*[Le] \*[Ge]
194.It Ic != ==
195Regular relational operators
196.It Ic ~ !~
197Regular expression match and not match
198.It Ic in
199Array membership
200.It Ic "\*[Am]\*[Am]"
201Logical AND
202.It Ic "||"
203Logical OR
204.It Ic ?:
205C conditional expression.
206This is used as
207.Ar expr1 Ic \&? Ar expr2 Ic \&: Ar expr3 No .
208If
209.Ar expr1
210is true, the result value is
211.Ar expr2 ,
212otherwise it is
213.Ar expr3 .
214Only one of
215.Ar expr2
216and
217.Ar expr3
218is evaluated.
219.It Ic = += -=
220.It Ic *= /= %= ^=
221Assignment and Operator-Assignment
222.El
223.Ss Control Statements
224The control statements are as follows:
225.Pp
226.Bl -hang -offset indent -width indent -compact
227.It Ic if \&( Ar expression Ic \&) Ar statement Bq Ic else Ar statement
228.It Ic while \&( Ar expression Ic \&) Ar statement
229.It Ic for \&( Ar expression Ic \&; Ar expression Ic \&; \
230Ar expression Ic \&) Ar statement
231.It Ic for \&( Va var Ic in Ar array Ic \&) Ar statement
232.It Ic do Ar statement Ic while \&( Ar expression Ic \&)
233.It Ic break
234.It Ic continue
235.It Ic delete Va array Bq Ar expression
236.It Ic delete Va array
237.It Ic exit Bq Ar expression
238.Ar expression
239.It Ic return Bq Ar expression
240.It Ic \&{ Ar [ statement ... ] Ic \&}
241.El
242.Ss I/O Statements
243The input/output statements are as follows:
244.Pp
245.Bl -tag -width indent
246.It Fn close expr
247Closes the file or pipe
248.Ar expr .
249Returns zero on success; otherwise nonzero.
250.It Fn fflush expr
251Flushes any buffered output for the file or pipe
252.Ar expr .
253Returns zero on success; otherwise nonzero.
254.It Ic getline Bq Va var
255Set
256.Va var
257(or
258.Va $0 if
259.Va var
260is not specified)
261to the next input record from the current input file.
262.Ic getline
263returns 1 for a successful input,
2640 for end of file, and \-1 for an error.
265.It Ic getline Bo Va var Bc Ic \*[Lt] Ar file
266Set
267.Va var
268(or
269.Va $0 if
270.Va var
271is not specified)
272to the next input record from the specified file
273.Ar file .
274.It Ar expr Ic \&| getline
275Pipes the output of
276.Ar expr
277into
278.Ic getline ;
279each call of
280.Ic getline
281returns the next line of output from
282.Ar expr .
283.It Ic next
284Skip remaining patterns on this input line.
285.It Ic nextfile
286Skip rest of this file, open next, start at top.
287.It Ic print Bo Ar expr-list Bc Bq Ic \*[Gt] Ar file
288The
289.Ic print
290statement prints its arguments on the standard output (or to a file
291if
292.Ic \*[Gt] file
293or to a pipe if
294.Ic | Ar expr
295is present),
296separated by the current output field separator
297.Va OFS ,
298and terminated by the
299output record separator
300.Va ORS .
301Both
302.Ar file
303and
304.Ar expr
305may be literal names or parenthesized expressions; identical string values in
306different statements denote the same open file.
307.It Ic printf Ar format Bo Ic \&, Ar expr-list Bc Bq Ic \*[Gt] Ar file
308Format and print its expression list according to
309.Ar format .
310See
311.Xr printf 3
312for list of supported formats and their meaning.
313.El
314.Ss Mathematical and Numeric Functions
315AWK has the following mathematical and numerical functions built-in:
316.Pp
317.Bl -tag -width indent
318.It Fn atan2 x y
319Returns the arctangent of
320.Ar x Ic / Ar y
321in radians.
322See also
323.Xr atan2 3 .
324.It Fn cos expr
325Computes the cosine of
326.Ar expr ,
327measured in radians.
328See also
329.Xr cos 3 .
330.It Fn exp expr
331Computes the exponential value of the given argument
332.Ar expr .
333See also
334.Xr exp 3 .
335.It Fn int expr
336Truncates
337.Ar expr
338to integer.
339.It Fn log expr
340Computes the value of the natural logarithm of argument
341.Ar expr .
342See also
343.Xr log 3 .
344.It Fn rand
345Returns random number between 0 and 1.
346.It Fn sin expr
347Computes the sine of
348.Ar expr ,
349measured in radians.
350See also
351.Xr sin 3 .
352.It Fn sqrt expr
353Computes the non-negative square root of
354.Ar expr .
355See also
356.Xr sqrt 3 .
357.It Fn srand [expr]
358Sets seed for random number generator (
359.Fn rand )
360and returns the previous seed.
361.El
362.Ss String Functions
363AWK has the following string functions built-in:
364.Pp
365.Bl -tag -width indent
366.It Fn gensub r s h [t]
367Search the target string
368.Ar t
369for matches of the regular expression
370.Ar r .
371If
372.Ar h
373is a string beginning with
374.Ic g
375or
376.Ic G ,
377then replace all matches of
378.Ar r
379with
380.Ar s .
381Otherwise,
382.Ar h
383is a number indicating which match of
384.Ar r
385to replace.
386If no
387.Ar t
388is supplied,
389.Va $0
390is used instead.
391.\"Within the replacement text
392.\".Ar s ,
393.\"the sequence
394.\".Ar \en ,
395.\"where
396.\".Ar n
397.\"is a digit from 1 to 9, may be used to indicate just the text that
398.\"matched the
399.\".Ar n Ap th
400.\"parenthesized subexpression.
401.\"The sequence
402.\".Ic \e0
403.\"represents the entire text, as does the character
404.\".Ic & .
405Unlike
406.Fn sub
407and
408.Fn gsub ,
409the modified string is returned as the result of the function,
410and the original target is
411.Em not
412changed.
413Note that the
414.Ar \en
415sequences within replacement string
416.Ar s
417supported by GNU
418.Nm
419are
420.Em not
421supported at this moment.
422.It Fn gsub r s "[t]"
423Same as
424.Fn sub
425except that all occurrences of the regular expression
426are replaced;
427.Fn sub
428and
429.Fn gsub
430return the number of replacements.
431.It Fn index s t
432the position in
433.Ar s
434where the string
435.Ar t
436occurs, or 0 if it does not.
437.It Fn length "[string]"
438the length of its argument
439taken as a string,
440or of
441.Va $0
442if no argument.
443.It Fn match s r
444the position in
445.Ar s
446where the regular expression
447.Ar r
448occurs, or 0 if it does not.
449The variables
450.Va RSTART
451and
452.Va RLENGTH
453are set to the position and length of the matched string.
454.It Fn split s a "[fs]"
455splits the string
456.Ar s
457into array elements
458.Va a[1] ,
459.Va a[2] ,
460\&...,
461.Va a[n] ,
462and returns
463.Va n .
464The separation is done with the regular expression
465.Ar fs
466or with the field separator
467.Va FS
468if
469.Ar fs
470is not given.
471An empty string as field separator splits the string
472into one array element per character.
473.It Fn sprintf fmt expr "..."
474Returns the string resulting from formatting
475.Ar expr
476according to the
477.Xr printf 3
478format
479.Ar fmt .
480.It Fn sub r s "[t]"
481substitutes
482.Ar s
483for the first occurrence of the regular expression
484.Ar r
485in the target string
486.Ar t .
487If
488.Ar t
489is not given,
490.Va $0
491is used.
492.It Fn substr s m [n]
493Returns the at most
494.Ar n Ns No -character
495substring of
496.Ar s
497starting at position
498.Ar m ,
499counted from 1.
500If
501.Ar n
502is omitted, the rest of
503.Ar s
504is returned.
505.It Fn tolower str
506returns a copy of
507.Ar str
508with all upper-case characters translated to their
509corresponding lower-case equivalents.
510.It Fn toupper str
511returns a copy of
512.Ar str
513with all lower-case characters translated to their
514corresponding upper-case equivalents.
515.El
516.Ss Time Functions
517This
518.Nm
519provides the following two functions for obtaining time
520stamps and formatting them:
521.Bl -tag -width indent
522.It Fn systime
523Returns the value of time in seconds since the start of
524.Tn Unix
525Epoch (Midnight, January 1, 1970, Coordinated Universal Time).
526See also
527.Xr time 3 .
528.It Fn strftime "[format [, timestamp]]"
529Formats the time
530.Ar timestamp
531according to the string
532.Ar format .
533.Ar timestamp
534should be in same form as value returned by
535.Fn systime .
536If
537.Ar timestamp
538is missing, current time is used.
539If
540.Ar format
541is missing, a default format equivalent to the output of
542.Xr date 1
543would be used.
544See the specification of ANSI C
545.Xr strftime 3
546for the format conversions which are supported.
547.El
548.Ss Other built-in functions
549.Bl -tag -width indent
550.It Fn system cmd
551executes
552.Ar cmd
553and returns its exit status
554.El
555.Ss Patterns
556Patterns are arbitrary Boolean combinations
557(with
558.Ic "! || \*[Am]\*[Am]" )
559of regular expressions and
560relational expressions.
561Regular expressions are as in
562.Xr egrep 1 .
563Isolated regular expressions
564in a pattern apply to the entire line.
565Regular expressions may also occur in
566relational expressions, using the operators
567.Ic ~
568and
569.Ic !~ .
570.Ic / re /
571is a constant regular expression;
572any string (constant or variable) may be used
573as a regular expression, except in the position of an isolated regular expression
574in a pattern.
575.Pp
576A pattern may consist of two patterns separated by a comma;
577in this case, the action is performed for all lines
578from an occurrence of the first pattern
579though an occurrence of the second.
580.Pp
581A relational expression is one of the following:
582.Bl -tag -offset indent -width indent -compact
583.It Ar expression matchop regular-expression
584.It Ar expression relop expression
585.It Ar expression Ic in Ar array-name
586.It ( Ar expr , expr,\&... Ic ") in" Ar array-name
587.El
588.Pp
589where a
590.Ar relop
591is any of the six relational operators in C,
592and a
593.Ar matchop
594is either
595.Ic ~
596(matches)
597or
598.Ic !~
599(does not match).
600A conditional is an arithmetic expression,
601a relational expression,
602or a Boolean combination
603of these.
604.Pp
605The special patterns
606.Ic BEGIN
607and
608.Ic END
609may be used to capture control before the first input line is read
610and after the last.
611.Ic BEGIN
612and
613.Ic END
614do not combine with other patterns.
615.Ss Built-in Variables
616Variable names with special meanings:
617.Bl -hang -width FILENAMES
618.It Va ARGC
619argument count, assignable
620.It Va ARGV
621argument array, assignable;
622non-null members are taken as filenames
623.It Va CONVFMT
624conversion format used when converting numbers
625(default
626.Qq %.6g )
627.It Va ENVIRON
628array of environment variables; subscripts are names.
629.It Va FILENAME
630the name of the current input file
631.It Va FNR
632ordinal number of the current record in the current file
633.It Va FS
634regular expression used to separate fields; also settable
635by option
636.Fl F Ar fs .
637.It Va NF
638number of fields in the current record
639.It Va NR
640ordinal number of the current record
641.It Va OFMT
642output format for numbers (default
643.Qq "%.6g"
644)
645.It Va OFS
646output field separator (default blank)
647.It Va ORS
648output record separator (default newline)
649.It Va RS
650input record separator (default newline)
651.It Va RSTART
652Position of the first character matched by
653.Fn match ;
6540 if not match.
655.It Va RLENGTH
656Length of the string matched by
657.Fn match ;
658-1 if no match.
659.It Va SUBSEP
660separates multiple subscripts (default 034)
661.El
662.Ss Functions
663Functions may be defined (at the position of a pattern-action statement) thus:
664.Bd -filled -offset indent
665.Ic function foo(a, b, c) { ...; return x }
666.Ed
667.Pp
668Parameters are passed by value if scalar and by reference if array name;
669functions may be called recursively.
670Parameters are local to the function; all other variables are global.
671Thus local variables may be created by providing excess parameters in
672the function definition.
673.Sh EXAMPLES
674.Bl -tag -width indent -compact
675.It Ic length($0) \*[Gt] 72
676Print lines longer than 72 characters.
677.Pp
678.It Ic \&{ print $2, $1 \&}
679Print first two fields in opposite order.
680.Pp
681.It Ic BEGIN { FS =  \&",[ \et]*|[ \et]+\&" }
682.It Ic "\ \ \ \ \ \ {" print \&$2, \&$1 }
683Same, with input fields separated by comma and/or blanks and tabs.
684.Pp
685.It Ic "\ \ \ \ {" s += $1 }
686.It Ic END { print \&"sum is\&", s, \&" average is\ \&",\ s/NR\ }
687Add up first column, print sum and average.
688.Pp
689.It Ic /start/, /stop/
690Print all lines between start/stop pairs.
691.Pp
692.It Ic BEGIN { # Simulate echo(1)
693.It Ic "\ \ \ \ " for (i = 1; i \*[Lt] ARGC;\ i++)\ printf\ \&"%s\ \&",\ ARGV[i]
694.It Ic "\ \ \ \ " printf \&"\en\&"
695.It Ic "\ \ \ \ " exit }
696.El
697.Sh SEE ALSO
698.Xr egrep 1 ,
699.Xr lex 1 ,
700.Xr sed 1 ,
701.Xr atan2 3 ,
702.Xr cos 3 ,
703.Xr exp 3 ,
704.Xr log 3 ,
705.Xr sin 3 ,
706.Xr sqrt 3 ,
707.Xr strftime 3 ,
708.Xr time 3
709.Pp
710A. V. Aho, B. W. Kernighan, P. J. Weinberger,
711.Em The AWK Programming Language ,
712Addison-Wesley, 1988.
713ISBN 0-201-07981-X
714.Pp
715.Em AWK Language Programming ,
716Edition 1.0, published by the Free Software Foundation, 1995
717.Sh HISTORY
718.Nm nawk
719has been the default system
720.Nm
721since
722.Nx 2.0 ,
723replacing the previously used GNU
724.Nm .
725.Sh BUGS
726There are no explicit conversions between numbers and strings.
727To force an expression to be treated as a number add 0 to it;
728to force it to be treated as a string concatenate
729\&"\&" to it.
730.Pp
731The scope rules for variables in functions are a botch;
732the syntax is worse.
733