xref: /netbsd-src/external/historical/nawk/bin/awk.1 (revision d704875fa206120d952a023b7b0174099b3d64af)
1.\"	$NetBSD: awk.1,v 1.35 2024/09/20 07:49:31 rin Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.Dd July 5, 2022
26.Dt AWK 1
27.Os
28.Sh NAME
29.Nm awk
30.Nd pattern-directed scanning and processing language
31.Sh SYNOPSIS
32.Nm
33.Op Fl F Ar fs
34.Op Fl v Ar var\| Ns Cm \&= Ns Ar value
35.Op Fl safe
36.Op Fl d Ns Op Ar N
37.Op Ar prog | Fl f Ar progfile
38.Ar
39.Nm
40.Fl version
41.Sh DESCRIPTION
42.Nm
43is the Bell Labs' implementation of the AWK programming language as
44described in the
45.Em The AWK Programming Language
46by
47A.V.\~Aho, B.W.\~Kernighan, P.\^J.\~Weinberger.
48.Pp
49.Nm
50scans each input
51.Ar file
52for lines that match any of a set of patterns specified literally in
53.Ar prog
54or in one or more files
55specified as
56.Fl f Ar progfile .
57With each pattern
58there can be an associated action that will be performed
59when a line of a
60.Ar file
61matches the pattern.
62Each line is matched against the
63pattern portion of every pattern-action statement;
64the associated action is performed for each matched pattern.
65The file name
66.Ar -
67means the standard input.
68Any
69.Ar file
70of the form
71.Ar var\| Ns Cm \&= Ns Ar value
72is treated as an assignment, not a filename,
73and is executed at the time it would have been opened if it were a filename.
74The option
75.Fl v
76followed by
77.Ar var\| Ns Cm \&= Ns Ar value
78is an assignment to be done before
79.Ar prog
80is executed; any number of
81.Fl v
82options may be present.
83The
84.Fl F Ar fs
85option defines the input field separator to be the regular expression
86.Ar fs .
87.Pp
88The options are as follows:
89.Bl -tag -width Fl
90.It Fl d Ns Op Ar N
91Set debug level to specified number
92.Ar N .
93If the number is omitted, debug level is set to 1.
94.It Fl f Ar filename
95Read the AWK program source from specified file
96.Ar filename ,
97instead of the first command line argument.
98Multiple
99.Fl f
100options may be specified.
101.It Fl F Ar fs
102Set the input field separator
103.Va FS
104to the regular expression
105.Ar fs .
106.It Fl mr Ar NNN , Fl mf Ar NNN
107Obsolete, no longer needed options.
108Set limit on maximum record or
109fields number.
110.It Fl safe
111Potentially unsafe functions such as
112.Fn system
113make the program abort (with a warning message).
114.It Fl v Ar var\| Ns Cm \&= Ns Ar value
115Assign the value
116.Ar value
117to the variable
118.Ar var
119before
120.Ar prog
121is executed.
122Any number of
123.Fl v
124options may be present.
125.It Fl version
126Print
127.Nm
128version on standard output and exit.
129.El
130.Pp
131An input line is normally made up of fields separated by white space,
132or by the regular expression the built-in variable
133.Va FS
134is set to.
135If
136.Va FS
137is null, the input line is split into one field per character.
138The fields are denoted
139.Li $ Ns Va 1 ,
140.Li $ Ns Va 2 ,
141\&..., while
142.Li $ Ns Va 0
143refers to the entire line.
144Setting any other field causes the re-evaluation of
145.Li $ Ns Va 0
146Assigning to
147.Li $ Ns Va 0
148resets the values of all other fields and the
149.Va NF
150built-in variable.
151.Pp
152A pattern-action statement has the form
153.Lp
154.D1 Ar pattern Li \&{ Ar action Li \&}
155.Lp
156A missing
157.Li \&{ Ar action Li \&}
158means print the line;
159a missing pattern always matches.
160Pattern-action statements are separated by newlines or semicolons.
161.Pp
162An action is a sequence of statements.
163Statements are terminated by
164semicolons, newlines or right braces.
165An empty
166.Ar expression-list
167stands for
168.Li $ Ns Va 0 .
169String constants are quoted
170.Li \(dq\(dq ,
171with the usual C escapes recognized within.
172Expressions take on string or numeric values as appropriate,
173and are built using the
174.Sx Operators
175(see next subsection).
176Variables may be scalars, array elements
177(denoted
178.Va x\| Ns Li [ Ns Va i\^ Ns Li ] )
179or fields.
180Variables are initialized to the null string.
181Array subscripts may be any string,
182not necessarily numeric;
183this allows for a form of associative memory.
184Multiple subscripts such as
185.Li [ Ns Ar i Ns Li \&, Ns Ar j Ns Li \&, Ns Ar k Ns Li ]
186are permitted; the constituents are concatenated,
187separated by the value of
188.Va SUBSEP .
189.Ss Operators
190.Nm
191operators, in order of decreasing precedence, are:
192.Pp
193.Bl -tag -width Ic -compact
194.It Ic \&( Ns No ... Ns Ic \&)
195Grouping
196.It Ic $
197Field reference
198.It Ic ++ --
199Increment and decrement, can be used either as postfix or prefix.
200.It Ic ^
201Exponentiation (the
202.Ic **\^
203form is also supported, and
204.Ic **\^=
205for the assignment operator).
206.It + \- \&!
207Unary plus, unary minus and logical negation.
208.It * / %
209Multiplication, division and modulus.
210.It + \-
211Addition and subtraction.
212.It Em space
213String concatenation.
214.It Ic \*[Lt] \*[Gt]
215.It Ic <= >=
216.It Ic != ==
217Regular relational operators
218.It Ic ~ !~
219Regular expression match and not match
220.It Ic in
221Array membership
222.It Ic "\*[Am]\*[Am]"
223Logical AND
224.It Ic "||"
225Logical OR
226.It Ic ?:
227C conditional expression.
228This is used as
229.Ar expr1 Ic \&? Ar expr2 Ic \&: Ar expr3 .
230If
231.Ar expr1
232is true, the result value is
233.Ar expr2 ,
234otherwise it is
235.Ar expr3 .
236Only one of
237.Ar expr2
238and
239.Ar expr3
240is evaluated.
241.It Ic = += -=
242.It Ic *= /= %= ^=
243Assignment and Operator-Assignment
244.El
245.Ss Control Statements
246The control statements are as follows:
247.Bl -tag -width Fn
248.It Ic if \&( Ns Ar expression\^ Ns Ic \&) Ar statement Bq Ic else Ar statement
249.It Ic while \&( Ns Ar expression\^ Ns Ic \&) Ar statement
250.It Ic for \&( Ns Ar expression\^ Ns Ic \&; \
251 Ar expression\^ Ns Ic \&; \
252 Ar expression\^ Ns Ic \&) Ar statement
253.It Ic for \&( Ns Ar var Ic in Ar array\^ Ns Ic \&) Ar statement
254.It Ic do Ar statement Ic while \&( Ns Ar expression\^ Ns Ic \&)
255.It Ic break
256.It Ic continue
257.It Ic \&{ Oo Ar statement ... Oc Ic \&}
258.It Ar expression
259Commonly
260.Ar var Ic = Ar expression
261.It Ic return Op Ar expression
262.It Ic next
263Skip remaining patterns on this input line
264.It Ic nextfile
265Skip rest of this file, open next, start at top
266.It Ic delete Ar array\| Ns Cm \&[ Ns Ar expression\^ Ns Cm \&]
267Delete an array element
268.It Ic delete Ar array
269Delete all elements of an array
270.It Ic exit Op Ar expression
271Exit immediately; status is
272.Ar expression
273.El
274.Ss I/O Statements
275The input/output statements are as follows:
276.Bl -tag -width Fn
277.It Fn close expr
278Closes the file or pipe
279.Ar expr .
280Returns zero on success; otherwise nonzero.
281.It Fn fflush expr
282Flushes any buffered output for the file or pipe
283.Ar expr .
284Returns zero on success; otherwise nonzero.
285.It Ic getline Op Ar var
286Set
287.Ar var
288(or
289.Li $ Ns Va 0
290if
291.Ar var
292is not specified)
293to the next input record from the current input file.
294.Ic getline
295returns 1 for a successful input,
2960 for end of file, and \-1 for an error.
297.It Ic getline Oo Ar var Oc Ic < Ar file
298Set
299.Ar var
300(or
301.Li $ Ns Va 0
302if
303.Ar var
304is not specified)
305to the next input record from the specified file
306.Ar file .
307.It Ar expr Ic \&| getline
308Pipes the output of
309.Ar expr
310into
311.Ic getline ;
312each call of
313.Ic getline
314returns the next line of output from
315.Ar expr .
316.It Ic print Oo Ar expr-list Oc Op Ar redirection
317Print arguments separated by the current output field separator
318.Va OFS ,
319and terminated by the
320output record separator
321.Va ORS .
322.It Ic printf Ar format\| Ns Oo Ic \&, Ar expr-list Oc Op Ar redirection
323Format and print its expression list according to
324.Ar format .
325See
326.Xr printf 3
327for list of supported formats and their meaning.
328.El
329.Pp
330Both
331.Ic print
332and
333.Ic printf
334statements write to standard output by default.
335The output is written to the file or pipe specified by
336.Ar redirection
337if one is supplied, as follows:
338.Ic \&> Ar file , ""
339.Ic \&>> Ar file , No or
340.Ic \&| Ar expr .
341Both
342.Ar file
343and
344.Ar expr
345may be literal names or parenthesized expressions; identical string values in
346different statements denote the same open file.
347For that purpose the file names
348.Pa /dev/stdin ,
349.Pa /dev/stdout ,
350and
351.Pa /dev/stderr
352refer to the program's
353.Va stdin ,
354.Va stdout ,
355and
356.Va stderr
357respectively (and are unrelated to the
358.Xr fd 4
359devices of the same names).
360.Ss Mathematical and Numeric Functions
361AWK has the following mathematical and numerical functions built-in:
362.Bl -tag -width Fn
363.It Fn atan2 x y
364Returns the arctangent of
365.Ar x\| Ns Li / Ns Ar y
366in radians.
367See also
368.Xr atan2 3 .
369.It Fn cos expr
370Computes the cosine of
371.Ar expr ,
372measured in radians.
373See also
374.Xr cos 3 .
375.It Fn exp expr
376Computes the exponential value of the given argument
377.Ar expr .
378See also
379.Xr exp 3 .
380.It Fn int expr
381Truncates
382.Ar expr
383to integer.
384.It Fn log expr
385Computes the value of the natural logarithm of argument
386.Ar expr .
387See also
388.Xr log 3 .
389.It Fn rand
390Returns random number between 0 and 1.
391.It Fn sin expr
392Computes the sine of
393.Ar expr ,
394measured in radians.
395See also
396.Xr sin 3 .
397.It Fn sqrt expr
398Computes the non-negative square root of
399.Ar expr .
400See also
401.Xr sqrt 3 .
402.It Fn srand [expr]
403Sets seed for random number generator
404.Pq Fn rand
405and returns the previous seed.
406.El
407.Ss String Functions
408AWK has the following string functions built-in:
409.Bl -tag -width Fn
410.It Xo Fo gensub
411.Fa r s h\|
412.Oo Fa t
413.Oc Fc Xc
414Search the target string
415.Ar t
416for matches of the regular expression
417.Ar r .
418If
419.Ar h
420is a string beginning with
421.Ql g
422or
423.Ql G ,
424then replace all matches of
425.Ar r
426with
427.Ar s .
428Otherwise,
429.Ar h
430is a number indicating which match of
431.Ar r
432to replace.
433If no
434.Ar t
435is supplied,
436.Li $ Ns Va 0
437is used instead.
438.\"Within the replacement text
439.\".Ar s ,
440.\"the sequence
441.\".Sq Li \e Ns Ar n ,
442.\"where
443.\".Ar n
444.\"is a digit from 1 to 9, may be used to indicate just the text that
445.\"matched the
446.\".Ar n Ap th
447.\"parenthesized subexpression.
448.\"The sequence
449.\".Ic \e0
450.\"represents the entire text, as does the character
451.\".Ic & .
452Unlike
453.Fn sub
454and
455.Fn gsub ,
456the modified string is returned as the result of the function,
457and the original target is
458.Em not
459changed.
460Note that the
461.Sq Li \e Ns Ar n
462sequences (backreferences) within replacement string
463.Ar s
464supported by GNU
465.Nm
466are
467.Em not
468supported at this moment.
469.It Xo Fo gsub
470.Fa r s\|
471.Oo Fa t
472.Oc Fc Xc
473Same as
474.Fn sub
475except that all occurrences of the regular expression
476are replaced;
477.Fn sub
478and
479.Fn gsub
480return the number of replacements.
481.It Fn index s t
482The position in
483.Ar s
484where the string
485.Ar t
486occurs, or 0 if it does not.
487.\" .Fn cannot be told to omit parens, so piece this together manually
488.\" to mark empty parens optional too
489.It Xo Ic length\^ Ns Oo \&( Ns
490.Oo Ns
491.Fa string
492.Oc Ns \&)
493.Oc Xc
494The length of its argument
495taken as a string,
496or of
497.Li $ Ns Va 0
498if no argument.
499.It Fn match s r
500The position in
501.Ar s
502where the regular expression
503.Ar r
504occurs, or 0 if it does not.
505The variables
506.Va RSTART
507and
508.Va RLENGTH
509are set to the position and length of the matched string.
510.It Xo Fo split
511.Fa s a\|
512.Oo Fa fs
513.Oc Fc Xc
514Splits the string
515.Ar s
516into array elements
517.Ar a Ns Li [1] ,
518.Ar a Ns Li [2] ,
519\&...,
520.Ar a Ns Li \&[ Ns Ar n Ns Li \&] ,
521and returns
522.Ar n .
523The separation is done with the regular expression
524.Ar fs
525or with the field separator
526.Va FS
527if
528.Ar fs
529is not given.
530An empty string as field separator splits the string
531into one array element per character.
532.It Fn sprintf fmt expr "..."
533Returns the string resulting from formatting
534.Ar expr
535according to the
536.Xr printf 3
537format
538.Ar fmt .
539.It Xo Fo sub
540.Fa r s\|
541.Oo Fa t
542.Oc Fc Xc
543Substitutes
544.Ar s
545for the first occurrence of the regular expression
546.Ar r
547in the target string
548.Ar t .
549If
550.Ar t
551is not given,
552.Li $ Ns Va 0
553is used.
554.It Xo Fo substr
555.Fa s m\|
556.Oo Fa n
557.Oc Fc Xc
558Returns the at most
559.Ar n\^ Ns No -character
560substring of
561.Ar s
562starting at position
563.Ar m ,
564counted from 1.
565If
566.Ar n
567is omitted, the rest of
568.Ar s
569is returned.
570.It Fn tolower str
571Returns a copy of
572.Ar str
573with all upper-case characters translated to their
574corresponding lower-case equivalents.
575.It Fn toupper str
576Returns a copy of
577.Ar str
578with all lower-case characters translated to their
579corresponding upper-case equivalents.
580.El
581.Ss Time Functions
582This
583.Nm
584provides the following two functions for obtaining time
585stamps and formatting them:
586.Bl -tag -width Fn
587.It Fn systime
588Returns the value of time in seconds since the start of
589Unix Epoch (midnight, January 1, 1970, Coordinated Universal Time).
590See also
591.Xr time 3 .
592.\"It Fn strftime "[format [, timestamp]]"
593.It Xo Fo strftime
594.Oo Fa format\|
595.Oo Fa timestamp\|
596.Oc Oc Fc Xc
597Formats the time
598.Ar timestamp
599according to the string
600.Ar format .
601.Ar timestamp
602should be in same form as value returned by
603.Fn systime .
604If
605.Ar timestamp
606is missing, current time is used.
607If
608.Ar format
609is missing, a default format equivalent to the output of
610.Xr date 1
611would be used.
612See the specification of ANSI C
613.Xr strftime 3
614for the format conversions which are supported.
615.El
616.Ss Other built-in functions
617.Bl -tag -width Fn
618.It Fn system cmd
619Executes
620.Ar cmd
621and returns its exit status.
622.El
623.Ss Patterns
624Patterns are arbitrary Boolean combinations
625(with
626.Ic "! || \*[Am]\*[Am]" )
627of regular expressions and
628relational expressions.
629Regular expressions are as in
630.Xr egrep 1 .
631Isolated regular expressions
632in a pattern apply to the entire line.
633Regular expressions may also occur in
634relational expressions, using the operators
635.Ic ~
636and
637.Ic !~ .
638.Ic / Ns Ar re Ns Ic /
639is a constant regular expression;
640any string (constant or variable) may be used
641as a regular expression, except in the position of an isolated regular expression
642in a pattern.
643.Pp
644A pattern may consist of two patterns separated by a comma;
645in this case, the action is performed for all lines
646from an occurrence of the first pattern
647though an occurrence of the second.
648.Pp
649A relational expression is one of the following:
650.Bl -tag -offset indent -width Fn -compact
651.It Ar expression matchop regular-expression
652.It Ar expression relop expression
653.It Ar expression Ic in Ar array-name
654.It Ic \&( Ns Ar expr Ns Ic \&, Ar expr Ns Ic \&, Ar ... Ic \&) in Ar array-name
655.El
656.Pp
657where a
658.Ar relop
659is any of the six relational operators in C,
660and a
661.Ar matchop
662is either
663.Ic ~
664(matches)
665or
666.Ic !~
667(does not match).
668A conditional is an arithmetic expression,
669a relational expression,
670or a Boolean combination
671of these.
672.Pp
673The special patterns
674.Ic BEGIN
675and
676.Ic END
677may be used to capture control before the first input line is read
678and after the last.
679.Ic BEGIN
680and
681.Ic END
682do not combine with other patterns.
683.Pp
684If an awk program consists of only actions with the pattern
685.Ic BEGIN ,
686and the
687.Ic BEGIN
688action contains no
689.Ic getline
690statement, awk exits without reading its input when the last
691statement in the last
692.Ic BEGIN
693action is executed.
694If an awk program consists of only actions with the pattern
695.Ic END
696or only actions with the patterns
697.Ic BEGIN
698and
699.Ic END ,
700the input is read before the statements in the
701.Ic END
702actions are executed.
703.Ss Built-in Variables
704Variable names with special meanings:
705.Bl -hang -width Va
706.It Va ARGC
707argument count, assignable
708.It Va ARGV
709argument array, assignable;
710non-null members are taken as filenames
711.It Va CONVFMT
712conversion format used when converting numbers
713(default
714.Li \(dq%.6g\(dq )
715.It Va ENVIRON
716array of environment variables; subscripts are names.
717.It Va FILENAME
718the name of the current input file
719.It Va FNR
720ordinal number of the current record in the current file
721.It Va FS
722regular expression used to separate fields; also settable
723by option
724.Fl F Ar fs .
725.It Va NF
726number of fields in the current record
727.It Va NR
728ordinal number of the current record
729.It Va OFMT
730output format for numbers (default
731.Li \(dq%.6g\(dq )
732.It Va OFS
733output field separator (default blank)
734.It Va ORS
735output record separator (default newline)
736.It Va RS
737input record separator (default newline)
738.It Va RSTART
739position of the first character matched by
740.Fn match ;
7410 if no match.
742.It Va RLENGTH
743length of the string matched by
744.Fn match ;
745\-1 if no match.
746.It Va SUBSEP
747separates multiple subscripts (default
748.Li 034 )
749.El
750.Ss Functions
751Functions may be defined (at the position of a pattern-action statement) thus:
752.Bd -literal -offset indent
753function foo(a, b, c) { ...; return x }
754.Ed
755.Pp
756Parameters are passed by value if scalar and by reference if array name;
757functions may be called recursively.
758Parameters are local to the function; all other variables are global.
759Thus local variables may be created by providing excess parameters in
760the function definition.
761.Sh EXAMPLES
762Print lines longer than 72 characters.
763.Fn length
764defaults to
765.Li $ Ns Va 0
766and the empty parens can also be omitted in this case:
767.Pp
768.Dl length > 72
769.Pp
770Print first two fields in opposite order:
771.Pp
772.Dl { print $2, $1 }
773.Pp
774Same, with input fields separated by comma and/or blanks and tabs:
775.Bd -literal -offset indent
776BEGIN { FS = ",[ \et]*|[ \et]+" }
777      { print $2, $1 }
778.Ed
779.Pp
780Add up first column, print sum and average:
781.Bd -literal -offset indent
782{ s += $1 }
783END { print "sum is", s, "average is", s/NR }
784.Ed
785.Pp
786Print all lines between start/stop pairs:
787.Pp
788.Dl /start/, /stop/
789.Pp
790Simulate
791.Xr echo 1 :
792.Bd -literal -offset indent
793BEGIN  {
794        for (i = 1; i < ARGC; ++i)
795        printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ")
796}
797.Ed
798.Pp
799Another way to do the same that demonstrates field assignment and
800.Li $ Ns Va 0
801re-evaluation:
802.Pp
803.Dl BEGIN { for (i = 1; i < ARGC; ++i) $i = ARGV[i]; print }
804.Pp
805Print an error message to standard error:
806.Bd -literal -offset indent
807{ print "error!" > "/dev/stderr" }
808.Ed
809.Sh SEE ALSO
810.Xr egrep 1 ,
811.Xr lex 1 ,
812.Xr sed 1 ,
813.Xr atan2 3 ,
814.Xr cos 3 ,
815.Xr exp 3 ,
816.Xr log 3 ,
817.Xr sin 3 ,
818.Xr sqrt 3 ,
819.Xr strftime 3 ,
820.Xr time 3
821.Pp
822A.\^V.\~Aho, B.\^W.\~Kernighan, P.\^J.\~Weinberger,
823.Em The AWK Programming Language ,
824Addison-Wesley, 1988.
825ISBN 0-201-07981-X
826.Pp
827.Em AWK Language Programming ,
828Edition 1.0, published by the Free Software Foundation, 1995
829.Sh HISTORY
830.Nm nawk
831has been the default system
832.Nm
833since
834.Nx 2.0 ,
835replacing the previously used GNU
836.Nm .
837.Sh BUGS
838There are no explicit conversions between numbers and strings.
839To force an expression to be treated as a number add 0 to it;
840to force it to be treated as a string concatenate
841\&"\&" to it.
842.Pp
843The scope rules for variables in functions are a botch;
844the syntax is worse.
845.Pp
846Only eight-bit characters sets are handled correctly.
847