Lines Matching +refs:po +refs:team +refs:name +refs:to +refs:code

21 @c applies to and all the info about who's publishing this edition
75 @set FN file name
85 2. I have done A LOT of work to make this look good. There are `@page' commands
87 with anything, it's your responsibility not to break the layout.
106 @c unwise to comment it out when running a master in case there are
121 Permission is granted to copy, distribute and/or modify this document
134 ``You have freedom to copy and modify this GNU Manual, like GNU
141 @c considerable paper. Remember to turn it back on *before*
146 @c if they want to waste paper.
171 liabilities with respect to the programs or applications.
206 @c Thanks to Bob Chassell for directions on doing dedications.
240 @c Licensing nodes are appendices, they're not central to AWK.
242 This file documents @command{awk}, a program that you can use to select
255 * Getting Started:: A basic introduction to using
256 @command{awk}. How to run an @command{awk}
260 * Reading Files:: How to read files and manipulate fields.
261 * Printing:: How to print using @command{awk}. Describes
262 the @code{print} and @code{printf}
271 * Internationalization:: Getting @command{gawk} to speak your
273 * Advanced Features:: Stuff for advanced users, specific to
275 * Invoking Gawk:: How to run @command{gawk}.
285 * Basic Concepts:: A very quick intoduction to programming
288 * Copying:: Your right to copy and distribute
296 * Names:: What name to use to find @command{awk}.
302 * How To Contribute:: Helping to save the world.
304 * Running gawk:: How to run @command{gawk} programs;
314 * Comments:: Adding documentation to @command{gawk}
327 * When:: When to use @command{gawk} and when to use
329 * Regexp Usage:: How to Use Regular Expressions.
330 * Escape Sequences:: How to write nonprinting characters.
333 * GNU Regexp Operators:: Operators specific to GNU software.
334 * Case-sensitivity:: How to do case-insensitive matching.
339 * Fields:: An introduction to fields.
342 * Field Separators:: The field separator and how to change it.
345 * Command Line Field Separator:: Setting @code{FS} from the command-line.
350 control using the @code{getline} function.
351 * Plain Getline:: Using @code{getline} with no arguments.
352 * Getline/Variable:: Using @code{getline} into a variable.
353 * Getline/File:: Using @code{getline} from a file.
354 * Getline/Variable/File:: Using @code{getline} into a variable from a
356 * Getline/Pipe:: Using @code{getline} from a pipe.
357 * Getline/Variable/Pipe:: Using @code{getline} into a variable from a
359 * Getline/Coprocess:: Using @code{getline} from a coprocess.
360 * Getline/Variable/Coprocess:: Using @code{getline} into a variable from a
362 * Getline Notes:: Important things to know about
363 @code{getline}.
364 * Getline Summary:: Summary of @code{getline} Variants.
365 * Print:: The @code{print} statement.
366 * Print Examples:: Simple examples of @code{print} statements.
367 * Output Separators:: The output separators and how to change
370 @code{print}.
371 * Printf:: The @code{printf} statement.
372 * Basic Printf:: Syntax of the @code{printf} statement.
376 * Redirection:: How to redirect output to multiple files
378 * Special Files:: File name interpretation in @command{gawk}.
379 @command{gawk} allows access to inherited
384 * Special Caveats:: Things to watch out for.
390 * Using Constant Regexps:: When and how to use a regexp constant.
391 * Variables:: Variables give names to values for later
397 * Conversion:: The conversion of strings to numbers and
424 * Using BEGIN/END:: How and why to use BEGIN/END rules.
428 * Using Shell Variables:: How to use shell variables with
444 * Continue Statement:: Skip to the end of the innermost enclosing
450 * User-modified:: Built-in variables that you change to
454 * ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
455 * Array Intro:: Introduction to Arrays
456 * Reference to Elements:: How to examine one element of an array.
457 * Assigning Elements:: How to change an element of an array.
459 * Scanning an Array:: A variation of the @code{for} statement. It
462 * Delete:: The @code{delete} statement removes an
464 * Numeric Array Subscripts:: How to use numbers as subscripts in
473 * Calling Built-in:: How to call built-in functions.
475 @code{int}, @code{sin} and @code{rand}.
477 @code{split}, @code{match} and
478 @code{sprintf}.
479 * Gory Details:: More than you want to know about @samp{\}
480 and @samp{&} with @code{sub}, @code{gsub},
481 and @code{gensub}.
487 * Definition Syntax:: How to write definitions and what they
491 * Function Caveats:: Things to watch out for.
495 * Explaining gettext:: How GNU @code{gettext} works.
499 * Printf Ordering:: Rearranging @code{printf} arguments.
510 * Command Line:: How to run @command{awk}.
518 * Library Names:: How to best name private global variables
521 * Nextfile Function:: Two implementations of a @code{nextfile}
525 * Round Function:: A function for rounding if @code{sprintf}
530 * Join Function:: A function to join an array into a string.
531 * Gettimeofday Function:: A function to get formatted times.
544 * Running Examples:: How to run these examples.
556 * Translate Program:: A program similar to the @command{tr}
559 * Word Sorting:: A program to produce a word usage count.
576 * Contributors:: The major contributors to @command{gawk}.
578 * Getting:: How to get the distribution.
579 * Extracting:: How to extract the distribution.
585 * Configuration Philosophy:: How it's all supposed to work.
601 * VMS Compilation:: How to compile @command{gawk} under VMS.
602 * VMS Installation Details:: How to install @command{gawk} under VMS.
603 * VMS Running:: How to run @command{gawk} under VMS.
614 * Compatibility Mode:: How to disable certain @command{gawk}
617 * Adding Code:: Adding code to the main body of
619 * New Ports:: Porting @command{gawk} to a new operating
621 * Dynamic Extensions:: Adding new built-in functions to
627 * Internal File Ops:: The code for internal file operations.
628 * Using Internal File Ops:: How to use an external extension.
632 * Basic Data Typing:: A very quick intro to data types.
633 * Floating Point Issues:: Stuff to know about floating-point numbers.
660 Unix computer sitting in the corner. No one knew how to use it,
663 I was @code{root} and the one-and-only user.
664 That day, I began the transition from statistician to Unix programmer.
666 On one of many trips to the library or bookstore in search of
671 data manipulations to few lines of code. I was excited to try my
677 I learned that this was typical; the old version refused to step
678 aside or relinquish its name. If a system had a new @command{awk}, it was
680 The best way to get a new @command{awk} was to @command{ftp} the source code for
681 @command{gawk} from @code{prep.ai.mit.edu}. @command{gawk} was a version of
686 it's no longer difficult to find a new @command{awk}. @command{gawk} ships with
687 Linux, and you can download binaries or source code for almost
691 plugged into a network. So, oblivious to the existence of @command{gawk}
695 but it was too late to stop, so I eventually posted
696 to a @code{comp.sources} newsgroup.
702 that I could update @command{mawk} to support language extensions added
710 expertise and time to the Free Software Foundation.
714 will appeal to a wide audience.
715 It is a definitive reference to the AWK language as defined by the
725 interface to network protocols via special @file{/inet} files.
728 typically much smaller and faster to develop than
730 Consequently, there is often a payoff to prototype an
731 algorithm or design in AWK to get it running quickly and expose
754 over the @file{awkprof.out} profile pinpointed the problem to
755 a single line of code. @command{pgawk} is a welcome addition to
760 AWK or want to learn how, then read this book.
776 You might want to extract certain lines and discard the rest.
777 Or you may need to make changes wherever certain patterns appear,
783 that makes it easy to handle simple data-reformatting jobs.
799 Using @command{awk} allows you to:
815 Experiment with algorithms that you can adapt later to other computer
824 provides facilities that make it easy to:
841 Unix-based systems. If you are using some other operating system, you still need to
852 such as Crays. @command{gawk} has also been ported to Mac OS X,
859 * Names:: What name to use to find @command{awk}.
865 * How To Contribute:: Helping to save the world.
875 @multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}}
876 @item @tab 1 part @code{egrep} @tab 1 part @code{snobol}
877 @item @tab 2 parts @code{ed} @tab 3 parts C
881 Blend all parts well using @code{lex} and @code{yacc}.
884 After eight years, add another part @code{egrep} and two
892 The name @command{awk} comes from the initials of its designers: Alfred V.@:
912 contributed parts of the code as well. In 1988 and 1989, David Trueman, with
921 features to do this for @command{gawk}. At that time, he also
925 His code finally became part of the main @command{gawk} distribution
929 for a complete list of those who made important contributions to @command{gawk}.
938 is often referred to as ``new @command{awk}'' (@command{nawk}).
954 All in all, this makes it difficult for you to know which version of
956 I can give here is to check your local documentation. Look for @command{awk},
963 Throughout this @value{DOCUMENT}, whenever we refer to a language feature
965 we simply use the term @command{awk}. When referring to a feature that is
966 specific to the GNU implementation, we use the term @command{gawk}.
972 The term @command{awk} refers to a particular program as well as to the language you
973 use to tell this program what to do. When we need to be careful, we call
977 both the @command{awk} language and how to run the @command{awk} utility.
978 The term @dfn{@command{awk} program} refers to a program written by you in
987 attempts to describe important differences between @command{gawk}
996 If you are a novice, feel free to skip over details that seem too complex.
1006 to be of interest on first reading.
1015 to @command{awk}, there is a lot of information here that even the @command{awk}
1023 provides the essentials you need to know to begin using @command{awk}.
1032 as the @code{getline} command.
1037 @code{print} and @code{printf}.
1044 describes how to write patterns for matching records, actions for
1055 @command{gawk} provide, as well as how to define
1065 are the abilities to have two-way communications with another process,
1070 describes how to run @command{gawk}, the meaning of its
1077 Reading them allows you to see @command{awk}
1082 first release to present. It also describes how @command{gawk}
1086 describes how to get @command{gawk}, how to compile it
1087 under Unix, and how to compile and use it on different
1088 non-Unix systems. It also describes how to report bugs
1089 in @command{gawk} and where to get three other freely
1093 describes how to disable @command{gawk}'s extensions, as
1094 well as how to contribute new code to @command{gawk},
1095 how to write extension libraries, and some possible
1112 present the licenses that cover the @command{gawk} source code
1121 A single Texinfo source file is used to produce both the printed and online
1146 In the text, command names appear in @code{this font}, while code segments
1148 emphasized @emph{like this}, and if a point needs to be made
1152 @value{FN}s are indicated like this: @file{/path/to/ourfile}.
1197 to the production and distribution of freely distributable software.
1207 Foundation to create a complete, freely distributable, POSIX-compliant
1209 The FSF uses the ``GNU General Public License'' (GPL) to ensure that
1211 source code is always available to the end user. A
1218 The GPL applies to the C language source code for @command{gawk}.
1252 information in it is free to anyone. The machine-readable
1253 source code for the @value{DOCUMENT} comes with @command{gawk}; anyone
1254 may take this @value{DOCUMENT} to a copying machine and make as many
1255 copies as they like. (Take a moment to check the Free Documentation
1259 easier to read and use. Furthermore,
1260 the proceeds from sales of this book go back to the FSF
1261 to help fund development of more free software.
1274 the @cite{The GAWK Manual} to be released was Edition 0.11 Beta in
1336 @cite{@value{TITLE}} will undoubtedly continue to evolve.
1341 problem reports electronically, or write to me in care of the publisher.
1344 @unnumberedsec How to Contribute
1352 @command{gawk} extension that you would like to
1355 @command{gawk} distribution down to manageable size.
1363 Many people need to be thanked for their assistance in producing this
1368 issues relevant both to @command{awk} implementation and to this manual, that
1373 I would like to acknowledge Richard M.@: Stallman, for his vision of a
1379 versions of this book, up to and including this edition.
1406 convincing me @emph{not} to title this @value{DOCUMENT}
1414 I would like to thank Marshall and Elaine Hartholz of Seattle and
1416 time in their homes, which allowed me to make significant progress on
1422 system, not once, but twice, which allowed me to do a lot of work while
1462 @command{gawk} ``crack portability team.'' Without their hard work and
1464 has been and continues to be a pleasure working with this team of fine
1468 David and I would like to thank Brian Kernighan of Bell Laboratories for
1485 I would like to thank my parents for their love, and for the grace with
1487 Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
1488 He has sent my way, as well as for the gifts He has given me with which to
1556 The basic function of @command{awk} is to search files for lines (or other
1568 the data you want to work with and then what to do when you find it.
1569 Most other languages are @dfn{procedural}; you have to describe, in great
1570 detail, every step the program is to take. When working with procedural
1572 harder to clearly describe the data your program will process.
1573 For this reason, @command{awk} programs are often refreshingly easy to
1579 tells @command{awk} what to do. The program consists of a series of
1583 pattern to search for and one action to perform
1587 action is enclosed in curly braces to separate it from the pattern.
1598 * Running gawk:: How to run @command{gawk} programs; includes
1609 * When:: When to use @command{gawk} and when to use
1614 @section How to Run @command{awk} Programs
1617 There are several ways to run an @command{awk} program. If the program is
1618 short, it is easiest to include it in the command that runs @command{awk},
1626 When the program is long, it is usually more convenient to put it in a file
1644 * Comments:: Adding documentation to @command{gawk}
1653 programs the moment you want to use them. Then you can write the
1664 @cindex single quote (@code{'})
1665 @cindex @code{'} (single quote)
1667 to start @command{awk} and use the @var{program} to process records in the
1670 characters. The quotes also cause the shell to treat all of @var{program} as
1671 a single argument for @command{awk}, and allow @var{program} to be more
1679 reliable because there are no other files to misplace.
1719 @command{awk} applies the @var{program} to the @dfn{standard input},
1729 (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}),
1730 to keep you from worrying about the complexities of computer programming
1731 (@code{BEGIN} is a feature we haven't discussed yet):
1739 @cindex double quote (@code{"})
1740 @cindex @code{"} (double quote)
1741 @cindex @code{\} (backslash)
1742 @cindex backslash (@code{\})
1747 quotes around the program text, double quotes are needed here in order to
1752 keyboard to its standard output (why this works is explained shortly).
1758 to come to the aid of their country.
1759 @print{} to come to the aid of their country.
1774 more convenient to put the program into a separate file. In order to tell
1775 @command{awk} to use that file for its program, you type:
1781 @cindex @code{-f} option
1784 The @option{-f} instructs the @command{awk} utility to get the @command{awk} program
1817 @cindex single quote (@code{'})
1819 @cindex @code{'} (single quote)
1820 If you want to identify your @command{awk} program files clearly as such,
1821 you can add the extension @file{.awk} to the @value{FN}. This doesn't
1828 @cindex @code{#} (number sign), @code{#!} (executable scripts)
1829 @cindex number sign (@code{#}), @code{#!} (executable scripts)
1831 @cindex @code{#} (number sign), @code{#!} (executable scripts), portability issues with
1832 @cindex number sign (@code{#}), @code{#!} (executable scripts), portability issues with
1834 Once you have learned @command{awk}, you may want to write self-contained
1840 For example, you could update the file @file{advice} to look like this:
1851 at the shell and the system arranges to run @command{awk}@footnote{The
1853 to run and an optional initial command-line argument to pass to that
1857 argument list contains either options to @command{awk}, or @value{DF}s,
1869 path variable (typically @code{$PATH}). If not, you may need
1870 to type @samp{./advice} at the shell.)
1872 Self-contained @command{awk} scripts are useful when you want to write a
1873 program that users can invoke without their having to know that the program is
1878 @cindex portability, @code{#!} (executable scripts)
1880 Some systems limit the length of the interpreter name to 32 characters.
1884 line after the path to @command{awk}. It does not work. The operating system
1885 treats the rest of the line as a single argument and passes it to @command{awk}.
1886 Doing this leads to confusing behavior---most likely a usage diagnostic
1889 @cindex @code{ARGC}/@code{ARGV} variables, portability and
1890 @cindex portability, @code{ARGV} variable
1892 the value of @code{ARGV[0]}
1896 of @command{awk} (such as @file{/bin/awk}), and some put the name
1897 of your script (@samp{advice}). Don't rely on the value of @code{ARGV[0]}
1898 to provide your script name.
1902 @cindex @code{#} (number sign), commenting
1903 @cindex number sign (@code{#}), commenting
1911 typically hard to understand without them.
1914 character (@samp{#}) and continues to the end of the line.
1915 The @samp{#} does not have to be the first character on the line. The
1927 comment is to help you or another person understand the program
1931 @cindex single quote (@code{'}), vs. apostrophe
1932 @cindex @code{'} (single quote), vs. apostrophe
1935 you can enclose small to medium programs in single quotes, in order to keep
1961 @cindex @code{\} (backslash)
1962 @cindex backslash (@code{\})
1971 For short to medium length @command{awk} programs, it is most convenient
1972 to enter the program on the @command{awk} command line.
1983 Once you are working with the shell, it is helpful to have a basic
1984 knowledge of shell quoting rules. The following rules apply only to
1997 character on to the command.
2000 @cindex @code{\} (backslash)
2001 @cindex backslash (@code{\})
2002 @cindex single quote (@code{'})
2003 @cindex @code{'} (single quote)
2006 to the command.
2007 It is @emph{impossible} to embed a single quote inside single-quoted text.
2008 Refer back to
2013 @cindex double quote (@code{"})
2014 @cindex @code{"} (double quote)
2022 a backslash within double-quoted text if they are to be passed on literally
2023 to the program. (The leading backslash is stripped first.)
2036 @cindex single quote (@code{'}), with double quotes
2037 @cindex @code{'} (single quote), with double quotes
2043 For example, to specify that the field separator @code{FS} should
2044 be set to the null string, use:
2059 In the second case, @command{awk} will attempt to use the text of the program
2060 as the value of @code{FS}, and the first @value{FN} as the text of the program!
2065 Mixing single and double quotes is difficult. You have to resort
2066 to shell quoting tricks, like this:
2077 This can be ``simplified'' to:
2087 Another option is to use double quotes, escaping the embedded, @command{awk}-level
2102 program, it is probably best to move it into a separate file, where
2110 @cindex @code{BBS-list} file
2116 each line is considered to be one @dfn{record}.
2118 In the @value{DF} @file{BBS-list}, each record contains the name of a computer
2119 bulletin board, its phone number, the board's baud rate(s), and a code for
2125 @c 2e: Update the baud rates to reflect today's faster modems
2147 @cindex @code{inventory-shipped} file
2182 this by using the command @kbd{M-x write-region} to copy text from the Info
2217 is the pattern to search for. This type of pattern is called a
2220 The pattern is allowed to match parts of words.
2240 action is to print all lines that match the pattern.
2243 Thus, we could leave out the action (the @code{print} statement and the curly
2246 omitting the @code{print} statement but retaining the curly braces makes an
2251 collection of useful, short programs to get you started. Some of these
2254 read the rest of the @value{DOCUMENT} to become an @command{awk} expert!)
2259 one way to do things in @command{awk}. At some point, you may want
2260 to look back at these examples and see if
2261 you can come up with different ways to do the same things shown here:
2291 The input is processed by the @command{expand} utility to change tabs
2301 This is an easy way to delete blank lines from a file (or rather, to
2302 create a new file similar to the old file but from which the blank lines
2306 Print seven random numbers from 0 to 100, inclusive:
2417 programs do. This example shows how @command{awk} can be used to
2429 @cindex backslash (@code{\}), continuing lines and, in @command{csh}
2430 @cindex @code{\} (backslash), continuing lines and, in @command{csh}
2433 @footnote{In the C shell (@command{csh}), you need to type
2459 the number of links to the file, and the third field identifies the owner of
2464 contains the name of the file.@footnote{On some
2465 very old systems, you may need to use @samp{ls -lg} to get this output.}
2473 performed. This adds the fifth field (the file's size) to the variable
2474 @code{sum}. As a result, when @command{awk} has finished reading all the
2475 input lines, @code{sum} is the total of the sizes of the files whose
2477 are automatically initialized to zero.)
2480 @code{END} rule executes and prints the value of @code{sum}.
2481 In this example, the value of @code{sum} is 80600.
2484 (@pxref{Action Overview}). Before you can move on to more
2485 advanced @command{awk} programming, you have to know how @command{awk} interprets
2487 @code{print} statements, you can produce some very useful and
2513 statement.@footnote{The @samp{?} and @samp{:} referred to here is the
2520 @cindex @code{\} (backslash), continuing lines and
2521 @cindex backslash (@code{\}), continuing lines and
2522 If you would like to split a single statement into two lines at a point
2525 the final character on the line in order to be recognized as a continuation
2545 may use backslash continuation. For example, they may not allow you to
2547 portability of your @command{awk} programs, it is best not to split your
2553 @cindex backslash (@code{\}), continuing lines and, in @command{csh}
2554 @cindex @code{\} (backslash), continuing lines and, in @command{csh}
2573 prompts, analogous to the standard shell's @samp{$} and @samp{>}.
2575 Compare the previous example to how it is done with a POSIX-compliant shell:
2585 @command{awk} is a line-oriented language. Each rule's action has to
2590 @cindex backslash (@code{\}), continuing lines and, comments and
2591 @cindex @code{\} (backslash), continuing lines and, comments and
2593 Another thing to keep in mind is that backslash continuation and
2610 @code{BEGIN} is noted as a syntax error.
2613 @cindex @code{;} (semicolon)
2614 @cindex semicolon (@code{;})
2615 When @command{awk} statements within one rule are short, you might want to put
2618 This also applies to the rules themselves.
2637 @dfn{built-in}, variables that your programs can use to get information
2639 as well to control how @command{awk} processes your data.
2652 @section When to Use @command{awk}
2665 be in other languages. This makes @command{awk} programs easy to compose and
2686 of source code than the equivalent @command{awk} programs, but they are
2687 easier to maintain and usually run more efficiently.
2700 @cindex forward slash (@code{/})
2701 @cindex @code{/} (forward slash)
2704 belongs to that set.
2708 Therefore, the pattern @code{/foo/} matches any input record containing
2719 * Regexp Usage:: How to Use Regular Expressions.
2720 * Escape Sequences:: How to write nonprinting characters.
2723 * GNU Regexp Operators:: Operators specific to GNU software.
2724 * Case-sensitivity:: How to do case-insensitive matching.
2731 @section How to Use Regular Expressions
2737 to match some part of the text in order to succeed.) For example, the
2751 @c @cindex operators, @code{~}
2753 @code{~} (tilde), @code{~} operator
2754 @cindex tilde (@code{~}), @code{~} operator
2755 @cindex @code{!} (exclamation point), @code{!~} operator
2756 @cindex exclamation point (@code{!}), @code{!~} operator
2757 @c @cindex operators, @code{!~}
2758 @cindex @code{if} statement
2759 @cindex @code{while} statement
2760 @cindex @code{do}-@code{while} statement
2761 @c @cindex statements, @code{if}
2762 @c @cindex statements, @code{while}
2763 @c @cindex statements, @code{do}
2765 expressions allow you to specify the string to match against; it need
2768 using these operators can be used as patterns, or in @code{if},
2769 @code{while}, @code{for}, and @code{do} statements.
2820 When a regexp is enclosed in slashes, such as @code{/foo/}, we call it
2821 a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
2822 @code{"foo"} is a string constant.
2828 @cindex backslash (@code{\}), in escape sequences
2829 @cindex @code{\} (backslash), in escape sequences
2831 (@code{"foo"}) or regexp constants (@code{/foo/}).
2834 One use of an escape sequence is to include a double-quote character in
2836 must use @samp{\"} to represent an actual double-quote character as a
2840 $ awk 'BEGIN @{ print "He said \"hi!\" to her." @}'
2841 @print{} He said "hi!" to her.
2845 included normally; you must write @samp{\\} to put one backslash in the
2847 @samp{"} and @samp{\} must be written @code{"\"\\"}.
2850 such as TAB or newline. While there is nothing to stop you from entering most
2857 sequences apply to both string constants and regexp constants:
2859 @table @code
2864 @cindex @code{\} (backslash), @code{\a} escape sequence
2865 @cindex backslash (@code{\}), @code{\a} escape sequence
2867 The ``alert'' character, @kbd{@value{CTL}-g}, ASCII code 7 (BEL).
2870 @cindex @code{\} (backslash), @code{\b} escape sequence
2871 @cindex backslash (@code{\}), @code{\b} escape sequence
2873 Backspace, @kbd{@value{CTL}-h}, ASCII code 8 (BS).
2875 @cindex @code{\} (backslash), @code{\f} escape sequence
2876 @cindex backslash (@code{\}), @code{\f} escape sequence
2878 Formfeed, @kbd{@value{CTL}-l}, ASCII code 12 (FF).
2880 @cindex @code{\} (backslash), @code{\n} escape sequence
2881 @cindex backslash (@code{\}), @code{\n} escape sequence
2883 Newline, @kbd{@value{CTL}-j}, ASCII code 10 (LF).
2885 @cindex @code{\} (backslash), @code{\r} escape sequence
2886 @cindex backslash (@code{\}), @code{\r} escape sequence
2888 Carriage return, @kbd{@value{CTL}-m}, ASCII code 13 (CR).
2890 @cindex @code{\} (backslash), @code{\t} escape sequence
2891 @cindex backslash (@code{\}), @code{\t} escape sequence
2893 Horizontal TAB, @kbd{@value{CTL}-i}, ASCII code 9 (HT).
2896 @cindex @code{\} (backslash), @code{\v} escape sequence
2897 @cindex backslash (@code{\}), @code{\v} escape sequence
2899 Vertical tab, @kbd{@value{CTL}-k}, ASCII code 11 (VT).
2901 @cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence
2902 @cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence
2904 The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits
2905 between @samp{0} and @samp{7}. For example, the code for the ASCII ESC
2910 @cindex @code{\} (backslash), @code{\x} escape sequence
2911 @cindex backslash (@code{\}), @code{\x} escape sequence
2921 @cindex @code{\} (backslash), @code{\/} escape sequence
2922 @cindex backslash (@code{\}), @code{\/} escape sequence
2925 This expression is used when you want to write a regexp
2927 slashes, you need to escape the slash that is part of the pattern,
2928 in order to tell @command{awk} to keep processing the rest of the regexp.
2930 @cindex @code{\} (backslash), @code{\"} escape sequence
2931 @cindex backslash (@code{\}), @code{\"} escape sequence
2934 This expression is used when you want to write a string
2936 double quotes, you need to escape the quote that is part of the string,
2937 in order to tell @command{awk} to keep processing the rest of the string.
2948 normally be a regexp operator. For example, @code{/a\+b/} matches the three
2951 @cindex backslash (@code{\}), in escape sequences
2952 @cindex @code{\} (backslash), in escape sequences
2972 A backslash before any other character means to treat that character
2980 @cindex backslash (@code{\}), in escape sequences, POSIX and
2981 @cindex @code{\} (backslash), in escape sequences, POSIX and
2993 For example, @code{"a\qc"} is the same as @code{"aqc"}.
2994 (Because this is such an easy bug both to introduce and to miss,
2996 Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
3005 In such implementations, typing @code{"a\qc"} is the same as typing
3006 @code{"a\\qc"}.
3014 escape to represent a regexp metacharacter.
3027 @code{/a\52b/} is equivalent to @code{/a\*b/}.
3035 called @dfn{regular expression operators} or @dfn{metacharacters}, to
3050 @table @code
3051 @cindex backslash (@code{\})
3052 @cindex @code{\} (backslash)
3054 This is used to suppress the special meaning of a character when
3060 @cindex @code{^} (caret)
3061 @cindex caret (@code{^})
3065 to identify chapter beginnings in Texinfo source files.
3066 The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to
3069 It is important to realize that @samp{^} does not match the beginning of
3077 @cindex @code{$} (dollar sign)
3078 @cindex dollar sign (@code{$})
3080 This is similar to @samp{^}, but it matches only at the end of a string.
3090 @cindex @code{.} (period)
3091 @cindex period (@code{.})
3101 @cindex POSIX @command{awk}, period (@code{.}), using
3104 character, which is a character with all bits equal to zero.
3106 may not be able to match the @sc{nul} character.
3108 @cindex @code{[]} (square brackets)
3109 @cindex square brackets (@code{[]})
3115 you may see a character list referred to as either a
3132 @cindex @code{|} (vertical bar)
3133 @cindex vertical bar (@code{|})
3135 This is the @dfn{alternation operator} and it is used to specify
3143 The alternation applies to the largest possible regexps on either side.
3145 @cindex @code{()} (parentheses)
3146 @cindex parentheses @code{()}
3149 arithmetic. They can be used to concatenate regular expressions
3151 @samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and
3156 @cindex @code{*} (asterisk), @code{*} operator, as regexp operator
3157 @cindex asterisk (@code{*}), @code{*} operator, as regexp operator
3160 repeated as many times as necessary to find a match. For example, @samp{ph*}
3161 applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
3166 (Use parentheses if you want to repeat a larger expression.) It finds
3174 @cindex @code{+} (plus sign)
3175 @cindex plus sign (@code{+})
3177 This symbol is similar to @samp{*}, except that the preceding expression must be
3188 @cindex @code{?} (question mark)
3189 @cindex question mark (@code{?})
3191 This symbol is similar to @samp{*}, except that the preceding expression can be
3203 repeated @var{n} to @var{m} times.
3207 @table @code
3220 They were added as part of the POSIX standard to make @command{awk}
3231 it is good practice to always escape them with a backslash. Then the
3232 regexp constants are valid and work the way you want them to, using
3269 locale, @samp{[a-dx-z]} is equivalent to @samp{[abcdxyz]}. Many locales
3271 @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]}; instead it
3272 might be equivalent to @samp{[aBbCcDdxXyYz]}, for example. To obtain
3274 locale by setting the @env{LC_ALL} environment variable to the value
3277 @cindex @code{\} (backslash), in character lists
3278 @cindex backslash (@code{\}), in character lists
3279 @cindex @code{^} (caret), in character lists
3280 @cindex caret (@code{^}), in character lists
3281 @cindex @code{-} (hyphen), in character lists
3282 @cindex hyphen (@code{-}), in character lists
3310 actual characters can vary from country to country and/or
3311 from character set to character set. For example, the notion of what
3320 @c leave it here in case we need to go back, but make sure the text
3324 @table @code
3355 Space characters (such as space, TAB, and formfeed, to name a few).
3365 @multitable {@code{[:xdigit:]}} {Characters that are both printable and visible. (A space is}
3366 @item @code{[:alnum:]} @tab Alphanumeric characters.
3367 @item @code{[:alpha:]} @tab Alphabetic characters.
3368 @item @code{[:blank:]} @tab Space and TAB characters.
3369 @item @code{[:cntrl:]} @tab Control characters.
3370 @item @code{[:digit:]} @tab Numeric characters.
3371 @item @code{[:graph:]} @tab Characters that are both printable and visible.
3373 @item @code{[:lower:]} @tab Lowercase alphabetic characters.
3374 @item @code{[:print:]} @tab Printable characters (characters that are not control characters).
3375 @item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits,
3377 @item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few).
3378 @item @code{[:upper:]} @tab Uppercase alphabetic characters.
3379 @item @code{[:xdigit:]} @tab Characters that are hexadecimal digits.
3382 For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/}
3383 to match alphanumeric characters. If your
3388 @code{/[[:alnum:]]/} to match the alphabetic
3395 These apply to non-ASCII character sets, which can have single symbols
3408 then @code{[[.ch.]]} is a regexp that matches this collating element, whereas
3409 @code{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
3414 characters that are equal. The name is enclosed between
3416 For example, the name @samp{e} might be used to represent all of
3417 ``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp
3446 @value{SECTION} and are specific to @command{gawk};
3452 @table @code
3453 @c @cindex operators, @code{\w} (@command{gawk})
3454 @cindex backslash (@code{\}), @code{\w} operator (@command{gawk})
3455 @cindex @code{\} (backslash), @code{\w} operator (@command{gawk})
3459 @w{@code{[[:alnum:]_]}}.
3461 @c @cindex operators, @code{\W} (@command{gawk})
3462 @cindex backslash (@code{\}), @code{\W} operator (@command{gawk})
3463 @cindex @code{\} (backslash), @code{\W} operator (@command{gawk})
3467 @w{@code{[^[:alnum:]_]}}.
3469 @c @cindex operators, @code{\<} (@command{gawk})
3470 @cindex backslash (@code{\}), @code{\<} operator (@command{gawk})
3471 @cindex @code{\} (backslash), @code{\<} operator (@command{gawk})
3474 For example, @code{/\<away/} matches @samp{away} but not
3477 @c @cindex operators, @code{\>} (@command{gawk})
3478 @cindex backslash (@code{\}), @code{\>} operator (@command{gawk})
3479 @cindex @code{\} (backslash), @code{\>} operator (@command{gawk})
3482 For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}.
3484 @c @cindex operators, @code{\y} (@command{gawk})
3485 @cindex backslash (@code{\}), @code{\y} operator (@command{gawk})
3486 @cindex @code{\} (backslash), @code{\y} operator (@command{gawk})
3494 @c @cindex operators, @code{\B} (@command{gawk})
3495 @cindex backslash (@code{\}), @code{\B} operator (@command{gawk})
3496 @cindex @code{\} (backslash), @code{\B} operator (@command{gawk})
3500 @code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}.
3510 string to match as the buffer.
3513 @table @code
3515 @c @cindex operators, @code{\`} (@command{gawk})
3516 @cindex backslash (@code{\}), @code{\`} operator (@command{gawk})
3517 @cindex @code{\} (backslash), @code{\`} operator (@command{gawk})
3521 @c @cindex operators, @code{\'} (@command{gawk})
3522 @cindex backslash (@code{\}), @code{\'} operator (@command{gawk})
3523 @cindex @code{\} (backslash), @code{\'} operator (@command{gawk})
3529 @cindex @code{^} (caret)
3530 @cindex caret (@code{^})
3531 @cindex @code{?} (question mark)
3532 @cindex question mark (@code{?})
3544 An alternative method would have been to require two backslashes in the
3546 method of using @samp{\y} for the GNU @samp{\b} appears to be the
3572 @item @code{--posix}
3577 @item @code{--traditional}
3580 are the POSIX character classes (@code{[[:alnum:]]}, etc.).
3584 @item @code{--re-interval}
3605 The simplest way to do a case-independent match is to use a character
3607 you need to use it often, and it can make the regular expressions harder
3608 to read. There are two alternatives that you might prefer.
3610 One way to perform a case-insensitive match at a particular point in the
3611 program is to convert the data to a single case, using the
3612 @code{tolower} or @code{toupper} built-in string functions (which we
3622 converts the first field to lowercase before matching against it.
3628 @cindex @code{~} (tilde), @code{~} operator
3629 @cindex tilde (@code{~}), @code{~} operator
3630 @cindex @code{!} (exclamation point), @code{!~} operator
3631 @cindex exclamation point (@code{!}), @code{!~} operator
3632 @cindex @code{IGNORECASE} variable
3633 @c @cindex variables, @code{IGNORECASE}
3634 Another method, specific to @command{gawk}, is to set the variable
3635 @code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}).
3636 When @code{IGNORECASE} is not zero, @emph{all} regexp and string
3638 @code{IGNORECASE} dynamically controls the case-sensitivity of the
3640 @code{IGNORECASE} (like most variables) is initialized to zero:
3650 In general, you cannot use @code{IGNORECASE} to make certain rules
3653 to set @code{IGNORECASE} just for the pattern of
3660 To do this, use either character lists or @code{tolower}. However, one
3661 thing you can do with @code{IGNORECASE} only is dynamically turn
3664 @code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule
3667 Setting @code{IGNORECASE} from the command line is a way to make
3668 a program case-insensitive without having to edit it.
3670 Prior to @command{gawk} 3.0, the value of @code{IGNORECASE}
3674 operations are also affected by @code{IGNORECASE}.
3685 The value of @code{IGNORECASE} has no effect if @command{gawk} is in
3702 This example uses the @code{sub} function (which we haven't discussed yet;
3704 to make a change to the input record. Here, the regexp @code{/a+/}
3720 text matching and substitutions with the @code{match}, @code{sub}, @code{gsub},
3721 and @code{gensub} functions, it is very important.
3737 @cindex @code{~} (tilde), @code{~} operator
3738 @cindex tilde (@code{~}), @code{~} operator
3739 @cindex @code{!} (exclamation point), @code{!~} operator
3740 @cindex exclamation point (@code{!}), @code{!~} operator
3741 @c @cindex operators, @code{~}
3742 @c @cindex operators, @code{!~}
3745 be any expression. The expression is evaluated and converted to a string
3756 This sets @code{digits_regexp} to a regexp that describes one or more digits,
3764 If you are going to use a string constant, you have to understand that
3766 @command{awk} reads your program, and the second time when it goes to
3769 @code{digits_regexp}, shown previously), not just string constants.
3772 @cindex @code{\} (backslash), regexp constants
3773 @cindex backslash (@code{\}), regexp constants
3774 @cindex @code{"} (double quote), regexp constants
3775 @cindex double quote (@code{"}), regexp constants
3777 scanned twice? The answer has to do with escape sequences, and particularly
3779 string, you have to type two backslashes.
3781 For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
3783 you have to type @code{"\\*"}. The first backslash escapes the
3790 Given that you can use both regexp and string constants to describe
3796 String constants are more complicated to write and
3797 more difficult to read. Using regexp constants makes your programs
3802 It is more efficient to use regexp constants. @command{awk} can note
3814 @subheading Advanced Notes: Using @code{\n} in Character Lists of Dynamic Regexps
3819 character to be used inside a character list for a dynamic regexp:
3840 @command{gawk} does not have this problem, and it isn't likely to
3849 Modern systems support the notion of @dfn{locales}: a way to tell
3856 The following example uses the @code{sub} function, which
3859 Here, the intent is to remove trailing uppercase characters:
3868 should not normally match @samp{[A-Z]*}. This result is due to the
3870 There are two fixes. The first is to use the POSIX character
3872 The second is to change the locale setting in the environment,
3881 The setting @samp{C} forces @command{gawk} to behave in the traditional
3883 You may wish to put these statements into your shell startup file,
3886 Similar considerations apply to other ranges. For example,
3895 to make several function calls, @emph{per input character} to find the record
3904 @cindex @code{FILENAME} variable
3909 in order, processing all the data from one before going on to the next.
3910 The name of the current input file can be found in the built-in variable
3911 @code{FILENAME}
3920 This makes it more convenient for programs to work on the parts of a record.
3922 @cindex @code{getline} command
3923 On rare occasions, you may need to use the @code{getline} command.
3924 The @code{getline} command is valuable, both because it
3926 used with it do not have to be named on the @command{awk} command line
3931 * Fields:: An introduction to fields.
3934 * Field Separators:: The field separator and how to change it.
3938 using the @code{getline} function.
3948 @cindex @code{NR} variable
3949 @cindex @code{FNR} variable
3956 built-in variable called @code{FNR}. It is reset to zero when a new
3957 file is started. Another built-in variable, @code{NR}, is the total
3959 but is never automatically reset to zero.
3967 assigning the character to the built-in variable @code{RS}.
3970 @cindex @code{RS} variable
3972 the value of @code{RS} can be changed in the @command{awk} program
3976 which indicate a string constant. Often the right time to do this is
3979 To do this, use the special @code{BEGIN} pattern
3983 @cindex @code{BEGIN} pattern
3990 changes the value of @code{RS} to @code{"/"}, before reading any input.
3994 record. Because each @code{print} statement adds a newline at the end of
3996 with each slash changed to a newline. Here are the results of running
4050 Another way to change the record separator is on the command line,
4059 This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
4071 variable @code{NF} is the number of fields in the current record.
4075 even if the last character in the file is not the character in @code{RS}.
4080 The empty string @code{""} (a string without any characters)
4082 as the value of @code{RS}. It means that records are separated
4086 If you change the value of @code{RS} in the middle of an @command{awk} run,
4087 the new value is used to delimit subsequent records, but the record
4091 @cindex @code{RT} variable
4099 sets the variable @code{RT} to the text in the input that matched
4100 @code{RS}.
4102 the value of @code{RS} is not limited to a one-character
4108 actually at work in the usual case, where @code{RS} contains just a
4112 The newline, because it matches @code{RS}, is not part of either record.
4114 When @code{RS} is a single character, @code{RT}
4115 contains the same single character. However, when @code{RS} is a
4116 regular expression, @code{RT} contains
4120 It sets @code{RS} equal to a regular expression that
4136 value of @code{RT} is a newline, and the @code{print} statement
4139 of @code{RS} as a regexp and @code{RT}.
4141 If you set @code{RS} to a regular expression that allows optional
4143 to implementation constraints, that @command{gawk} may match the leading
4146 @command{gawk} attempts to avoid this problem, but currently, there's
4149 @cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables
4150 The use of @code{RS} as a regular expression and the @code{RT}
4155 @code{RS} is used to determine the end of the record.
4158 @subheading Advanced Notes: @code{RS = "\0"} Is Not Portable
4162 There are times when you might want to treat an entire @value{DF} as a
4163 single record. The only way to make this happen is to give @code{RS}
4165 to do in a general way, such that a program always works for arbitrary
4170 consists of a character with all bits equal to zero, is a good
4171 value to use for @code{RS} in this case:
4181 to other @command{awk} implementations.
4192 The best way to treat a whole file as a single record is to
4220 The purpose of fields is to make it more convenient for you to refer to
4221 these pieces of the record. You don't have to use them---you can
4225 @cindex @code{$} field operator
4226 @cindex field operator @code{$}
4227 @cindex @code{$} (dollar sign), @code{$} field operator
4228 @cindex dollar sign (@code{$}), @code{$} field operator
4232 to refer to a field in an @command{awk} program,
4233 followed by the number of the field you want. Thus, @code{$1}
4234 refers to the first field, @code{$2} to the second, and so on.
4235 (Unlike the Unix shells, the field numbers are not limited to single digits.
4236 @code{$127} is the one hundred twenty-seventh field in the record.)
4244 Here the first field, or @code{$1}, is @samp{This}, the second field, or
4245 @code{$2}, is @samp{seems}, and so on. Note that the last field,
4246 @code{$7}, is @samp{example.}. Because there is no space between the
4250 @cindex @code{NF} variable
4252 @code{NF} is a built-in variable whose value is the number of fields
4254 of @code{NF} each time it reads a record. No matter how many fields
4255 there are, the last field in a record can be represented by @code{$NF}.
4256 So, @code{$NF} is the same as @code{$7}, which is @samp{example.}.
4257 If you try to reference a field beyond the last
4258 one (such as @code{$8} when the record has only seven fields), you get
4261 The use of @code{$0}, which looks like a reference to the ``zero-th'' field, is
4279 it tests whether a string (here, the field @code{$1}) matches a given regular
4300 The number of a field does not need to be a constant. Any expression in
4301 the @command{awk} language can be used after a @samp{$} to refer to a
4303 value is a string, rather than a number, it is converted to a number.
4311 Recall that @code{NR} is the number of records read so far: one in the
4323 its value as the number of the field to print. The @samp{*} sign
4324 represents multiplication, so the expression @samp{2*2} evaluates to four.
4334 Thus, @samp{$(2-2)} has the same value as @code{$0}. Negative field
4335 numbers are not allowed; trying to reference one usually terminates
4343 variable @code{NF} (also @pxref{Built-in Variables}). The expression
4344 @code{$NF} is not a special feature---it is the direct consequence of
4345 evaluating @code{NF} and using its value as a field number.
4369 @code{nboxes}.
4371 field three, @code{$3}, as the original value of field three minus ten:
4377 For this to work, the text in field @code{$3} must make sense
4378 as a number; the string of characters must be converted to a number
4379 for the computer to do arithmetic on it. The number resulting
4380 from the subtraction is converted back to a string of characters that
4385 text of the input record is recalculated to contain the new field where
4386 the old one was. In other words, @code{$0} changes to reflect the altered
4399 It is also possible to also assign contents to fields that are out
4414 We've just created @code{$6}, whose value is the sum of fields
4415 @code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
4416 represents addition. For the file @file{inventory-shipped}, @code{$6}
4420 input record, which is the value of @code{$0}. Thus, if you do @samp{print $0}
4425 @cindex @code{OFS} variable
4426 @cindex output field separator, See @code{OFS} variable
4427 @cindex field separators, See Also @code{OFS}
4429 @code{NF} (the number of fields; @pxref{Fields}).
4430 For example, the value of @code{NF} is set to the number of the highest
4432 The exact format of @code{$0} is also affected by a feature that has not been discussed yet:
4433 the @dfn{output field separator}, @code{OFS},
4434 used to separate the fields (@pxref{Output Separators}).
4437 does @emph{not} change the value of either @code{$0} or @code{NF}.
4449 should print @samp{everything is normal}, because @code{NF+1} is certain
4450 to be out of range. (@xref{If Statement},
4451 for more information about @command{awk}'s @code{if-else} statements.
4455 It is important to note that making an assignment to an existing field
4457 value of @code{$0} but does not change the value of @code{NF},
4458 even when you assign the empty string to a field. For example:
4480 The intervening field, @code{$5}, is created with an empty value
4482 and @code{NF} is updated with the value six.
4485 @cindex dark corner, @code{NF} variable, decrementing
4486 @cindex @code{NF} variable, decrementing
4487 Decrementing @code{NF} throws away the values of the fields
4488 after the new value of @code{NF} and recomputes @code{$0}.
4500 @cindex portability, @code{NF} variable, decrementing
4502 rebuild @code{$0} when @code{NF} is decremented. Caveat emptor.
4504 Finally, there are times when it is convenient to force
4505 @command{awk} to rebuild the entire record, using the current
4506 value of the fields and @code{OFS}. To do this, use the
4510 $1 = $1 # force record to be reconstituted
4516 to add a comment, as we've shown here.
4518 There is a flip side to the relationship between @code{$0} and
4519 the fields. Any assignment to @code{$0} causes the record to be
4520 reparsed into fields using the @emph{current} value of @code{FS}.
4521 This also applies to any built-in function that updates @code{$0},
4522 such as @code{sub} and @code{gsub}
4532 * Command Line Field Separator:: Setting @code{FS} from the command-line.
4536 @cindex @code{FS} variable
4547 In the examples that follow, we use the bullet symbol (@bullet{}) to
4560 @cindex troubleshooting, @command{awk} uses @code{FS} not @code{IFS}
4561 The field separator is represented by the built-in variable @code{FS}.
4563 name @code{IFS} that is used by the POSIX-compliant shells (such as
4566 @cindex @code{FS} variable, changing value of
4567 The value of @code{FS} can be changed in the @command{awk} program with the
4569 Often the right time to do this is at the beginning of execution
4572 @code{BEGIN} pattern
4574 For example, here we set the value of @code{FS} to the string
4575 @code{","}:
4581 @cindex @code{BEGIN} pattern
4598 person's name in the example we just used might have a title or
4608 If you were expecting the program to print the
4609 address, you would be surprised. The moral is to choose your data layout and
4610 separator characters carefully to prevent such problems.
4611 (If the data is not in a form that is easy to process, perhaps you
4618 delimit an empty field. The default value of the field separator @code{FS}
4619 is a string containing a single space, @w{@code{" "}}. If @command{awk}
4623 @code{FS} is a special case---it is taken to specify the default manner
4626 If @code{FS} is any other single character, such as @code{","}, then
4634 @subsection Using Regular Expressions to Separate Fields
4642 value of @code{FS}.
4643 More generally, the value of @code{FS} may be a string containing any
4662 single spaces to separate fields the way single commas are used.
4663 @code{FS} can be set to @w{@code{"[@ ]"}} (left bracket, space, right
4670 For both values of @code{FS}, fields are separated by @dfn{runs}
4672 and/or newlines. However, when the value of @code{FS} is @w{@code{" "}},
4699 play whenever @code{$0} is recomputed. For instance, study this pipeline:
4708 The first @code{print} statement prints the record as it was read,
4709 with leading whitespace intact. The assignment to @code{$2} rebuilds
4710 @code{$0} by concatenating @code{$1} through @code{$NF} together,
4711 separated by the value of @code{OFS}. Because the leading whitespace
4712 was ignored when finding @code{$1}, it is not part of the new @code{$0}.
4713 Finally, the last @code{print} statement prints the new @code{$0}.
4723 There are times when you may want to examine each character
4725 simply assigning the null string (@code{""}) to @code{FS}. In this case,
4740 @cindex dark corner, @code{FS} as null string
4742 Traditionally, the behavior of @code{FS} equal to @code{""} was not defined.
4748 if @code{FS} is the null string, then @command{gawk} also
4752 @subsection Setting @code{FS} from the Command Line
4753 @cindex @code{-F} option
4758 @cindex command line, @code{FS} on, setting
4759 @cindex @code{FS} variable, setting from command line
4761 @code{FS} can be set on the command line. Use the @option{-F} option to
4769 sets @code{FS} to the @samp{,} character. Notice that the option uses
4774 the @option{-F} and @option{-f} options have nothing to do with each other.
4775 You can use both options at the same time to set the @code{FS} variable
4778 The value used for the argument to @option{-F} is processed in exactly the
4779 same way as assignments to the built-in variable @code{FS}.
4781 appropriately. For example, to use a @samp{\} as the field separator
4782 on the command line, you would have to type:
4790 @cindex @code{\} (backslash), as field separators
4791 @cindex backslash (@code{\}), as field separators
4795 a single @samp{\} to use for the field separator.
4800 if the argument to @option{-F} is @samp{t}, then @code{FS} is set to
4803 figures that you really want your fields to be separated with tabs and
4805 if you really do want to separate your fields with @samp{t}s.
4808 that contains the pattern @code{/300/} and the action @samp{print $1}:
4814 Let's also set @code{FS} to be the @samp{-} character and run the
4819 @c tweaked to make the tex output look better in @smallbook
4843 The @samp{-} as part of the system's name was used as the field
4845 originally intended. This demonstrates why you have to be careful in
4854 by colons. The first field is the user's logon name and the second is
4873 It is important to remember that when you assign a string constant
4874 as the value of @code{FS}, it undergoes normal @command{awk} string
4876 the assignment @samp{FS = "\.."} assigns the character string @code{".."}
4877 to @code{FS} (the backslash is stripped). This creates a regexp meaning
4879 If instead you want fields to be separated by a literal period followed
4883 of @code{FS} (@samp{==} means ``is equal to''):
4885 @table @code
4895 to be escaped.
4908 @subheading Advanced Notes: Changing @code{FS} Does Not Affect the Fields
4912 According to the POSIX standard, @command{awk} is supposed to behave
4914 In particular, this means that if you change the value of @code{FS}
4916 should reflect the old value of @code{FS}, not the new one.
4924 using the @emph{current} value of @code{FS}!
4927 to diagnose. The following example illustrates the difference
4953 @subheading Advanced Notes: @code{FS} and @code{IGNORECASE}
4955 The @code{IGNORECASE} variable
4957 affects field splitting @emph{only} when the value of @code{FS} is a regexp.
4958 It has no effect when @code{FS} is a single character, even if
4959 that character is a letter. Thus, in the following code:
4969 The output is @samp{aCa}. If you really want to split fields on an
4971 do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE}
4983 you might want to skip it on the first reading.
4988 If you are a novice @command{awk} user, you might want to skip it on
5003 spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS}
5005 can use a series of @code{substr} calls on @code{$0}
5012 @cindex @code{FIELDWIDTHS} variable
5014 assigning a string containing space-separated numbers to the built-in
5015 variable @code{FIELDWIDTHS}. Each number specifies the width of the field,
5016 @emph{including} columns between fields. If you want to ignore the columns
5019 It is a fatal error to supply a field width that is not a positive number.
5021 to illustrate the use of @code{FIELDWIDTHS}:
5038 The following program takes the above input, converts the idle time to
5080 cards. These cards are then processed to count the votes for any particular
5081 candidate or on any particular issue. Because a voter may choose not to
5083 program for processing such data could use the @code{FIELDWIDTHS} feature
5084 to simplify reading the data. (Of course, getting @command{gawk} to run on
5092 Assigning a value to @code{FS} causes @command{gawk} to use
5093 @code{FS} for field splitting again. Use @samp{FS = FS} to make this happen,
5094 without having to know the current value of @code{FS}.
5095 In order to tell which kind of field splitting is in effect,
5096 use @code{PROCINFO["FS"]}
5098 The value is @code{"FS"} if regular field splitting is being used,
5099 or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used:
5109 that needs to temporarily change @code{FS} or @code{FIELDWIDTHS},
5126 records. The first step in doing this is to choose your data format.
5129 One technique is to use an unusual character or string to separate
5131 @samp{\f} in @command{awk}, as in C) to separate them, making each record
5132 a page of the file. To do this, just set the variable @code{RS} to
5133 @code{"\f"} (a string containing the formfeed character). Any
5137 @cindex @code{RS} variable, multiline records and
5138 Another technique is to have blank lines separate records. By a special
5139 dispensation, an empty string as the value of @code{RS} indicates that
5140 records are separated by one or more blank lines. When @code{RS} is set
5141 to the empty string, each record always ends at the first blank line
5151 string @code{"\n\n+"} to @code{RS}. This regexp matches the newline
5169 Now that the input is separated into records, the second step is to
5170 separate the fields in the record. One way to do this is to divide each
5172 as the result of a special feature. When @code{RS} is set to the empty
5173 string, @emph{and} @code{FS} is a set to a single character,
5175 This is in addition to whatever field separations result from
5176 @code{FS}.@footnote{When @code{FS} is the null string (@code{""})
5177 or a regexp, this special feature of @code{RS} does not apply.
5178 It does apply to the default field separator of a single space:
5181 The original motivation for this special exception was probably to provide
5182 useful behavior in the default case (i.e., @code{FS} is equal
5183 to @w{@code{" "}}). This feature can be a problem if you really don't
5184 want the newline character to separate fields, because there is no way to
5185 prevent it. However, you can work around this by using the @code{split}
5186 function to break up the record manually
5189 the special feature in a different way, by making @code{FS} into a
5194 Another way to separate fields is to
5195 put each field on a separate line: to do this, just set the
5196 variable @code{FS} to the string @code{"\n"}. (This single
5214 A simple program to process this file is as follows:
5253 @code{RS}.
5254 (@samp{==} means ``is equal to.'')
5257 @code{RS}:
5260 @table @code
5272 always serves as a field separator, in addition to whatever value
5273 @code{FS} may have. Leading and trailing newlines in a file are ignored.
5282 @cindex @code{RT} variable
5283 In all cases, @command{gawk} sets @code{RT} to the input text that matched the
5284 value specified by @code{RS}.
5290 @section Explicit Input with @code{getline}
5293 @cindex @code{getline} command, explicit input with
5299 special built-in command called @code{getline} that
5300 can be used to read input under your explicit control.
5302 The @code{getline} command is used in several different ways and should
5304 The examples that follow the explanation of the @code{getline} command
5306 and study the @code{getline} command @emph{after} you have reviewed the
5309 @cindex @code{ERRNO} variable
5310 @cindex differences in @command{awk} and @command{gawk}, @code{getline} command
5311 @cindex @code{getline} command, return values
5312 The @code{getline} command returns one if it finds a record and zero if
5314 a record, such as a file that cannot be opened, then @code{getline}
5316 @code{ERRNO} to a string describing the error that occurred.
5322 * Plain Getline:: Using @code{getline} with no arguments.
5323 * Getline/Variable:: Using @code{getline} into a variable.
5324 * Getline/File:: Using @code{getline} from a file.
5325 * Getline/Variable/File:: Using @code{getline} into a variable from a
5327 * Getline/Pipe:: Using @code{getline} from a pipe.
5328 * Getline/Variable/Pipe:: Using @code{getline} into a variable from a
5330 * Getline/Coprocess:: Using @code{getline} from a coprocess.
5331 * Getline/Variable/Coprocess:: Using @code{getline} into a variable from a
5333 * Getline Notes:: Important things to know about @code{getline}.
5334 * Getline Summary:: Summary of @code{getline} Variants.
5338 @subsection Using @code{getline} with No Arguments
5340 The @code{getline} command can be used without arguments to read input
5343 finished processing the current record, but want to do some special
5382 This form of the @code{getline} command sets @code{NF},
5383 @code{NR}, @code{FNR}, and the value of @code{$0}.
5385 @strong{Note:} The new value of @code{$0} is used to test
5387 of @code{$0} that triggered the rule that executed @code{getline}
5389 By contrast, the @code{next} statement reads a new record
5394 @subsection Using @code{getline} into a Variable
5396 @cindex variables, @code{getline} command into, using
5398 You can use @samp{getline @var{var}} to read the next record from
5402 and you want to read it without triggering
5403 any rules. This form of @code{getline} allows you to read that line
5438 The @code{getline} command used in this way sets only the variables
5439 @code{NR} and @code{FNR} (and of course, @var{var}). The record is not
5440 split into fields, so the values of the fields (including @code{$0}) and
5441 the value of @code{NF} do not change.
5444 @subsection Using @code{getline} from a File
5448 @cindex @code{<} (left angle bracket), @code{<} operator (I/O)
5449 @cindex left angle bracket (@code{<}), @code{<} operator (I/O)
5451 Use @samp{getline < @var{file}} to read the next record from @var{file}.
5454 because it directs input to come from a different place.
5457 encounters a first field with a value equal to 10 in the current input
5470 Because the main input stream is not used, the values of @code{NR} and
5471 @code{FNR} are not changed. However, the record it reads is split into fields in
5472 the normal manner, so the values of @code{$0} and the other fields are
5473 changed, resulting in a new value of @code{NF}.
5475 @cindex POSIX @command{awk}, @code{<} operator and
5476 @c Thanks to Paul Eggert for initial wording here
5477 According to POSIX, @samp{getline < @var{expression}} is ambiguous if
5482 to be portable to other @command{awk} implementations.
5485 @subsection Using @code{getline} into a Variable from a File
5487 @cindex variables, @code{getline} command into, using
5489 Use @samp{getline @var{var} < @var{file}} to read input
5492 is a string-valued expression that specifies the file from which to read.
5494 In this version of @code{getline}, none of the built-in variables are
5497 For example, the following program copies all the input files to the
5513 Note here how the name of the extra input file is not built into
5517 @cindex @code{close} function
5518 The @code{close} function is called to ensure that if two identical
5531 @subsection Using @code{getline} from a Pipe
5533 @cindex @code{|} (vertical bar), @code{|} operator (I/O)
5534 @cindex vertical bar (@code{|}), @code{|} operator (I/O)
5538 The output of a command can also be piped into @code{getline}, using
5541 is piped into @command{awk} to be used as input. This form of @code{getline}
5543 For example, the following program copies its input to its output, except for
5560 @cindex @code{close} function
5561 The @code{close} function is called to ensure that if two identical
5600 This variation of @code{getline} splits the record into fields, sets the
5601 value of @code{NF}, and recomputes the value of @code{$0}. The values of
5602 @code{NR} and @code{FNR} are not changed.
5604 @cindex POSIX @command{awk}, @code{|} I/O operator and
5605 @c Thanks to Paul Eggert for initial wording here
5606 According to POSIX, @samp{@var{expression} | getline} is ambiguous if
5611 to be portable to other @command{awk} implementations.
5614 @subsection Using @code{getline} into a Variable from a Pipe
5616 @cindex variables, @code{getline} command into, using
5619 output of @var{command} is sent through a pipe to
5620 @code{getline} and into the variable @var{var}. For example, the
5622 @code{current_time}, using the @command{date} utility, and then
5633 In this version of @code{getline}, none of the built-in variables are
5637 @c Thanks to Paul Eggert for initial wording here
5638 According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if
5643 program to be portable to other @command{awk} implementations.
5647 @subsection Using @code{getline} from a Coprocess
5648 @cindex coprocesses, @code{getline} from
5650 @cindex @code{getline} command, coprocesses, using from
5651 @cindex @code{|} (vertical bar), @code{|&} operator (I/O)
5652 @cindex vertical bar (@code{|}), @code{|&} operator (I/O)
5656 Input into @code{getline} from a pipe is a one-way operation.
5658 sends data @emph{to} your @command{awk} program.
5660 On occasion, you might want to send data to another program
5665 Typically, you write data to the coprocess first and then
5674 which sends a query to @command{db_server} and then reads the results.
5676 The values of @code{NR} and
5677 @code{FNR} are not changed,
5680 the normal manner, thus changing the values of @code{$0}, of the other fields,
5681 and of @code{NF}.
5684 this is the @value{SECTION} on @code{getline}.
5689 @subsection Using @code{getline} into a Variable from a Coprocess
5691 @cindex variables, @code{getline} command into, using
5694 the coprocess @var{command} is sent through a two-way pipe to @code{getline}
5697 In this version of @code{getline}, none of the built-in variables are
5703 this is the @value{SECTION} on @code{getline}.
5709 @subsection Points to Remember About @code{getline}
5710 Here are some miscellaneous points about @code{getline} that
5715 When @code{getline} changes the value of @code{$0} and @code{NF},
5716 @command{awk} does @emph{not} automatically jump to the start of the
5726 program may have open to just one. In @command{gawk}, there is no such limit.
5730 @cindex side effects, @code{FILENAME} variable
5732 @cindex @code{FILENAME} variable, @code{getline}, setting with
5733 @cindex dark corner, @code{FILENAME} variable
5734 @cindex @code{getline} command, @code{FILENAME} variable and
5735 @cindex @code{BEGIN} pattern, @code{getline} and
5737 An interesting side effect occurs if you use @code{getline} without a
5738 redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline}
5739 reads from the command-line @value{DF}s, the first @code{getline} command
5740 causes @command{awk} to set the value of @code{FILENAME}. Normally,
5741 @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you
5742 have not yet started to process the command-line @value{DF}s.
5748 Using @code{FILENAME} with @code{getline}
5750 is likely to be a source for
5752 current input file. However, by not using a variable, @code{$0}
5753 and @code{NR} are still updated. If you're doing this, it's
5755 trying to accomplish.
5759 @subsection Summary of @code{getline} Variants
5760 @cindex @code{getline} command, variants
5762 The following table summarizes the eight variants of @code{getline},
5765 @multitable {@var{command} @code{|& getline} @var{var}} {1234567890123456789012345678901234567890}
5766 @item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR}
5768 @item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, and @code{NR}
5770 @item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF}
5772 @item @code{getline @var{var} < @var{file}} @tab Sets @var{var}
5774 @item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF}
5776 @item @var{command} @code{| getline} @var{var} @tab Sets @var{var}
5778 @item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF}.
5781 @item @var{command} @code{|& getline} @var{var} @tab Sets @var{var}.
5794 One of the most common programming actions is to @dfn{print}, or output,
5795 some or all of the input. Use the @code{print} statement
5796 for simple output, and the @code{printf} statement
5798 The @code{print} statement is not limited when
5799 computing @emph{which} values to print. However, with two exceptions,
5800 you cannot specify @emph{how} to print them---how many
5801 columns, whether to use exponential notation or not, and so on.
5804 For printing with specifications, you need the @code{printf} statement
5808 @cindex @code{print} statement
5809 @cindex @code{printf} statement
5811 also covers I/O redirections to files and pipes, introduces
5813 and discusses the @code{close} built-in function.
5816 * Print:: The @code{print} statement.
5817 * Print Examples:: Simple examples of @code{print} statements.
5818 * Output Separators:: The output separators and how to change them.
5819 * OFMT:: Controlling Numeric Output With @code{print}.
5820 * Printf:: The @code{printf} statement.
5821 * Redirection:: How to redirect output to multiple files and
5823 * Special Files:: File name interpretation in @command{gawk}.
5824 @command{gawk} allows access to inherited file
5830 @section The @code{print} Statement
5832 The @code{print} statement is used to produce output with simple, standardized
5833 formatting. Specify only the strings or numbers to print, in a
5847 The items to print can be constant strings or numbers, fields of the
5848 current record (such as @code{$1}), variables, or any @command{awk}
5849 expression. Numeric values are converted to strings and then printed.
5854 The simple statement @samp{print} with no items is equivalent to
5856 line, use @samp{print ""}, where @code{""} is the empty string.
5858 @w{@code{"Don't Panic"}}, as one item. If you forget to use the
5864 @section Examples of @code{print} Statements
5866 Each @code{print} statement makes at least one line of output. However, it
5867 isn't limited to only one line. If an item value is a string that contains a
5869 single @code{print} statement can make any number of lines this way.
5873 (the @samp{\n} is an escape sequence, used to represent the newline
5896 @cindex @code{print} statement, commas, omitting
5898 @cindex troubleshooting, @code{print} statement, omitting commas
5899 A common mistake in using the @code{print} statement is to omit the comma
5902 juxtaposing two string expressions in @command{awk} means to concatenate
5914 @cindex @code{BEGIN} pattern, headings, adding
5917 would make it clearer. Let's add some headings to our table of months
5918 (@code{$1}) and green crates shipped (@code{$2}). We do this using the
5919 @code{BEGIN} pattern
5955 @cindex @code{printf} statement, columns, aligning
5958 complicated when there are many columns to fix. Counting spaces for two
5960 a lot of time. This is why the @code{printf} statement was
5964 @cindex line continuations, in @code{print} statement
5965 @cindex @code{print} statement, line continuations and
5966 @strong{Note:} You can continue either a @code{print} or
5967 @code{printf} statement simply by putting a newline after any comma
5974 @cindex @code{OFS} variable
5975 As mentioned previously, a @code{print} statement contains a list
5977 separated by single spaces. However, this doesn't need to be the case;
5980 built-in variable @code{OFS}. The initial value of this variable
5981 is the string @w{@code{" "}}---that is, a single space.
5983 The output from an entire @code{print} statement is called an
5984 @dfn{output record}. Each @code{print} statement outputs one output
5986 (or @code{ORS}). The initial
5987 value of @code{ORS} is the string @code{"\n"}; i.e., a newline
5988 character. Thus, each @code{print} statement normally makes a separate line.
5991 @cindex output record separator, See @code{ORS} variable
5992 @cindex @code{ORS} variable
5993 @cindex @code{BEGIN} pattern, @code{OFS}/@code{ORS} variables, assigning values to
5994 In order to change how output fields and records are separated, assign
5995 new values to the variables @code{OFS} and @code{ORS}. The usual
5996 place to do this is in the @code{BEGIN} rule
6014 program by using a new value of @code{OFS}.
6028 If the value of @code{ORS} does not contain a newline, the program's output
6032 @section Controlling Numeric Output with @code{print}
6036 When the @code{print} statement is used to print numeric values,
6037 @command{awk} internally converts the number to a string of characters
6038 and prints that string. @command{awk} uses the @code{sprintf} function
6039 to do this conversion
6041 For now, it suffices to say that the @code{sprintf}
6042 function accepts a @dfn{format specification} that tells it how to format
6048 @cindex @code{sprintf} function
6049 @cindex @code{OFMT} variable
6051 @cindex output, format specifier, @code{OFMT}
6052 The built-in variable @code{OFMT} contains the default format specification
6053 that @code{print} uses with @code{sprintf} when it wants to convert a
6054 number to a string for printing.
6055 The default value of @code{OFMT} is @code{"%.6g"}.
6056 The way @code{print} prints numbers can be changed
6058 as the value of @code{OFMT}, as shown in the following example:
6068 @cindex dark corner, @code{OFMT} variable
6069 @cindex POSIX @command{awk}, @code{OFMT} variable and
6070 @cindex @code{OFMT} variable, POSIX @command{awk} and
6071 According to the POSIX standard, @command{awk}'s behavior is undefined
6072 if @code{OFMT} contains anything but a floating-point conversion specification.
6076 @section Using @code{printf} Statements for Fancier Printing
6079 @cindex @code{printf} statement
6083 normally provided by @code{print}, use @code{printf}.
6084 @code{printf} can be used to
6085 specify the width to use for each item, as well as various
6086 formatting choices for numbers (such as what output base to use, whether to
6087 print an exponent, whether to print a sign, and how many digits to print
6089 the @dfn{format string}, that controls how and where to print the other
6093 * Basic Printf:: Syntax of the @code{printf} statement.
6100 @subsection Introduction to the @code{printf} Statement
6102 @cindex @code{printf} statement, syntax of
6103 A simple @code{printf} statement looks like this:
6116 The difference between @code{printf} and @code{print} is the @var{format}
6118 specifies how to output each of the other arguments. It is called the
6121 The format string is very similar to that in the ISO C library function
6122 @code{printf}. Most of @var{format} is text to output verbatim.
6124 Each format specifier says to output the next item in the argument list
6127 The @code{printf} statement does not automatically append a newline
6128 to its output. It outputs only what the format string specifies.
6130 The output separator variables @code{OFS} and @code{ORS} have no effect
6131 on @code{printf} statements. For example:
6148 @cindex @code{printf} statement, format-control characters
6149 @cindex format specifiers, @code{printf} statement
6152 a @dfn{format-control letter}---it tells the @code{printf} statement
6153 how to output one item. The format-control letter specifies what @emph{kind}
6154 of value to print. The rest of the format specifier is made up of
6155 optional @dfn{modifiers} that control @emph{how} to print the value, such as
6158 @table @code
6228 outside the range of the widest C integer type, @command{gawk} switches to the
6236 @subsection Modifiers for @code{printf} Formats
6239 @cindex @code{printf} statement, modifiers
6245 We will use the bullet symbol ``@bullet{}'' in the following examples to
6250 @table @code
6251 @cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{printf} statements
6252 @cindex @code{printf} statement, positional specifiers
6254 @cindex positional specifiers, @code{printf} statement
6257 Normally, format specifications are applied to arguments in the order
6259 specification is applied to a specific argument, instead of what
6271 At first glance, this feature doesn't seem to be of much use.
6275 which describes how and why to use positional specifiers.
6281 says to left-justify
6299 says to always supply a sign for numeric conversions, even if the data
6300 to format is positive. The @samp{+} overrides the space modifier.
6315 This applies even to non-numeric output formats.
6318 value to print.
6323 field to expand to this width. The default way to do this is to
6344 Preceding the @var{width} with a minus sign causes the output to be
6349 specifies the precision to use when printing.
6353 @item @code{%e}, @code{%E}, @code{%f}
6354 Number of digits to the right of the decimal point.
6356 @item @code{%g}, @code{%G}
6359 @item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X}
6360 Minimum number of digits to print.
6362 @item @code{%s}
6376 The C library @code{printf}'s dynamic @var{width} and @var{prec}
6377 capability (for example, @code{"%*.*s"}) is supported. Instead of
6389 is exactly equivalent to:
6400 concatenation to build up the format string, like so:
6410 This is not particularly easy to read but it does work.
6413 @cindex troubleshooting, fatal errors, @code{printf} format strings
6414 @cindex POSIX @command{awk}, @code{printf} format strings and
6415 C programmers may be used to supplying additional
6417 modifiers in @code{printf} format strings. These are not valid in @command{awk}.
6426 @subsection Examples Using @code{printf}
6429 how to use @code{printf} to make an aligned table:
6437 prints the names of the bulletin boards (@code{$1}) in the file
6439 prints the phone numbers (@code{$2}) next on the line. This
6458 In this case, the phone numbers had to be printed as strings because
6463 It wasn't necessary to specify a width for the phone numbers because
6464 they are last on their lines. They don't need to have spaces
6467 The table could be made to look even nicer by adding headings to the
6468 tops of the columns. This is done using the @code{BEGIN} pattern
6479 The above example mixed @code{print} and @code{printf} statements in
6480 the same program. Using just @code{printf} statements can produce the
6505 At this point, it would be a worthwhile exercise to use the
6506 @code{printf} statement to line up the headings and table data for the
6508 on the @code{print} statement
6513 @section Redirecting Output of @code{print} and @code{printf}
6517 So far, the output from @code{print} and @code{printf} has gone
6518 to the standard
6519 output, usually the terminal. Both @code{print} and @code{printf} can
6520 also send their output to other places.
6523 A redirection appears after the @code{print} or @code{printf} statement.
6528 @cindex @code{print} statement, See Also redirection, of output
6529 @cindex @code{printf} statement, See Also redirection, of output
6530 There are four forms of output redirection: output to a file, output
6531 appended to a file, output through a pipe to another command, and output
6532 to a coprocess. They are all shown for the @code{print} statement,
6533 but they work identically for @code{printf}:
6535 @table @code
6536 @cindex @code{>} (right angle bracket), @code{>} operator (I/O)
6537 @cindex right angle bracket (@code{>}), @code{>} operator (I/O)
6542 expression. Its value is changed to a string and then used as a
6546 before the first output is written to it. Subsequent writes to the same
6547 @var{output-file} do not erase @var{output-file}, but append to it.
6550 is how an @command{awk} program can write a list of BBS names to one
6551 file named @file{name-list}, and a list of phone numbers to another file
6556 > print $1 > "name-list" @}' BBS-list
6561 $ cat name-list
6568 Each output file contains one name or number per line.
6570 @cindex @code{>} (right angle bracket), @code{>>} operator (I/O)
6571 @cindex right angle bracket (@code{>}), @code{>>} operator (I/O)
6577 appended to the file.
6580 @cindex @code{|} (vertical bar), @code{|} operator (I/O)
6584 It is also possible to send output to another program through a pipe
6585 instead of into a file. This type of redirection opens a pipe to
6587 to another process created to execute @var{command}.
6590 expression. Its value is converted to a string whose contents give
6591 the shell command to be run. For example, the following produces two
6598 record. It's done to avoid overfull hboxes in TeX. Leave it
6611 The next example uses redirection to mail a message to the mailing
6624 @code{m}. It's then sent down the pipeline to the @command{mail} program.
6625 (The parentheses group the items to concatenate---see
6628 The @code{close} function is called here because it's a good idea to close
6629 the pipe as soon as all the intended output has been sent to it.
6633 This example also illustrates the use of a variable to represent
6634 a @var{file} or @var{command}---it is not necessary to always
6640 @cindex @code{|} (vertical bar), @code{|&} operator (I/O)
6644 This type of redirection prints the items to the input of @var{command}.
6647 can be read with @code{getline}.
6649 but subsidiary to, the @command{awk} program.
6658 asks the system to open a file, pipe, or coprocess only if the particular
6660 to by your program or if it has been closed since it was last written to.
6663 It is a common error to use @samp{>} redirection for the first @code{print}
6664 to a file, and then to use @samp{>>} for subsequent output:
6677 use @samp{>} for all the @code{print} statements, since the output file
6694 program may have open to just one! In @command{gawk}, there is no such limit.
6695 @command{gawk} allows a program to
6703 A particularly powerful way to use redirection is to build command lines
6706 are stored in uppercase, and you wish to rename them to have names in
6716 The @code{tolower} function returns its argument string with all
6717 uppercase characters converted to lowercase
6720 using the @command{mv} utility to rename the files.
6721 It then sends the list to the shell for execution.
6731 internally. These @value{FN}s provide access to standard file descriptors,
6738 * Special Caveats:: Things to watch out for.
6752 already available to them for reading and writing. These are known as
6754 output}. These streams are, by default, connected to your terminal, but
6763 In other implementations of @command{awk}, the only way to write an error
6764 message to standard error in an @command{awk} program is as follows:
6771 This works by opening a pipeline to a shell command that can access the
6775 don't do this. Instead, they send the error messages to the
6785 that happens, writing to the terminal is not correct. In fact, if
6794 has been ported to, not just those that are POSIX-compliant:
6797 @cindex @code{/dev/@dots{}} special files (@command{gawk})
6798 @cindex files, @code{/dev/@dots{}} special files
6799 @c @cindex @code{/dev/stdin} special file
6800 @c @cindex @code{/dev/stdout} special file
6801 @c @cindex @code{/dev/stderr} special file
6802 @c @cindex @code{/dev/fd} special files
6823 The proper way to write an error message in a @command{gawk} program
6824 is to use @file{/dev/stderr}, like this:
6833 It is a common error to omit the quotes, which leads
6834 to confusing results.
6842 @command{gawk} also provides special @value{FN}s that give access to information
6845 first be closed with the @code{close} function
6849 @c @cindex @code{/dev/pid} special file
6850 @c @cindex @code{/dev/pgrpid} special file
6851 @c @cindex @code{/dev/ppid} special file
6852 @c @cindex @code{/dev/user} special file
6871 @table @code
6873 The return value of the @code{getuid} system call
6877 The return value of the @code{geteuid} system call
6881 The return value of the @code{getgid} system call
6885 The return value of the @code{getegid} system call
6890 the @code{getgroups} system call.
6906 To obtain process-related information, use the @code{PROCINFO} array.
6936 Here is a list of things to bear in mind when using the
6948 @cindex @code{PROCINFO} array
6962 To obtain process-related information, use the @code{PROCINFO} array.
6972 but sometimes it did; thus, it was decided to make @command{gawk}'s
6973 behavior consistent on all systems and to have it always interpret
6977 file descriptor that is @code{dup}'ed from file descriptor 4. Most of
6978 the time this does not matter; however, it is important to @emph{not}
6979 close any of the files related to file descriptors 0, 1, and 2.
6997 @cindex @code{getline} command, coprocesses, using from
6999 If the same @value{FN} or the same shell command is used with @code{getline}
7004 The next time the same file or command is used with @code{getline},
7009 writes to the same file or command are appended to the previous writes.
7012 @cindex @code{close} function
7013 This implies that special steps are necessary in order to read the same
7014 file again from the beginning, or to rerun a shell command (rather than
7015 reading more output from the same command). The @code{close} function
7030 value must @emph{exactly} match the string that was used to open the file or
7045 Once this function call is executed, the next @code{getline} from that
7046 file or command, or the next @code{print} or @code{printf} to that
7048 Because the expression that you use to close a file or pipeline must
7049 exactly match the expression used to open the file or run the command,
7050 it is good practice to use a variable to store the @value{FN} or command.
7061 This helps avoid hard-to-find typographical errors in your @command{awk}
7068 begin reading it with @code{getline}.
7073 system limit on the number of open files in one process. It is best to
7078 the command reading the pipe normally continues to try to read input
7081 output is redirected to the @command{mail} program, the message is not
7086 This is not the same thing as giving more input to the first run!
7088 For example, suppose a program pipes output to the @command{mail} program.
7089 If it outputs several lines redirected to this pipe without closing
7095 @cindex differences in @command{awk} and @command{gawk}, @code{close} function
7096 @cindex portability, @code{close} function and
7097 If you use more files than the system allows you to have open,
7098 @command{gawk} attempts to multiplex the available open files among
7099 your @value{DF}s. @command{gawk}'s ability to do this depends upon the
7101 therefore both good practice and good portability advice to always
7102 use @code{close} on your files when you are done with them.
7118 Without the call to @code{close} indicated in the comment, @command{awk}
7119 creates child processes to run the commands, until it eventually
7123 return status from @code{getline}), the child process is not
7126 it is referred to as ``reaping.''}
7129 is not closed and released until @code{close} is called or
7132 @code{close} will silently do nothing if given an argument that
7143 @cindex @code{|} (vertical bar), @code{|&} operator (I/O), pipes, closing
7144 When using the @samp{|&} operator to communicate with a coprocess,
7145 it is occasionally useful to be able to close one end of the two-way
7147 This is done by supplying a second argument to @code{close}.
7148 As in any other call to @code{close},
7149 the first argument is the name of the command or special file used
7150 to start the coprocess.
7152 @code{"to"} or @code{"from"}. Case does not matter.
7159 @subheading Advanced Notes: Using @code{close}'s Return Value
7160 @cindex advanced features, @code{close} function
7161 @cindex dark corner, @code{close} function
7162 @cindex @code{close} function, return values
7164 @cindex return values, @code{close} function
7165 @cindex differences in @command{awk} and @command{gawk}, @code{close} function
7166 @cindex Unix @command{awk}, @code{close} function and
7168 In many versions of Unix @command{awk}, the @code{close} function
7169 is actually a statement. It is a syntax error to try and use the return
7170 value from @code{close}:
7179 @command{gawk} treats @code{close} as a function.
7184 @code{ERRNO} to a string describing the problem.
7189 This is a full 16-bit value as returned by the @code{wait}
7191 how to decode this value.}
7192 Otherwise, it is the return value from the system's @code{close} or
7193 @code{fclose} C functions when closing input or output
7198 The POSIX standard is very vague; it says that @code{close}
7208 It allows you to get the output from a command as well as its
7210 @c 8/21/2002, FIXME: Maybe the code and this doc should be adjusted to
7218 was terminated by a signal. Subtract 128 to get the signal number:
7225 print command, "exited with code", exit_val
7229 piping into @code{getline}. For commands piped into
7230 from @code{print} or @code{printf}, the
7231 return value from @code{close} is that of the library's
7232 @code{pclose} function.
7246 and actions. An expression evaluates to a value that you can print, test,
7247 or pass to a function. Additionally, an expression
7248 can assign a new value to a variable or a field by using an assignment operator.
7252 statements contain one or more expressions that specify the data on which to
7259 * Using Constant Regexps:: When and how to use a regexp constant.
7260 * Variables:: Variables give names to values for later use.
7261 * Conversion:: The conversion of strings to numbers and vice
7291 value that isn't going to change. Numeric constants can
7332 eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
7344 programming languages allow you to specify numbers in other bases, often
7351 @samp{a} through @samp{f} are used to represent the rest.
7360 there is a special notation to help signify the base.
7364 @table @code
7382 Being able to use octal and hexadecimal constants in your programs is most
7392 (If you really need to do this, use the @option{--non-decimal-data}
7396 you can use the @code{strtonum} function
7398 to convert the data into a number.
7399 Most of the time, you will want to use octal or hexadecimal constants
7415 Octal and hexadecimal source code constants are a @command{gawk} extension.
7430 numbers to strings:
7442 @cindex @code{~} (tilde), @code{~} operator
7443 @cindex tilde (@code{~}), @code{~} operator
7444 @cindex @code{!} (exclamation point), @code{!~} operator
7445 @cindex exclamation point (@code{!}), @code{!~} operator
7447 slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in
7458 operators, a regexp constant merely stands for the regexp that is to be
7460 However, regexp constants (such as @code{/foo/}) may be used like simple expressions.
7466 This means that the following two code segments:
7497 This code is ``obviously'' testing @code{$1} for a match against the regexp
7498 @code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} actually means
7500 against the regexp @code{/foo/}. The result is either zero or one,
7503 Because it is unlikely that you would ever really want to make this kind of
7513 assigns either zero or one to the variable @code{matches}, depending
7519 @cindex dark corner, regexp constants, as arguments to user-defined functions
7520 @cindex @code{gensub} function (@command{gawk})
7521 @cindex @code{sub} function
7522 @cindex @code{gsub} function
7524 the @code{gensub}, @code{sub}, and @code{gsub} functions, and as the
7525 second argument of the @code{match} function
7528 the third argument of @code{split} to be a regexp constant, but some
7531 This can lead to confusion when attempting to use regexp constants
7532 as arguments to user-defined functions
7556 In this example, the programmer wants to pass a regexp constant to the
7557 user-defined function @code{mysub}, which in turn passes it on to
7558 either @code{sub} or @code{gsub}. However, what really happens is that
7559 the @code{pat} parameter is either one or zero, depending upon whether
7560 or not @code{$0} matches @code{/hi/}.
7562 a parameter to a user-defined function, since passing a truth value in
7586 Variables let you give names to values and refer to them later. Variables
7587 have already been used in many of the examples. The name of a variable
7589 with a digit. Case is significant in variable names; @code{a} and @code{A}
7592 A variable name is a valid expression by itself; it represents the
7601 A few variables have special built-in meanings, such as @code{FS} (the
7602 field separator), and @code{NF} (the number of fields in the current input
7610 By default, variables are initialized to the empty string, which
7611 is zero if converted to a number. There is no need to
7631 @cindex @code{-v} option, variables, assigning
7644 @code{BEGIN} rules are run. The @option{-v} option and its assignment
7657 prints the value of field number @code{n} for all input records. Before
7658 the first file is read, the command line sets the variable @code{n}
7659 equal to four. This causes the fourth field to be printed in lines from
7661 but before the second file is started, @code{n} is set to two, so that the
7676 the @command{awk} program in the @code{ARGV} array
7686 @cindex converting, strings to numbers
7690 Strings are converted to numbers and numbers are converted to strings, if the context
7692 either @code{foo} or @code{bar} in the expression @samp{foo + bar}
7693 happens to be a string, it is converted to a number before the addition
7695 are converted to strings. Consider the following:
7704 the variables @code{two} and @code{three} are converted to strings and
7705 concatenated together. The resulting string is converted back to the
7706 number 23, to which 4 is then added.
7708 @cindex null strings, converting numbers to strings
7710 If, for some reason, you need to force a number to be converted to a
7711 string, concatenate the empty string, @code{""}, with that number.
7712 To force a string to be converted to a number, add zero to that string.
7713 A string is converted to a number by interpreting any numeric prefix
7715 @code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"}
7717 Strings that can't be interpreted as valid numbers convert to zero.
7719 @cindex @code{CONVFMT} variable
7721 by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
7722 Numbers are converted using the @code{sprintf} function
7723 with @code{CONVFMT} as the format
7727 @code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with
7728 at least six significant digits. For some applications, you might want to
7729 change it to specify more precision.
7731 17 digits is enough to capture a floating-point number's
7733 most of the time.@footnote{Pathological cases can require up to
7734 752 digits (!), but we doubt that you need to worry about this.}
7736 @cindex dark corner, @code{CONVFMT} variable
7737 Strange results can occur if you set @code{CONVFMT} to a string that doesn't
7738 tell @code{sprintf} how to format floating-point numbers in a useful way.
7740 all numbers to the same constant string.
7742 it to a string is @emph{always} an integer, no matter what the value of
7743 @code{CONVFMT} may be. Given the following code fragment:
7752 @code{b} has the value @code{"12"}, not @code{"12.00"}.
7755 @cindex POSIX @command{awk}, @code{OFMT} variable and
7756 @cindex @code{OFMT} variable
7758 @cindex @command{awk}, new vs. old, @code{OFMT} variable
7759 Prior to the POSIX standard, @command{awk} used the value
7760 of @code{OFMT} for converting numbers to strings. @code{OFMT}
7761 specifies the output format to use when printing numbers with @code{print}.
7762 @code{CONVFMT} was introduced in order to separate the semantics of
7763 conversion from the semantics of printing. Both @code{CONVFMT} and
7764 @code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority
7766 However, these semantics for @code{OFMT} are something to keep in mind if you must
7767 port your new style program to older implementations of @command{awk}.
7771 for more information on the @code{print} statement.
7773 Finally, once again, where you are can matter when it comes to
7778 programs, it affects the decimal point character. The @code{"C"} locale, and most
7784 point when reading the @command{awk} program source code, and for command-line
7786 However, when interpreting input data, for @code{print} and @code{printf} output,
7787 and for number to string conversion, the local decimal point character is used.
7805 the decimal point separator. In the normal @code{"C"} locale, @command{gawk}
7823 precedence rules and work as you would expect them to.
7848 the highest precedence to the lowest:
7850 @table @code
7855 Unary plus; the expression is converted to a number.
7860 Exponentiation; @var{x} raised to the @var{y} power. @samp{2 ^ 3} has
7861 the value eight; the character sequence @samp{**} is equivalent to
7871 numbers, the result is @emph{not} rounded to an integer---@samp{3 / 4} has
7873 to forget that @emph{all} numbers in @command{awk} are floating-point,
7894 When computing the remainder of @code{@var{x} % @var{y}},
7895 the quotient is rounded toward zero to an integer and
7905 @code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus:
7915 @cindex portability, @code{**} operator and
7916 @cindex @code{*} (asterisk), @code{**} operator
7917 @cindex asterisk (@code{*}), @code{**} operator
7935 specific operator to represent it. Instead, concatenation is performed by
7936 writing expressions next to one another, with no operator. For example:
7957 often necessary to insure that it happens at the right time by using
7958 parentheses to enclose the items to concatenate. For example, the
7959 following code fragment does not concatenate @code{file} and @code{name}
7964 name = "name"
7965 print "something meaningful" > file name
7969 It is necessary to use the following:
7972 print "something meaningful" > (file name)
7992 It is not defined whether the assignment to @code{a} happens
7993 before or after the value of @code{a} is retrieved for producing the
8029 But where did the space disappear to?
8039 This forces @command{awk} to treat the @samp{-} on the @samp{-24} as unary.
8043 @minus{}12 (@code{"@ "} @minus{} 24)
8061 @cindex @code{=} (equals sign), @code{=} operator
8062 @cindex equals sign (@code{=}), @code{=} operator
8064 value into a variable. For example, let's assign the value one to the variable
8065 @code{z}:
8071 After this expression is executed, the variable @code{z} has the value one.
8072 Whatever old value @code{z} had before the assignment is forgotten.
8076 the value @code{"this food is good"} in the variable @code{message}:
8091 except to compute a value. If the value isn't used, there's no reason to
8112 It is important to note that variables do @emph{not} have permanent types.
8114 to hold at the moment. In the following program fragment, the variable
8115 @code{foo} has a numeric value at first, and a string value later on:
8125 When the second assignment gives @code{foo} a string value, the fact that
8129 zero. After executing the following code, the value of @code{foo} is five:
8153 (@code{x}, @code{y}, and @code{z}).
8155 value of @samp{z = 5}, which is five, is stored into @code{y} and then
8156 the value of @samp{y = z = 5}, which is five, is stored into @code{x}.
8159 example, it is valid to write @samp{x != (y = 1)} to set @code{y} to one,
8160 and then test whether @code{x} equals one. But this style tends to make
8161 programs hard to read; such nesting of assignments should be avoided,
8164 @cindex @code{+} (plus sign), @code{+=} operator
8165 @cindex plus sign (@code{+}), @code{+=} operator
8169 to the old value of the variable. Thus, the following assignment adds
8170 five to the value of @code{foo}:
8177 This is equivalent to the following:
8192 # Thanks to Pat Rankin for this example
8207 The indices of @code{bar} are practically guaranteed to be different, because
8208 @code{rand} returns different values each time it is called.
8209 (Arrays and the @code{rand} function haven't been covered yet.
8214 It is up to the implementation as to which expression is evaluated
8224 The value of @code{a[3]} could be either two or four.
8228 to a number.
8231 @table @code
8233 Adds @var{increment} to the value of @var{lvalue}.
8245 Sets @var{lvalue} to its remainder by @var{modulus}.
8251 Raises @var{lvalue} to the power @var{power}.
8256 @cindex @code{-} (hyphen), @code{-=} operator
8257 @cindex hyphen (@code{-}), @code{-=} operator
8258 @cindex @code{*} (asterisk), @code{*=} operator
8259 @cindex asterisk (@code{*}), @code{*=} operator
8260 @cindex @code{/} (forward slash), @code{/=} operator
8261 @cindex forward slash (@code{/}), @code{/=} operator
8262 @cindex @code{%} (percent sign), @code{%=} operator
8263 @cindex percent sign (@code{%}), @code{%=} operator
8264 @cindex @code{^} (caret), @code{^=} operator
8265 @cindex caret (@code{^}), @code{^=} operator
8266 @cindex @code{*} (asterisk), @code{**=} operator
8267 @cindex asterisk (@code{*}), @code{**=} operator
8269 @item @var{lvalue} @code{+=} @var{increment} @tab Adds @var{increment} to the value of @var{lvalue}.
8271 @item @var{lvalue} @code{-=} @var{decrement} @tab Subtracts @var{decrement} from the value of @var{…
8273 @item @var{lvalue} @code{*=} @var{coefficient} @tab Multiplies the value of @var{lvalue} by @var{co…
8275 @item @var{lvalue} @code{/=} @var{divisor} @tab Divides the value of @var{lvalue} by @var{divisor}.
8277 @item @var{lvalue} @code{%=} @var{modulus} @tab Sets @var{lvalue} to its remainder by @var{modulus}.
8281 @item @var{lvalue} @code{^=} @var{power} @tab
8282 @item @var{lvalue} @code{**=} @var{power} @tab Raises @var{lvalue} to the power @var{power}.
8285 @cindex POSIX @command{awk}, @code{**=} operator and
8286 @cindex portability, @code{**=} operator and
8294 @cindex dark corner, regexp constants, @code{/=} operator and
8295 @cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
8296 @cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
8297 @cindex regexp constants, @code{/=@dots{}/}, @code{/=} operator and
8303 @cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8304 @cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8305 @cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8344 the increment operators add no power to the @command{awk} language; however, they
8348 @cindex @code{+} (plus sign), decrement/increment operators
8349 @cindex plus sign (@code{+}), decrement/increment operators
8351 The operator used for adding one is written @samp{++}. It can be used to increment
8353 To pre-increment a variable @code{v}, write @samp{++v}. This adds
8354 one to the value of @code{v}---that new value is also the value of the
8360 value. Thus, if @code{foo} has the value four, then the expression @samp{foo++}
8361 has the value four, but it changes the value of @code{foo} to five.
8368 not necessarily equal @code{foo}. But the difference is minute as
8369 long as you stick to numbers that are fairly small (less than 10e12).
8371 @cindex @code{$} (dollar sign), incrementing fields and arrays
8372 @cindex dollar sign (@code{$}), incrementing fields and arrays
8374 just like variables. (Use @samp{$(i++)} when you want to do a field reference
8381 the lvalue to pre-decrement or after it to post-decrement.
8384 @table @code
8385 @cindex @code{+} (plus sign), @code{++} operator
8386 @cindex plus sign (@code{+}), @code{++} operator
8395 @cindex @code{-} (hyphen), @code{--} operator
8396 @cindex hyphen (@code{-}), @code{--} operator
8445 In other words, it is up to the particular version of @command{awk}.
8452 @c You'll sleep better at night and be able to look at yourself
8468 constants @code{true} and @code{false}, or perhaps their uppercase
8474 string @code{""}) is false. The following program prints @samp{A strange
8490 the string constant @code{"0"} is actually true, because it is non-null.
8497 The Hitchhiker's Guide to the Galaxy
8515 upon the value that is assigned to them.
8522 like a number---for example, @code{@w{" +2"}}. This concept is used
8538 Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements,
8539 @code{ENVIRON} elements, and the
8540 elements of an array created by @code{split} that are numeric strings
8548 @c (Although a use may cause the entity to acquire an additional
8555 @code{a} has numeric type, even though it is later used in a string
8567 may be used. This depends upon the attributes of the operands, according to the
8570 @c thanks to Karl Berry, kb@cs.umb.edu, for major help with TeX tables
8581 % \strut -- a way to make each line have the height and depth
8597 % The \omit tells TeX to skip inserting the template for this column on
8599 % to separate the heading row from the rule below it. the depth 2pt --
8602 % This is the horizontal rule below the heading. Since it has nothing to
8603 % do with the columns of the table, we use \noalign to get it in there.
8631 Thus, for example, the string constant @w{@code{" +3.14"}}
8647 @cindex @code{<} (left angle bracket), @code{<} operator
8648 @cindex left angle bracket (@code{<}), @code{<} operator
8649 @cindex @code{<} (left angle bracket), @code{<=} operator
8650 @cindex left angle bracket (@code{<}), @code{<=} operator
8651 @cindex @code{>} (right angle bracket), @code{>=} operator
8652 @cindex right angle bracket (@code{>}), @code{>=} operator
8653 @cindex @code{>} (right angle bracket), @code{>} operator
8654 @cindex right angle bracket (@code{>}), @code{>} operator
8655 @cindex @code{=} (equals sign), @code{==} operator
8656 @cindex equals sign (@code{=}), @code{==} operator
8657 @cindex @code{!} (exclamation point), @code{!=} operator
8658 @cindex exclamation point (@code{!}), @code{!=} operator
8659 @cindex @code{~} (tilde), @code{~} operator
8660 @cindex tilde (@code{~}), @code{~} operator
8661 @cindex @code{!} (exclamation point), @code{!~} operator
8662 @cindex exclamation point (@code{!}), @code{!~} operator
8663 @cindex @code{in} operator
8664 @table @code
8669 True if @var{x} is less than or equal to @var{y}.
8675 True if @var{x} is greater than or equal to @var{y}.
8678 True if @var{x} is equal to @var{y}.
8681 True if @var{x} is not equal to @var{y}.
8695 to strings using the value of @code{CONVFMT}
8700 and so on. Thus, @code{"10"} is less than @code{"9"}. If there are two
8702 the longer one. Thus, @code{"abc"} is less than @code{"abcd"}.
8704 @cindex troubleshooting, @code{==} operator
8705 It is very easy to accidentally mistype the @samp{==} operator and
8707 code, but the program does not do what is intended:
8717 Unless @code{b} happens to be zero or the null string, the @code{if}
8719 so similar, this kind of error is very difficult to spot when
8720 scanning the source code.
8726 @table @code
8759 the result is @samp{false} because both @code{$1} and @code{$2}
8763 to attempt to produce the behavior that is ``least surprising,'' while
8773 has the value one, or is true if the variable @code{x}
8781 has the value one if @code{x} contains @samp{foo}, such as
8782 @code{"Oh, what a fool am I!"}.
8784 @cindex @code{~} (tilde), @code{~} operator
8785 @cindex tilde (@code{~}), @code{~} operator
8786 @cindex @code{!} (exclamation point), @code{!~} operator
8787 @cindex exclamation point (@code{!}), @code{!~} operator
8789 either a regexp constant (@code{/@dots{}/}) or an ordinary
8798 @code{/@var{regexp}/} is an abbreviation for the following comparison expression:
8804 One special place where @code{/foo/} is @emph{not} an abbreviation for
8831 parentheses to control nesting. The truth value of the Boolean expression is
8833 Boolean expressions are also referred to as @dfn{logical expressions}.
8837 expressions can be used. They can be used in @code{if}, @code{while},
8838 @code{do}, and @code{for} statements
8845 you can use one as a pattern to control the execution of rules.
8848 @table @code
8862 ($2 == bar++)}, the variable @code{bar} is not incremented if there is
8890 (The @code{in} operator is described in
8891 @ref{Reference to Elements}.)
8896 @cindex @code{&} (ampersand), @code{&&} operator
8897 @cindex ampersand (@code{&}), @code{&&} operator
8898 @cindex @code{|} (vertical bar), @code{||} operator
8899 @cindex vertical bar (@code{|}), @code{||} operator
8911 @cindex @code{!} (exclamation point), @code{!} operator
8912 @cindex exclamation point (@code{!}), @code{!} operator
8918 is applied to.
8920 variable from false to true and back again. For example, the following
8921 program is one way to print lines in between special bracketing lines:
8930 The variable @code{interested}, as with all @command{awk} variables, starts
8931 out initialized to zero, which is also false. When a line is seen whose
8932 first field is @samp{START}, the value of @code{interested} is toggled
8933 to true, using @samp{!}. The next rule prints lines as long as
8934 @code{interested} is true. When a line is seen whose first field is
8935 @samp{END}, @code{interested} is toggled back to false.
8939 bogus input data, but the point is to illustrate the use of `!',
8943 @cindex @code{next} statement
8944 @strong{Note:} The @code{next} statement is discussed in
8946 @code{next} tells @command{awk} to skip the rest of the rules, get the
8948 The reason it's there is to avoid printing the bracketing
8960 three operands. It allows you to use one expression's value to select
8975 For example, the following expression produces the absolute value of @code{x}:
8985 this conditional expression examines element @code{i} of either array
8986 @code{a} or array @code{b}, and increments @code{i}:
8993 This is guaranteed to increment @code{i} exactly once, because each time
9015 A @dfn{function} is a name for a particular calculation.
9016 This enables you to
9017 ask for it by name at any point in the program. For
9018 example, the function @code{sqrt} computes the square root of a number.
9022 available in every @command{awk} program. The @code{sqrt} function is one
9027 for instructions on how to do this.
9030 The way to use a function is with a @dfn{function call} expression,
9031 which consists of the function name followed immediately by a list of
9035 there are no arguments, just write @samp{()} after the function name.
9046 Do not put any space between the function name and the open-parenthesis!
9047 A user-defined function name looks just like the name of a
9052 it is best not to get into the habit of using space to avoid mistakes
9054 of arguments. For example, the @code{sqrt} function must be called with
9055 a single argument, the number of which to take the square root:
9066 are omitted in calls to user-defined functions, then those arguments are
9067 treated as local variables and initialized to the empty string
9075 values to certain variables or doing I/O.
9100 means to multiply @code{b} and @code{c}, and then add @code{a} to the
9105 parentheses are assumed to be. In
9106 fact, it is wise to always use parentheses whenever there is an unusual
9110 which leads to mistakes.
9121 unary operators are involved, because there is only one way to interpret
9129 to lowest precedence:
9131 @c use @code in the items, looks better in TeX w/o all the quotes
9132 @table @code
9136 @cindex @code{$} (dollar sign), @code{$} field operator
9137 @cindex dollar sign (@code{$}), @code{$} field operator
9141 @cindex @code{+} (plus sign), @code{++} operator
9142 @cindex plus sign (@code{+}), @code{++} operator
9143 @cindex @code{-} (hyphen), @code{--} (decrement/increment) operator
9144 @cindex hyphen (@code{-}), @code{--} (decrement/increment) operators
9148 @cindex @code{^} (caret), @code{^} operator
9149 @cindex caret (@code{^}), @code{^} operator
9150 @cindex @code{*} (asterisk), @code{**} operator
9151 @cindex asterisk (@code{*}), @code{**} operator
9153 Exponentiation. These operators group right-to-left.
9155 @cindex @code{+} (plus sign), @code{+} operator
9156 @cindex plus sign (@code{+}), @code{+} operator
9157 @cindex @code{-} (hyphen), @code{-} operator
9158 @cindex hyphen (@code{-}), @code{-} operator
9159 @cindex @code{!} (exclamation point), @code{!} operator
9160 @cindex exclamation point (@code{!}), @code{!} operator
9164 @cindex @code{*} (asterisk), @code{*} operator, as multiplication operator
9165 @cindex asterisk (@code{*}), @code{*} operator, as multiplication operator
9166 @cindex @code{/} (forward slash), @code{/} operator
9167 @cindex forward slash (@code{/}), @code{/} operator
9168 @cindex @code{%} (percent sign), @code{%} operator
9169 @cindex percent sign (@code{%}), @code{%} operator
9173 @cindex @code{+} (plus sign), @code{+} operator
9174 @cindex plus sign (@code{+}), @code{+} operator
9175 @cindex @code{-} (hyphen), @code{-} operator
9176 @cindex hyphen (@code{-}), @code{-} operator
9181 No special symbol is used to indicate concatenation.
9185 @cindex @code{<} (left angle bracket), @code{<} operator
9186 @cindex left angle bracket (@code{<}), @code{<} operator
9187 @cindex @code{<} (left angle bracket), @code{<=} operator
9188 @cindex left angle bracket (@code{<}), @code{<=} operator
9189 @cindex @code{>} (right angle bracket), @code{>=} operator
9190 @cindex right angle bracket (@code{>}), @code{>=} operator
9191 @cindex @code{>} (right angle bracket), @code{>} operator
9192 @cindex right angle bracket (@code{>}), @code{>} operator
9193 @cindex @code{=} (equals sign), @code{==} operator
9194 @cindex equals sign (@code{=}), @code{==} operator
9195 @cindex @code{!} (exclamation point), @code{!=} operator
9196 @cindex exclamation point (@code{!}), @code{!=} operator
9197 @cindex @code{>} (right angle bracket), @code{>>} operator (I/O)
9198 @cindex right angle bracket (@code{>}), @code{>>} operator (I/O)
9200 @cindex @code{|} (vertical bar), @code{|} operator (I/O)
9201 @cindex vertical bar (@code{|}), @code{|} operator (I/O)
9203 @cindex @code{|} (vertical bar), @code{|&} operator (I/O)
9204 @cindex vertical bar (@code{|}), @code{|&} operator (I/O)
9213 @cindex @code{print} statement, I/O operators in
9214 @cindex @code{printf} statement, I/O operators in
9215 Note that the I/O redirection operators in @code{print} and @code{printf}
9216 statements belong to the statement level, not to expressions. The
9218 another operator. As a result, it does not make sense to use a
9222 The correct way to write this statement is @samp{print foo > (a ? b : c)}.
9224 @cindex @code{~} (tilde), @code{~} operator
9225 @cindex tilde (@code{~}), @code{~} operator
9226 @cindex @code{!} (exclamation point), @code{!~} operator
9227 @cindex exclamation point (@code{!}), @code{!~} operator
9231 @cindex @code{in} operator
9235 @cindex @code{&} (ampersand), @code{&&} operator
9236 @cindex ampersand (@code{&}), @code{&&}operator
9240 @cindex @code{|} (vertical bar), @code{||} operator
9241 @cindex vertical bar (@code{|}), @code{||} operator
9245 @cindex @code{?} (question mark), @code{?:} operator
9246 @cindex question mark (@code{?}), @code{?:} operator
9248 Conditional. This operator groups right-to-left.
9250 @cindex @code{+} (plus sign), @code{+=} operator
9251 @cindex plus sign (@code{+}), @code{+=} operator
9252 @cindex @code{-} (hyphen), @code{-=} operator
9253 @cindex hyphen (@code{-}), @code{-=} operator
9254 @cindex @code{*} (asterisk), @code{*=} operator
9255 @cindex asterisk (@code{*}), @code{*=} operator
9256 @cindex @code{*} (asterisk), @code{**=} operator
9257 @cindex asterisk (@code{*}), @code{**=} operator
9258 @cindex @code{/} (forward slash), @code{/=} operator
9259 @cindex forward slash (@code{/}), @code{/=} operator
9260 @cindex @code{%} (percent sign), @code{%=} operator
9261 @cindex percent sign (@code{%}), @code{%=} operator
9262 @cindex @code{^} (caret), @code{^=} operator
9263 @cindex caret (@code{^}), @code{^=} operator
9266 Assignment. These operators group right to left.
9290 up to here has been the foundation
9291 that programs are built on top of. Now it's time to start
9296 * Using Shell Variables:: How to use shell variables with @command{awk}.
9319 @table @code
9338 Special patterns for you to supply startup or cleanup actions for your
9372 input record. If the expression uses fields such as @code{$1}, the
9385 slashes (@code{/@var{regexp}/}), or any expression whose string value
9391 @cindex @code{/} (forward slash), patterns and
9392 @cindex forward slash (@code{/}), patterns and
9393 @cindex @code{~} (tilde), @code{~} operator
9394 @cindex tilde (@code{~}), @code{~} operator
9395 @cindex @code{!} (exclamation point), @code{!~} operator
9396 @cindex exclamation point (@code{!}), @code{!~} operator
9402 (There is no output, because there is no BBS site with the exact name @samp{foo}.)
9417 pattern. The expression @code{/foo/} has the value one if @samp{foo}
9418 appears in the current input record. Thus, as a pattern, @code{/foo/}
9462 @cindex @code{BEGIN} pattern, Boolean patterns and
9463 @cindex @code{END} pattern, Boolean patterns and
9467 patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
9477 @cindex @code{,} (comma), in range patterns
9478 @cindex comma (@code{,}), in range patterns
9480 the form @samp{@var{begpat}, @var{endpat}}. It is used to match ranges of
9498 for the following record. Then the range pattern goes back to checking
9502 @cindex @code{if} statement, actions, changing
9504 off both match the range pattern. If you don't want to operate on
9505 these records, you can write @code{if} statements in the rule's action
9506 to distinguish them from the records you are interested in.
9508 It is possible for a pattern to be turned on and off by the same
9513 A first attempt would be to
9515 @code{next} statement
9517 This causes @command{awk} to skip any further processing of the current
9533 @cindex @code{!} operator
9541 program attempts to combine a range pattern with another, simpler test:
9561 @subsection The @code{BEGIN} and @code{END} Special Patterns
9564 @cindex @code{BEGIN} pattern
9566 @cindex @code{END} pattern
9568 The @code{BEGIN} and @code{END} special patterns are different.
9570 @code{BEGIN} and @code{END} rules must have actions; there is no default
9572 @code{BEGIN} and @code{END} rules are often referred to as
9573 ``@code{BEGIN} and @code{END} blocks'' by long-time @command{awk}
9577 * Using BEGIN/END:: How and why to use BEGIN/END rules.
9584 A @code{BEGIN} rule is executed once only, before the first input record
9585 is read. Likewise, an @code{END} rule is executed once only, after all the
9597 @cindex @code{BEGIN} pattern, operators and
9598 @cindex @code{END} pattern, operators and
9600 that contain the string @samp{foo}. The @code{BEGIN} rule prints a title
9601 for the report. There is no need to use the @code{BEGIN} rule to
9602 initialize the counter @code{n} to zero, since @command{awk} does this
9604 The second rule increments the variable @code{n} every time a
9605 record containing the pattern @samp{foo} is read. The @code{END} rule
9606 prints the value of @code{n} at the end of the run.
9608 The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
9610 An @command{awk} program may have multiple @code{BEGIN} and/or @code{END}
9611 rules. They are executed in the order in which they appear: all the @code{BEGIN}
9612 rules at startup and all the @code{END} rules at termination.
9613 @code{BEGIN} and @code{END} rules may be intermixed with other rules.
9617 required the @code{BEGIN} rule to be placed at the beginning of the
9618 program, the @code{END} rule to be placed at the end, and only allowed one of
9620 This is no longer required, but it is a good idea to follow this template
9623 Multiple @code{BEGIN} and @code{END} rules are useful for writing
9624 library functions, because each library file can have its own @code{BEGIN} and/or
9625 @code{END} rule to do its own initialization and/or cleanup.
9627 controls the order in which their @code{BEGIN} and @code{END} rules are
9628 executed. Therefore, you have to be careful when writing such rules in
9635 If an @command{awk} program has only a @code{BEGIN} rule and no
9636 other rules, then the program exits after the @code{BEGIN} rule is
9637 run.@footnote{The original version of @command{awk} used to keep
9639 @code{END} rule exists, then the input is read, even if there are
9640 no other rules in the program. This is necessary in case the @code{END}
9641 rule checks the @code{FNR} and @code{NR} variables.
9644 @subsubsection Input/Output from @code{BEGIN} and @code{END} Rules
9646 @cindex input/output, from @code{BEGIN} and @code{END}
9647 There are several (sometimes subtle) points to remember when doing I/O
9648 from a @code{BEGIN} or @code{END} rule.
9649 The first has to do with the value of @code{$0} in a @code{BEGIN}
9650 rule. Because @code{BEGIN} rules are executed before any input is read,
9652 executing @code{BEGIN} rules. References to @code{$0} and the fields
9654 to give @code{$0} a real value is to execute a @code{getline} command
9656 Another way is simply to assign a value to @code{$0}.
9658 @cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns
9659 @cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns
9660 @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and
9661 @cindex @code{BEGIN} pattern, @code{print} statement and
9662 @cindex @code{END} pattern, @code{print} statement and
9663 The second point is similar to the first but from the other direction.
9664 Traditionally, due largely to implementation issues, @code{$0} and
9665 @code{NF} were @emph{undefined} inside an @code{END} rule.
9666 The POSIX standard specifies that @code{NF} is available in an @code{END}
9668 Most probably due to an oversight, the standard does not say that @code{$0}
9670 In fact, @command{gawk} does preserve the value of @code{$0} for use in
9671 @code{END} rules. Be aware, however, that Unix @command{awk}, and possibly
9675 inside a @code{BEGIN} or @code{END} rule is the same as always:
9676 @samp{print $0}. If @code{$0} is the null string, then this prints an
9678 @samp{print} in @code{BEGIN} and @code{END} rules, to mean @samp{@w{print ""}},
9679 relying on @code{$0} being null. Although one might generally get away with
9680 this in @code{BEGIN} rules, it is a very bad idea in @code{END} rules,
9684 @cindex @code{next} statement, @code{BEGIN}/@code{END} patterns and
9685 @cindex @code{nextfile} statement, @code{BEGIN}/@code{END} patterns and
9686 @cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and
9687 @cindex @code{END} pattern, @code{next}/@code{nextfile} statements and
9688 Finally, the @code{next} and @code{nextfile} statements are not allowed
9689 in a @code{BEGIN} rule, because the implicit
9691 are not valid in an @code{END} rule, since all the input has been read.
9702 An empty (i.e., nonexistent) pattern is considered to match @emph{every}
9721 For example, it is very common to use a shell variable to
9723 There are two ways to get the value of the shell variable
9727 The most common method is to use shell quoting to substitute
9735 END @{ print nmatches, "found" @}' /path/to/data
9740 that are concatenated together to form the program.
9742 the @code{pattern} variable inside the quotes.
9748 and it's often difficult to correctly
9751 A better method is to use @command{awk}'s variable assignment feature
9753 to assign the shell variable's value to an @command{awk} variable's
9754 value. Then use dynamic regexps to match the pattern
9756 The following shows how to redo the
9763 END @{ print nmatches, "found" @}' /path/to/data
9769 in case there is whitespace in the value of @code{$pattern}.
9770 The @command{awk} variable @code{pat} could be named @code{pattern}
9788 both) may be omitted. The purpose of the @dfn{action} is to tell
9789 @command{awk} what to do once a match for the pattern is found. Thus,
9796 function @var{name}(@var{args}) @{ @dots{} @}
9800 @cindex @code{@{@}} (braces), actions and
9801 @cindex braces (@code{@{@}}), actions and
9804 @cindex @code{;} (semicolon), separating statements in actions
9805 @cindex semicolon (@code{;}), separating statements in actions
9808 thing to do. The statements are separated by newlines or semicolons.
9812 well. An omitted action is equivalent to @samp{@{ print $0 @}}:
9815 /foo/ @{ @} @i{match @code{foo}, do nothing --- empty action}
9816 /foo/ @i{match @code{foo}, print the record --- omitted action}
9824 Call functions or assign values to variables
9833 (@code{if}, @code{for}, @code{while}, and @code{do}) as well as a few
9838 curly braces. A compound statement is used in order to put several
9839 statements together in the body of an @code{if}, @code{while}, @code{do},
9840 or @code{for} statement.
9843 Use the @code{getline} command
9845 Also supplied in @command{awk} are the @code{next}
9847 and the @code{nextfile} statement
9851 Such as @code{print} and @code{printf}.
9868 @dfn{Control statements}, such as @code{if}, @code{while}, and so on,
9877 @cindex @code{@{@}} (braces), statements, grouping
9878 @cindex braces (@code{@{@}}), statements, grouping
9880 @cindex @code{;} (semicolon), separating statements in actions
9881 @cindex semicolon (@code{;}), separating statements in actions
9882 All the control statements start with special keywords, such as @code{if}
9883 and @code{while}, to distinguish them from simple expressions.
9885 @code{if} statement contains another statement that may or may not be
9902 * Continue Statement:: Skip to the end of the innermost enclosing
9910 @subsection The @code{if}-@code{else} Statement
9912 @cindex @code{if} statement
9913 The @code{if}-@code{else} statement is @command{awk}'s decision-making
9924 The @code{else} part of the statement is
9927 Refer to the following:
9937 if the value of @code{x} is evenly divisible by two), then the first
9938 @code{print} statement is executed; otherwise, the second @code{print}
9940 If the @code{else} keyword appears on the same line as @var{then-body} and
9943 the @code{else}.
9954 because a human reader might fail to see the @code{else} if it is not
9958 @subsection The @code{while} Statement
9959 @cindex @code{while} statement
9961 @cindex loops, See Also @code{while} statement
9965 The @code{while} statement is the simplest looping statement in
9979 The first thing the @code{while} statement does is test the @var{condition}.
10005 The loop works in the following manner: first, the value of @code{i} is set to one.
10006 Then, the @code{while} statement tests whether @code{i} is less than or equal to
10007 three. This is true when @code{i} equals one, so the @code{i}-th
10008 field is printed. Then the @samp{i++} increments the value of @code{i}
10009 and the loop repeats. The loop terminates when @code{i} reaches four.
10015 program is harder to read without it.
10018 @subsection The @code{do}-@code{while} Statement
10019 @cindex @code{do}-@code{while} statement
10021 The @code{do} loop is a variation of the @code{while} looping statement.
10022 The @code{do} loop executes the @var{body} once and then repeats the
10034 @code{while} statement:
10043 is false to begin with.
10044 The following is an example of a @code{do} statement:
10057 realistic example, since in this case an ordinary @code{while} would do
10059 occasionally is there a real use for a @code{do} statement.
10062 @subsection The @code{for} Statement
10063 @cindex @code{for} statement
10065 The @code{for} statement makes it more convenient to count iterations of a
10066 loop. The general form of the @code{for} statement looks like this:
10078 The @code{for} statement starts by executing @var{initialization}.
10081 @var{increment}. Typically, @var{initialization} sets a variable to
10082 either zero or one, @var{increment} adds one to it, and @var{condition}
10096 It isn't possible to
10100 are equal. (But it is possible to initialize additional variables by writing
10101 their assignments as separate statements preceding the @code{for} loop.)
10119 If there is nothing to be done, any of the three expressions in the
10120 parentheses following the @code{for} keyword may be omitted. Thus,
10121 @w{@samp{for (; x > 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the
10125 In most cases, a @code{for} loop is an abbreviation for a @code{while}
10136 @cindex loops, @code{continue} statements and
10138 The only exception is when the @code{continue} statement
10140 inside the loop. Changing a @code{for} statement to a @code{while}
10141 statement in this way can change the effect of the @code{continue}
10144 The @command{awk} language has a @code{for} statement in addition to a
10145 @code{while} statement because a @code{for} loop is often both less work to
10146 type and more natural to think of. Counting the number of iterations is
10147 very common in loops. It can be easier to think of this counting as part
10148 of looping rather than as something to do inside the loop.
10151 @cindex @code{in} operator
10152 There is an alternate version of the @code{for} loop, for iterating over
10162 for more information on this version of the @code{for} loop.
10166 @subsection The @code{switch} Statement
10167 @cindex @code{switch} statement
10168 @cindex @code{case} keyword
10169 @cindex @code{default} keyword
10173 enable it, use the @option{--enable-switch} option to @command{configure}
10178 The @code{switch} statement allows the evaluation of an expression and
10179 the execution of statements based on a @code{case} match. Case statements
10181 @code{case} is found, the @code{default} section is executed, if supplied. The
10182 general form of the @code{switch} statement looks like this:
10193 The @code{switch} statement works as it does in C. Once a match to a given
10194 case is made, case statement bodies are executed until a @code{break},
10195 @code{continue}, @code{next}, @code{nextfile} or @code{exit} is encountered,
10196 or the end of the @code{switch} statement itself. For example:
10217 of a matched @code{case} statement, execution falls through to the
10218 next @code{case} until execution halts. In the above example, for
10220 the @code{print} statement is executed and then falls through into the
10221 @code{default} section, executing its @code{print} statement. In turn,
10222 the @minus{}1 case will also be executed since the @code{default} does
10226 @subsection The @code{break} Statement
10227 @cindex @code{break} statement
10230 The @code{break} statement jumps out of the innermost @code{for},
10231 @code{while}, or @code{do} loop that encloses it. The following example
10249 When the remainder is zero in the first @code{if} statement, @command{awk}
10250 immediately @dfn{breaks out} of the containing @code{for} loop. This means
10251 that @command{awk} proceeds immediately to the statement following the loop
10252 and continues processing. (This is very different from the @code{exit}
10256 Th following program illustrates how the @var{condition} of a @code{for}
10257 or @code{while} statement could be replaced with a @code{break} inside
10258 an @code{if}:
10277 @c @cindex @code{break}, outside of loops
10280 @cindex POSIX @command{awk}, @code{break} statement and
10281 @cindex dark corner, @code{break} statement
10282 @cindex @command{gawk}, @code{break} statement in
10283 The @code{break} statement has no meaning when
10285 historical implementations of @command{awk} treated the @code{break}
10286 statement outside of a loop as if it were a @code{next} statement
10289 @command{gawk} supports this use of @code{break} only
10294 specifies that @code{break} should only be used inside the body of a
10299 @subsection The @code{continue} Statement
10301 @cindex @code{continue} statement
10302 As with @code{break}, the @code{continue} statement is used only inside
10303 @code{for}, @code{while}, and @code{do} loops. It skips
10305 to begin immediately. Contrast this with @code{break}, which jumps out
10308 The @code{continue} statement in a @code{for} loop directs @command{awk} to
10310 increment-expression of the @code{for} statement. The following program
10325 This program prints all the numbers from 0 to 20---except for 5, for
10326 which the @code{printf} is skipped. Because the increment @samp{x++}
10327 is not skipped, @code{x} does not remain stuck at 5. Contrast the
10328 @code{for} loop from the previous example with the following @code{while} loop:
10344 This program loops forever once @code{x} reaches 5.
10346 @c @cindex @code{continue}, outside of loops
10349 @cindex POSIX @command{awk}, @code{continue} statement and
10350 @cindex dark corner, @code{continue} statement
10351 @cindex @command{gawk}, @code{continue} statement in
10352 The @code{continue} statement has no meaning when used outside the body of
10353 a loop. Historical versions of @command{awk} treated a @code{continue}
10354 statement outside a loop the same way they treated a @code{break}
10355 statement outside a loop: as if it were a @code{next}
10361 @code{break} statement, the POSIX standard specifies that @code{continue}
10366 @subsection The @code{next} Statement
10367 @cindex @code{next} statement
10369 The @code{next} statement forces @command{awk} to immediately stop processing
10370 the current record and go on to the next record. This means that no
10374 Contrast this with the effect of the @code{getline} function
10376 @command{awk} to read the next record immediately, but it does not alter the
10383 think of this loop as a @code{for} statement whose body contains the
10384 rules, then the @code{next} statement is analogous to a @code{continue}
10385 statement. It skips to the end of the body of this implicit loop and
10402 Because of the @code{next} statement,
10404 message is redirected to the standard error output stream, as error
10410 @c @cindex @code{next}, inside a user-defined function
10411 @cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and
10412 @cindex @code{END} pattern, @code{next}/@code{nextfile} statements and
10413 @cindex POSIX @command{awk}, @code{next}/@code{nextfile} statements and
10414 @cindex @code{next} statement, user-defined functions and
10415 @cindex functions, user-defined, @code{next}/@code{nextfile} statements and
10416 According to the POSIX standard, the behavior is undefined if
10417 the @code{next} statement is used in a @code{BEGIN} or @code{END} rule.
10420 some other @command{awk} implementations don't allow the @code{next}
10423 Just as with any other @code{next} statement, a @code{next} statement inside a
10426 If the @code{next} statement causes the end of the input to be reached,
10427 then the code in any @code{END} rules is executed.
10431 @subsection Using @command{gawk}'s @code{nextfile} Statement
10432 @cindex @code{nextfile} statement
10433 @cindex differences in @command{awk} and @command{gawk}, @code{next}/@code{nextfile} statements
10435 @command{gawk} provides the @code{nextfile} statement,
10436 which is similar to the @code{next} statement.
10438 @code{nextfile} statement instructs @command{gawk} to stop processing the
10441 The @code{nextfile} statement is a @command{gawk} extension.
10445 @code{nextfile} is not special.
10447 Upon execution of the @code{nextfile} statement, @code{FILENAME} is
10448 updated to the name of the next @value{DF} listed on the command line,
10449 @code{FNR} is reset to one, @code{ARGIND} is incremented, and processing
10451 (@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.)
10452 If the @code{nextfile} statement causes the end of the input to be reached,
10453 then the code in any @code{END} rules is executed.
10456 The @code{nextfile} statement is useful when there are many @value{DF}s
10457 to process but it isn't necessary to process every record in every file.
10458 Normally, in order to move on to the next @value{DF}, a program
10459 has to continue scanning the unwanted records. The @code{nextfile}
10463 the same as @code{nextfile}, this isn't true. @code{close} is
10465 opened with redirections. It is not related to the main processing that
10466 @command{awk} does with the files listed in @code{ARGV}.
10468 If it's necessary to use an @command{awk} version that doesn't support
10469 @code{nextfile}, see
10471 for a user-defined function that simulates the @code{nextfile}
10474 @cindex functions, user-defined, @code{next}/@code{nextfile} statements and
10475 @cindex @code{nextfile} statement, user-defined functions and
10478 also supports @code{nextfile}. However, it doesn't allow the @code{nextfile}
10481 @command{gawk} does; a @code{nextfile} inside a
10483 first rule in the program, just as any other @code{nextfile} statement.
10485 @cindex @code{next file} statement, in @command{gawk}
10486 @cindex @command{gawk}, @code{next file} statement in
10487 @cindex @code{nextfile} statement, in @command{gawk}
10488 @cindex @command{gawk}, @code{nextfile} statement in
10489 @strong{Caution:} Versions of @command{gawk} prior to 3.0 used two
10490 words (@samp{next file}) for the @code{nextfile} statement.
10492 to one word, because the treatment of @samp{file} was
10493 inconsistent. When it appeared after @code{next}, @samp{file} was a keyword;
10498 @subsection The @code{exit} Statement
10500 @cindex @code{exit} statement
10501 The @code{exit} statement causes @command{awk} to immediately stop
10502 executing the current rule and to stop processing input; any remaining input
10503 is ignored. The @code{exit} statement is written as follows:
10506 exit @r{[}@var{return code}@r{]}
10509 @cindex @code{BEGIN} pattern, @code{exit} statement and
10510 @cindex @code{END} pattern, @code{exit} statement and
10511 When an @code{exit} statement is executed from a @code{BEGIN} rule, the
10513 read. However, if an @code{END} rule is present,
10514 as part of executing the @code{exit} statement,
10515 the @code{END} rule is executed
10517 If @code{exit} is used as part of an @code{END} rule, it causes
10518 the program to stop immediately.
10520 An @code{exit} statement that is not part of a @code{BEGIN} or @code{END}
10523 @code{END} rule if there is one.
10526 if you don't want the @code{END} rule to do its job, set a variable
10527 to nonzero before the @code{exit} statement and check that variable in
10528 the @code{END} rule.
10532 @cindex dark corner, @code{exit} statement
10533 If an argument is supplied to @code{exit}, its value is used as the exit
10534 status code for the @command{awk} process. If no argument is supplied,
10535 @code{exit} returns status zero (success). In the case where an argument
10536 is supplied to a first @code{exit} statement, and then @code{exit} is
10537 called a second time from an @code{END} rule with no argument,
10541 @cindex programming conventions, @code{exit} statement
10543 impossible to handle. Conventionally, programs report this by
10545 using an @code{exit} statement with a nonzero argument, as shown
10569 Most @command{awk} variables are available to use for your own
10570 purposes; they never change unless your program assigns values to
10574 to tell @command{awk} how to do certain things. Others are set
10576 internal workings of @command{awk} to your program.
10584 * User-modified:: Built-in variables that you change to control
10588 * ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
10598 The following is an alphabetical list of variables that you can change to
10600 specific to @command{gawk} are marked with a pound sign@w{ (@samp{#}).}
10602 @table @code
10603 @cindex @code{BINMODE} variable
10611 string values of @code{"r"} or @code{"w"} specify that input files and
10613 A string value of @code{"rw"} or @code{"wr"} indicates that all
10615 Any other string value is equivalent to @code{"rw"}, but @command{gawk}
10617 @code{BINMODE} is described in more detail in
10620 @cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable
10629 @cindex @code{CONVFMT} variable
10630 @cindex POSIX @command{awk}, @code{CONVFMT} variable and
10631 @cindex numbers, converting, to strings
10632 @cindex strings, converting, numbers to
10634 This string controls conversion of numbers to
10636 It works by being passed, in effect, as the first argument to the
10637 @code{sprintf} function
10639 Its default value is @code{"%.6g"}.
10640 @code{CONVFMT} was introduced by the POSIX standard.
10642 @cindex @code{FIELDWIDTHS} variable
10643 @cindex differences in @command{awk} and @command{gawk}, @code{FIELDWIDTHS} variable
10644 @cindex field separators, @code{FIELDWIDTHS} variable and
10645 @cindex separators, field, @code{FIELDWIDTHS} variable and
10648 how to split input with fixed columnar boundaries.
10649 Assigning a value to @code{FIELDWIDTHS}
10650 overrides the use of @code{FS} for field splitting.
10653 @cindex @command{gawk}, @code{FIELDWIDTHS} variable in
10655 (@pxref{Options}), then @code{FIELDWIDTHS}
10657 exclusively on the value of @code{FS}.
10659 @cindex @code{FS} variable
10667 record. If the value is the null string (@code{""}), then each
10670 specify the behavior when @code{FS} is the null string.)
10673 @cindex POSIX @command{awk}, @code{FS} variable and
10674 The default value is @w{@code{" "}}, a string consisting of a single
10678 spaces, tabs, and newlines at the beginning and end of a record to be ignored.
10680 You can set the value of @code{FS} on the command line using the
10688 If @command{gawk} is using @code{FIELDWIDTHS} for field splitting,
10689 assigning a value to @code{FS} causes @command{gawk} to return to
10690 the normal, @code{FS}-based field splitting. An easy way to do this
10691 is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
10693 @cindex @code{IGNORECASE} variable
10694 @cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable
10699 If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
10701 matching with @samp{~} and @samp{!~}, as well as the @code{gensub},
10702 @code{gsub}, @code{index}, @code{match}, @code{split}, and @code{sub}
10703 functions, record termination with @code{RS}, and field splitting with
10704 @code{FS}, all ignore case when doing their particular regexp operations.
10705 However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting
10710 @cindex @command{gawk}, @code{IGNORECASE} variable in
10713 then @code{IGNORECASE} has no special meaning. Thus, string
10716 @cindex @code{LINT} variable
10717 @cindex differences in @command{awk} and @command{gawk}, @code{LINT} variable
10723 With a value of @code{"fatal"}, lint warnings become fatal errors.
10724 With a value of @code{"invalid"}, only warnings about things that are
10727 Assigning a false value to @code{LINT} turns off the lint warnings.
10729 @cindex @command{gawk}, @code{LINT} variable in
10732 changing @code{LINT} does affect the production of lint warnings,
10739 @cindex @code{OFMT} variable
10740 @cindex numbers, converting, to strings
10741 @cindex strings, converting, numbers to
10743 This string controls conversion of numbers to
10745 printing with the @code{print} statement. It works by being passed
10746 as the first argument to the @code{sprintf} function
10748 Its default value is @code{"%.6g"}. Earlier versions of @command{awk}
10749 also used @code{OFMT} to specify the format for converting numbers to
10750 strings in general expressions; this is now done by @code{CONVFMT}.
10752 @cindex @code{sprintf} function, @code{OFMT} variable and
10753 @cindex @code{print} statement, @code{OFMT} variable and
10754 @cindex @code{OFS} variable
10759 output between the fields printed by a @code{print} statement. Its
10760 default value is @w{@code{" "}}, a string consisting of a single space.
10762 @cindex @code{ORS} variable
10765 @code{print} statement. Its default value is @code{"\n"}, the newline
10768 @cindex @code{RS} variable
10781 The ability for @code{RS} to be a regular expression
10786 just the first character of @code{RS}'s value is used.
10788 @cindex @code{SUBSEP} variable
10793 @code{"\034"} and is used to separate the parts of the indices of a
10794 multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}}
10795 really accesses @code{foo["A\034B"]}
10798 @cindex @code{TEXTDOMAIN} variable
10799 @cindex differences in @command{awk} and @command{gawk}, @code{TEXTDOMAIN} variable
10805 @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} functions
10807 The default value of @code{TEXTDOMAIN} is @code{"messages"}.
10828 sets automatically on certain occasions in order to provide
10829 information to your program. The variables that are specific to
10832 @table @code
10833 @cindex @code{ARGC}/@code{ARGV} variables
10837 The command-line arguments available to @command{awk} programs are stored in
10838 an array called @code{ARGV}. @code{ARGC} is the number of command-line
10841 @code{ARGV} is indexed from 0 to @code{ARGC} @minus{} 1.
10855 @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
10856 contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
10857 @code{"BBS-list"}. The value of @code{ARGC} is three, one more than the
10858 index of the last element in @code{ARGV}, because the elements are numbered
10861 @cindex programming conventions, @code{ARGC}/@code{ARGV} variables
10862 The names @code{ARGC} and @code{ARGV}, as well as the convention of indexing
10863 the array from 0 to @code{ARGC} @minus{} 1, are derived from the C language's
10866 The value of @code{ARGV[0]} can vary from system to system.
10868 @code{ARGV}, nor are any of @command{awk}'s command-line options.
10872 @cindex @code{ARGIND} variable
10873 @cindex differences in @command{awk} and @command{gawk}, @code{ARGIND} variable
10875 The index in @code{ARGV} of the current file being processed.
10877 @code{ARGIND} to the index in @code{ARGV} of the @value{FN}.
10882 @cindex files, processing, @code{ARGIND} variable and
10883 This variable is useful in file processing; it allows you to tell how far
10884 along you are in the list of @value{DF}s as well as to distinguish between
10888 While you can change the value of @code{ARGIND} within your @command{awk}
10889 program, @command{gawk} automatically sets it to a new value when the
10898 @cindex @code{ENVIRON} variable
10904 @code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array
10905 does not affect the environment passed on to any programs that
10906 @command{awk} may spawn via redirection or the @code{system} function.
10910 On such systems, the @code{ENVIRON} array is empty (except for
10911 @w{@code{ENVIRON["AWKPATH"]}},
10914 @cindex @code{ERRNO} variable
10915 @cindex differences in @command{awk} and @command{gawk}, @code{ERRNO} variable
10916 @cindex error handling, @code{ERRNO} variable and
10918 If a system error occurs during a redirection for @code{getline},
10919 during a read for @code{getline}, or during a @code{close} operation,
10920 then @code{ERRNO} contains a string describing the error.
10928 @cindex @code{FILENAME} variable
10929 @cindex dark corner, @code{FILENAME} variable
10931 The name of the file that @command{awk} is currently reading.
10933 from the standard input and @code{FILENAME} is set to @code{"-"}.
10934 @code{FILENAME} is changed each time a new file is read
10936 Inside a @code{BEGIN} rule, the value of @code{FILENAME} is
10937 @code{""}, since there are no input files being processed
10939 @code{FILENAME} to @code{"-"}, even if there were @value{DF}s to be
10943 Note, though, that using @code{getline}
10945 inside a @code{BEGIN} rule can give
10946 @code{FILENAME} a value.
10948 @cindex @code{FNR} variable
10950 The current record number in the current file. @code{FNR} is
10953 to zero each time a new input file is started.
10955 @cindex @code{NF} variable
10958 @code{NF} is set each time a new record is read, when a new field is
10959 created or when @code{$0} changes (@pxref{Fields}).
10968 assigning a value to @code{NF} has the potential to affect
10970 to @code{NF} can be used to create or remove fields from the
10973 @cindex @code{NR} variable
10978 @code{NR} is incremented each time a new record is read.
10980 @cindex @code{PROCINFO} array
10981 @cindex differences in @command{awk} and @command{gawk}, @code{PROCINFO} array
10983 The elements of this array provide access to information about the
10986 are guaranteed to be available:
10988 @table @code
10990 The value of the @code{getegid} system call.
10993 The value of the @code{geteuid} system call.
10997 @code{"FS"} if field splitting with @code{FS} is in effect, or it is
10998 @code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect.
11001 The value of the @code{getgid} system call.
11013 The value of the @code{getuid} system call.
11016 On some systems, there may be elements in the array, @code{"group1"}
11017 through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of
11018 supplementary groups that the process has. Use the @code{in} operator
11019 to test for these elements
11020 (@pxref{Reference to Elements}).
11028 @cindex @code{RLENGTH} variable
11031 @code{match} function
11033 @code{RLENGTH} is set by invoking the @code{match} function. Its value
11036 @cindex @code{RSTART} variable
11039 @code{match} function
11041 @code{RSTART} is set by invoking the @code{match} function. Its value
11045 @cindex @code{RT} variable
11046 @cindex differences in @command{awk} and @command{gawk}, @code{RT} variable
11049 that matched the text denoted by @code{RS}, the record separator.
11061 @subheading Advanced Notes: Changing @code{NR} and @code{FNR}
11062 @cindex @code{NR} variable, changing
11063 @cindex @code{FNR} variable, changing
11064 @cindex advanced features, @code{FNR}/@code{NR} variables
11065 @cindex dark corner, @code{FNR}/@code{NR} variables
11066 @command{awk} increments @code{NR} and @code{FNR}
11067 each time it reads a record, instead of setting them to the absolute
11087 Before @code{FNR} was added to the @command{awk} language
11089 many @command{awk} programs used this feature to track the number of
11090 records in a file by resetting @code{NR} to zero when @code{FILENAME}
11094 @subsection Using @code{ARGC} and @code{ARGV}
11095 @cindex @code{ARGC}/@code{ARGV} variables
11100 presented the following program describing the information contained in @code{ARGC}
11101 and @code{ARGV}:
11114 In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]}
11115 contains @samp{inventory-shipped}, and @code{ARGV[2]} contains
11117 Notice that the @command{awk} program is not entered in @code{ARGV}. The
11122 treated as arguments and do show up in the @code{ARGV} array:
11140 A program can alter @code{ARGC} and the elements of @code{ARGV}.
11142 element of @code{ARGV} as the name of the next input file. By storing a
11144 Use @code{"-"} to represent the standard input. Storing
11145 additional elements and incrementing @code{ARGC} causes
11146 additional files to be read.
11148 If the value of @code{ARGC} is decreased, that eliminates input files
11149 from the end of the list. By recording the old value of @code{ARGC}
11154 (@code{""}) into @code{ARGV} in place of the file's name. As a
11157 Another option is to
11158 use the @code{delete} statement to remove elements from
11159 @code{ARGV} (@pxref{Delete}).
11161 All of these actions are typically done in the @code{BEGIN} rule,
11165 of each way of removing elements from @code{ARGV}.
11166 The following fragment processes @code{ARGV} in order to examine, and
11168 @c NEXT ED: Add xref to rewind() function
11196 @cindex differences in @command{awk} and @command{gawk}, @code{ARGC}/@code{ARGV} variables
11199 into @code{ARGV} for the @command{awk} program to deal with. As soon
11211 are passed on to the @command{awk} program.
11223 how to use array elements, how to scan through every element in an array,
11224 and how to remove array elements.
11238 Thus, you cannot have a variable and an array with the same name in the
11242 * Array Intro:: Introduction to Arrays
11243 * Reference to Elements:: How to examine one element of an array.
11244 * Assigning Elements:: How to change an element of an array.
11246 * Scanning an Array:: A variation of the @code{for} statement. It
11249 * Delete:: The @code{delete} statement removes an element
11251 * Numeric Array Subscripts:: How to use numbers as subscripts in
11261 @section Introduction to Arrays
11265 Every @command{awk} array must have a name. Array names have the same
11266 syntax as variable names; any valid variable name would also be a valid
11267 array name. But one name cannot be used in both ways (as an array and
11272 isn't necessary to specify the size of an array before starting to use it.
11279 declaration causes a contiguous block of memory to be allocated for that
11284 first element, and so on. It is impossible to add more elements to the
11292 conceptually, if the element values are 8, @code{"foo"},
11293 @code{""}, and 30:
11304 \halign{\strut\hfil\ignorespaces#&&\vrule#&\hbox to\width{\hfil#\unskip\hfil}\cr
11357 at any time. For example, suppose a tenth element is added to the array
11358 whose value is @w{@code{"number ten"}}. The result is:
11375 have to be positive integers. Any number, or even a string, can be
11377 English to French:
11387 Here we decided to translate the number one in both spelled-out and
11393 Here, the number @code{1} isn't double-quoted, since @command{awk}
11394 automatically converts it to a string.
11397 @cindex arrays, @code{IGNORECASE} variable and
11398 @cindex @code{IGNORECASE} variable, array subscripts and
11399 The value of @code{IGNORECASE} has no effect upon array subscripting.
11400 The identical string value used to store an array element must be used
11401 to retrieve it.
11402 When @command{awk} creates an array (e.g., with the @code{split}
11407 @command{awk}'s arrays are efficient---the time to access an element
11412 @node Reference to Elements
11413 @section Referring to an Array Element
11417 The principal way to use an array is to refer to one of its elements.
11425 Here, @var{array} is the name of an array. The expression @var{index} is
11429 element. For example, @code{foo[4.3]} is an expression for the element
11430 of array @code{foo} at index @samp{4.3}.
11432 A reference to an array element that has no recorded value yields a value of
11433 @code{""}, the null string. This includes elements
11440 @c @cindex arrays, @code{in} operator and
11441 @cindex @code{in} operator, arrays and
11453 The expression has the value one (true) if @code{@var{array}[@var{index}]}
11455 For example, this statement tests whether the array @code{frequencies}
11464 @code{frequencies} contains an element whose @emph{value} is two.
11465 There is no way to do that except to scan all the elements. Also, this
11466 @emph{does not} create @code{frequencies[2]}, while the following
11487 @var{array} is the name of an array. The expression
11489 assigned a value. The expression @var{value} is the value to
11490 assign to that element of the array.
11520 it also stores each line into the array @code{arr}, at an index that
11522 The second rule runs after all the input has been read, to print out
11549 Gaps in the line numbers can be handled with an easy improvement to the
11550 program's @code{END} rule, as follows:
11565 In programs that use arrays, it is often necessary to use a loop that
11567 arrays are contiguous and indices are limited to positive integers,
11569 the lowest index up to the highest. This technique won't do the job
11571 So @command{awk} has a special kind of @code{for} statement for scanning
11580 @cindex @code{in} operator, arrays and
11582 program has previously used, with the variable @var{var} set to that index.
11584 @cindex arrays, @code{for} statement and
11585 @cindex @code{for} statement, in arrays
11586 The following program uses this form of the @code{for} statement. The
11588 least once) in the input, by storing a one into the array @code{used} with
11589 the word as index. The second rule scans the elements of @code{used} to
11594 for more information on the built-in function @code{length}.
11622 @command{awk} and cannot be controlled or changed. This can lead to
11623 problems if new elements are added to @var{array} by statements in
11624 the loop body; it is not predictable whether the @code{for} loop will
11626 strange results. It is best to avoid such things.
11629 @section The @code{delete} Statement
11630 @cindex @code{delete} statement
11635 To remove an individual element of an array, use the @code{delete}
11644 been referred to or had been given a value.
11653 This example removes all the elements from the array @code{frequencies}.
11654 Once an element is deleted, a subsequent @code{for} statement to scan the array
11655 does not report that element and the @code{in} operator to check for
11665 It is important to note that deleting an element is @emph{not} the
11666 same as assigning it a null value (the empty string, @code{""}).
11676 It is not an error to delete an element that does not exist.
11686 by leaving off the subscript in the @code{delete} statement,
11696 Using this version of the @code{delete} statement is about three times
11702 The following statement provides a portable but nonobvious way to clear
11703 out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
11710 @cindex @code{split} function, array elements, deleting
11711 The @code{split} function
11713 clears out the target array first. This call asks it to split
11714 apart the null string. Because there is no data to split out, the
11718 delete an array and then use the array's name as a scalar
11726 @section Using Numbers to Subscript Arrays
11731 @cindex @code{CONVFMT} variable, array subscripts and
11732 An important aspect about arrays to remember is that @emph{array subscripts
11734 it is converted to a string value before being used for subscripting
11736 This means that the value of the built-in variable @code{CONVFMT} can
11751 @code{xyz} a numeric value. Assigning to
11752 @code{data[xyz]} subscripts @code{data} with the string value @code{"12.153"}
11753 (using the default conversion value of @code{CONVFMT}, @code{"%.6g"}).
11754 Thus, the array element @code{data["12.153"]} is assigned the value one.
11756 the value of @code{CONVFMT}. The test @samp{(xyz in data)} generates a new
11757 string value from @code{xyz}---this time @code{"12.15"}---because the value of
11758 @code{CONVFMT} only allows two significant digits. This test fails,
11759 since @code{"12.15"} is a different string from @code{"12.153"}.
11762 According to the rules for conversions
11764 values are always converted to strings as integers, no matter what the
11765 value of @code{CONVFMT} may happen to be. So the usual case of
11773 The ``integer values always convert to strings as integers'' rule
11780 @code{array[17]},
11781 @code{array[021]},
11783 @code{array[0x11]}
11784 all refer to the same element!
11787 things work as one would expect them to. But it is useful to have a precise
11799 Suppose it's necessary to write a program
11800 to print the input data in reverse order.
11801 A reasonable attempt to do so (with some test
11819 At first glance, this program should have worked. The variable @code{lines}
11821 So, @command{awk} should have printed the value of @code{l[0]}.
11825 value @code{""}, not zero. Thus, @samp{line 1} ends up stored in
11826 @code{l[""]}.
11837 Here, the @samp{++} forces @code{lines} to be numeric, thus making
11838 the ``old value'' numeric zero. This is then converted to @code{"0"}
11845 (@code{""}) is a valid array subscript.
11859 languages, including @command{awk}) to refer to an element of a
11860 two-dimensional array named @code{grid} is with
11861 @code{grid[@var{x},@var{y}]}.
11863 @cindex @code{SUBSEP} variable, multidimensional arrays
11872 variable @code{SUBSEP}.
11875 when the value of @code{SUBSEP} is @code{"@@"}. The numbers 5 and 12 are
11876 converted to strings and
11877 concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus,
11878 the array element @code{foo["5@@12"]} is set to @code{"value"}.
11885 The default value of @code{SUBSEP} is the string @code{"\034"},
11886 which contains a nonprinting character that is unlikely to appear in an
11889 that index values that contain a string matching @code{SUBSEP} can lead to
11890 combined strings that are ambiguous. Suppose that @code{SUBSEP} is
11891 @code{"@@"}; then @w{@samp{foo["a@@b", "c"]}} and @w{@samp{foo["a",
11952 There is no special @code{for} statement for scanning a
11961 the scanning @code{for} statement
11963 built-in @code{split} function
11975 This sets the variable @code{combined} to
11978 @code{SUBSEP} appears. The individual indices then become the elements of
11979 the array @code{separate}.
11981 Thus, if a value is previously stored in @code{array[1, "foo"]}; then
11982 an element with index @code{"1\034foo"} exists in @code{array}. (Recall
11983 that the default value of @code{SUBSEP} is the character with code 034.)
11984 Sooner or later, the @code{for} statement finds that index and does an
11985 iteration with the variable @code{combined} set to @code{"1\034foo"}.
11986 Then the @code{split} function is called as follows:
11993 The result is to set @code{separate[1]} to @code{"1"} and
11994 @code{separate[2]} to @code{"foo"}. Presto! The original sequence of
12001 @cindex @code{asort} function (@command{gawk})
12003 @cindex @code{asort} function (@command{gawk}), arrays, sorting
12008 writing a @code{sort} function.
12011 @command{gawk} provides the built-in @code{asort}
12012 and @code{asorti} functions
12023 After the call to @code{asort}, the array @code{data} is indexed from 1
12024 to some number @var{n}, the total number of elements in @code{data}.
12025 (This count is @code{asort}'s return value.)
12026 @code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
12031 @cindex side effects, @code{asort} function
12032 An important side effect of calling @code{asort} is that
12034 As this isn't always desirable, @code{asort} accepts a
12044 In this case, @command{gawk} copies the @code{source} array into the
12045 @code{dest} array and then sorts @code{dest}, destroying its indices.
12046 However, the @code{source} array is not affected.
12048 Often, what's needed is to sort on the values of the @emph{indices}
12051 @code{asorti} function. The interface is identical to that of
12052 @code{asort}, except that the index values are used for sorting, and
12066 have @code{asorti}. Instead, use a helper array
12067 to hold the sorted index values, and then access the original array's
12085 @var{n} down to 1, either over the elements or over the indices.
12089 Internally, @command{gawk} maintains @dfn{reference counts} to data.
12090 For example, when @code{asort} copies the first array to the second one,
12093 @code{data} to @code{ind}, there is only one copy of the actual index
12097 @cindex arrays, sorting, @code{IGNORECASE} variable and
12098 @cindex @code{IGNORECASE} variable, array sorting and
12100 ``usual comparison rules.'' Because @code{IGNORECASE} affects
12101 string comparisons, the value of @code{IGNORECASE} also
12102 affects sorting for both @code{asort} and @code{asorti}.
12116 to work with values that represent time, do
12134 your @command{awk} program to call. This @value{SECTION} defines all
12140 * Calling Built-in:: How to call built-in functions.
12142 @code{int}, @code{sin} and @code{rand}.
12144 @code{split}, @code{match} and @code{sprintf}.
12154 To call one of @command{awk}'s built-in functions, write the name of
12157 is a call to the function @code{atan2} and has two arguments.
12162 Whitespace is ignored between the built-in function name and the
12163 open parenthesis, and it is good practice to avoid using whitespace
12165 it is easier to avoid mistakes by following a simple
12166 convention that always works---no whitespace after a function name.
12174 arguments vary from function to function and are described under the
12176 arguments given to built-in functions are ignored. However, in @command{gawk},
12177 it is a fatal error to give extra arguments to a built-in function.
12181 For example, in the following code fragment:
12192 the variable @code{i} is incremented to the value five before @code{sqrt}
12196 assume that parameters are evaluated from left to right or from
12197 right to left. For example:
12204 If the order of evaluation is left to right, then @code{i} first becomes
12205 6, and then 12, and @code{atan2} is called with the two arguments 6
12206 and 12. But if the order of evaluation is right to left, @code{i}
12207 first becomes 10, then 11, and @code{atan2} is called with the
12217 @table @code
12219 @cindex @code{int} function
12220 This returns the nearest integer to @var{x}, located between @var{x} and zero and
12223 For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
12224 is @minus{}3, and @code{int(-3)} is @minus{}3 as well.
12227 @cindex @code{sqrt} function
12230 if @var{x} is negative. Thus, @code{sqrt(4)} is 2.
12233 @cindex @code{exp} function
12234 This returns the exponential of @var{x} (@code{e ^ @var{x}}) or reports
12239 @cindex @code{log} function
12244 @cindex @code{sin} function
12248 @cindex @code{cos} function
12252 @cindex @code{atan2} function
12253 This returns the arctangent of @code{@var{y} / @var{x}} in radians.
12256 @cindex @code{rand} function
12257 @cindex random numbers, @code{rand}/@code{srand} functions
12258 This returns a random number. The values of @code{rand} are
12260 The value could be zero but is never one.@footnote{The C version of @code{rand}
12261 is known to produce fairly poor sequences of random numbers.
12263 @code{rand} to implement the @command{awk} version of @code{rand}.
12264 In fact, @command{gawk} uses the BSD @code{random} function, which is
12265 considerably better than @code{rand}, to produce random numbers.}
12268 that can be used to obtain a random non-negative integer less than @var{n}:
12278 than @code{n}. Using @code{int}, this result is made into
12279 an integer between zero and @code{n} @minus{} 1, inclusive.
12281 The following example uses a similar function to produce random integers
12286 # Function to roll a simulated die.
12301 @code{rand} starts generating numbers from the same
12305 from run to run. This is convenient for debugging, but if you want
12306 a program to do different things each time it is used, you must change
12307 the seed to a value that is different in each run. To do this,
12308 use @code{srand}.
12311 @cindex @code{srand} function
12312 The function @code{srand} sets the starting point, or seed,
12313 for generating random numbers to the value @var{x}.
12315 Each seed value leads to a particular sequence of random
12318 that while the numbers in a sequence appear to be random, you can in
12320 Thus, if the seed is set to the same value a second time,
12325 to produce the same series of random numbers when executed by
12329 date and time of day are used for a seed. This is the way to get random
12332 The return value of @code{srand} is the previous seed. This makes it
12333 easy to keep track of the seeds in case you need to consistently reproduce
12344 specific to @command{gawk} are marked with a pound sign@w{ (@samp{#}):}
12347 * Gory Details:: More than you want to know about @samp{\} and
12348 @samp{&} with @code{sub}, @code{gsub}, and
12349 @code{gensub}.
12352 @table @code
12355 @cindex @code{asort} function (@command{gawk})
12356 @code{asort} is a @command{gawk}-specific extension, returning the number of
12359 (in particular, @code{IGNORECASE} affects the sorting)
12365 For example, if the contents of @code{a} are as follows:
12374 A call to @code{asort}:
12381 results in the following contents of @code{a}:
12389 The @code{asort} function is described in more detail in
12391 @code{asort} is a @command{gawk} extension; it is not available
12395 @cindex @code{asorti} function (@command{gawk})
12396 @code{asorti} is a @command{gawk}-specific extension, returning the number of
12398 It works similarly to @code{asort}, however, the @emph{indices}
12401 @code{IGNORECASE} affects the sorting.)
12403 The @code{asorti} function is described in more detail in
12406 @code{asorti} is a @command{gawk} extension; it is not available
12410 @cindex @code{index} function
12422 If @var{find} is not found, @code{index} returns zero.
12426 @cindex @code{length} function
12429 that number is returned. For example, @code{length("abcde")} is 5. By
12430 contrast, @code{length(15 * 35)} works out to 3. In this example, 15 * 35 =
12431 525, and 525 is then converted to the string @code{"525"}, which has
12434 If no argument is supplied, @code{length} returns the length of @code{$0}.
12437 @cindex portability, @code{length} function
12438 @cindex POSIX @command{awk}, functions and, @code{length}
12440 In older versions of @command{awk}, the @code{length} function could
12445 version of the standard. Therefore, for programs to be maximally portable,
12449 @cindex @code{match} function
12450 The @code{match} function searches @var{string} for the
12458 In the latter case, the string is treated as a regexp to be matched.
12465 @code{sub} and @code{gsub}. It might help to remember that
12466 for @code{match}, the order is the same as for the @samp{~} operator:
12469 @cindex @code{RSTART} variable, @code{match} function and
12470 @cindex @code{RLENGTH} variable, @code{match} function and
12471 @cindex @code{match} function, @code{RSTART}/@code{RLENGTH} variables
12472 The @code{match} function sets the built-in variable @code{RSTART} to
12473 the index. It also sets the built-in variable @code{RLENGTH} to the
12475 @code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1.
12496 the variable @code{regex}. This regular expression can be changed. If the
12497 first word on a line is @samp{FIND}, @code{regex} is changed to be the
12520 @cindex differences in @command{awk} and @command{gawk}, @code{match} function
12522 of @var{array} is set to the entire portion of @var{string}
12524 the integer-indexed elements of @var{array} are set to contain the
12555 should be tested for with the @code{in} operator
12556 (@pxref{Reference to Elements}).
12558 @cindex troubleshooting, @code{match} function
12559 The @var{array} argument to @code{match} is a
12565 @cindex @code{split} function
12568 @code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
12570 a regexp describing where to split @var{string} (much as @code{FS} can
12571 be a regexp describing where to split input records). If
12572 @var{fieldsep} is omitted, the value of @code{FS} is used.
12573 @code{split} returns the number of elements created.
12575 The @code{split} function splits strings into pieces in a
12576 manner similar to the way input lines are split into fields. For example:
12585 separator. It sets the contents of the array @code{a} as follows:
12594 The value returned by this call to @code{split} is three.
12596 @cindex differences in @command{awk} and @command{gawk}, @code{split} function
12598 @w{@code{" "}}, leading and trailing whitespace is ignored, and the elements
12604 Note, however, that @code{RS} has no effect on the way @code{split}
12605 works. Even though @samp{RS = ""} causes newline to also be an input
12606 field separator, this does not affect how @code{split} splits strings.
12608 @cindex dark corner, @code{split} function
12610 the third argument to be a regexp constant (@code{/abc/}) as well as a
12618 Before splitting the string, @code{split} deletes any previously existing
12622 way to delete an entire array with one statement.
12630 @cindex @code{sprintf} function
12631 This returns (without printing) the string that @code{printf} would
12641 assigns the string @w{@code{"pi = 3.14 (approx.)"}} to the variable @code{pival}.
12643 @cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk})
12644 @cindex @code{strtonum} function (@command{gawk})
12647 begins with a leading @samp{0}, @code{strtonum} assumes that @var{str}
12649 @samp{0X}, @code{strtonum} assumes that @var{str} is a hexadecimal number.
12658 Using the @code{strtonum} function is @emph{not} the same as adding zero
12659 to a string value; the automatic coercion of strings to numbers
12664 @cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk})
12665 @code{strtonum} is a @command{gawk} extension; it is not available
12669 @cindex @code{sub} function
12670 The @code{sub} function alters the value of @var{target}.
12679 In the latter case, the string is treated as a regexp to be matched.
12685 used to compute a value, and not just any expression will do---it
12686 must be a variable, field, or array element so that @code{sub} can
12688 default is to use and alter @code{$0}.@footnote{Note that this means
12689 that the record will first be regenerated using the value of @code{OFS} if
12701 sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the
12704 The @code{sub} function returns the number of substitutions made (either
12717 changes the first occurrence of @samp{candidate} to @samp{candidate
12736 backslash before it in the string. As usual, to insert one backslash in
12738 in a string constant to include a literal @samp{&} in the replacement.
12739 For example, the following shows how to replace the first @samp{|} on each line with
12746 @cindex @code{sub} function, arguments of
12747 @cindex @code{gsub} function, arguments of
12748 As mentioned, the third argument to @code{sub} must
12750 Some versions of @command{awk} allow the third argument to
12751 be an expression that is not an lvalue. In such a case, @code{sub}
12754 to put it. Such versions of @command{awk} accept expressions
12762 @cindex troubleshooting, @code{gsub}/@code{sub} functions
12763 For historical compatibility, @command{gawk} accepts erroneous code,
12769 string, and then the value of that string is treated as the regexp to match.
12772 @cindex @code{gsub} function
12773 This is similar to the @code{sub} function, except @code{gsub} replaces
12775 substrings it can find. The @samp{g} in @code{gsub} stands for
12786 The @code{gsub} function returns the number of substitutions made. If
12787 the variable to search and alter (@var{target}) is
12788 omitted, then the entire input record (@code{$0}) is used.
12789 As in @code{sub}, the characters @samp{&} and @samp{\} are special,
12793 @cindex @code{gensub} function (@command{gawk})
12794 @code{gensub} is a general substitution function. Like @code{sub} and
12795 @code{gsub}, it searches the target string @var{target} for matches of
12796 the regular expression @var{regexp}. Unlike @code{sub} and @code{gsub},
12801 as a number that indicates which match of @var{regexp} to replace. If
12802 no @var{target} is supplied, @code{$0} is used.
12804 @code{gensub} provides an additional feature that is not available
12805 in @code{sub} or @code{gsub}: the ability to specify components of a
12807 the regexp to mark the components and then specifying @samp{\@var{N}}
12808 in the replacement text, where @var{N} is a digit from 1 to 9.
12822 As with @code{sub}, you must type two backslashes in order
12823 to get one into the string.
12827 The following example shows how you can use the third argument to control
12836 In this case, @code{$0} is used as the default target string.
12837 @code{gensub} returns the new string as its result, which is
12838 passed directly to @code{print} for printing.
12843 @samp{G}, or if it is a number that is less than or equal to zero, only one
12847 If @var{regexp} does not match @var{target}, @code{gensub}'s return value
12850 @code{gensub} is a @command{gawk} extension; it is not available
12854 @cindex @code{substr} function
12859 For example, @code{substr("washington", 5, 3)} returns @code{"ing"}.
12863 @code{substr("washington", 5)} returns @code{"ington"}. The whole
12868 If @var{start} is less than one, @code{substr} treats it as
12869 if it was one. (POSIX doesn't specify what to do in this case:
12873 in the string, @code{substr} returns the null string.
12874 Similarly, if @var{length} is present but less than or equal to zero,
12877 @cindex troubleshooting, @code{substr} function
12878 The string returned by @code{substr} @emph{cannot} be
12879 assigned. Thus, it is a mistake to attempt to change a portion of
12884 # try to get "abCDEf", won't work
12889 It is also a mistake to use @code{substr} as the third argument
12890 of @code{sub} or @code{gsub}:
12896 @cindex portability, @code{substr} function
12898 @code{substr} this way, but doing so is not portable.)
12900 If you need to replace bits and pieces of a string, combine @code{substr}
12912 @cindex @code{tolower} function
12916 @code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
12919 @cindex @code{toupper} function
12923 @code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
12927 @subsubsection More About @samp{\} and @samp{&} with @code{sub}, @code{gsub}, and @code{gensub}
12929 @cindex escape processing, @code{gsub}/@code{gensub}/@code{sub} functions
12930 @cindex @code{sub} function, escape processing
12931 @cindex @code{gsub} function, escape processing
12932 @cindex @code{gensub} function (@command{gawk}), escape processing
12933 @cindex @code{\} (backslash), @code{gsub}/@code{gensub}/@code{sub} functions and
12934 @cindex backslash (@code{\}), @code{gsub}/@code{gensub}/@code{sub} functions and
12935 @cindex @code{&} (ampersand), @code{gsub}/@code{gensub}/@code{sub} functions and
12936 @cindex ampersand (@code{&}), @code{gsub}/@code{gensub}/@code{sub} functions and
12937 When using @code{sub}, @code{gsub}, or @code{gensub}, and trying to get literal
12938 backslashes and ampersands into the replacement text, you need to remember
12945 replacement string to determine what to generate.
12955 example, @code{"a\qb"} is treated as @code{"aqb"}.
12959 Historically, the @code{sub} and @code{gsub} functions treated the two
12965 @c Thank to Karl Berry for help with the TeX stuff.
12973 You type!@code{sub} sees!@code{sub} generates@cr
12975 @code{\&}! @code{&}!the matched text@cr
12976 @code{\\&}! @code{\&}!a literal @samp{&}@cr
12977 @code{\\\&}! @code{\&}!a literal @samp{&}@cr
12978 @code{\\\\&}! @code{\\&}!a literal @samp{\&}@cr
12979 @code{\\\\\&}! @code{\\&}!a literal @samp{\&}@cr
12980 @code{\\\\\\&}! @code{\\\&}!a literal @samp{\\&}@cr
12981 @code{\\q}! @code{\q}!a literal @samp{\q}@cr
12987 You type @code{sub} sees @code{sub} generates
12989 @code{\&} @code{&} the matched text
12990 @code{\\&} @code{\&} a literal @samp{&}
12991 @code{\\\&} @code{\&} a literal @samp{&}
12992 @code{\\\\&} @code{\\&} a literal @samp{\&}
12993 @code{\\\\\&} @code{\\&} a literal @samp{\&}
12994 @code{\\\\\\&} @code{\\\&} a literal @samp{\\&}
12995 @code{\\q} @code{\q} a literal @samp{\q}
13002 as well as the runtime processing done by @code{sub}.
13006 The problem with the historical approach is that there is no way to get
13010 @cindex POSIX @command{awk}, functions and, @code{gsub}/@code{sub}
13011 The 1992 POSIX standard attempted to fix this problem. The standard
13012 says that @code{sub} and @code{gsub} look for either a @samp{\} or an @samp{&}
13016 @c thanks to Karl Berry for formatting this table
13024 You type!@code{sub} sees!@code{sub} generates@cr
13026 @code{&}! @code{&}!the matched text@cr
13027 @code{\\&}! @code{\&}!a literal @samp{&}@cr
13028 @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
13029 @code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
13035 You type @code{sub} sees @code{sub} generates
13037 @code{&} @code{&} the matched text
13038 @code{\\&} @code{\&} a literal @samp{&}
13039 @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text
13040 @code{\\\\\\&} @code{\\\&} a literal @samp{\&}
13045 This appears to solve the problem.
13049 such special meaning is undefined. This wording leads to two problems:
13065 reverts to rules that correspond more closely to the original existing
13067 to produce a @samp{\} preceding the matched text:
13076 You type!@code{sub} sees!@code{sub} generates@cr
13078 @code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
13079 @code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr
13080 @code{\\&}! @code{\&}!a literal @samp{&}@cr
13081 @code{\\q}! @code{\q}!a literal @samp{\q}@cr
13087 You type @code{sub} sees @code{sub} generates
13089 @code{\\\\\\&} @code{\\\&} a literal @samp{\&}
13090 @code{\\\\&} @code{\\&} a literal @samp{\}, followed by the matched text
13091 @code{\\&} @code{\&} a literal @samp{&}
13092 @code{\\q} @code{\q} a literal @samp{\q}
13102 @command{gawk} 3.0 and 3.1 follow these proposed POSIX rules for @code{sub} and
13103 @code{gsub}.
13111 However, it was too late to change @command{gawk} for the 3.1 release.
13114 The rules for @code{gensub} are considerably simpler. At the runtime
13128 You type!@code{gensub} sees!@code{gensub} generates@cr
13130 @code{&}! @code{&}!the matched text@cr
13131 @code{\\&}! @code{\&}!a literal @samp{&}@cr
13132 @code{\\\\}! @code{\\}!a literal @samp{\}@cr
13133 @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
13134 @code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
13135 @code{\\q}! @code{\q}!a literal @samp{q}@cr
13141 You type @code{gensub} sees @code{gensub} generates
13143 @code{&} @code{&} the matched text
13144 @code{\\&} @code{\&} a literal @samp{&}
13145 @code{\\\\} @code{\\} a literal @samp{\}
13146 @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text
13147 @code{\\\\\\&} @code{\\\&} a literal @samp{\&}
13148 @code{\\q} @code{\q} a literal @samp{q}
13153 and the special cases for @code{sub} and @code{gsub},
13154 we recommend the use of @command{gawk} and @code{gensub} when you have
13155 to do substitutions.
13164 @cindex @code{*} (asterisk), @code{*} operator, null strings, matching
13165 @cindex asterisk (@code{*}), @code{*} operator, null strings, matching
13168 This is particularly important for the @code{sub}, @code{gsub},
13169 and @code{gensub} functions. For example:
13182 The following functions relate to input/output (I/O).
13185 @table @code
13187 @cindex @code{close} function
13191 for redirecting to or from a pipe; then the coprocess or pipe is closed.
13195 When closing a coprocess, it is occasionally useful to first close
13196 one end of the two-way pipe and then to close the other. This is done
13197 by providing a second argument to @code{close}. This second argument
13198 should be one of the two string values @code{"to"} or @code{"from"},
13199 indicating which end of the pipe to close. Case in the string does
13205 @cindex @code{fflush} function
13207 file opened for writing or a shell command for redirecting output to
13210 @cindex portability, @code{fflush} function and
13214 to write to a disk file or terminal in memory until there is enough
13215 for it to be worthwhile to send the data to the output device.
13218 it is necessary to force a program to @dfn{flush} its buffers; that is,
13219 write the information to its destination, even if a buffer is not full.
13220 This is the purpose of the @code{fflush} function---@command{gawk} also
13221 buffers its output and the @code{fflush} function forces
13222 @command{gawk} to flush its buffers.
13224 @code{fflush} was added to the Bell Laboratories research
13229 @cindex @command{gawk}, @code{fflush} function in
13230 @command{gawk} extends the @code{fflush} function in two ways. The first
13231 is to allow no argument at all. In this case, the buffer for the
13232 standard output is flushed. The second is to allow the null string
13233 (@w{@code{""}}) as the argument. In this case, the buffers for
13238 @cindex troubleshooting, @code{fflush} function
13239 @code{fflush} returns zero if the buffer is successfully flushed;
13245 @command{gawk} also issues a warning message if you attempt to flush
13246 a file or pipe that was opened for reading (such as with @code{getline}),
13248 In such a case, @code{fflush} returns @minus{}1, as well.
13251 @cindex @code{system} function
13254 commands and then returns to the @command{awk} program. The @code{system}
13259 For example, if the following fragment of code is put in your @command{awk}
13272 Note that redirecting @code{print} or @code{printf} into a pipe is often
13273 enough to accomplish your task. If you need to run many commands, it
13274 is more efficient to simply print them down a pipeline to the shell:
13277 while (@var{more stuff to do})
13283 @cindex troubleshooting, @code{system} function
13285 program is interactive, @code{system} is useful for cranking up large
13287 Some operating systems cannot implement the @code{system} function.
13288 @code{system} causes a fatal error if it is not supported.
13300 to a terminal device.}
13302 @c Thanks to Walter.Mecky@dresdnerbank.de for this example, and for
13303 @c motivating me to write this section.
13333 it is all buffered and sent down the pipe to @command{cat} in one shot.
13336 @subheading Advanced Notes: Controlling Output Buffering with @code{system}
13342 The @code{fflush} function provides explicit control over output buffering for
13343 individual files and pipes. However, its use is not portable to many other
13344 @command{awk} implementations. An alternative method to flush output
13345 buffers is to call @code{system} with a null string as its argument:
13352 @command{gawk} treats this use of the @code{system} function as a special
13353 case and is smart enough not to run a shell (or other command
13362 @code{system} should flush any pending output. The following program:
13390 If @command{awk} did not flush its buffers before calling @code{system},
13406 @code{awk} programs are commonly used to process log files
13409 in the form returned by the @code{time} system call, which is the
13415 @math{2^31 - 1}, which is sufficient to represent times through
13422 In order to make it easier to process such log files and to produce
13431 @table @code
13433 @cindex @code{systime} function (@command{gawk})
13442 @cindex @code{mktime} function (@command{gawk})
13444 as is returned by @code{systime}. It is similar to the function of the
13445 same name in ISO C. The argument, @var{datespec}, is a string of the form
13446 @w{@code{"@var{YYYY} @var{MM} @var{DD} @var{HH} @var{MM} @var{SS} [@var{DST}]"}}.
13448 the full year including century, the month from 1 to 12, the day of the month
13449 from 1 to 31, the hour of the day from 0 to 23, the minute from 0 to
13450 59, the second from 0 to 60,@footnote{Occasionally there are
13452 seconds can go up to 60.}
13459 The time is assumed to be in the local timezone.
13460 If the daylight-savings flag is positive, the time is assumed to be
13461 daylight savings time; if zero, the time is assumed to be standard
13462 time; and if negative (the default), @code{mktime} attempts to determine
13466 is out of range, @code{mktime} returns @minus{}1.
13470 @cindex @code{strftime} function (@command{gawk})
13471 This function returns a string. It is similar to the function of the
13472 same name in ISO C. The time specified by @var{timestamp} is used to
13475 @code{systime} function. If no @var{timestamp} argument is supplied,
13477 If no @var{format} argument is supplied, @code{strftime} uses
13478 @code{@w{"%a %b %d %H:%M:%S %Z %Y"}}. This format string produces
13479 output that is (almost) equivalent to that of the @command{date} utility.
13480 (Versions of @command{gawk} prior to 3.0 require the @var{format} argument.)
13483 The @code{systime} function allows you to compare a timestamp from a
13484 log file with the current time of day. In particular, it is easy to
13486 you to produce log records using the ``seconds since the epoch'' format.
13488 @cindex converting, dates to timestamps
13489 @cindex dates, converting to timestamps
13490 @cindex timestamps, converting dates to
13491 The @code{mktime} function allows you to convert a textual representation
13492 of a date and time into a timestamp. This makes it easy to do before/after
13496 The @code{strftime} function allows you to easily turn a timestamp
13497 into human-readable information. It is similar in nature to the @code{sprintf}
13500 in that it copies nonformat specification characters verbatim to the
13504 @cindex format specifiers, @code{strftime} function (@command{gawk})
13505 @code{strftime} is guaranteed by the 1999 ISO C standard@footnote{As this
13506 is a recent standard, not every system's @code{strftime} necessarily
13508 to support the following date format specifications:
13510 @table @code
13512 The locale's abbreviated weekday name.
13515 The locale's full weekday name.
13518 The locale's abbreviated month name.
13521 The locale's full month name.
13525 (This is @samp{%A %B %d %T %Y} in the @code{"C"} locale.)
13528 The century. This is the year divided by 100 and truncated to the next
13535 Equivalent to specifying @samp{%m/%d/%y}.
13541 Equivalent to specifying @samp{%Y-%m-%d}.
13555 Equivalent to @samp{%b}.
13581 (This is @samp{%I:%M:%S %p} in the @code{"C"} locale.)
13584 Equivalent to specifying @samp{%H:%M}.
13593 Equivalent to specifying @samp{%H:%M:%S}.
13620 (This is @samp{%A %B %d %Y} in the @code{"C"} locale.)
13624 (This is @samp{%T} in the @code{"C"} locale.)
13635 The timezone offset in a +HHMM format (e.g., the format necessary to
13639 The time zone name or abbreviation; no characters if
13647 it; these facilities are meant to make it easier to ``internationalize''
13659 behavior of the C version of @code{strftime} undefined and @command{gawk}
13660 uses the system's version of @code{strftime} if it's there.
13666 is meant to run. For example, a common way to abbreviate the date
13669 Thus, the @samp{%x} specification in a @code{"US"} locale might produce
13670 @samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce
13671 @samp{4.9.91}. The ISO C standard defines a default @code{"C"}
13673 are used to.
13675 A public-domain C version of @code{strftime} is supplied with @command{gawk}
13679 used to compile @command{gawk} (@pxref{Installation}),
13682 @table @code
13692 The ``Emperor/Era'' name.
13693 Equivalent to @code{%C}.
13697 Equivalent to @code{%y}.
13710 @cindex @code{date} utility, POSIX
13711 @cindex POSIX @command{awk}, @code{date} utility and
13715 provide an argument to it that begins with a @samp{+}, @command{date}
13716 copies nonformat specifier characters to the standard output and
13717 interprets the current time according to the format specifiers in
13726 It has a shell ``wrapper'' to handle the @option{-u} option,
13728 is set to UTC:
13741 @c FIXME: One day, change %d to %e, when C 99 is common.
13779 Many languages provide the ability to perform @dfn{bitwise} operations
13840 Finally, two other common operations are to shift the bits left or right.
13844 always true, but in some languages, it's possible to have the left side
13846 @c Purposely decided to use 0's and 1's here. 2/2001.
13854 @table @code
13855 @cindex @code{and} function (@command{gawk})
13859 @cindex @code{or} function (@command{gawk})
13863 @cindex @code{xor} function (@command{gawk})
13867 @cindex @code{compl} function (@command{gawk})
13871 @cindex @code{lshift} function (@command{gawk})
13875 @cindex @code{rshift} function (@command{gawk})
13882 @multitable {@code{rshift(@var{val}, @var{count})}} {Return the value of @var{val}, shifted right b…
13883 @cindex @code{and} function (@command{gawk})
13884 @item @code{and(@var{v1}, @var{v2})}
13887 @cindex @code{or} function (@command{gawk})
13888 @item @code{or(@var{v1}, @var{v2})}
13891 @cindex @code{xor} function (@command{gawk})
13892 @item @code{xor(@var{v1}, @var{v2})}
13895 @cindex @code{compl} function (@command{gawk})
13896 @item @code{compl(@var{val})}
13899 @cindex @code{lshift} function (@command{gawk})
13900 @item @code{lshift(@var{val}, @var{count})}
13903 @cindex @code{rshift} function (@command{gawk})
13904 @item @code{rshift(@var{val}, @var{count})}
13909 converted to the widest C unsigned integer type, then the bitwise operation is
13910 performed and then the result is converted back into a C @code{double}. (If
13917 @cindex @code{bits2str} user-defined function
13918 @cindex @code{testbits.awk} program
13941 @c this is a hack to make testbits.awk self-contained
13990 @cindex numbers, converting, to strings
13991 @cindex strings, converting, numbers to
13992 @cindex converting, numbers, to strings
13993 The @code{bits2str} function turns a binary number into a string.
13994 The number @code{1} represents a binary value where the rightmost bit
13995 is set to 1. Using this mask,
13998 rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front
14000 Otherwise, a @code{"0"} is added.
14004 If the initial value is zero it returns a simple @code{"0"}.
14005 Otherwise, at the end, it pads the value with zeros to represent multiples
14008 The main code in the @code{BEGIN} rule shows the difference between the
14012 results of the @code{compl}, @code{lshift}, and @code{rshift} functions.
14033 @table @code
14034 @cindex @code{dcgettext} function (@command{gawk})
14038 The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14039 The default value for @var{category} is @code{"LC_MESSAGES"}.
14041 @cindex @code{dcngettext} function (@command{gawk})
14048 The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14049 The default value for @var{category} is @code{"LC_MESSAGES"}.
14051 @cindex @code{bindtextdomain} function (@command{gawk})
14053 This function allows you to specify the directory in which
14059 The default @var{domain} is the value of @code{TEXTDOMAIN}.
14060 If @var{directory} is the null string (@code{""}), then
14061 @code{bindtextdomain} returns the current binding for the
14076 built-in ones (@pxref{Function Calls}), but it is up to you to define
14077 them, i.e., to tell @command{awk} what they should do.
14080 * Definition Syntax:: How to write definitions and what they mean.
14083 * Function Caveats:: Things to watch out for.
14095 extended to include sequences of rules @emph{and} user-defined function
14097 There is no need to put the definition of a function
14099 entire program before starting to execute any of it.
14101 The definition of a function named @var{name} looks like this:
14105 function @var{name}(@var{parameter-list})
14115 @var{name} is the name of the function to define. A valid function
14116 name is like a valid variable name: a sequence of letters, digits, and
14118 Within a single @command{awk} program, any particular name can only be
14124 the argument names are used to hold the argument values given in
14125 the call. The local variables are initialized to the empty string.
14126 A function cannot have two parameters with the same name, nor may it
14127 have a parameter with the same name as the function itself.
14131 should actually @emph{do}. The argument names exist to give the body a
14132 way to talk about the arguments; local variables exist to give the body
14133 places to keep temporary values.
14142 to the function, some of the names in @var{parameter-list} may be
14144 way to think of this is that omitted arguments default to the
14148 Usually when you write a function, you know how many names you intend to
14149 use for arguments and how many you intend to use as local variables. It is
14150 conventional to place some extra space between the arguments and
14151 the local variables, in order to document how your function is supposed to be used.
14157 function definition, because there is no way to name them while their
14175 @cindex POSIX @command{awk}, @code{function} keyword in
14177 the keyword @code{function} may be
14178 abbreviated @code{func}. However, POSIX only specifies the use of
14179 the keyword @code{function}. This actually has some practical implications.
14200 keyword @code{function} when defining a function.
14205 Here is an example of a user-defined function, called @code{myprint}, that
14216 To illustrate, here is an @command{awk} rule that uses our @code{myprint}
14234 this program, using our function to format the results, prints:
14251 When working with arrays, it is often necessary to delete all the elements
14255 to repeat this loop everywhere that you need to clear out
14256 an array, your program can just call @code{delarray}.
14257 (This guarantees portability. The use of @samp{delete @var{array}} to delete
14266 @cindex @code{rev} user-defined function
14286 The C @code{ctime} function takes a timestamp and returns it in a string,
14288 The following example uses the built-in @code{strftime} function
14290 to create an @command{awk} version of @code{ctime}:
14292 @cindex @code{ctime} user-defined function
14293 @c FIXME: One day, change %d to %e, when C 99 is common.
14316 @dfn{Calling a function} means causing the function to run and do its job.
14320 A function call consists of the function name followed by the arguments
14324 example, here is a call to @code{foo} with three arguments (the first
14332 between the function name and the open-parenthesis of the argument list.
14334 to concatenate a variable with an expression in parentheses. However, it
14335 notices that you used a function name and not a variable name, and reports
14343 example, if you write the following code:
14351 then you should not think of the argument to @code{myfunc} as being
14352 ``the variable @code{foo}.'' Instead, think of the argument as the
14353 string value @code{"bar"}.
14354 If the function @code{myfunc} alters the values of its local variables,
14355 this has no effect on any other variables. Thus, if @code{myfunc}
14368 to change its first argument variable @code{str}, it does @emph{not}
14369 change the value of @code{foo} in the caller. The role of @code{foo} in
14370 calling @code{myfunc} ended when its value (@code{"bar"}) was computed.
14371 If @code{str} also exists outside of @code{myfunc}, the function body
14373 execution of @code{myfunc} and cannot be seen or changed from there.
14376 @cindex arrays, as parameters to functions
14377 @cindex functions, arrays as parameters to
14378 However, when arrays are the parameters to functions, they are @emph{not}
14381 Changes made to an array parameter inside the body of a function @emph{are}
14404 @code{changeit} stores @code{"two"} in the second element of @code{a}.
14408 Some @command{awk} implementations allow you to call a function that
14410 program actually tries to call the function. For example:
14425 problem that @code{foo} has not been defined. Usually, though, it is a
14431 @command{gawk} reports calls to undefined functions.
14433 @cindex portability, @code{next} statement in user-defined functions
14435 error if you use the @code{next} statement
14442 @subsection The @code{return} Statement
14444 @cindex @code{return} statement, user-defined functions
14446 The body of a user-defined function can contain a @code{return} statement.
14447 This statement returns control to the calling part of the @command{awk} program. It
14448 can also be used to return a value for use in the rest of the @command{awk}
14458 A @code{return} statement with no value expression is assumed at the end of
14463 Sometimes, you want to write a function for what it does, not for
14464 what it returns. Such a function corresponds to a @code{void} function
14465 in C or to a @code{procedure} in Pascal. Thus, it may be appropriate to not
14485 You call @code{maxelt} with one argument, which is an array name. The local
14486 variables @code{i} and @code{ret} are not intended to be arguments;
14487 while there is nothing to stop you from passing more than one argument
14488 to @code{maxelt}, the results would be strange. The extra space before
14489 @code{i} in the function parameter list indicates that @code{i} and
14490 @code{ret} are not supposed to be arguments.
14493 The following program uses the @code{maxelt} function. It loads an
14494 array, calls @code{maxelt}, and then reports the maximum number in that
14529 the program reports (predictably) that @code{99385} is the largest number
14567 countries, they were able to sell more systems.
14575 Until recently, the ability to provide internationalization
14576 was largely restricted to programs written in C and C++.
14583 longer required to write in C when internationalization is
14588 * Explaining gettext:: How GNU @code{gettext} works.
14604 further source-code changes.
14606 internationalized program to work in a particular language.
14607 Most typically, these terms refer to features such as the language
14608 used for printing error messages, the language used to read
14609 responses, and information related to how numerical and
14613 @section GNU @code{gettext}
14617 @cindex @code{gettext} library
14618 The facilities in GNU @code{gettext} focus on messages; strings printed
14619 by a program, either directly or via formatting with @code{printf} or
14620 @code{sprintf}.@footnote{For some operating systems, the @command{gawk}
14621 port doesn't support GNU @code{gettext}. This applies most notably to
14625 @cindex portability, @code{gettext} library and
14626 When using GNU @code{gettext}, each application has its own
14627 @dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk},
14642 For example, @code{"`-F': option required"} is a good candidate for translation.
14647 @cindex @code{textdomain} function (C library)
14650 (@code{"guide"}) to the @code{gettext} library,
14651 by calling the @code{textdomain} function.
14654 Messages from the application are extracted from the source code and
14655 collected into a portable object file (@file{guide.po}),
14661 @cindex @code{.po} files
14662 @cindex files, @code{.po}
14666 For each language with a translator, @file{guide.po}
14669 @cindex @code{.mo} files
14670 @cindex files, @code{.mo}
14674 Each language's @file{.po} file is converted into a binary
14684 @cindex @code{bindtextdomain} function (C library)
14686 For testing and development, it is possible to tell @code{gettext}
14687 to use @file{.mo} files in a different directory than the standard
14688 one by using the @code{bindtextdomain} function.
14690 @cindex @code{.mo} files, specifying directory of
14691 @cindex files, @code{.mo}, specifying directory of
14696 to @code{gettext}. The returned string is the translated string
14700 If necessary, it is possible to access messages from a different
14701 text domain than the one belonging to the application, without
14702 having to switch the application's default text domain back
14706 @cindex @code{gettext} function (C library)
14708 are accomplished by wrapping each string in a call to @code{gettext}:
14714 The tools that extract messages from source code pull out all
14715 strings enclosed in calls to @code{gettext}.
14717 @cindex @code{_} (underscore), @code{_} C macro
14718 @cindex underscore (@code{_}), @code{_} C macro
14719 The GNU @code{gettext} developers, recognizing that typing
14720 @samp{gettext} over and over again is both painful and ugly to look
14721 at, use the macro @samp{_} (an underscore) to make things easier:
14732 @cindex @code{gettext} library, locale categories
14735 This reduces the typing overhead to just three extra characters per string
14736 and is considerably easier to read as well.
14739 The defined locale categories that @code{gettext} knows about are:
14741 @table @code
14742 @cindex @code{LC_MESSAGES} locale category
14744 Text messages. This is the default category for @code{gettext}
14745 operations, but it is possible to supply a different one explicitly,
14746 if necessary. (It is almost never necessary to supply a different category.)
14749 @cindex @code{LC_COLLATE} locale category
14754 @cindex @code{LC_CTYPE} locale category
14760 such as @code{/[[:alnum:]]/}
14765 @cindex @code{LC_MONETARY} locale category
14770 @cindex @code{LC_NUMERIC} locale category
14772 Numeric information, such as which characters to use for the decimal
14776 @code{1,234.56} versus @code{1.234,56}.}
14778 @cindex @code{LC_RESPONSE} locale category
14785 @cindex dates, information related to, localization
14786 @cindex @code{LC_TIME} locale category
14791 @cindex @code{LC_ALL} locale category
14793 All of the above. (Not too useful in the context of @code{gettext}.)
14805 @table @code
14806 @cindex @code{TEXTDOMAIN} variable
14809 For compatibility with GNU @code{gettext}, the default
14810 value is @code{"messages"}.
14819 @cindex @code{dcgettext} function (@command{gawk})
14823 The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14824 The default value for @var{category} is @code{"LC_MESSAGES"}.
14826 If you supply a value for @var{category}, it must be a string equal to
14834 You must also supply a text domain. Use @code{TEXTDOMAIN} if
14835 you want to use the current domain.
14837 @strong{Caution:} The order of arguments to the @command{awk} version
14838 of the @code{dcgettext} function is purposely different from the order for
14840 chosen to be simple and to allow for reasonable @command{awk}-style
14843 @cindex @code{dcngettext} function (@command{gawk})
14850 The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14851 The default value for @var{category} is @code{"LC_MESSAGES"}.
14853 The same remarks as for the @code{dcgettext} function apply.
14855 @cindex @code{.mo} files, specifying directory of
14856 @cindex files, @code{.mo}, specifying directory of
14859 @cindex @code{bindtextdomain} function (@command{gawk})
14861 This built-in function allows you to specify the directory in which
14862 @code{gettext} looks for @file{.mo} files, in case they
14867 The default @var{domain} is the value of @code{TEXTDOMAIN}.
14868 If @var{directory} is the null string (@code{""}), then
14869 @code{bindtextdomain} returns the current binding for the
14884 @cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
14885 @cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
14887 Set the variable @code{TEXTDOMAIN} to the text domain of
14888 your program. This is best done in a @code{BEGIN} rule
14900 @cindex @code{_} (underscore), translatable string
14901 @cindex underscore (@code{_}), translatable string
14904 character. It @emph{must} be adjacent to the opening
14915 still translate them, using the @code{dcgettext}
14924 Here, the call to @code{dcgettext} supplies a different
14925 text domain (@code{"adminprog"}) in which to find the
14926 message, but it uses the default @code{"LC_MESSAGES"} category.
14928 @cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain} function (@command{gawk})
14930 During development, you might want to put the @file{.mo}
14932 with the @code{bindtextdomain} built-in function:
14938 # where to find our files
14950 for an example program showing the steps to create
14956 @cindex @code{.po} files
14957 @cindex files, @code{.po}
14961 be extracted to create the initial @file{.po} file.
14962 As part of translation, it is often helpful to rearrange the order
14963 in which arguments to @code{printf} are output.
14965 @command{gawk}'s @option{--gen-po} command-line option extracts
14967 After that, @code{printf}'s ability to
14968 rearrange the order for @code{printf} arguments at runtime
14973 * Printf Ordering:: Rearranging @code{printf} arguments.
14982 @cindex @code{--gen-po} option
14988 @cindex @code{--gen-po} option
14991 it is time to produce translations.
14992 First, use the @option{--gen-po} command-line option to create
14993 the initial @file{.po} file:
14996 $ gawk --gen-po -f guide.awk > guide.po
14999 @cindex @code{xgettext} utility
15000 When run with @option{--gen-po}, @command{gawk} does not execute your
15002 to standard output in the format of a GNU @code{gettext} Portable Object
15004 appear as the first argument to @code{dcgettext} or as the first and
15005 second argument to @code{dcngettext}.@footnote{Starting with @code{gettext}
15007 @code{gettext} can handle @file{.awk} files.}
15009 for the full list of steps to go through to create and test
15013 @subsection Rearranging @code{printf} Arguments
15015 @cindex @code{printf} statement, positional specifiers
15017 @cindex positional specifiers, @code{printf} statement
15018 Format strings for @code{printf} and @code{sprintf}
15022 from the GNU @code{gettext} manual.}
15038 Even though @code{gettext} can return the translated string
15040 it cannot change the argument order in the call to @code{printf}.
15042 To solve this problem, @code{printf} format specificiers may have
15051 argument to use, and a @samp{$}. Counts are one-based, and the
15084 @cindex @code{printf} statement, positional specifiers, mixing with regular formats
15086 @cindex positional specifiers, @code{printf} statement, mixing with regular formats
15088 @command{gawk} does not allow you to mix regular format specifiers
15096 @strong{Note:} There are some pathological cases that @command{gawk} may fail to
15098 It's still a bad idea to try mixing them, even if @command{gawk}
15102 their primary purpose is to help in producing correct translations of
15111 @command{gawk}'s internationalization features were purposely chosen to
15113 programs that use them to other versions of @command{awk}.
15131 @cindex @code{TEXTDOMAIN} variable, portability and
15133 Assignments to @code{TEXTDOMAIN} won't have any effect,
15134 since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
15138 as the concatenation of a variable named @code{_} with the string
15140 @command{awk}'' contest.} Typically, the variable @code{_} has
15141 the null string (@code{""}) as its value, leaving the original string constant as
15145 By defining ``dummy'' functions to replace @code{dcgettext}, @code{dcngettext}
15146 and @code{bindtextdomain}, the @command{awk} program can be made to run, but
15150 @cindex @code{bindtextdomain} function (@command{gawk}), portability and
15151 @cindex @code{dcgettext} function (@command{gawk}), portability and
15152 @cindex @code{dcngettext} function (@command{gawk}), portability and
15173 The use of positional specifications in @code{printf} or
15174 @code{sprintf} is @emph{not} portable.
15175 To support @code{gettext} at the C level, many systems' C versions of
15176 @code{sprintf} do support positional specifiers. But it works only if
15178 @command{awk} pass @code{printf} formats and arguments unchanged to the
15179 underlying C library version of @code{sprintf}, but only one format and
15191 Now let's look at a step-by-step example of how to internationalize and
15208 Run @samp{gawk --gen-po} to create the @file{.po} file:
15211 $ gawk --gen-po -f guide.awk > guide.po
15218 @c file eg/data/guide.po
15231 into which the application is translated. The @code{msgid}
15232 is the original string and the @code{msgstr} is the translation.
15235 appear in the @file{guide.po} file.
15238 Here is a translation to a hypothetical dialect of English,
15244 $ cp guide.po guide-mellow.po
15245 @var{Add translations to} guide-mellow.po @dots{}
15253 @c file eg/data/guide-mellow.po
15267 The next step is to make the directory to hold the binary message object
15268 file and then to create the @file{guide.mo} file.
15269 The directory layout shown here is standard for GNU @code{gettext} on
15270 GNU/Linux systems. Other versions of @code{gettext} may use a different
15277 @cindex @code{.po} files, converting to @code{.mo}
15278 @cindex files, @code{.po}, converting to @code{.mo}
15279 @cindex @code{.mo} files, converting from @code{.po}
15280 @cindex files, @code{.mo}, converting from @code{.po}
15281 @cindex portable object files, converting to message object files
15282 @cindex files, portable object, converting to message object files
15287 @file{.po} file to machine-readable @file{.mo} file.
15293 $ msgfmt guide-mellow.po
15297 Finally, we run the program to test it:
15306 If the three replacement functions for @code{dcgettext}, @code{dcngettext}
15307 and @code{bindtextdomain}
15323 using the GNU @code{gettext} package.
15325 (GNU @code{gettext} is described in
15330 (GNU @code{gettext} is described in
15334 As of this writing, the latest version of GNU @code{gettext} is
15341 @cindex @code{--with-included-gettext} configuration option
15342 @cindex configuration option, @code{--with-included-gettext}
15373 to each other.
15374 First, a command-line option allows @command{gawk} to recognize
15379 can @dfn{profile} an @command{awk} program, making it possible to tune
15383 discusses the ability to dynamically add new built-in functions to
15384 @command{gawk}. As this feature is still immature and likely to change,
15385 its description is relegated to an appendix.
15397 @cindex @code{--non-decimal-data} option
15414 For this feature to work, write your program so that
15423 The @code{print} statement treats its expressions as strings.
15425 they are still strings, so @code{print} does not try to treat them
15426 numerically. You may need to add zero to a field to force it to
15437 Because it is common to have decimal data with leading zeros, and because
15438 using it could lead to surprising results, the default is to leave this
15441 @cindex programming conventions, @code{--non-decimal-data} option
15442 @cindex @code{--non-decimal-data} option, @code{strtonum} function and
15443 @cindex @code{strtonum} function (@command{gawk}), @code{--non-decimal-data} option and
15447 Instead, use the @code{strtonum} function to convert your data
15449 This makes your programs easier to write and easier to read, and
15450 leads to less surprising results.
15460 Subject: Re: Learn the SECRET to Attract Women Easily
15473 >Learn the SECRET to Attract Women Easily
15475 >The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
15477 The scent of awk programmers is a lot more attractive to women than
15487 It is often useful to be able to
15488 send data to a separate program for
15510 to be using a temporary file with the same name.
15514 @cindex @code{|} (vertical bar), @code{|&} operator (I/O)
15515 @cindex vertical bar (@code{|}), @code{|&} I/O operator (I/O)
15516 @cindex @command{csh} utility, @code{|&} operator, comparison with
15517 Starting with @value{PVERSION} 3.1 of @command{gawk}, it is possible to
15518 open a @emph{two-way} pipe to another process. The second process is
15528 @} while (@var{data left to process})
15533 operator, @command{gawk} creates a two-way pipeline to a child process
15534 that runs the other program. Output created with @code{print}
15535 or @code{printf} is written to the program's standard input, and
15537 program using @code{getline}.
15542 There are some cautionary items to be aware of:
15546 As the code inside @command{gawk} currently stands, the coprocess's
15547 standard error goes to the same place that the parent @command{gawk}'s
15548 standard error goes. It is not possible to read the child's
15553 @cindex @code{getline} command, deadlock and
15556 flushes all output down the pipe to the child process.
15558 @command{gawk} may hang when doing a @code{getline} in order to read
15559 the coprocess's results. This could lead to a situation
15561 other one to do something.
15564 @cindex @code{close} function, two-way pipes and
15565 It is possible to close just one end of the two-way pipe to
15566 a coprocess, by supplying a second argument to the @code{close}
15567 function of either @code{"to"} or @code{"from"}
15569 These strings tell @command{gawk} to close the end of the pipe
15570 that sends data to the process or the end that reads from it,
15574 This is particularly necessary in order to use
15581 When you have finished writing data to the @command{sort}
15582 utility, you can close the @code{"to"} end of the pipe, and
15583 then start reading sorted data via @code{getline}.
15593 close(command, "to")
15602 per line, down the two-way pipe to @command{sort}. It then closes the
15604 indication. This causes @command{sort} to sort the data and write the
15605 sorted data back to the @command{gawk} program. Once all of the data
15614 in the @code{PROCINFO} array
15628 system's ptys are in use, @command{gawk} automatically falls back to
15637 @cindex @code{/inet/} files (@command{gawk})
15638 @cindex files, @code{/inet/} (@command{gawk})
15639 @cindex @code{EMISTERED}
15641 @code{EMISTERED}: @i{A host is a host from coast to coast,@*
15642 and no-one can talk to host that's close,@*
15647 In addition to being able to open a two-way pipeline to a coprocess
15650 it is possible to make a two-way connection to
15653 You can think of this as just a @emph{very long} two-way pipeline to
15655 The way @command{gawk} decides that you want to use TCP/IP networking is
15664 The protocol to use over IP. This must be either @samp{tcp},
15674 @cindex @code{getservbyname} function (C library)
15675 The local TCP or UDP port number to use. Use a port number of @samp{0}
15676 when you want the system to pick a port. This is what you should do
15678 You may also use a well-known service name, such as @samp{smtp}
15679 or @samp{http}, in which case @command{gawk} attempts to determine
15680 the predefined port number using the C @code{getservbyname} function.
15683 The IP address or fully-qualified domain name of the Internet
15684 host to which you want to connect.
15687 The TCP or UDP port number to use on the given @var{remote-host}.
15689 service name.
15725 @cindex @code{/p} files (@command{gawk})
15726 @cindex files, @code{/p} (@command{gawk})
15727 @cindex @code{--enable-portals} configuration option
15730 Similar to the @file{/inet} special files, if @command{gawk}
15734 files whose pathnames begin with @code{/p} as 4.4 BSD-style portals.
15736 @cindex @code{|} (vertical bar), @code{|&} operator (I/O), two-way communications
15737 @cindex vertical bar (@code{|}), @code{|&} operator (I/O), two-way communications
15759 @cindex @code{awkprof.out} file
15760 @cindex files, @code{awkprof.out}
15761 @cindex @command{pgawk} program, @code{awkprof.out} file
15762 @command{pgawk} is identical in every way to @command{gawk}, except that when
15765 Because it is profiling, it also executes up to 45% slower than
15768 @cindex @code{--profile} option
15770 the @option{--profile} option can be used to change the name of the file
15784 option to @option{--profile} to change the @value{FN}. Here is a sample
15828 programmers sometimes have to work late):
15830 @cindex @code{BEGIN} pattern, @command{pgawk} program
15831 @cindex @code{END} pattern, @command{pgawk} program
15879 The program is printed in the order @code{BEGIN} rule,
15880 pattern/action rules, @code{END} rule and functions, listed
15882 Multiple @code{BEGIN} and @code{END} rules are merged together.
15887 The first count, to the left of the rule, shows how many times
15889 The second count, to the right of the rule's opening left brace
15893 pattern evaluated to false.
15897 the count for an @code{if}-@code{else} statement shows how many times
15899 To the right of the opening left brace for the @code{if}'s body
15901 The count for the @code{else}
15906 The count for a loop header (such as @code{for}
15907 or @code{while}) shows how many times the loop test was executed.
15909 statement in a rule to determine how many times the rule was executed.
15915 For user-defined functions, the count next to the @code{function}
15917 The counts next to the statements in the body show how many times
15920 @cindex @code{@{@}} (braces), @command{pgawk} program
15921 @cindex braces (@code{@{@}}), @command{pgawk} program
15925 the body of an @code{if}, @code{else}, or loop is only a single statement.
15927 @cindex @code{()} (parentheses), @command{pgawk} program
15928 @cindex parentheses @code{()}, @command{pgawk} program
15942 Parentheses are used around the arguments to @code{print}
15943 and @code{printf} only when
15944 the @code{print} or @code{printf} statement is followed by a redirection.
15950 front of the @code{BEGIN} and @code{END} rules,
15958 the program. The advantage to this is that @command{pgawk} can produce
15959 a standard representation. The disadvantage is that all source-code
15960 comments are lost, as are the distinctions among multiple @code{BEGIN}
15961 and @code{END} rules. Also, things such as:
15984 infinite loop and you want to see what has been executed.
15994 @cindex @code{USR1} signal
15995 @cindex signals, @code{USR1}/@code{SIGUSR1}
15998 Use the @command{kill} command to send the @code{USR1} signal
15999 to @command{pgawk}:
16006 As usual, the profiled version of the program is written to
16007 @file{awkprof.out}, or to a different file if you use the @option{--profile}
16022 You may send @command{pgawk} the @code{USR1} signal as many times as you like.
16023 Each time, the profile and function call trace are appended to the output
16026 @cindex @code{HUP} signal
16027 @cindex signals, @code{HUP}/@code{SIGHUP}
16028 If you use the @code{HUP} signal instead of the @code{USR1} signal,
16031 @cindex @code{INT} signal (MS-DOS)
16032 @cindex signals, @code{INT}/@code{SIGINT} (MS-DOS)
16033 @cindex @code{QUIT} signal (MS-DOS)
16034 @cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-DOS)
16036 @code{INT} and @code{QUIT} signals for producing the profile and, in
16037 the case of the @code{INT} signal, @command{pgawk} exits. This is
16039 only signals you can deliver to a program are those generated by the
16040 keyboard. The @code{INT} signal is generated by the
16042 @code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
16052 This @value{CHAPTER} covers how to run awk, both POSIX-standard
16056 It then proceeds to cover how @command{gawk} searches for source files,
16066 * Command Line:: How to run @command{awk}.
16083 There are two ways to run @command{awk}---with an explicit program or with
16088 awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{}
16089 awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
16100 It is possible to invoke @command{awk} with an empty program:
16106 @cindex @code{--lint} option
16129 to be uniquely identified. If the option takes an argument, then the
16143 @table @code
16146 @cindex @code{-F} option
16147 @cindex @code{--field-separator} option
16148 @cindex @code{FS} variable, @code{--field-separator} option and
16149 Sets the @code{FS} variable to @var{fs}
16154 @cindex @code{-f} option
16155 @cindex @code{--file} option
16157 Indicates that the @command{awk} program is to be found in @var{source-file}
16162 @cindex @code{-v} option
16163 @cindex @code{--assign} option
16165 Sets the variable @var{var} to the value @var{val} @emph{before}
16167 inside the @code{BEGIN} rule
16175 @cindex built-in variables, @code{-v} option, setting with
16177 @cindex variables, built-in, @code{-v} option, setting with
16178 @strong{Caution:} Using @option{-v} to set the values of the built-in
16179 variables may lead to surprising results. @command{awk} will reset the
16180 values of those variables as it needs to, possibly ignoring any
16185 @cindex @code{-mf}/@code{-mr} options
16187 Sets various memory limits to the value @var{N}. The @samp{f} flag sets
16194 it continues to accept them to avoid breaking old programs.)
16197 @cindex @code{-W} option
16199 options are supplied as arguments to the @option{-W} option. These options
16213 @cindex @code{-} (hyphen), filenames beginning with
16214 @cindex hyphen (@code{-}), filenames beginning with
16226 @table @code
16231 @cindex @code{--compat} option
16232 @cindex @code{--traditional} option
16234 Specifies @dfn{compatibility mode}, in which the GNU extensions to
16244 @cindex @code{--copyright} option
16250 @cindex @code{--copyleft} option
16254 @cindex @code{--dump-variables} option
16255 @cindex @code{awkvars.out} file
16256 @cindex files, @code{awkvars.out}
16261 to @var{file}. If no @var{file} is provided, @command{gawk} prints this
16262 list to the file named @file{awkvars.out} in the current directory.
16266 Having a list of all global variables is a good way to look for
16269 functions, and you want to be sure that your functions don't
16270 inadvertently use global variables that you meant to be local.
16271 (This is a particularly easy mistake to make with simple variable
16272 names like @code{i}, @code{j}, etc.)
16274 @item -W gen-po
16275 @itemx --gen-po
16276 @cindex @code{--gen-po} option
16280 generates a GNU @code{gettext} Portable Object file on standard
16289 @cindex @code{--help} option
16290 @cindex @code{--usage} option
16299 @cindex @code{--lint} option
16302 Warns about constructs that are dubious or nonportable to
16315 @cindex @code{--lint-old} option
16322 @cindex @code{--non-decimal-data} option
16330 @cindex troubleshooting, @code{--non-decimal-data} option
16336 @cindex @code{--posix} option
16349 @code{\x} escape sequences are not recognized
16355 Newlines do not act as whitespace to separate fields when @code{FS} is
16356 equal to a single space
16364 The synonym @code{func} for the keyword @code{function} is not
16367 @cindex @code{*} (asterisk), @code{**} operator
16368 @cindex asterisk (@code{*}), @code{**} operator
16369 @cindex @code{*} (asterisk), @code{**=} operator
16370 @cindex asterisk (@code{*}), @code{**=} operator
16371 @cindex @code{^} (caret), @code{^} operator
16372 @cindex caret (@code{^}), @code{^} operator
16373 @cindex @code{^} (caret), @code{^=} operator
16374 @cindex caret (@code{^}), @code{^=} operator
16380 @cindex @code{FS} variable, as TAB character
16383 of @code{FS} to be a single TAB character
16387 @cindex @code{fflush} function, unsupported
16389 The @code{fflush} built-in function is not supported
16395 @cindex @code{--traditional} option, @code{--posix} option and
16396 @cindex @code{--posix} option, @code{--traditional} option and
16403 @cindex @code{--profile} option
16408 The optional @var{file} argument allows you to specify a different
16418 @cindex @code{--re-interval} option
16429 @cindex @code{--source} option
16430 @cindex source code, mixing
16431 Allows you to mix source code in files with source
16432 code that you enter on the command line.
16433 Program source code is taken from the @var{program-text}.
16435 when you have library functions that you want to use from your command-line
16440 @cindex @code{--version} option
16444 This allows you to determine if your copy of @command{gawk} is up to date
16445 with respect to whatever the Free Software Foundation is currently
16455 @cindex @code{-F} option, @code{-Ft} sets @code{FS} to TAB
16457 to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB
16458 character (@code{"\t"}). This is true only for @option{--traditional} and not
16462 @cindex @code{-f} option, on command line
16468 of having to be included into each individual program.
16475 type @kbd{@value{CTL}-d} (the end-of-file character) to terminate it.
16476 (You may also use @samp{-f -} to read program source from the standard
16477 input but then you will not be able to also use the standard input as a
16480 Because it is clumsy using the standard @command{awk} mechanisms to mix source
16482 @option{--source} option. This does not require you to pre-empt the standard
16483 input for your source code; it allows you to easily mix command-line
16484 and library source code
16487 @cindex @code{--source} option
16490 program source code.
16492 @cindex @code{POSIXLY_CORRECT} environment variable
16493 @cindex lint checking, @code{POSIXLY_CORRECT} environment variable
16498 Many GNU programs look for this environment variable to turn on
16505 lines to the @file{.profile} file in your home directory:
16512 @cindex @command{csh} utility, @code{POSIXLY_CORRECT} environment variable
16515 you would add this line to the @file{.login} file in your home directory:
16521 @cindex portability, @code{POSIXLY_CORRECT} environment variable
16523 but it is good for testing the portability of your programs to other
16534 input files to be processed in the order specified. However, an
16535 argument that has the form @code{@var{var}=@var{value}}, assigns
16536 the value @var{value} to the variable @var{var}---it does not specify a
16541 @cindex @code{ARGIND} variable, command-line arguments
16542 @cindex @code{ARGC}/@code{ARGV} variables, command-line arguments
16543 All these arguments are made available to your @command{awk} program in the
16544 @code{ARGV} array (@pxref{Built-in Variables}). Command-line options
16545 and the program text (if present) are omitted from @code{ARGV}.
16547 included. As each element of @code{ARGV} is processed, @command{gawk}
16548 sets the variable @code{ARGIND} to the index in @code{ARGV} of the
16553 arguments is made when @command{awk} is about to open the next input file.
16554 At that point in execution, it checks the @value{FN} to see whether
16561 @code{BEGIN} rule
16572 the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus
16574 @code{BEGIN} rule, while others were not. Unfortunately,
16575 some applications came to depend
16576 upon this ``feature.'' When @command{awk} was changed to be more consistent,
16577 the @option{-v} option was added to accommodate applications that depended
16580 The variable assignment feature is most useful for assigning to variables
16581 such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
16593 the value of @code{FS} is not
16601 @cindex differences in @command{awk} and @command{gawk}, @code{AWKPATH} environment variable
16607 implementations, you must supply a precise path name for each program
16609 But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option
16612 file with the specified name.
16622 @command{gawk} was configured. You probably don't need to worry about this,
16631 would have to be typed for each file.
16640 @strong{Note:} If you want files in the current directory to be found,
16644 colon or by placing two colons next to each other (@samp{::}).) If the
16647 to the shell's.
16652 @code{ENVIRON["AWKPATH"]}. This makes it easy to determine
16656 While you can change @code{ENVIRON["AWKPATH"]} within your @command{awk}
16658 sense: the @env{AWKPATH} environment variable is used to find the program
16660 found, and @command{gawk} no longer needs to use @env{AWKPATH}.
16676 @cindex @code{next file} statement, deprecated
16677 @cindex @code{nextfile} statement, @code{next file} statement and
16682 The use of @samp{next file} (two words) for @code{nextfile} was deprecated
16691 (Use @code{PROCINFO} instead; see
16723 @table @code
16726 Print the message @code{"awk: bailing out near line 1"} and dump core.
16729 The message is @emph{not} subject to translation in non-English locales.
16733 Early versions of @command{awk} used to not require any separator (either
16735 it was common to see one-line programs like:
16742 because it is considered bad style. The correct way to write such a program
16761 You can insert newlines after the @samp{;} in @code{for} loops.
16762 This seems to have been a long-undocumented feature in Unix @command{awk}.
16764 Similarly, you may use @code{print} or @code{printf} statements in the
16765 @var{init} and @var{increment} parts of a @code{for} loop. This is another
16766 long-undocumented ``feature'' of Unix @code{awk}.
16770 then the associative @code{for} loop will go through the array
16774 @code{IGNORECASE} does not affect the comparison either.
16789 @cindex troubleshooting, @code{-F} option
16790 @cindex @code{-F} option, troubleshooting
16791 @cindex @code{FS} variable, changing value of
16793 The @option{-F} option for changing the value of @code{FS}
16799 Syntactically invalid single-character programs tend to overflow
16801 are surprisingly difficult to diagnose in the completely general case,
16802 and the effort to do so really is not worth it.
16811 Part II shows how to use @command{awk} and @command{gawk} for problem solving.
16812 There is lots of code here for you to read and learn from.
16839 @ref{User-defined}, describes how to write
16841 it allows you to encapsulate algorithms and program tasks in a single
16845 One valuable way to learn a new programming language is to @emph{read}
16848 provide a good-sized body of code for you to read,
16849 and hopefully, to learn from.
16855 The functions are presented here in a progression from simple to complex.
16859 presents a program that you can use to extract the source code for
16865 and would like to contribute them to the author's collection of @command{awk}
16875 Diagnostic error messages are sent to @file{/dev/stderr}.
16879 A number of programs use @code{nextfile}
16881 to skip any remaining input in the input file.
16883 shows you how to write a function that does the same thing.
16885 @c 12/2000: Thanks to Nelson Beebe for pointing out the output issue.
16887 @cindex @code{IGNORECASE} variable, in example programs
16888 Finally, some of the programs choose to ignore upper- and lowercase
16889 distinctions in their input. They do so by assigning one to @code{IGNORECASE}.
16892 record will be in all lowercase, while @code{IGNORECASE} preserves the original
16893 contents of the input record.} by adding the following rule to the
16906 * Library Names:: How to best name private global variables in
16925 Due to the way the @command{awk} language evolved, variables are either
16927 a specific function). There is no intermediate state analogous to
16928 @code{static} variables in C.
16933 Library functions often need to have global variables that they can use to
16934 preserve state information between calls to the function---for example,
16935 @code{getopt}'s variable @code{_opti}
16937 Such variables are called @dfn{private}, since the only functions that need to
16940 When writing a library function, you should try to choose names for your
16943 name like @samp{i} or @samp{j} is not a good choice, because user programs
16950 decreases the chances that the variable name will be accidentally shared
16953 @cindex @code{_} (underscore), in names of private variables
16954 @cindex underscore (@code{_}), in names of private variables
16957 @code{_pw_byname} in the user database routines
16963 been rewritten to use this convention, this was not done, in order to
16964 show how my own @command{awk} programming style has evolved and to
16968 available for use by a main program, it is a good convention to start that
16969 variable's name with a capital letter---for
16970 example, @code{getopt}'s @code{Opterr} and @code{Optind} variables
16973 the variable name is not all capital letters indicates that the variable is
16974 not one of @command{awk}'s built-in variables, such as @code{FS}.
16976 @cindex @code{--dump-variables} option
16978 functions that do not need to save state are, in fact, declared
16981 could accidentally be used in the user's program, leading to bugs that
16982 are very difficult to track down:
16997 A different convention, common in the Tcl community, is to use a single
16998 associative array to hold the values needed by the library function(s), or
17002 might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}},
17003 @code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of
17004 @code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}},
17005 and @code{@w{_pw_count}}.
17008 that: conventions. You are not required to write your programs this
17018 * Nextfile Function:: Two implementations of a @code{nextfile}
17022 * Round Function:: A function for rounding if @code{sprintf} does
17027 * Join Function:: A function to join an array into a string.
17028 * Gettimeofday Function:: A function to get formatted times.
17032 @subsection Implementing @code{nextfile} as a Function
17036 @cindex libraries of @command{awk} functions, @code{nextfile} statement
17038 @cindex functions, library, @code{nextfile} statement
17040 @cindex @code{nextfile} statement, implementing
17041 @cindex @command{gawk}, @code{nextfile} statement in
17042 The @code{nextfile} statement, presented in
17046 @code{nextfile} function that you can use to simulate @command{gawk}'s
17047 @code{nextfile} statement if you cannot use @command{gawk}.
17049 A first attempt at writing a @code{nextfile} function is as follows:
17059 @cindex programming conventions, @code{nextfile} statement
17062 @value{DF}'s name (which is always in the @code{FILENAME} variable) to
17063 a private variable named @code{_abandon_}. If the @value{FN} matches,
17064 then the action part of the rule executes a @code{next} statement to
17065 go on to the next record. (The use of @samp{_} in the variable name is
17069 The use of the @code{next} statement effectively creates a loop that reads
17072 a new @value{DF} is opened, changing the value of @code{FILENAME}.
17073 Once this happens, the comparison of @code{_abandon_} to @code{FILENAME}
17076 The @code{nextfile} function itself simply sets the value of @code{_abandon_}
17077 and then executes a @code{next} statement to start the
17083 that allows you to
17084 execute @code{next} from within a function body. Some other workaround
17088 @cindex @code{nextfile} user-defined function
17093 this code skips right through the file a second time, even though
17094 it should stop when it gets to the end of the first occurrence.
17095 A second version of @code{nextfile} that remedies this problem
17125 The @code{nextfile} function has not changed. It makes @code{_abandon_}
17126 equal to the current @value{FN} and then executes a @code{next} statement.
17127 The @code{next} statement reads the next record and increments @code{FNR}
17128 so that @code{FNR} is guaranteed to have a value of at least two.
17129 However, if @code{nextfile} is called for the last record in the file,
17130 then @command{awk} closes the current @value{DF} and moves on to the next
17131 one. Upon doing so, @code{FILENAME} is set to the name of the new file
17132 and @code{FNR} is reset to one. If this next file is the same as
17133 the previous one, @code{_abandon_} is still equal to @code{FILENAME}.
17134 However, @code{FNR} is equal to one, telling us that this is a new
17136 @code{nextfile} function was executed. In that case, @code{_abandon_}
17137 is reset to the empty string, so that further executions of this rule
17138 fail (until the next time that @code{nextfile} is called).
17140 If @code{FNR} is not one, then we are still in the original @value{DF}
17141 and the program executes a @code{next} statement to skip through it.
17143 An important question to ask at this point is: given that the
17144 functionality of @code{nextfile} can be provided with a library file,
17146 features for little reason leads to larger, slower programs that are
17147 harder to maintain.
17148 The answer is that building @code{nextfile} into @command{gawk} provides
17149 significant gains in efficiency. If the @code{nextfile} function is executed
17150 at the beginning of a large @value{DF}, @command{awk} still has to scan the entire
17153 just to skip over it. The built-in
17154 @code{nextfile} can simply close the file immediately and proceed to the
17169 @cindex @code{assert} function (C library)
17175 When writing large programs, it is often useful to know
17177 particular computation, you make a statement about what you believe to be
17179 @dfn{assertion}. The C language provides an @code{<assert.h>} header file
17180 and corresponding @code{assert} macro that the programmer can use to make
17181 assertions. If an assertion fails, the @code{assert} macro arranges to
17184 @code{assert} looks this:
17196 If the assertion fails, the program prints a message similar to this:
17202 @cindex @code{assert} user-defined function
17203 The C language makes it possible to turn the condition into a string for use
17205 this @code{assert} function also requires a string version of the condition
17242 The @code{assert} function tests the @code{condition} parameter. If it
17243 is false, it prints a message to standard error, using the @code{string}
17244 parameter to describe the failed condition. It then sets the variable
17245 @code{_assert_exit} to one and executes the @code{exit} statement.
17246 The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
17247 rules finds @code{_assert_exit} to be true, it exits immediately.
17249 The purpose of the test in the @code{END} rule is to
17250 keep any other @code{END} rules from running. When an assertion fails, the
17252 If no assertions fail, then @code{_assert_exit} is still
17253 false when the @code{END} rule is run normally, and the rest of the
17254 program's @code{END} rules execute.
17255 For all of this to work correctly, @file{assert.awk} must be the
17268 If the assertion fails, you see a message similar to the following:
17274 @cindex @code{END} pattern, @code{assert} user-defined function and
17275 There is a small problem with this version of @code{assert}.
17276 An @code{END} rule is automatically added
17277 to the program calling @code{assert}. Normally, if a program consists
17278 of just a @code{BEGIN} rule, the input files and/or standard input are
17279 not read. However, now that the program has an @code{END} rule, @command{awk}
17280 attempts to read the input @value{DF}s or standard input
17282 most likely causing the program to hang as it waits for input.
17284 @cindex @code{BEGIN} pattern, @code{assert} user-defined function and
17285 There is a simple workaround to this:
17286 make sure the @code{BEGIN} rule always ends
17287 with an @code{exit} statement.
17301 @cindex @code{print} statement, @code{sprintf} function and
17302 @cindex @code{printf} statement, @code{sprintf} function and
17303 @cindex @code{sprintf} function, @code{print}/@code{printf} statements and
17304 The way @code{printf} and @code{sprintf}
17306 perform rounding often depends upon the system's C @code{sprintf}
17307 subroutine. On many machines, @code{sprintf} rounding is ``unbiased,''
17309 to naive expectations. In unbiased rounding, @samp{.5} rounds to even,
17310 rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means
17311 that if you are using a format that does rounding (e.g., @code{"%.0f"}),
17313 traditional rounding; it might be useful if your awk's @code{printf}
17316 @cindex @code{round} user-defined function
17371 It is easily programmed, in less than 10 lines of @command{awk} code:
17373 @cindex @code{cliff_rand} user-defined function
17401 If the built-in @code{rand} function
17413 @code{ord}, which takes a character and returns the numeric value for that
17414 character in the machine's character set. If the string passed to
17415 @code{ord} has more than one character, only the first one is used.
17417 The inverse of this function is @code{chr} (from the function of the same
17418 name in Pascal), which takes a number and returns the corresponding character.
17420 reason to build them into the @command{awk} interpreter:
17422 @cindex @code{ord} user-defined function
17423 @cindex @code{chr} user-defined function
17430 # _ord_init: function to initialize _ord_
17472 Some explanation of the numbers used by @code{chr} is worthwhile.
17474 8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
17475 defines characters that use the values from 0 to 127.@footnote{ASCII
17476 has been extended in many countries to use the values from 128 to 255
17478 you can simplify @code{_ord_init} to simply loop from 0 to 255.}
17484 have numeric values from 128 to 255.
17501 # force c to be numeric by adding 0
17506 #### test code ####
17519 An obvious improvement to these functions is to move the code for the
17520 @code{@w{_ord_init}} function into the body of the @code{BEGIN} rule. It was
17522 There is a ``test program'' in a @code{BEGIN} rule, to test the
17532 When doing string processing, it is often useful to be able to join
17534 @code{join}, accomplishes this task. It is used later in several of
17538 Good function design is important; this function needs to be general but it
17540 as well as the beginning and ending indices of the elements in the array to be
17542 assumption since the array was likely created with @code{split}
17545 @cindex @code{join} user-defined function
17573 An optional additional argument is the separator to use when joining the
17575 @code{join} uses it; if it is not supplied, it has a null
17576 value. In this case, @code{join} uses a single blank as a default
17577 separator for the strings. If the value is equal to @code{SUBSEP},
17578 then @code{join} joins the strings with no separator between them.
17579 @code{SUBSEP} serves as a ``magic'' value to indicate that there should
17583 more difficult than they really need to be.}
17592 The @code{systime} and @code{strftime} functions described in
17595 in human readable form. While @code{strftime} is extensive, the control
17596 formats are not necessarily easy to remember or intuitively obvious when
17599 The following function, @code{gettimeofday}, populates a user-supplied array
17603 @cindex @code{gettimeofday} user-defined function
17625 # time["monthname"] -- name of the month
17626 # time["shortmonth"] -- short name of the month
17631 # time["dayname"] -- name of weekday
17632 # time["shortdayname"] -- short name of weekday
17634 # time["timezone"] -- abbreviation of timezone name
17650 # fill in values, force numeric values to be
17677 The string indices are easier to use and read than the various formats
17678 required by @code{strftime}. The @code{alarm} program presented in
17681 A more general design for the @code{gettimeofday} function would have
17682 allowed the user to supply an optional timestamp value to use instead
17710 The @code{BEGIN} and @code{END} rules are each executed exactly once at
17714 @code{BEGIN} rule is executed at the beginning of each @value{DF} and the
17715 @code{END} rule is executed at the end of each @value{DF}. When informed
17717 patterns to @command{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that
17718 would have the desired behavior. He even supplied us the code to do so.
17720 Adding these special patterns to @command{gawk} wasn't necessary;
17723 It arranges to call two user-supplied functions, @code{beginfile} and
17724 @code{endfile}, at the beginning and end of each @value{DF}.
17725 Besides solving the problem in only nine(!) lines of code, it does so
17734 # that each take the name of the file being started or
17754 This rule relies on @command{awk}'s @code{FILENAME} variable that
17756 saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does
17757 not equal @code{_oldfilename}, then a new @value{DF} is being processed and
17758 it is necessary to call @code{endfile} for the old file. Because
17759 @code{endfile} should only be called if a file has been processed, the
17760 program first checks to make sure that @code{_oldfilename} is not the null
17761 string. The program then assigns the current @value{FN} to
17762 @code{_oldfilename} and calls @code{beginfile} for the file.
17763 Because, like all @command{awk} variables, @code{_oldfilename} is
17764 initialized to the null string, this rule executes correctly even for the
17767 The program also supplies an @code{END} rule to do the final processing for
17768 the last file. Because this @code{END} rule comes before any @code{END} rules
17769 supplied in the ``main'' program, @code{endfile} is called first. Once
17770 again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
17772 @cindex @code{beginfile} user-defined function
17773 @cindex @code{endfile} user-defined function
17774 This version has same problem as the first version of @code{nextfile}
17777 @code{endfile} and @code{beginfile} are not executed at the end of the
17815 Another request for a new built-in function was for a @code{rewind}
17816 function that would make it possible to reread the current file.
17817 The requesting user didn't want to have to use @code{getline}
17821 However, as long as you are not in the @code{END} rule, it is
17822 quite easy to arrange to immediately close the current input file
17824 For lack of a better name, we'll call it @code{rewind}:
17826 @cindex @code{rewind} user-defined function
17846 # make sure gawk knows to keep going
17849 # make current file next to get done
17858 This code relies on the @code{ARGIND} variable
17860 which is specific to @command{gawk}.
17869 to either update @code{ARGIND} on your own
17870 or modify this code as appropriate.
17872 The @code{rewind} function also relies on the @code{nextfile} keyword
17875 for a function version of @code{nextfile}.
17886 might want to just ignore such files and keep going. You can
17887 do this by prepending the following program to your @command{awk}
17890 @cindex @code{readable.awk} program
17893 # readable.awk --- library file to skip over unreadable files
17918 @cindex troubleshooting, @code{getline} function
17919 In @command{gawk}, the @code{getline} won't be fatal (unless
17921 Removing the element from @code{ARGV} with @code{delete}
17924 @c This doesn't handle /dev/stdin etc. Not worth the hassle to mention or fix.
17932 tries to read a record from an empty file, it immediately receives an
17933 end of file indication, closes the file, and proceeds on to the next
17935 @command{awk} program code.
17937 Using @command{gawk}'s @code{ARGIND} variable
17938 (@pxref{Built-in Variables}), it is possible to detect when an empty
17939 @value{DF} has been skipped. Similar to the library file presented
17941 @code{zerofile} that the user must provide. The arguments passed are
17942 the @value{FN} and the position in @code{ARGV} where it was found:
17944 @cindex @code{zerofile.awk} program
17947 # zerofile.awk --- library file to process empty input files
17975 The user-level variable @code{Argind} allows the @command{awk} program
17976 to track its progress through @code{ARGV}. Whenever the program detects
17977 that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or
17978 more empty files were skipped. The action then calls @code{zerofile} for
17979 each such file, incrementing @code{Argind} along the way.
17981 The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date
17984 Finally, the @code{END} rule catches the case of any empty files at
17986 condition of the @code{for} loop uses the @samp{<=} operator,
17987 not @code{<}.
17990 be solved without relying on @command{gawk}'s @code{ARGIND} variable.
17992 As a second exercise, revise this code to handle the case where
17993 an intervening value in @code{ARGV} is a variable assignment.
18033 Occasionally, you might not want @command{awk} to process command-line
18040 to disable command-line assignments. However, some simple programming with
18043 @cindex @code{noassign.awk} program
18046 # noassign.awk --- library file to avoid the need for a
18079 It prepends @samp{./} to
18083 The use of @code{No_command_assign} allows you to disable command-line
18106 the command line that can be used to change the way a program behaves.
18109 Often, options take @dfn{arguments}; i.e., data that the program needs to
18111 @option{-F} option requires a string to use as the field separator.
18115 @cindex @code{getopt} function (C library)
18116 Modern Unix systems provide a C function named @code{getopt} for processing
18119 string with a colon. @code{getopt} is also passed the
18121 @code{getopt} processes the command-line arguments for option letters.
18126 When using @code{getopt}, options that do not take arguments can be
18143 the argument is considered to be the option's argument.
18146 and that @samp{foo} is the argument to the @option{-b} option.
18148 @code{getopt} provides four external variables that the programmer can use:
18150 @table @code
18152 The index in the argument value array (@code{argv}) where the first
18156 The string value of the argument to an option.
18159 Usually @code{getopt} prints an error message when it finds an invalid
18160 option. Setting @code{opterr} to zero disables this feature. (An
18161 application might want to print its own error message.)
18168 The following C fragment shows how @code{getopt} might process command-line
18202 As a side point, @command{gawk} actually uses the GNU @code{getopt_long}
18203 function to process both normal and GNU-style long options
18206 The abstraction provided by @code{getopt} is very useful and is quite
18208 version of @code{getopt}. This function highlights one of the
18210 manipulating single characters. Repeated calls to @code{substr} are
18213 function was written before @command{gawk} acquired the ability to
18214 split strings into single characters using @code{""} as the separator.
18215 We have left it alone, since using @code{substr} is more portable.}
18217 The discussion that follows walks through the code a bit at a time:
18219 @cindex @code{getopt} user-defined function
18237 # Optarg -- string value of argument to current option
18254 are ``private'' to this library function. Such documentation is essential
18257 The @code{getopt} function first checks that it was indeed called with a string of options
18258 (the @code{options} parameter). If @code{options} has a zero length,
18259 @code{getopt} immediately returns @minus{}1:
18261 @cindex @code{getopt} user-defined function
18282 The next thing to check for is the end of the options. A @option{--}
18284 does not begin with a @samp{-}. @code{Optind} is used to step through
18286 to @code{getopt}, because it is a global variable.
18288 The regular expression that is used, @code{@w{/^-[^: \t\n\f\r\v\b]/}}, is
18315 The @code{_opti} variable tracks the position in the current command-line
18316 argument (@code{argv[Optind]}). If multiple options are
18318 to return them to the user one at a time.
18320 If @code{_opti} is equal to zero, it is set to two, which is the index in
18321 the string of the next character to look at (we skip the @samp{-}, which
18322 is at position one). The variable @code{thisopt} holds the character,
18323 obtained with @code{substr}. It is saved in @code{Optopt} for the main
18324 program to use.
18326 If @code{thisopt} is not in the @code{options} string, then it is an
18327 invalid option. If @code{Opterr} is nonzero, @code{getopt} prints an error
18328 message on the standard error that is similar to the message from the C
18329 version of @code{getopt}.
18331 Because the option is invalid, it is necessary to skip it and move on to the
18332 next option character. If @code{_opti} is greater than or equal to the
18333 length of the current command-line argument, it is necessary to move on
18334 to the next argument, so @code{Optind} is incremented and @code{_opti} is reset
18335 to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
18338 In any case, because the option is invalid, @code{getopt} returns @samp{?}.
18339 The main program can examine @code{Optopt} if it needs to know what the
18357 in the @code{options} string. If there are remaining characters in the
18358 current command-line argument (@code{argv[Optind]}), then the rest of that
18359 string is assigned to @code{Optarg}. Otherwise, the next command-line
18361 @code{_opti} is reset to zero, because there are no more characters left to
18376 Finally, if @code{_opti} is either zero or greater than the length of the
18377 current command-line argument, it means this element in @code{argv} is
18378 through being processed, so @code{Optind} is incremented to point to the
18379 next element in @code{argv}. If neither condition is true, then only
18380 @code{_opti} is incremented, so that the next option letter can be processed
18381 on the next call to @code{getopt}.
18383 The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
18384 @code{Opterr} is set to one, since the default behavior is for @code{getopt}
18385 to print a diagnostic message upon seeing an invalid option. @code{Optind}
18386 is set to one, since there's no reason to look at the program name, which is
18387 in @code{ARGV[0]}:
18392 Opterr = 1 # default is to diagnose
18409 The rest of the @code{BEGIN} rule is a simple test program. Here is the
18431 the first @option{--} terminates the arguments to @command{awk}, so that it does
18432 not try to interpret the @option{-a}, etc., as its own options.
18435 use @code{getopt} to process their arguments.
18454 @cindex @code{PROCINFO} array
18455 The @code{PROCINFO} array
18457 provides access to the current user's real and effective user and group ID
18460 information to the average user. There needs to be some way to find the
18466 @cindex @code{getpwent} function (C library)
18467 @cindex @code{getpwent} user-defined function
18474 kept. Instead, it provides the @code{<pwd.h>} header file
18476 The primary function is @code{getpwent}, for ``get password entry.''
18479 encrypted passwords (hence the name).
18485 information is stored in a network database.} To be sure you are able to
18487 to write a small C program that calls @code{getpwent}. @code{getpwent}
18488 is defined as returning a pointer to a @code{struct passwd}. Each time it
18490 no more entries, it returns @code{NULL}, the null pointer. When this
18491 happens, the C program should call @code{endpwent} to close the database.
18494 @c Use old style function header for portability to old systems (SunOS, HP/UX).
18555 @item Login name
18556 The user's login name.
18563 (On some systems it's a C @code{long}, and not an @code{int}. Thus
18564 we cast it to @code{long} for all cases.)
18568 (Similar comments about @code{long} vs.@: @code{int} apply here.)
18570 @item Full name
18571 The user's full name, and perhaps other information associated with the
18575 The user's login (or ``home'') directory (familiar to shell programmers as
18576 @code{$HOME}).
18585 @item Login name @tab The user's login name.
18593 @item Full name @tab The user's full name, and perhaps other information associated with the
18596 @item Home directory @tab The user's login (or ``home'') directory (familiar to shell programmers as
18597 @code{$HOME}).
18622 information. There are several functions here, corresponding to the C
18626 @c Answer: return foo[key] returns "" if key not there, no need to check with `in'.
18628 @cindex @code{_pw_init} user-defined function
18644 # tailor this to suit your system
18678 @cindex @code{BEGIN} pattern, @code{pwcat} program
18679 The @code{BEGIN} rule sets a private variable to the directory where
18680 @command{pwcat} is stored. Because it is used to help out an @command{awk} library
18681 routine, we have chosen to put it in @file{/usr/local/libexec/awk};
18682 however, you might want it to be in a different directory on your system.
18684 The function @code{_pw_init} keeps three copies of the user information
18686 (@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
18687 occurrence (@code{_pw_bycount}).
18688 The variable @code{_pw_inited} is used for efficiency; @code{_pw_init}
18689 needs only to be called once.
18691 @cindex @code{getline} command, @code{_pw_init} function
18692 Because this function uses @code{getline} to read information from
18693 @command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}.
18694 It notes in the variable @code{using_fw} whether field splitting
18695 with @code{FIELDWIDTHS} is in effect or not.
18701 The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
18702 is @code{"FIELDWIDTHS"} if field splitting is being done with
18703 @code{FIELDWIDTHS}. This makes it possible to restore the correct
18705 @command{gawk}. It is false if using @code{FS} or on some other
18708 The main part of the function uses a loop to read database lines, split
18710 When the loop is done, @code{@w{_pw_init}} cleans up by closing the pipeline,
18711 setting @code{@w{_pw_inited}} to one, and restoring @code{FS} (and @code{FIELDWIDTHS}
18712 if necessary), @code{RS}, and @code{$0}.
18713 The use of @code{@w{_pw_count}} is explained shortly.
18717 @cindex @code{getpwnam} function (C library)
18718 The @code{getpwnam} function takes a username as a string argument. If that
18722 @cindex @code{getpwnam} user-defined function
18726 function getpwnam(name)
18729 if (name in _pw_byname)
18730 return _pw_byname[name]
18737 @cindex @code{getpwuid} function (C library)
18739 the @code{getpwuid} function takes a user ID number argument. If that
18743 @cindex @code{getpwuid} user-defined function
18756 @cindex @code{getpwent} function (C library)
18757 The @code{getpwent} function simply steps through the database, one entry at
18758 a time. It uses @code{_pw_count} to track its current position in the
18759 @code{_pw_bycount} array:
18761 @cindex @code{getpwent} user-defined function
18774 @cindex @code{endpwent} function (C library)
18775 The @code{@w{endpwent}} function resets @code{@w{_pw_count}} to zero, so that
18776 subsequent calls to @code{getpwent} start over again:
18778 @cindex @code{endpwent} user-defined function
18789 @code{@w{_pw_init}} to initialize the database arrays. The overhead of running
18790 a separate process to generate the user database, and the I/O to scan it,
18794 (The alternative is move the body of @code{@w{_pw_init}} into a
18795 @code{BEGIN} rule, which always runs @command{pwcat}. This simplifies the
18796 code but runs an extra process that may never be needed.)
18798 In turn, calling @code{_pw_init} is not too expensive, because the
18799 @code{_pw_inited} variable keeps the program from reading the data more than
18801 @command{awk} program, the check of @code{_pw_inited} could be moved out of
18802 @code{_pw_init} and duplicated in all the other functions. In practice,
18804 clutters up the code.
18824 @cindex @code{PROCINFO} array
18825 @cindex @code{getgrent} function (C library)
18826 @cindex @code{getgrent} user-defined function
18834 applies to the group database as well. Although there has traditionally
18837 (@code{<grp.h>} and @code{getgrent})
18841 to have a small C program that generates the group database as its output.
18917 The name of the group.
18921 usually empty or set to @samp{*}.
18925 (On some systems it's a C @code{long}, and not an @code{int}. Thus
18926 we cast it to @code{long} for all cases.)
18930 Modern Unix systems allow users to be members of several groups
18932 @code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO}
18934 (Note that @code{PROCINFO} is a @command{gawk} extension;
18940 @item Group name @tab The group's name.
18943 it is usually empty or set to @samp{*}.
18950 Modern Unix systems allow users to be members of several groups
18952 @code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO}
18954 (Note that @code{PROCINFO} is a @command{gawk} extension;
18974 @cindex @code{getline} command, @code{_gr_init} user-defined function
18975 @cindex @code{_gr_init} user-defined function
18993 # Change to suit your system
19043 The @code{BEGIN} rule sets a private variable to the directory where
19044 @command{grcat} is stored. Because it is used to help out an @command{awk} library
19045 routine, we have chosen to put it in @file{/usr/local/libexec/awk}. You might
19046 want it to be in a different directory on your system.
19050 The @code{@w{_gr_inited}} variable is used to
19052 The @code{@w{_gr_init}} function first saves @code{FS}, @code{FIELDWIDTHS}, @code{RS}, and
19053 @code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
19057 The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
19058 (@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
19059 There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}),
19060 which is a space-separated list of groups to which each user belongs.
19062 Unlike the user database, it is possible to have multiple records in the
19071 For this reason, @code{_gr_init} looks to see if a group name or
19074 subtle problem with the code just presented. Suppose that
19075 the first time there were no names. This code adds the names with
19076 a leading comma. It also doesn't check that there is a @code{$4}.)
19078 Finally, @code{_gr_init} closes the pipeline to @command{grcat}, restores
19079 @code{FS} (and @code{FIELDWIDTHS} if necessary), @code{RS}, and @code{$0},
19080 initializes @code{_gr_count} to zero
19081 (it is used later), and makes @code{_gr_inited} nonzero.
19083 @cindex @code{getgrnam} function (C library)
19084 The @code{getgrnam} function takes a group name as its argument, and if that
19085 group exists, it is returned. Otherwise, @code{getgrnam} returns the null
19088 @cindex @code{getgrnam} user-defined function
19101 @cindex @code{getgrgid} function (C library)
19102 The @code{getgrgid} function is similar, it takes a numeric group ID and
19105 @cindex @code{getgrgid} user-defined function
19118 @cindex @code{getgruser} function (C library)
19119 The @code{getgruser} function does not have a C counterpart. It takes a
19122 @cindex @code{getgruser} function, user-defined
19135 @cindex @code{getgrent} function (C library)
19136 The @code{getgrent} function steps through the database one entry at a time.
19137 It uses @code{_gr_count} to track its position in the list:
19139 @cindex @code{getgrent} user-defined function
19153 @cindex @code{endgrent} function (C library)
19154 The @code{endgrent} function resets @code{_gr_count} to zero so that @code{getgrent} can
19157 @cindex @code{endgrent} user-defined function
19167 As with the user database routines, each function calls @code{_gr_init} to
19169 @command{grcat} if these functions are used (as opposed to moving the body of
19170 @code{_gr_init} into a @code{BEGIN} rule).
19174 simple, relying on @command{awk}'s associative arrays to do work.
19192 presents the idea that reading programs in a language contributes to
19198 The first describes how to run the programs presented
19212 ability to do a lot in just a few lines of code.
19219 * Running Examples:: How to run these examples.
19234 Here, @var{program} is the name of the @command{awk} program (such as
19246 If your @command{awk} is not @command{gawk}, you may instead need to use this:
19260 because the algorithms can be very clearly expressed, and the code is usually
19263 It should be noted that these programs are not necessarily intended to
19265 purpose is to illustrate @command{awk} language programming for ``real world''
19291 from its standard input and sends them to its standard output.
19293 but you may supply a command-line option to change the field
19297 A common use of @command{cut} might be to pull out just the login name of
19307 @table @code
19309 Use @var{list} as the list of characters to cut out. Items within the list
19315 Use @var{list} as the list of fields to cut out.
19325 The @command{awk} implementation of @command{cut} uses the @code{getopt} library
19327 and the @code{join} library function
19331 functions needed, and a @code{usage} function that prints out a usage
19332 message and exits. @code{usage} is called if invalid arguments are
19335 @cindex @code{cut.awk} program
19372 The variables @code{e1} and @code{e2} are used so that the function
19381 @cindex @code{BEGIN} pattern, running @command{awk} programs and
19382 @cindex @code{FS} variable, running @command{awk} programs and
19383 Next comes a @code{BEGIN} rule that parses the command-line options.
19384 It sets @code{FS} to a single TAB character, because that is @command{cut}'s
19385 default field separator. The output field separator is also set to be the
19386 same as the input field separator. Then @code{getopt} is used to step
19388 @code{by_fields} or @code{by_chars} is set to true, to indicate that
19390 When cutting by characters, the output field separator is set to the null
19430 a single space (@code{@w{" "}}) for the value of @code{FS} is
19432 tabs, and/or newlines, and we want them to be separated with individual
19433 spaces. Also, note that after @code{getopt} is through, we have to
19434 clear out all the elements of @code{ARGV} from 1 to @code{Optind},
19435 so that @command{awk} does not try to process the command-line options
19441 either @code{set_fieldlist} or @code{set_charlist} to pull apart the
19465 @code{set_fieldlist} is used to split the field list apart at the commas
19466 and into an array. Then, for each element of the array, it looks to
19468 is verified to make sure the first number is smaller than the second.
19469 Each number in the list is added to the @code{flist} array, which
19500 The @code{set_charlist} function is more complicated than @code{set_fieldlist}.
19501 The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable
19506 Setting up @code{FIELDWIDTHS} is more complicated than simply listing the
19507 fields that need to be printed. We have to keep track of the fields to
19508 print and also the intervening characters that have to be skipped.
19511 for @code{FIELDWIDTHS} is @code{@w{"8 6 1 6 14"}}. This yields five
19512 fields, and the fields to print
19513 are @code{$1}, @code{$3}, and @code{$5}.
19516 @code{flist} lists the fields to print, and @code{t} tracks the
19566 is given, then @code{suppress} is true. The first @code{if} statement
19568 @command{cut} is processing fields, @code{suppress} is true, and the field
19572 into fields, either using the character in @code{FS} or using fixed-length
19573 fields and @code{FIELDWIDTHS}. The loop goes through the list of fields
19596 This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS}
19597 variable to do the character-based cutting. While it is possible in
19598 other @command{awk} implementations to use @code{substr}
19601 The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
19620 expressions that are almost identical to those available in @command{awk}
19629 expression is quoted to prevent the shell from expanding any of the
19632 the command line, each output line is preceded by the name of the file
19635 The options to @command{egrep} are as follows:
19637 @table @code
19658 Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
19659 option is to allow patterns that start with a @samp{-}.
19662 This version uses the @code{getopt} library function
19667 The program begins with a descriptive comment and then a @code{BEGIN} rule
19668 that processes the command-line arguments with @code{getopt}. The @option{-i}
19670 @code{IGNORECASE} built-in variable
19673 @cindex @code{egrep.awk} program
19717 Next comes the code that handles the @command{egrep}-specific behavior. If no
19719 command line is used. The @command{awk} command-line arguments up to @code{ARGV[Optind]}
19720 are cleared, so that @command{awk} won't try to process them as files. If no
19722 specified, we make sure to note this so that the @value{FN}s can precede the
19745 @command{gawk}. They should be uncommented if you have to use another version
19756 @c Exercise: Fix this, w/array and new line as key to original line
19767 The @code{beginfile} function is called by the rule in @file{ftrans.awk}
19769 does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
19771 (naming the parameter @code{junk} shows we know that @code{beginfile}
19783 The @code{endfile} function is called after each file has been processed.
19785 matched. @code{no_print} is true only if the exit status is desired.
19786 @code{count_only} is true if line counts are desired. @command{egrep}
19788 The output format must be adjusted depending upon the number of files to
19789 process. Finally, @code{fcount} is added to @code{total}, so that we
19808 @code{matches} is true if the line matched the pattern. If the user
19809 wants lines that did not match, the sense of @code{matches} is inverted
19810 using the @samp{!} operator. @code{fcount} is incremented with the value of
19811 @code{matches}, which is either one or zero, depending upon a
19813 @code{next} statement just moves on to the next record.
19815 @cindex @code{!} (exclamation point), @code{!} operator
19816 @cindex exclamation point (@code{!}), @code{!} operator
19819 (@code{no_print} is true), then it is enough to know that @emph{one}
19820 line in this file matched, and we can skip on to the next file with
19821 @code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
19822 print the @value{FN}, and then skip to the next file with @code{nextfile}.
19826 @cindex @code{!} operator
19857 The @code{END} rule takes care of producing the correct exit status. If
19871 The @code{usage} function prints a usage message in case of invalid options,
19886 The variable @code{e} is used so that the function fits nicely
19889 @cindex @code{END} pattern, backslash continuation and
19890 @cindex @code{\} (backslash), continuing lines and
19891 @cindex backslash (@code{\}), continuing lines and
19892 Just a note on programming style: you may have noticed that the @code{END}
19898 your @code{BEGIN} and @code{END} rules this way
19922 @code{PROCINFO} array (@pxref{Built-in Variables}).
19933 @code{BEGIN} rule. The user and group ID numbers are obtained from
19934 @code{PROCINFO}.
19935 The code is repetitive. The entry in the user database for the real user ID
19936 number is split into parts at the @samp{:}. The name is the first field.
19937 Similar code is used for the effective user ID number and the group
19940 @cindex @code{id.awk} program
20021 @cindex @code{in} operator
20022 The test in the @code{for} loop is worth noting.
20023 Any supplementary groups in the @code{PROCINFO} array have the
20024 indices @code{"group1"} through @code{"group@var{N}"} for some
20030 @code{"group"}, and then using @code{in} to see if that value is
20031 in the array. Eventually, @code{i} is incremented past
20041 information is printed. Modify this version to accept the same
20050 @cindex @code{split} utility
20051 The @code{split} program splits large text files into smaller pieces.
20063 instead of 1000. To change the name of the output files to something like
20067 Here is a version of @code{split} in @command{awk}. It uses the @code{ord} and
20068 @code{chr} functions presented in
20071 The program first sets its defaults, and then tests to make sure there are
20074 to look like a negative number, so it is made positive, and that is the
20078 @cindex @code{split.awk} program
20110 i++ # skip data file name
20122 The next rule does most of the work. @code{tcount} (temporary count) tracks
20123 how many lines have been printed to the output file so far. If it is greater
20124 than @code{count}, it is time to close the current file and start a new one.
20125 @code{s1} and @code{s2} track the current suffixes for the @value{FN}. If
20126 they are both @samp{z}, the file is just too big. Otherwise, @code{s1}
20127 moves to the next letter in the alphabet and @code{s2} starts over again at
20138 printf("split: %s is too large to split\n",
20160 The @code{usage} function simply prints an error message and exits:
20174 The variable @code{e} is used so that the function
20183 This program is a bit sloppy; it relies on @command{awk} to automatically close the last file
20184 instead of doing it in an @code{END} rule.
20196 @cindex @code{tee} utility
20197 The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies
20198 its standard input to its standard output and also duplicates it to the
20205 The @option{-a} option tells @code{tee} to append to the named files, instead of
20208 The @code{BEGIN} rule first makes a copy of all the command-line arguments
20209 into an array named @code{copy}.
20210 @code{ARGV[0]} is not copied, since it is not needed.
20211 @code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to
20212 process each @value{FN} in @code{ARGV} as input data.
20216 @code{append} is set to true, and both @code{ARGV[1]} and
20217 @code{copy[1]} are deleted. If @code{ARGC} is less than two, then no
20218 @value{FN}s were supplied and @code{tee} prints a usage message and exits.
20219 Finally, @command{awk} is forced to read the standard input by setting
20220 @code{ARGV[1]} to @code{"-"} and @code{ARGC} to two:
20223 @cindex @code{tee.awk} program
20261 line into each file on the command line, and then to the standard output:
20279 It is also possible to write the loop this way:
20295 @var{N}@code{*}@var{M} @samp{if} statements.
20297 Finally, the @code{END} rule cleans up by closing all the output files:
20320 prints unique lines---hence the name. @command{uniq} has a number of
20329 @table @code
20342 is similar to @command{awk}'s default: nonwhitespace characters separated
20354 The generated output is sent to the named output file, instead of to the
20362 @code{getopt} library function
20364 and the @code{join} library function
20367 The program begins with a @code{usage} function and then a brief outline of
20369 The @code{BEGIN} rule deals with the command-line arguments and options. It
20370 uses a trick to get @code{getopt} to handle options of the form @samp{-25},
20372 @samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks
20373 like a number), @code{Optarg} is
20374 concatenated with the option digit and then the result is added to zero to make
20376 @code{Optarg} is not needed. In this case, @code{Optind} must be decremented so that
20377 @code{getopt} processes it next time. This code is admittedly a bit
20380 If no options are supplied, then the default is taken, to print both
20382 to @code{outputfile}. Early on, @code{outputfile} is initialized to the
20385 @cindex @code{uniq.awk} program
20429 # getopt requires args to options
20460 The following function, @code{are_equal}, compares the current line,
20461 @code{$0}, to the
20462 previous line, @code{last}. It handles skipping fields and characters.
20463 If no field count and no character count are specified, @code{are_equal}
20465 comparison of @code{last} and @code{$0}. Otherwise, things get more
20467 If fields have to be skipped, each line is broken into an array using
20468 @code{split}
20470 the desired fields are then joined back into a line using @code{join}.
20471 The joined lines are stored in @code{clast} and @code{cline}.
20472 If no fields are skipped, @code{clast} and @code{cline} are set to
20473 @code{last} and @code{$0}, respectively.
20474 Finally, if characters are skipped, @code{substr} is used to strip off the
20475 leading @code{charcount} characters in @code{clast} and @code{cline}. The
20476 two strings are then compared and @code{are_equal} returns the result:
20505 executed only for the very first line of data. It sets @code{last} equal to
20506 @code{$0}, so that subsequent lines of text have something to be compared to.
20508 The second rule does the work. The variable @code{equal} is one or zero,
20509 depending upon the results of @code{are_equal}'s comparison. If @command{uniq}
20510 is counting repeated lines, and the lines are equal, then it increments the @code{count} variable.
20511 Otherwise, it prints the line and resets @code{count},
20514 If @command{uniq} is not counting, and if the lines are equal, @code{count} is incremented.
20515 Nothing is printed, since the point is to remove duplicates.
20518 and only one line is seen, then the line is printed, and @code{count}
20521 Finally, similar logic is used in the @code{END} rule to print the final
20593 @table @code
20612 This uses the @code{getopt} library function
20623 The @code{BEGIN} rule does the argument processing. The variable
20624 @code{print_total} is true if more than one file is named on the
20627 @cindex @code{wc.awk} program
20646 # Default is to count lines, words, characters
20673 The @code{beginfile} function is simple; it just resets the counts of lines,
20674 words, and characters to zero, and saves the current @value{FN} in
20675 @code{fname}:
20688 The @code{endfile} function adds the current file's numbers to the running
20690 @code{FNR} in @code{endfile}. If you examine
20691 the code in
20694 @code{FNR} has already been reset by the time
20695 @code{endfile} is called.} It then prints out those numbers
20696 for the file that was just read. It relies on @code{beginfile} to reset the
20722 the record, plus one, to @code{chars}. Adding one plus the record length
20724 of @code{RS}) is not part of the record itself, and thus not included
20725 in its length. Next, @code{lines} is incremented for each line read,
20726 and @code{words} is incremented by the value of @code{NF}, which is the
20740 Finally, the @code{END} rule simply prints the totals for all the files:
20773 * Translate Program:: A program similar to the @command{tr} utility.
20775 * Word Sorting:: A program to produce a word usage count.
20793 A common error when writing large amounts of prose is to accidentally
20797 another, making them very difficult to spot.
20802 word on a line (in the variable @code{prev}) for comparison with the first
20807 so that, for example, ``The'' and ``the'' compare equal to each other.
20811 don't create nonsense words (e.g., the Texinfo @samp{@@code@{NF@}}
20818 word, comparing it to the previous one:
20820 @cindex @code{dupword.awk} program
20869 the number of times to repeat the message as well as a delay between
20872 This program uses the @code{gettimeofday} function from
20875 All the work is done in the @code{BEGIN} rule. The first part is argument
20876 checking and setting of defaults: the delay, the count, and the message to
20878 character (known as the ``alert'' character, @code{"\a"}), then it is added to
20881 to itself in case the user is not looking at the computer or terminal.)
20884 @cindex @code{alarm.awk} program
20941 The next @value{SECTION} of code turns the alarm time into hours and minutes,
20942 converts it (if necessary) to a 24-hour clock, and then turns that
20945 is how long to wait before setting off the alarm:
20959 # then add 12 to real hour
20970 # how long to sleep for
20980 Finally, the program uses the @code{system} function
20982 to call the @command{sleep} utility. The @command{sleep} utility simply pauses
20986 message in a loop, again using @command{sleep} to delay for however many
20995 # time to notify!
21018 often used to map uppercase letters into lowercase for further processing:
21031 to prevent the shell from attempting a @value{FN} expansion. This is
21036 in the ``from'' list than in the ``to'' list, the last character of the
21037 ``to'' list is used for the remaining characters in the ``from'' list.
21042 be added to @command{gawk}.
21043 @c Wishing to avoid gratuitous new features,
21045 The following program was written to
21052 painful, requiring repeated use of the @code{substr}, @code{index},
21053 and @code{gsub} built-in functions
21055 program was written before @command{gawk} acquired the ability to
21057 @c Exercise: How might you use this new feature to simplify the program?
21058 There are two functions. The first, @code{stranslate}, takes three
21061 @table @code
21063 A list of characters from which to translate.
21065 @item to
21066 A list of characters to which to translate.
21069 The string on which to do the translation.
21072 Associative arrays make the translation part fairly easy. @code{t_ar} holds
21073 the ``to'' characters, indexed by the ``from'' characters. Then a simple
21074 loop goes through @code{from}, one character at a time. For each character
21075 in @code{from}, if the character appears in @code{target}, @code{gsub}
21076 is used to change it to the corresponding @code{to} character.
21078 The @code{translate} function simply calls @code{stranslate} using @code{$0}
21079 as the target. The main program sets two global variables, @code{FROM} and
21080 @code{TO}, from the command line, and then changes @code{ARGV} so that
21083 Finally, the processing rule simply calls @code{translate} for each record:
21085 @cindex @code{translate.awk} program
21100 # to be spelled out. However, if `to' is shorter than `from',
21101 # the last character in `to' is used for the rest of `from'.
21103 function stranslate(from, to, target, lf, lt, t_ar, i, c)
21106 lt = length(to)
21108 t_ar[substr(from, i, 1)] = substr(to, i, 1)
21111 t_ar[substr(from, i, 1)] = substr(to, lt, 1)
21120 function translate(from, to)
21122 return $0 = stranslate(from, to, $0)
21129 print "usage: translate from to" > "/dev/stderr"
21146 While it is possible to do character transliteration in a user-level
21148 authors) started to consider adding a built-in function. However,
21150 @command{awk} had added the @code{toupper} and @code{tolower} functions
21153 cases where character transliteration is necessary, and so we chose to
21154 simply add those functions to @command{gawk} as well and then leave well
21157 An obvious improvement to this program would be to set up the
21158 @code{t_ar} array only once, in a @code{BEGIN} rule. However, this
21159 assumes that the ``from'' and ``to'' lists
21172 ``a program actually used to get something done.''}
21176 on it, 2 across and 10 down. The addresses are guaranteed to be no more
21180 The basic idea is to read 20 labels worth of data. Each line of each label
21181 is stored in the @code{line} array. The single rule takes care of filling
21182 the @code{line} array and printing the page when 20 labels have been read.
21184 The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that
21187 It sets @code{MAXLINES} to 100, since 100 is the maximum number
21190 Most of the work is done in the @code{printpage} function.
21191 The label lines are stored sequentially in the @code{line} array. But they
21192 have to print horizontally; @code{line[1]} next to @code{line[6]},
21193 @code{line[2]} next to @code{line[7]}, and so on. Two loops are used to
21194 accomplish this. The outer loop, controlled by @code{i}, steps through
21196 controlled by @code{j}, goes through the lines within the row.
21197 As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
21198 the row, and @samp{i+j+5} is the entry next to it. The output ends up
21210 As a final note, an extra blank line is printed at lines 21 and 61, to keep
21215 The @code{END} rule arranges to flush the final page of labels; there may
21218 @cindex @code{labels.awk} program
21298 utility programs to do a useful task of some complexity with a minimum of
21319 (@pxref{Fields}) to pick out the individual words from
21320 the line, and the built-in variable @code{NF} (@pxref{Built-in Variables})
21321 to know how many fields are available.
21322 For each input word, it increments an element of the array @code{freq} to
21325 The second rule, because it has the pattern @code{END}, is not executed
21327 @code{freq} table that has been built up inside the first action.
21335 newlines) don't have any special meaning to @command{awk}. This means that
21339 The @command{awk} language considers upper- and lowercase characters to be
21343 be sensitive to capitalization.
21346 The output does not come out in any useful order. You're more likely to be
21352 The way to solve these problems is to use some of @command{awk}'s more advanced
21353 features. First, we use @code{tolower} to remove
21354 case distinctions. Next, we use @code{gsub} to remove punctuation
21355 characters. Finally, we use the system @command{sort} utility to process the
21359 @cindex @code{wordfreq.awk} program
21392 utility and printed on the terminal. The options given to @command{sort}
21399 the @code{END} action to:
21415 to use the @command{sort} program.
21428 Suppose, however, you need to remove duplicate lines from a @value{DF} but
21429 that you want to preserve the order the lines are in. A good example of
21431 the commands you have entered, and it is not unusual to repeat a command
21432 several times in a row. Occasionally you might want to compact the history
21433 by removing duplicate entries. Yet it is desirable to maintain the order
21436 This simple program does the job. It uses two arrays. The @code{data}
21438 For each line, @code{data[$0]} is incremented.
21440 been seen before, then @code{data[$0]} is zero.
21441 In this case, the text of the line is stored in @code{lines[count]}.
21442 Each element of @code{lines} is a unique command, and the indices of
21443 @code{lines} indicate the order in which those lines are encountered.
21444 The @code{END} rule simply prints out the lines, in order:
21447 @cindex @code{histsort.awk} program
21451 # Thanks to Byron Rakitzis for the general idea
21477 information. For example, using the following @code{print} statement in the
21478 @code{END} rule indicates how often a particular command is used:
21484 This works because @code{data[$0]} is incremented each time a line is
21507 If you want to experiment with these programs, it is tedious to have to type
21515 A single Texinfo source file can be used to produce both
21527 For our purposes, it is enough to know three things about Texinfo input
21545 (Unfortunately, @TeX{} isn't always smart enough to do things exactly right,
21546 and we have to give it some help.)
21553 control line and passing it on to the @code{system} function
21555 Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to
21560 @file{extract.awk} uses the @code{join} library function
21566 @file{extract.awk} to extract the sample programs and install many
21572 This program has a @@code@{BEGIN@} rule,
21591 @file{extract.awk} begins by setting @code{IGNORECASE} to one, so that
21594 The first rule handles calling @code{system}, checking that a command is
21595 given (@code{NF} is at least three) and also checking that the command
21598 @cindex @code{extract.awk} program
21637 The variable @code{e} is used so that the function
21653 The @samp{for} loop does the work. It reads lines using @code{getline}
21655 For an unexpected end of file, it calls the @code{@w{unexpected_eof}}
21659 ignores it and goes on to the next line.
21666 the array @code{a}, using the @code{split} function
21669 Each element of @code{a} that is empty indicates two successive @samp{@@}
21671 the original file), we have to add a single @samp{@@} symbol back in.
21673 When the processing of the array is finished, @code{join} is called with the
21674 value of @code{SUBSEP}, to rejoin the pieces back into a single
21675 line. That line is then printed to the output file:
21721 An important thing to note is the use of the @samp{>} redirection.
21723 subsequent output is appended to the file
21725 This makes it easy to mix program text and explanatory prose for the same
21730 Finally, the function @code{@w{unexpected_eof}} prints an appropriate
21732 The @code{END} rule handles the final cleanup, closing the open file:
21760 stream of data, makes changes to it, and passes it on.
21761 It is often used to make global changes to a large file or to a stream
21764 use is to perform global substitutions in the middle of a pipeline:
21770 Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp
21772 @samp{new}, i.e., all the occurrences on a line. This is similar to
21773 @command{awk}'s @code{gsub} function
21777 arguments: the pattern to look for and the text to replace it with. Any
21778 additional arguments are treated as data @value{FN}s to process. If none
21788 # Thanks to Michael Brennan for the idea
21829 The program relies on @command{gawk}'s ability to have @code{RS} be a regexp,
21830 as well as on the setting of @code{RT} to the actual text that terminates the
21833 The idea is to have @code{RS} be the pattern to look for. @command{gawk}
21834 automatically sets @code{$0} to the text between matches of the pattern.
21835 This is text that we want to keep, unmodified. Then, by setting @code{ORS}
21836 to the replacement text, a simple @code{print} statement outputs the
21837 text we want to keep, followed by the replacement text.
21839 There is one wrinkle to this scheme, which is what to do if the last record
21840 doesn't end with text that matches @code{RS}. Using a @code{print}
21842 However, if the file did not end in text that matches @code{RS}, @code{RT}
21843 is set to the null string. In this case, we can print @code{$0} using
21844 @code{printf}
21847 The @code{BEGIN} rule handles the setup, checking for the right number
21848 of arguments and calling @code{usage} if there is a problem. Then it sets
21849 @code{RS} and @code{ORS} from the command-line arguments and sets
21850 @code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are
21854 The @code{usage} function prints an error message and exits.
21856 using @code{print} or @code{printf} as appropriate, depending upon the
21857 value of @code{RT}.
21880 @subsection An Easy Way to Use Library Functions
21887 encourages code reuse and the writing of general functions. Programs are
21892 environment variable and the ability to put @command{awk} functions into a
21894 It would be nice to be able to write programs in the following manner:
21914 @command{igawk} makes an effort to only include files once, so that nested
21919 including the ability to have multiple source files specified via
21920 @option{-f}, and the ability to mix command-line and library source files.
21925 a good shell programming book if you wish to understand things in more
21931 @command{awk} source code for later, when the expanded program is run.
21944 @samp{@@include @var{filename}} to the shell variable's contents. Since the file-inclusion
21950 Run an @command{awk} program (naturally) over the shell variable's contents to expand
21962 potential problems that might arise were we to use temporary files instead,
21971 @table @code
21973 This ends the arguments to @command{igawk}. Anything else should be passed on
21974 to the user's @command{awk} program without being evaluated.
21977 This indicates that the next option is specific to @command{gawk}. To make
21978 argument processing easier, the @option{-W} is appended to the front of the
21984 These are saved and passed on to @command{gawk}.
21987 The @value{FN} is appended to the shell variable @code{program} with an
21989 The @command{expr} utility is used to remove the leading option part of the
21991 (Typical @command{sh} usage would be to use the @command{echo} and @command{sed}
21992 utilities to do this work. Unfortunately, some versions of @command{echo} evaluate
21997 The source text is appended to @code{program}.
22001 to get the @command{gawk} version information, and then exits.
22008 Otherwise, the first argument is appended to @code{program}.
22010 @code{program} contains the complete text of the original @command{awk}
22015 @cindex @code{igawk.sh} program
22040 # Initialize variables to empty
22104 The @command{awk} program to process @samp{@@include} directives
22105 is stored in the shell variable @code{expand_prog}. Doing this keeps
22107 reads through the user's program, one line at a time, using @code{getline}
22117 The @code{pathto} function does the work of finding the full path to
22122 the @value{FN} is concatenated with the name of each directory in
22123 the path, and an attempt is made to open the generated @value{FN}.
22124 The only way to test if a file can be read in @command{awk} is to go
22125 ahead and try to read it with @code{getline}; this is what @code{pathto}
22132 An alternative way to test for the file's existence would be to call
22133 @samp{system("test -r " t)}, which uses the @command{test} utility to
22134 see if the file exists and is readable. The disadvantage to this method
22163 The main program is contained inside one @code{BEGIN} rule. The first thing it
22164 does is set up the @code{pathlist} array that @code{pathto} uses. After
22165 splitting the path on @samp{:}, null elements are replaced with @code{"."},
22180 The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}.
22183 If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}.
22184 @code{pathto} is called to generate the full path. If it cannot, then we
22187 The next thing to check is if the file is included already. The
22188 @code{processed} array is indexed by the full @value{FN} of each included
22193 Finally, when @code{getline} encounters the end of the input file, the file
22194 is closed and the stack is popped. When @code{stackptr} is less than zero,
22236 Everything in the shell script up to the @var{marker} is fed to @var{command} as input.
22245 The expanded program is saved in the variable @code{processed_program}.
22251 value of the @code{expand_prog} shell variable) on standard input.
22254 Standard input is the contents of the user's program, from the shell variable @code{program}.
22255 Its contents are fed to @command{gawk} via a here document.
22258 The results of this processing are saved in the shell variable @code{processed_program} by using co…
22261 The last step is to call @command{gawk} with the expanded program,
22267 The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
22268 to handle an interesting case. Suppose that the user's program only has
22269 a @code{BEGIN} rule and there are no @value{DF}s to read.
22271 However, suppose that an included library file defines an @code{END}
22273 input. In order to avoid this, @file{/dev/null} is explicitly added to the
22299 Not trying to save the line read with @code{getline}
22300 in the @code{pathto} function when testing for the
22307 Using a @code{getline} loop in the @code{BEGIN} rule does it all in one
22308 place. It is not necessary to call out to a separate loop for processing
22315 of the @command{sh} language, making it harder to follow for those who
22319 Also, this program illustrates that it is often worthwhile to combine
22321 accomplish quite a lot, without having to resort to low-level programming
22322 in C or C++, and it is frequently easier to do certain kinds of string
22325 Finally, @command{igawk} shows that it is not always necessary to add new
22326 features to a program; they can often be layered on top. With @command{igawk},
22327 there is no real reason to build @samp{@@include} processing into
22342 as @code{getopt} and @code{assert}.
22345 This file contains library functions that are specific to a site or
22347 Having a separate file allows @file{default.awk} to change with
22348 new @command{gawk} releases, without requiring the system administrator to
22354 suggested that @command{gawk} be modified to automatically read these files
22355 upon startup. Instead, it would be very simple to modify @command{igawk}
22356 to do this. Since @command{igawk} can process nested @samp{@@include}
22372 the @command{gawk} source code and this @value{DOCUMENT}, respectively.
22416 evolution of the @command{awk} language, with cross-references to other parts
22429 * Contributors:: The major contributors to @command{gawk}.
22442 cross-references to further details:
22446 The requirement for @samp{;} to separate rules on a line
22450 User-defined functions and the @code{return} statement
22454 The @code{delete} statement (@pxref{Delete}).
22457 The @code{do}-@code{while} statement
22461 The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and
22462 @code{srand} (@pxref{Numeric Functions}).
22465 The built-in functions @code{gsub}, @code{sub}, and @code{match}
22469 The built-in functions @code{close} and @code{system}
22473 The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
22474 and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
22490 Regexps as the value of @code{FS}
22492 third argument to the @code{split} function
22502 (Some vendors have updated their old versions of @command{awk} to
22507 Redirection of input for the @code{getline} function
22511 Multiple @code{BEGIN} and @code{END} rules
22529 The @code{ENVIRON} variable (@pxref{Built-in Variables}).
22551 A defined return value for the @code{srand} built-in function
22555 The @code{toupper} and @code{tolower} built-in string functions
22561 @code{printf} function
22565 The ability to dynamically pass the field width and precision (@code{"%*.*d"})
22566 in the argument list of the @code{printf} function
22570 The use of regexp constants, such as @code{/foo/}, as expressions, where
22571 they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/}
22593 The use of @code{CONVFMT} for controlling the conversion of numbers
22594 to strings (@pxref{Conversion}).
22597 The concept of a numeric string and tighter comparison rules to go
22612 @code{\x} escape sequences are not recognized
22616 Newlines do not act as whitespace to separate fields when @code{FS} is
22617 equal to a single space
22625 The synonym @code{func} for the keyword @code{function} is not
22635 of @code{FS} to be a single TAB character
22639 The @code{fflush} built-in function is not supported
22660 to set the maximum number of fields and the maximum
22664 it continues to accept them to avoid breaking old programs.
22667 The @code{fflush} built-in function for flushing buffered output
22677 The use of @code{func} as an abbreviation for @code{function}
22682 The @code{SYMTAB} array, that allows access to @command{awk}'s internal symbol
22703 The ability for @code{FS} and for the third
22704 argument to @code{split} to be null strings
22708 The @code{nextfile} statement
22712 The ability to delete all of an array at once with @samp{delete @var{array}}
22720 I've tried to follow this general order, esp. for the 3.0 and 3.1 sections:
22739 This @value{SECTION} lists them in the order they were added to @command{gawk}.
22753 The @code{IGNORECASE} variable and its effects
22766 The @code{FIELDWIDTHS} variable and its effects
22770 The @code{systime} and @code{strftime} built-in functions for obtaining
22775 The @option{-W lint} option to provide error and portability checking
22776 for both the source code and at runtime
22780 The @option{-W compat} option to turn off the GNU extensions
22792 The @code{next file} statement for skipping to the next @value{DF}
22800 The @code{ARGIND} variable, which tracks the movement of @code{FILENAME}
22801 through @code{ARGV} (@pxref{Built-in Variables}).
22804 The @code{ERRNO} variable, which contains the system error message when
22805 @code{getline} returns @minus{}1 or @code{close} fails
22814 The ability to delete all of an array at once with @samp{delete @var{array}}
22818 The ability to use GNU-style long-named options that start with @option{--}
22823 source code
22831 @code{IGNORECASE} changed, now applying to string comparison as well
22836 The @code{RT} variable that contains the input text that
22837 matched @code{RS}
22845 The @code{gensub} function for more powerful text manipulation
22849 The @code{strftime} function acquired a default time format,
22850 allowing it to be called with no arguments
22854 The ability for @code{FS} and for the third
22855 argument to @code{split} to be null strings
22859 The ability for @code{RS} to be a regexp
22863 The @code{next file} statement became @code{nextfile}
22867 The @option{--lint-old} option to
22873 The @option{-m} option and the @code{fflush} function from the
22879 The @option{--re-interval} option to provide interval expressions in regexps
22883 The @option{--traditional} option was added as a better name for
22887 The use of GNU Autoconf to control the configuration process
22900 The @code{BINMODE} special variable for non-POSIX systems,
22905 The @code{LINT} special variable, which dynamically controls lint warnings
22909 The @code{PROCINFO} array for providing process-related information
22913 The @code{TEXTDOMAIN} special variable for setting an application's
22920 The ability to use octal and hexadecimal constants in @command{awk}
22921 program source code
22925 The @samp{|&} operator for two-way I/O to a coprocess
22933 The optional second argument to @code{close} that allows closing one end
22934 of a two-way pipe to a coprocess
22938 The optional third argument to the @code{match} function
22943 Positional specifiers in @code{printf} formats for
22948 The @code{asort} and @code{asorti} functions for sorting arrays
22952 The @code{bindtextdomain}, @code{dcgettext} and @code{dcngettext} functions
22957 The @code{extension} built-in function and the ability to add
22962 The @code{mktime} built-in function for creating timestamps
22967 @code{and},
22968 @code{or},
22969 @code{xor},
22970 @code{compl},
22971 @code{lshift},
22972 @code{rshift},
22974 @code{strtonum} built-in
22979 @cindex @code{next file} statement
22984 The @option{--dump-variables} option to print a list of all global variables
22988 The @option{--gen-po} command-line option and the use of a leading
22989 underscore to mark strings that should be translated
22993 The @option{--non-decimal-data} option to allow non-decimal
23004 The @option{--enable-portals} configuration option to enable special treatment of
23009 The use of GNU Automake to help in standardizing the configuration process
23013 The use of GNU @code{gettext} for @command{gawk}'s own message output
23029 The source code now uses new-style function definitions, with
23030 @command{ansi2knr} to convert the code on systems with old compilers.
23033 The @option{--disable-lint} configuration option to disable lint checking
23046 @appendixsec Major Contributors to @command{gawk}
23047 @cindex @command{gawk}, list of contributors to
23053 This @value{SECTION} names the major contributors to @command{gawk}
23082 to around 90 pages.
23094 contributed parts of the code (mostly fixes) in
23118 did the initial ports to MS-DOS with various versions of MSC.
23123 provided help in porting @command{gawk} to Cray systems.
23128 provided the initial port to OS/2 and its documentation.
23133 provided the port to Atari systems and its documentation.
23134 He continues to provide portability checking with DEC Alpha
23135 systems, and has done a lot of work to make sure @command{gawk}
23141 provided the port to Amiga systems and its documentation.
23151 maintains the port to Windows32 systems.
23156 acts as coordinator for the various ports to different PC platforms
23158 He is also instrumental in keeping the documentation up to date for
23164 provided the @code{extension}
23171 code and documentation, and motivated the inclusion of the @samp{|&} operator.
23176 provided the port to Tandem systems and its documentation.
23181 provided the port to BeOS and its documentation.
23186 did the initial work to convert @command{gawk} to use
23187 GNU Automake and @code{gettext}.
23192 provided the initial version of the @code{asort} function
23193 as well as the code for the new optional third argument to the @code{match} function.
23205 Michael Benzinger contributed the initial code for @code{switch} statements.
23208 Patrick T.J.@: McPhee contributed the code for dynamic loading in Windows32
23248 @cindex source code, @command{gawk}
23250 This @value{SECTION} describes how to get the @command{gawk}
23251 distribution, how to extract it, and then what is in the various files and
23255 * Getting:: How to get the distribution.
23256 * Extracting:: How to extract the distribution.
23263 @cindex @command{gawk}, source code, obtaining
23264 There are three ways to get GNU software:
23289 Ordering from the FSF directly contributes to the support of the foundation
23290 and to the production of more free software.
23293 Retrieve @command{gawk} by using anonymous @command{ftp} to the Internet host
23294 @code{ftp.gnu.org}, in the directory @file{/gnu/gawk}.
23298 The up-to-date list of mirror sites is available from
23300 Try to use one of the mirrors; they
23301 will be less busy, and you can usually find one closer to your site.
23305 @command{gawk} is distributed as a @code{tar} file compressed with the
23306 GNU Zip program, @code{gzip}.
23310 use @code{gzip} to expand the
23311 file and then use @code{tar} to extract it. You can use the following
23312 pipeline to produce the @command{gawk} distribution:
23315 # Under System V, add 'o' to the tar options
23331 or equal to 80 denote ``beta'' or nonproduction software; you might not want
23332 to retrieve such a version unless you don't mind experimenting.)
23333 If you are not on a Unix system, you need to make other arrangements
23344 subdirectories, and files related to the configuration process
23346 as well as several subdirectories related to different non-Unix
23351 The actual @command{gawk} source code.
23364 A detailed list of source code changes as bugs are fixed or improvements made.
23367 A list of changes to @command{gawk} since the last release or patch.
23402 needed to produce the color version. See the file @file{README.card}
23412 It should be processed with @TeX{} to produce a printed document, and
23413 with @command{makeinfo} to produce an Info or HTML file.
23426 It should be processed with @TeX{} to produce a printed document and
23427 with @command{makeinfo} to produce an Info or HTML file.
23439 The input file used during the configuration process to generate the
23463 @itemx po/*
23464 The @file{intl} directory provides the GNU @code{gettext} library, which implements
23465 @command{gawk}'s internationalization features, while the @file{po} library
23474 which can be used to extract the sample programs from the Texinfo
23476 @command{configure} uses to generate a @file{Makefile}.
23477 @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}.
23482 are included as ready-to-use files in the @command{gawk} distribution.
23509 directory to run your version of @command{gawk} against the test suite.
23520 to configure @command{gawk} for your system yourself.
23525 * Configuration Philosophy:: How it's all supposed to work.
23533 to @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}. Like most GNU software,
23555 This produces a @file{Makefile} and @file{config.h} tailored to your system.
23557 You might want to edit the @file{Makefile} to
23558 change the @code{CFLAGS} variable, which controls
23559 the command-line options that are passed to the C compiler (such as
23563 variables on the command line, such as @code{CC} and @code{CFLAGS}, when
23583 That's all there is to it!
23587 check the files in the @file{README_d} directory to see if you've
23601 @table @code
23602 @cindex @code{--enable-portals} configuration option
23603 @cindex configuration option, @code{--enable-portals}
23610 @cindex @code{--enable-switch} configuration option
23611 @cindex configuration option, @code{--enable-switch}
23613 Enable the recognition and execution of C-style @code{switch} statements
23619 @cindex @code{--with-included-gettext} configuration option
23620 @cindex @code{--with-included-gettext} configuration option, configuring @command{gawk} with
23621 @cindex configuration option, @code{--with-included-gettext}
23623 Use the version of the @code{gettext} library that comes with @command{gawk}.
23628 @cindex @code{--disable-lint} configuration option
23629 @cindex configuration option, @code{--disable-lint}
23631 This option disables all lint checking within @code{gawk}. The
23635 Similarly, setting the @code{LINT} variable
23639 When used with GCC's automatic dead-code-elimination, this option
23642 with other compilers are likely to vary.
23646 to fail. This option may be removed at a later date.
23648 @cindex @code{--disable-nls} configuration option
23649 @cindex configuration option, @code{--disable-nls}
23665 The source code for @command{gawk} generally attempts to adhere to formal
23669 function prototypes are used to help improve the compile-time checking.
23674 most likely to be missing.
23678 where you are attempting to compile @command{gawk}. The three things
23683 @code{st_blksize} element in the @code{stat} structure. In this case,
23686 @cindex @code{custom.h} file
23687 It is possible for your C compiler to lie to @command{configure}. It may
23691 @code{#define} any constants that @command{configure} should have defined but
23692 didn't, or @code{#undef} any constants that @command{configure} defined and
23699 @command{autoconf}. You may be able to change this file and generate a
23702 for information on how to report problems in configuring @command{gawk}).
23703 The same mechanism may be used to send in updates to @file{configure.in}
23709 This @value{SECTION} describes how to install @command{gawk} on
23727 @code{ftp.ninemoons.com} in the directory @file{pub/ade/current}.
23744 Anonymous @command{ftp} site: @code{ftp.ninemoons.com}
23764 Since BeOS DR9, all the tools that you should need to build @code{gawk} are
23765 included with BeOS. The process is basically identical to the Unix process
23770 prefix for the installation directory. For BeOS DR9 and beyond, the best directory to
23807 refers to any of Windows-95/98/ME/NT/2000.
23812 and Windows32 can add to the confusion. For an overview of the
23813 considerations, please refer to @file{README_d/README.pc} in the
23834 This is designed for easy installation to a @file{/gnu} directory on your
23837 @file{igawk.cmd} and @file{igawk.bat} (in @file{gnu/bin}) may need to be
23845 directory of your preferred drive. Set @env{UNIXROOT} to your installation
23846 drive (e.g., @samp{e:}) if you want to install @command{gawk} onto another drive
23871 to build a Windows32 version, and Microsoft C/C++ can be
23872 used to build 16-bit versions for MS-DOS and OS/2.
23884 for @file{ChangeLog}) to the directory with the rest of the @command{gawk}
23886 may need to be edited in order to work with your @command{make} utility.
23890 command is given without a target. As an example, to build @command{gawk}
23893 Using @command{make} to run the standard tests and to install @command{gawk}
23895 @command{cp}. In order to run the tests, the @file{test/*.ok} files may need to
23905 In principle, it is possible to compile @command{gawk} the following way:
23931 To get an FHS-compliant file hierarchy it is recommended to use the additional
23935 The internal @code{gettext} library tends to be problematic. It is therefore recommended
23936 to use either an external one (@option{--without-included-gettext}) or to disable
23939 If you use GCC 2.95 or newer it is recommended to use also:
23946 You can also get an @code{a.out} executable if you prefer:
23960 @strong{Note:} Even if the compiled @command{gawk.exe} (@code{a.out}) executable
23962 that runs under DOS, @code{"-DPIPES_SIMULATED"} must be added to @env{CPPFLAGS}.
23967 but the @code{pid} test are expected to work properly. The @code{pid}
23968 test fails because child processes are not started by @code{fork()}.
23972 @strong{Note:} Most OS/2 ports of GNU @command{make} are not able to handle
23985 uncomment the definitions of @code{DYN_FLAGS}, @code{DYN_EXP},
23986 @code{DYN_OBJ}, and @code{DYN_MAKEXP} in the configuration section of
23987 the @file{Makefile}. There are two definitions for @code{DYN_MAKEXP}:
23990 To build some of the example extension libraries, @command{cd} to the
23991 extension directory and copy @file{Makefile.pc} to @file{Makefile}. You
23993 @command{awk} scripts, you'll need to either change the call to
23994 the @code{extension} function to match the name of the library (for
23995 instance, change @code{"./ordchr.so"} to @code{"ordchr.dll"} or simply
23996 @code{"ordchr"}), or rename the library to match the call (for instance,
23997 rename @file{ordchr.dll} to @file{ordchr.so}).
23999 If you build @command{gawk.exe} with one compiler but want to build
24000 an extension library with the other, you need to copy the import
24003 interoperate if you give them the correct name. The resulting shared
24007 but you're essentially on your own. Post to @code{comp.lang.awk} or
24008 send electronic mail to @email{ptjm@@interlog.com} if you have problems getting
24009 started. If you need to access functions or variables which are not
24010 exported by @command{gawk.exe}, add them to @file{gawkw32.def} and
24011 rebuild. You should also add @code{ATTRIBUTE_EXPORTED} to the declaration
24012 in @file{awk.h} of any variables you add to @file{gawkw32.def}.
24014 Note that extension libraries have the name of the @command{awk}
24017 rename @command{gawk.exe} to @command{awk.exe} or if you try to use
24019 @command{pgawk.exe} to @command{gawk.exe}. You can resolve this problem
24020 by changing the program name in the definition of @code{DYN_MAKEXP}
24043 @cindex @code{;} (semicolon), @code{AWKPATH} variable and
24044 @cindex semicolon (@code{;}), @code{AWKPATH} variable and
24045 @cindex @code{AWKPATH} environment variable
24051 @code{@w{".;c:/lib/awk;c:/gnu/lib/awk"}}.
24058 Additionally, to support binary distributions of @command{gawk} for OS/2
24062 E.g., if @env{UNIXROOT} is set to @file{e:} the complete default search path is
24063 @code{@w{".;c:/usr/share/awk;e:/usr/share/awk"}}.
24065 An @command{sh}-like shell (as opposed to @command{command.com} under MS-DOS
24068 Daisuke Aoyama has ported GNU @command{bash} to MS-DOS using the DJGPP tools,
24073 the setting for @command{gawk} in the shell configuration may need to be
24074 changed and the @code{ignoretype} option may also be of interest.
24076 @cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable
24077 @cindex @code{BINMODE} variable
24079 translate end-of-line @code{"\r\n"} to @code{"\n"} on input and @code{"\n"}
24080 to @code{"\r\n"} on output. A special @code{BINMODE} variable allows
24085 If @code{BINMODE} is @samp{"r"}, or
24086 @code{(BINMODE & 1)} is nonzero, then
24090 If @code{BINMODE} is @code{"w"}, or
24091 @code{(BINMODE & 2)} is nonzero, then
24095 If @code{BINMODE} is @code{"rw"} or @code{"wr"},
24097 (same as @code{(BINMODE & 3)}).
24100 @code{BINMODE=@var{non-null-string}} is
24103 message if the string is not one of @code{"rw"} or @code{"wr"}.
24110 Setting @code{BINMODE} for standard input or
24113 @code{BINMODE} is set at the time a file or pipe is opened and cannot be
24116 The name @code{BINMODE} was chosen to match @command{mawk}
24118 Both @command{mawk} and @command{gawk} handle @code{BINMODE} similarly; however,
24120 variable that can set @code{BINMODE}, @code{RS}, and @code{ORS}. The
24122 prepared distributions) have been chosen to match @command{mawk}'s @samp{-W
24124 the setting of @code{RS} giving the fewest ``surprises'' is open to debate.
24129 output and other files, and set @code{ORS} as the ``usual'' DOS-style
24146 The following changes the record separator to @code{"\r\n"} and sets binary
24161 With proper quoting, in the first example the setting of @code{RS} can be
24162 moved into the @code{BEGIN} rule.
24181 When compared to GNU/Linux on the same system, the @samp{configure}
24191 @appendixsubsec How to Compile and Install @command{gawk} on VMS
24197 This @value{SUBSECTION} describes how to compile and install @command{gawk} under VMS.
24200 * VMS Compilation:: How to compile @command{gawk} under VMS.
24201 * VMS Installation Details:: How to install @command{gawk} under VMS.
24202 * VMS Running:: How to run @command{gawk} under VMS.
24209 To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that
24210 issues all the necessary @code{CC} and @code{LINK} commands. There is
24211 also a @file{Makefile} for use with the @code{MMS} utility. From the source
24231 @code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0.
24235 @file{vmsbuild.com} or @file{descrip.mms} according to the comments in them.
24243 from those for VAX C V2.x but equally straightforward. No changes to
24247 Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments.
24248 No changes to @file{config.h} are needed.
24258 a @code{DCL} symbol whose value begins with a dollar sign. For example:
24267 @file{login.com} of any user who wants to run @command{gawk},
24271 to run @command{gawk}.
24280 (You may want to substitute a site-specific help library rather than
24292 The logical name @samp{AWK_LIBRARY} can designate a default location
24298 @command{gawk} appends the suffix @samp{.awk} to the filename and retries
24321 The VMS port of @command{gawk} includes a @code{DCL}-style interface in addition
24322 to the original shell-style interface (see the help entry for details).
24326 flag is required to force Unix style rather than @code{DCL} parsing. If any
24327 other dash-type options (or multiple parameters such as @value{DF}s to
24335 by the @option{-f} option, is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical
24336 name @samp{AWKPATH} can be used to override this default. The format
24339 translation and not a multitranslation @code{RMS} searchlist.
24358 of templates, using a script to make the C compiler fit @command{configure}'s
24361 unable to redefine @code{CC}. @command{configure} takes a very long
24362 time to execute, but at least it provides incremental feedback as it runs.
24385 included for those who might want to use it but it is no longer being
24397 In order to use @command{gawk}, you need to have a shell, either text or
24398 graphics, that does not map all the characters of a command line to
24403 flags, you need to upgrade your tools. Support for I/O
24404 redirection is necessary to make it easy to import @command{awk} programs
24405 from other environments. Pipes are nice to have but not vital.
24415 A proper compilation of @command{gawk} sources when @code{sizeof(int)}
24416 differs from @code{sizeof(void *)} requires an ISO C compiler. An initial
24418 where @code{int}s are four bytes wide but the other variant works as well.
24420 You may need quite a bit of memory when trying to recompile the @command{gawk}
24431 @file{atari} subdirectory and can be edited and copied to the
24433 @command{configure} produces something, it might be advisable to compare
24436 Some @command{gawk} source code fragments depend on a preprocessor define
24439 environment. Also see the remarks about @env{AWKPATH} and @code{envsep} in
24442 As shipped, the sample @file{config.h} claims that the @code{system}
24447 shell and operating system, you might want to change the file to indicate
24448 that @code{system} is available.
24459 If either one is found, its value is assumed to be a directory for
24461 memory, it is a good idea to put it on a RAM drive. If neither
24468 @code{DEFPATH} defined in @file{Makefile}. The sample @command{gcc}/TOS
24469 @file{Makefile} for the ST in the distribution sets @code{DEFPATH} to
24470 @code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}. The search path can be
24471 modified by explicitly setting @env{AWKPATH} to whatever you want.
24472 Note that colons cannot be used on the ST to separate elements in the
24474 Instead, you must use a comma to separate elements in the path. When
24476 the @code{envsep} variable in @file{unsupported/atari/gawkmisc.atr} to another
24483 computer to crash and requiring a reboot. Often a warm reboot is
24486 @command{awk} program using @code{print} statements explicitly redirected
24487 to @file{/dev/stdout}, while other @code{print} statements use the
24489 output to a file.
24496 It may also create problems for external programs called via the @code{system}
24500 strings have to be doubled in order to get literal backslashes
24509 The port's contributor no longer has access to a Tandem system.
24513 The port is pretty clean and all facilities seem to work except for
24522 that the @value{FN}s on the Tandem box conform to the restrictions of D20.
24526 distribution) and should be copied to the main source directory before
24529 The file @file{compit} can then be used to compile and bind an executable.
24533 @samp{@}} characters to be escaped with @samp{~} on the command line
24537 on @code{getline}, @code{print} etc., are supported.)
24541 has been ``stolen'' to enable Tandem users to process fixed-length
24543 @command{gawk} to read the input file as fixed 74-byte records.
24552 The Hitchhiker's Guide to the Galaxy
24561 please report it to the developers; we cannot promise to do anything
24562 but we might well want to fix it.
24566 what you're trying to do. If it's not clear whether you should be able
24567 to do something or not, report that too; it's a bug in the documentation!
24569 Before reporting a bug or trying to fix it yourself, try to isolate it
24570 to the smallest possible @command{awk} program and input @value{DF} that
24573 the compiler you used to compile @command{gawk}, and the exact results
24574 @command{gawk} gave you. Also say what you expected to occur; this helps
24577 @cindex @code{bug-gawk@@gnu.org} bug reporting address
24578 @cindex email address for bug reports, @code{bug-gawk@@gnu.org}
24579 @cindex bug reports, email address, @code{bug-gawk@@gnu.org}
24580 Once you have a precise problem, send email to @email{bug-gawk@@gnu.org}.
24586 mail to me. If necessary, I can be reached directly at
24591 @cindex @code{comp.lang.awk} newsgroup
24592 @strong{Caution:} Do @emph{not} try to report bugs in @command{gawk} by
24593 posting to the Usenet/Internet newsgroup @code{comp.lang.awk}.
24600 features, ask me; I will try to help you out, although I
24601 may not have the time to fix the problem. You can send me electronic
24605 an electronic mail message to the person who maintains that port. They
24634 The Unix for OS/2 team, @email{gawk-maintainer@@unixos2.org}.
24661 @item OS/2 @tab The Unix for OS/2 team, @email{gawk-maintainer@@unixos2.org}.
24671 report to the @email{bug-gawk@@gnu.org} email list as well.
24688 @i{It's kind of fun to put comments like this in your awk code.}@*
24689 @ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}@*
24694 This @value{SECTION} briefly describes where to get them:
24698 @cindex source code, Bell Laboratories @command{awk}
24727 @cindex source code, @command{mawk}
24734 You can get it via anonymous @command{ftp} to the host
24735 @code{@w{ftp.whidbey.net}}. Change directory to @file{/pub/brennan}.
24739 @command{gunzip} may be used to decompress this file. Installation
24740 is similar to @command{gawk}'s
24748 The @code{fflush} built-in function for flushing buffered output
24758 The use of @code{func} as an abbreviation for @code{function}
24769 Use @code{"-"} instead of @code{"/dev/stdin"} with @command{mawk}.
24772 The ability for @code{FS} and for the third
24773 argument to @code{split} to be null strings
24777 The ability to delete all of an array at once with @samp{delete @var{array}}
24781 The ability for @code{RS} to be a regexp
24785 The @code{BINMODE} special variable for non-Unix operating systems
24789 The next version of @command{mawk} will support @code{nextfile}.
24793 @cindex source code, @command{awka}
24804 To get @command{awka}, go to @uref{http://awka.sourceforge.net}.
24811 the Bell Labs @command{awk} to provide timing and profiling information.
24832 This appendix contains information mainly of interest to implementors and
24833 maintainers of @command{gawk}. Everything in it applies specifically to
24834 @command{gawk} and not to other implementations.
24837 * Compatibility Mode:: How to disable certain @command{gawk}
24840 * Dynamic Extensions:: Adding new built-in functions to
24854 for a summary of the GNU extensions to the @command{awk} language and program.
24861 @table @code
24872 @appendixsec Making Additions to @command{gawk}
24874 If you find that you want to enhance @command{gawk} in a significant
24875 fashion, you are perfectly free to do so. That is the point of having
24876 free software; the source code is available and you are free to change
24879 This @value{SECTION} discusses the ways you might want to change @command{gawk}
24883 * Adding Code:: Adding code to the main body of
24885 * New Ports:: Porting @command{gawk} to a new operating
24893 @cindex adding, features to @command{gawk}
24895 @cindex features, adding to @command{gawk}
24898 You are free to add any new features you like to @command{gawk}.
24899 However, if you want your changes to be incorporated into the @command{gawk}
24900 distribution, there are several steps that you need to take in order to
24901 make it possible for me to include your changes:
24912 It is much easier for me to integrate changes if they are relative to
24914 @command{gawk} is very old, I may not be able to integrate them at all.
24926 read it, please do so, preferably @emph{before} starting to modify @command{gawk}.
24940 The C code for @command{gawk} follows the instructions in the
24941 @cite{GNU Coding Standards}, with minor exceptions. The code is formatted
24942 using the traditional ``K&R'' style, particularly as regards to the placement
24951 Put the name of the function at the beginning of its own line.
24954 Put the return type of the function, even if it is @code{int}, on the
24955 line above the line with the name and arguments of the function.
24959 (@code{if}, @code{while}, @code{for}, @code{do}, @code{switch},
24960 and @code{return}).
24969 Do not use the comma operator to produce multiple side effects, except
24970 in @code{for} loop initialization and increment parts, and in macro bodies.
24979 Use comparisons against @code{NULL} and @code{'\0'} in the conditions of
24980 @code{if}, @code{while}, and @code{for} statements, as well as in the @code{case}s
24981 of @code{switch} statements, instead of just the
24985 Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants
24986 and the character constant @code{'\0'} where appropriate, instead of @code{1}
24987 and @code{0}.
24990 Use the @code{ISALPHA}, @code{ISDIGIT}, etc.@: macros, instead of the
25001 Do not use the @code{alloca} function for allocating memory off the stack.
25003 to free the storage. Instead, use @code{malloc} and @code{free}.
25007 If I have to reformat your code to follow the coding style used in
25008 @command{gawk}, I may not bother to integrate your changes at all.
25011 Be prepared to sign the appropriate paperwork.
25012 In order for the FSF to distribute your changes, you must either place
25013 those changes in the public domain and submit a signed statement to that
25014 effect, or assign the copyright in your changes to the FSF.
25015 Both of these actions are easy to do and @emph{many} people have done so
25023 Along with your new code, please supply new sections and/or chapters
25027 Conventions to be followed in @cite{@value{TITLE}} are provided
25031 You will also have to sign paperwork for your documentation changes.
25035 Use @samp{diff -c -r -N} or @samp{diff -u -r -N} to compare
25037 (I find context diffs to be more readable but unified diffs are
25040 Send the output produced by either run of @command{diff} to me when you
25045 Using this format makes it easy for me to apply your changes to the
25046 master version of the @command{gawk} source code (using @code{patch}).
25047 If I have to apply the changes manually, using a text editor, I may
25052 This helps further minimize the amount of work I have to do,
25053 making it easier for me to accept patches.
25057 may write the new code, I have to maintain it and support it. If it
25058 isn't possible for me to do that with a minimum of extra work, then I
25065 @appendixsubsec Porting @command{gawk} to a New Operating System
25067 @cindex operating systems, porting @command{gawk} to
25070 If you want to port @command{gawk} to a new operating system, there are
25085 When doing a port, bear in mind that your code must coexist peacefully
25087 changes to the system-independent parts of the code. If at all possible,
25089 code.
25092 code, I probably will not accept them. In such a case, you can, of course,
25101 question, but changes to these files are scrutinized extra carefully.
25107 Be willing to continue to maintain the port.
25109 the code needed to compile and run @command{gawk} on their systems. If noone
25110 volunteers to maintain a port, it becomes unsupported and it may
25111 be necessary to remove it from the distribution.
25117 @samp{#ifdef}s scattered throughout the code. The @file{gawkmisc.c} in
25120 Be sure to update it as well.
25125 @file{gawkmisc.c}, makes it possible to move files from a port's subdirectory
25132 necessary for your operating system. All your code should be in a
25133 separate subdirectory, with a name that is the same as, or reminiscent
25135 try to structure things so that it is not necessary to move files out
25137 possible, then be sure to avoid using names for your files that
25143 installation and compilation steps needed to compile and/or install
25147 Be prepared to sign the appropriate paperwork.
25148 In order for the FSF to distribute your code, you must either place
25149 your code in the public domain and submit a signed statement to that
25150 effect, or assign the copyright in your code to the FSF.
25152 Both of these actions are easy to do and @emph{many} people have done so
25158 Following these steps makes it much easier to integrate your changes
25160 operating systems' code that is already there.
25162 In the code that you supply and maintain, feel free to use a
25166 @appendixsec Adding New Built-in Functions to @command{gawk}
25179 @cindex adding, functions to @command{gawk}
25181 @cindex functions, built-in, adding to @command{gawk}
25182 Beginning with @command{gawk} 3.1, it is possible to add new built-in
25183 functions to @command{gawk} using dynamically loaded libraries. This
25185 the @code{dlopen} and @code{dlsym} functions.
25186 This @value{SECTION} describes how to write and use dynamically
25192 are very much subject to change in the next @command{gawk} release.
25193 Be aware that you may have to re-do everything, perhaps from scratch,
25202 @appendixsubsec A Minimal Introduction to @command{gawk} Internals
25209 brief and simplistic; would-be @command{gawk} hackers are encouraged to
25210 spend some time reading the source code before trying to write
25213 Reading @file{awk.y} in order to see how the parse tree is built
25216 @cindex @code{awk.h} file (internal)
25222 @table @code
25223 @cindex floating-point, numbers, @code{AWKNUM} internal type
25224 @cindex numbers, floating-point, @code{AWKNUM} internal type
25225 @cindex @code{AWKNUM} internal type
25227 An @code{AWKNUM} is the internal type of @command{awk}
25228 floating-point numbers. Typically, it is a C @code{double}.
25230 @cindex @code{NODE} internal type
25231 @cindex strings, @code{NODE} internal type
25232 @cindex numbers, @code{NODE} internal type
25234 Just about everything is done using objects of type @code{NODE}.
25237 @cindex @code{force_number} internal function
25240 This macro forces a value to be numeric. It returns the actual
25244 @cindex @code{force_string} internal function
25246 This macro guarantees that a @code{NODE}'s string value is current.
25252 @cindex @code{param_cnt} internal variable
25256 @cindex @code{stptr} internal variable
25257 @cindex @code{stlen} internal variable
25260 The data and length of a @code{NODE}'s string value, respectively.
25261 The string is @emph{not} guaranteed to be zero-terminated.
25262 If you need to pass the string value to a C library function, save
25263 the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it,
25266 @cindex @code{type} internal variable
25268 The type of the @code{NODE}. This is a C @code{enum}. Values should
25269 be either @code{Node_var} or @code{Node_var_array} for function
25272 @cindex @code{vname} internal variable
25274 The ``variable name'' of a node. This is not of much use inside
25278 @cindex @code{assoc_clear} internal function
25280 Clears the associative array pointed to by @code{n}.
25284 @cindex @code{assoc_lookup} internal function
25287 @code{symbol} is the array, @code{subs} is the subscript.
25288 This is usually a value created with @code{tmp_string} (see below).
25289 @code{reference} should be @code{TRUE} if it is an error to use the
25290 value before it is created. Typically, @code{FALSE} is the
25291 correct value to use from extension functions.
25294 @cindex @code{make_string} internal function
25296 Take a C string and turn it into a pointer to a @code{NODE} that
25301 @cindex @code{make_number} internal function
25303 Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
25307 @cindex @code{tmp_string} internal function
25309 Take a C string and turn it into a pointer to a @code{NODE} that
25313 @cindex @code{tmp_number} internal function
25315 Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
25321 @cindex @code{dupnode} internal function
25324 reference count instead of actually duplicating the entire @code{NODE};
25328 @cindex @code{free_temp} internal macro
25330 This macro releases the memory associated with a @code{NODE}
25331 allocated with @code{tmp_string} or @code{tmp_number}.
25334 @cindex @code{make_builtin} internal function
25335 @item void make_builtin(char *name, NODE *(*func)(NODE *), int count)
25336 Register a C function pointed to by @code{func} as new built-in
25337 function @code{name}. @code{name} is a regular C string. @code{count}
25352 @cindex @code{get_argument} internal function
25354 This function is called from within a C extension function to get
25355 the @code{i}-th argument from the function call.
25360 @cindex @code{set_value} internal function
25362 This function is called from within a C extension function to set
25367 @cindex @code{ERRNO} variable
25368 @cindex @code{update_ERRNO} internal function
25370 This function is called from within a C extension function to set
25371 the value of @command{gawk}'s @code{ERRNO} variable, based on the current
25372 value of the C @code{errno} variable.
25376 An argument that is supposed to be an array needs to be handled with
25377 some extra code, in case the array being passed in is actually
25380 In versions of @command{gawk} up to and including 3.1.2, the
25381 following boilerplate code shows how to do this:
25400 /* force it to be an array, if necessary, clear it */
25407 following boilerplate code now suffices:
25414 /* force it to be an array: */
25422 don't just blindly copy this code.
25429 @cindex @code{chdir} function, implementing in @command{gawk}
25432 @cindex @code{stat} function, implementing in @command{gawk}
25439 Two useful functions that are not in @command{awk} are @code{chdir}
25441 @code{stat} (so that an @command{awk} program can gather information about
25448 * Internal File Ops:: The code for internal file operations.
25449 * Using Internal File Ops:: How to use an external extension.
25453 @appendixsubsubsec Using @code{chdir} and @code{stat}
25455 This @value{SECTION} shows how to use the new functions at the @command{awk}
25458 Using @code{chdir} is very straightforward. It takes one argument,
25459 the new directory to change to:
25466 printf("could not change to %s: %s\n",
25473 The return value is negative if the @code{chdir} failed,
25474 and @code{ERRNO}
25476 is set to a string indicating the error.
25478 Using @code{stat} is a bit more complicated.
25479 The C @code{stat} function fills in a structure that has a fair
25481 The right way to model this in @command{awk} is to fill in an associative
25487 fdata[1] = "x" # force `fdata' to be an array
25497 The @code{stat} function always clears the data array, even if
25498 the @code{stat} fails. It fills in the following elements:
25500 @table @code
25501 @item "name"
25502 The name of the file that was @code{stat}'ed.
25531 with @code{strftime}
25537 @samp{ls -l}---for example, @code{"drwxr-xr-x"}.
25543 @table @code
25564 The file is an @code{AF_UNIX} (``Unix domain'') socket in the
25574 program by using the @code{in} operator
25575 (@pxref{Reference to Elements}):
25577 @table @code
25579 The preferred block size for I/O to the file. This field is not
25580 present on all POSIX-like systems in the C @code{stat} structure.
25583 If the file is a symbolic link, this element is the name of the
25584 file the link points to (i.e., the value of the link).
25595 @appendixsubsubsec C Code for @code{chdir} and @code{stat}
25597 Here is the C code for these extensions. They were written for
25598 GNU/Linux. The code needs some more work for complete portability
25599 to other POSIX-compliant systems:@footnote{This version is edited
25622 The file includes the @code{"awk.h"} header file for definitions
25623 for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
25624 for access to the @code{major} and @code{minor} macros.
25627 By convention, for an @command{awk} function @code{foo}, the function that
25629 a @samp{NODE *} argument, usually called @code{tree}, that
25630 represents the argument list to the function. The @code{newdir}
25631 variable represents the new directory to change to, retrieved
25632 with @code{get_argument}. Note that the first argument is
25635 This code actually accomplishes the @code{chdir}. It first forces
25636 the argument to be a string and passes the string value to the
25637 @code{chdir} system call. If the @code{chdir} fails, @code{ERRNO}
25639 The result of @code{force_string} has to be freed with @code{free_temp}:
25652 Finally, the function returns the return value to the @command{awk} level,
25653 using @code{set_value}. Then it must return a value from the call to
25660 /* Just to make the interpreter happy */
25665 The @code{stat} built-in is more involved. First comes a function
25682 Next comes the actual @code{do_stat} function itself. First come the
25686 Changed message for page breaking. Used to be:
25713 we use @code{lstat}, in case the file is a symbolic link.
25714 If there's an error, we set @code{ERRNO} and return:
25720 * array to hold results is second
25746 aptr = assoc_lookup(array, tmp_string("name", 4), FALSE);
25766 /* Just to make the interpreter happy */
25772 Finally, it's necessary to provide the ``glue'' that loads the
25774 a routine named @code{dlload} that does the job:
25790 And that's it! As an exercise, consider adding functions to
25791 implement system calls such as @code{chown}, @code{chmod}, and @code{umask}.
25797 @cindex @command{gawk}, interpreter, adding code to
25798 Now that the code is written, it must be possible to add it at
25799 runtime to the running @command{gawk} interpreter. First, the
25800 code must be compiled. Assuming that the functions are in
25811 @cindex @code{extension} function (@command{gawk})
25812 Once the library exists, it is loaded by calling the @code{extension}
25814 This function takes two arguments: the name of the
25815 library to load and the name of a function to call when the library
25816 is first loaded. This function adds the new functions to @command{gawk}.
25827 data[1] = 1 # force `data' to be an array
25854 @print{} data["name"] = testff.awk
25890 : @i{AWK is a language similar to PERL, only considerably more elegant.} @*
25897 : Before I actually release this for publication, I wanted to get your
25898 : permission to quote you. (Hopefully, in the spirit of much of GNU, the
25909 @i{AWK is a language similar to PERL, only considerably more elegant.}@*
25927 It is not clear that the @command{awk}-level interface to the
25928 modules facility is as good as it should be. The interface needs to be
25933 @item @code{RECLEN} variable for fixed-length records
25934 Along with @code{FIELDWIDTHS}, this would speed up the processing of
25936 @code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"},
25939 @item Additional @code{printf} specifiers
25940 The 1999 ISO C standard added a number of additional @code{printf}
25950 It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
25956 @item More @code{lint} warnings
25961 source code easier to work with:
25968 to create and integrate a loadable module.
25972 @command{gawk} should be changed to use @command{libtool}.
25975 The API to its internals that @command{gawk} ``exports'' should be revised.
25977 and implemented to make module writing easier.
25981 so that using the same value to index multiple arrays only
25986 but it's a lot of work to do.
25996 parser to convert the script given it into a syntax tree; the syntax
25999 calls to do even the simplest things.
26001 It should be possible for @command{gawk} to convert the script's parse tree
26003 C compiler and a special @command{gawk} library to provide all the needed
26007 @cindex @command{gawk}, interpreter, adding code to
26008 An easier possibility might be for an intermediate phase of @command{gawk} to
26009 convert the parse tree into a linear byte code form like the one used
26011 a straight line byte code interpreter that would be intermediate in speed
26030 This @value{APPENDIX} attempts to define some of the basic concepts
26036 other introductory texts that you should refer to instead.)
26040 * Basic Data Typing:: A very quick intro to data types.
26041 * Floating Point Issues:: Stuff to know about floating-point numbers.
26048 At the most basic level, the job of a program is to process
26089 \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss Results\hss}}%
26091 \rlap{\kern 1.800in\lower\graphtemp\hbox to 0pt{\hss Program\hss}}%
26100 \rlap{\kern 0.350in\lower\graphtemp\hbox to 0pt{\hss Data\hss}}%
26127 instructions in your program to process the data.
26140 \rlap{\kern 2.800in\lower\graphtemp\hbox to 0pt{\hss Yes\hss}}%
26142 \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss No\hss}}%
26192 \rlap{\kern 2.600in\lower\graphtemp\hbox to 0pt{\hss Process\hss}}%
26201 \rlap{\kern 2.688in\lower\graphtemp\hbox to 0pt{\hss More Data?\hss}}%
26205 \rlap{\kern 0.613in\lower\graphtemp\hbox to 0pt{\hss Initialization\hss}}%
26211 \rlap{\kern 4.600in\lower\graphtemp\hbox to 0pt{\hss Clean Up\hss}}%
26237 These are the things you do before actually starting to process
26239 to work with, and so on.
26240 This step corresponds to @command{awk}'s @code{BEGIN} rule
26251 In most programming languages, you have to manually manage the reading
26252 of data, checking to see if there is more each time you read a chunk.
26257 In baking a cake, the processing corresponds to the actual labor:
26262 Once you've processed all the data, you may have things you need to
26264 This step corresponds to @command{awk}'s @code{END} rule
26267 After the cake comes out of the oven, you still have to wrap it in
26268 plastic wrap to keep anyone from tasting it, as well as wash
26273 An @dfn{algorithm} is a detailed set of instructions necessary to accomplish
26275 a cake. Programs implement algorithms. Often, it is up to you to design
26281 similar to the records a company keeps on employees, a school keeps for
26285 to as the @dfn{fields} of the record.
26289 They are often referred to together as ``input/output,''
26297 breaking it up into records and fields. Your program's job is to
26298 tell @command{awk} what to with the data. You do this by describing
26299 @dfn{patterns} in the data to look for, and @dfn{actions} to execute
26301 @command{awk} programs usually makes them both easier to write
26302 and easier to read.
26310 A variable is just a name for a given value, such as @code{first_name},
26311 @code{last_name}, @code{address}, and so on.
26313 special names to refer to the current input record
26316 associated values under one name, as an array.
26323 String values are essentially anything that's not a number, such as a name.
26324 Strings are sometimes referred to as @dfn{character data}, since they
26327 referred to as @dfn{scalar} values.
26335 In school, integer values were referred to as ``whole'' numbers---that is,
26337 The advantage to integer numbers is that they represent values exactly.
26339 this range is @minus{}2,147,483,648 to 2,147,483,647.
26347 the range is from 0 to 4,294,967,295.
26353 The advantage to floating-point numbers is that they
26365 Advanced applications sometimes have to manipulate bits directly,
26369 While you are probably used to the idea of a number without a value (i.e., zero),
26370 it takes a bit more getting used to the idea of zero-length character data.
26375 like this: @code{""}.
26377 Humans are used to working in decimal; i.e., base 10. In base 10,
26378 numbers go from 0 to 9, and then ``roll over'' into the next
26383 In binary, each column represents two times the value in the column to
26398 There have been several versions of C. The first is often referred to
26403 In the mid-1980s, an effort began to produce an international standard
26417 uses double-precision floating-point numbers to represent all
26436 It is important to note that the string value for a number may not
26457 This program shows the full value of the sum of @code{$2} and @code{$3}
26458 using @code{printf}, and then prints the string values obtained
26459 from both automatic conversion (via @code{CONVFMT}) and
26460 from printing (via @code{OFMT}).
26474 @code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
26475 at least six significant digits. For some applications, you might want to
26476 change it to specify more precision.
26478 17 digits is enough to capture a floating-point number's
26479 value exactly.@footnote{Pathological cases can require up to
26480 752 digits (!), but we doubt that you need to worry about this.}
26516 In particular, it is possible to represent ``minus zero'' as well as
26520 when stored internally, but that they are in fact equal to each other,
26521 as well as to ``regular'' zero:
26532 It helps to keep this in mind should you process numeric data
26542 A series of @command{awk} statements attached to a rule. If the rule's
26569 to the beginning or end of the string, respectively.
26580 A grouping of multiple values under the same name.
26586 Useful for reasoning about how a program is supposed to behave.
26590 variable or data object. An object that you can assign to is called an
26604 given to the program, the program's rules are all processed in turn.
26608 Another name for an @command{awk} program.
26625 All values in computer memory ultimately reduce to binary digits: values
26635 Computers are often defined by how many bits they use to represent integer
26652 @code{sqrt} (for the square root of a number) and @code{substr} (for a
26659 @code{ARGC},
26660 @code{ARGV},
26661 @code{CONVFMT},
26662 @code{ENVIRON},
26663 @code{FILENAME},
26664 @code{FNR},
26665 @code{FS},
26666 @code{NF},
26667 @code{NR},
26668 @code{OFMT},
26669 @code{OFS},
26670 @code{ORS},
26671 @code{RLENGTH},
26672 @code{RSTART},
26673 @code{RS},
26675 @code{SUBSEP}
26676 are the variables that have special meaning to @command{awk}.
26678 @code{ARGIND},
26679 @code{BINMODE},
26680 @code{ERRNO},
26681 @code{FIELDWIDTHS},
26682 @code{IGNORECASE},
26683 @code{LINT},
26684 @code{PROCINFO},
26685 @code{RT},
26687 @code{TEXTDOMAIN}
26688 are the variables that have special meaning to @command{gawk}.
26696 A computer system allowing users to log in and read and/or leave messages
26705 In general, @command{gawk} attempts to be as similar to the 1990 version
26716 The set of numeric codes used by a computer system to represent the
26735 A program that translates human-readable source code into
26736 machine-executable object code. The object code is then executed
26761 Comparison expressions are used in @code{if}, @code{while}, @code{do},
26762 and @code{for}
26763 statements, and in patterns to select which input records to process.
26774 are) not clear, leading to unexpected or undesirable behavior.
26786 are interested in processing, and what to do when that data is seen.
26795 for the other to perform an action.
26801 @command{awk} stores numeric values. It is the C type @code{double}.
26806 @code{"foo"}, but it may also be an expression whose value can vary.
26810 A collection of strings, of the form @var{name@code{=}val}, that each
26811 program has available to it. Users generally place values into the
26812 environment in order to provide information to various programs. Typical
26839 change by setting the built-in variable @code{FS}). Such pieces are
26841 variable @code{FIELDWIDTHS} to describe their lengths.
26851 Often referred to in mathematical terms as a ``rational'' or real number,
26856 Format strings are used to control the appearance of output in the
26857 @code{strftime} and @code{sprintf} functions, and are used in the
26858 @code{printf} statement as well. Also, data conversions from numbers to strings
26860 @code{CONVFMT}. (@xref{Control Letters}.)
26867 A specialized group of statements used to encapsulate general
26869 functions, and also allows you to define your own.
26880 to the production and distribution of freely distributable software.
26892 code may be distributed. (@xref{Copying}.)
26905 to create a complete, freely distributable, POSIX-compliant computing
26912 been ported to a variety of architectures.
26915 The Linux kernel source code is available under the terms of the GNU General
26922 Base 16 notation, where the digits are @code{0}--@code{9} and
26923 @code{A}--@code{F}, with @samp{A}
26924 representing 10, @samp{B} representing 11, and so on, up to @samp{F} for 15.
26926 to indicate their base. Thus, @code{0x12} is 18 (1 times 16 plus 2).
26943 further source code changes.
26947 A program that reads human-readable source code directly, and uses
26948 the instructions in it to process data and produce results.
26964 This @value{DOCUMENT} refers to Standard C as ``ISO C'' throughout.
26971 @code{BEGIN},
26972 @code{END},
26973 @code{if},
26974 @code{else},
26975 @code{while},
26976 @code{do@dots{}while},
26977 @code{for},
26978 @code{for@dots{}in},
26979 @code{break},
26980 @code{continue},
26981 @code{delete},
26982 @code{next},
26983 @code{nextfile},
26984 @code{function},
26985 @code{func},
26987 @code{exit}.
26995 and their source code may be distributed.
27005 internationalized program to work in a particular language.
27021 regexp describes the contents of the string, it is said to @dfn{match} it.
27030 @command{awk} programs by placing two double quote characters next to
27031 each other (@code{""}). It can appear in input data by having two successive
27032 occurrences of the field separator appear next to each other.
27036 double-precision floating-point to represent numbers.
27040 Base-eight notation, where the digits are @code{0}--@code{7}.
27042 to indicate their base. Thus, @code{013} is 11 (one times 8 plus 3).
27049 Patterns tell @command{awk} which input records are interesting to which
27053 tested. If the condition is satisfied, the pattern is said to @dfn{match}
27058 The name for a series of standards
27065 Informally, this standard is often referred to as simply ``P1003.2.''
27079 can specify ranges of input lines for @command{awk} to process or it can
27084 If this isn't clear, refer to the entry for ``recursion.''
27088 stream, or performing output to something other than the standard output stream.
27090 You can redirect the output of the @code{print} and @code{printf} statements
27091 to a file or a system command, using the @samp{>}, @samp{>>}, @samp{|}, and @samp{|&}
27092 operators. You can redirect input to the @code{getline} statement using
27110 slashes, such as @code{/foo/}. This regular expression is chosen
27115 A segment of an @command{awk} program that specifies how to process single
27131 In @command{gawk}, a list of directories to search for @command{awk} program source files.
27132 In the shell, a list of directories to search for executable programs.
27162 This is the type used by some very old versions of @command{awk} to store
27163 numeric values. It is the C type @code{float}.
27170 directly to the underlying operating system---for example, @file{/dev/stderr}.
27176 expect to read their input files in entirety before starting to do
27188 It usually expands to up to eight spaces upon output.
27191 A unique name that identifies an application.
27198 @code{mktime}, @code{strftime}, and @code{systime}.
27213 versions of Unix, as well as several work-alike systems whose source code
27235 Everyone is permitted to copy and distribute verbatim copies
27242 The licenses for most software are designed to take away your
27243 freedom to share and change it. By contrast, the GNU General Public
27244 License is intended to guarantee your freedom to share and change free
27245 software---to make sure the software is free for all its users. This
27246 General Public License applies to most of the Free Software
27247 Foundation's software and to any other program whose authors commit to
27249 the GNU Library General Public License instead.) You can apply it to
27252 When we speak of free software, we are referring to freedom, not
27253 price. Our General Public Licenses are designed to make sure that you
27254 have the freedom to distribute copies of free software (and charge for
27255 this service if you wish), that you receive source code or can get it
27259 To protect your rights, we need to make restrictions that forbid
27260 anyone to deny you these rights or to ask you to surrender the rights.
27261 These restrictions translate to certain responsibilities for you if you
27267 source code. And you must show them these terms so they know their
27271 (2) offer you this license which gives you legal permission to copy,
27274 Also, for each author's protection and ours, we want to make certain
27277 want its recipients to know that what they have is not the original, so
27282 patents. We wish to avoid the danger that redistributors of a free
27300 This License applies to any program or other work which contains
27303 refers to any such program or work, and a ``work based on the Program''
27305 that is to say, a work containing the Program or a portion of it,
27319 source code as you receive it, in any medium, provided that you
27322 notices that refer to this License and to the absence of any warranty;
27337 You must cause the modified files to carry prominent notices
27343 part thereof, to be licensed as a whole at no charge to all third
27349 interactive use in the most ordinary way, to print or display an
27353 these conditions, and telling the user how to view a copy of this
27356 the Program is not required to print an announcement.)
27359 These requirements apply to the modified work as a whole. If
27362 themselves, then this License, and its terms, do not apply to those
27366 this License, whose permissions for other licensees extend to the
27367 entire whole, and thus to each and every part regardless of who wrote it.
27369 Thus, it is not the intent of this section to claim rights or contest
27370 your rights to work written entirely by you; rather, the intent is to
27371 exercise the right to control the distribution of derivative or
27381 under Section 2) in object code or executable form under the terms of
27387 source code, which must be distributed under the terms of Sections
27392 years, to give any third party, for a charge no more than your
27394 machine-readable copy of the corresponding source code, to be
27399 Accompany it with the information you received as to the offer
27400 to distribute corresponding source code. (This alternative is
27402 received the program in object code or executable form with such
27406 The source code for a work means the preferred form of the work for
27407 making modifications to it. For an executable work, complete source
27408 code means all the source code for all modules it contains, plus any
27409 associated interface definition files, plus the scripts used to
27411 special exception, the source code distributed need not include
27417 If distribution of executable or object code is made by offering
27418 access to copy from a designated place, then offering equivalent
27419 access to copy the source code from the same place counts as
27420 distribution of the source code, even though third parties are not
27421 compelled to copy the source along with the object code.
27426 otherwise to copy, modify, sublicense or distribute the Program is
27433 You are not required to accept this License, since you have not
27434 signed it. However, nothing else grants you permission to modify or
27438 Program), you indicate your acceptance of this License to do so, and
27445 original licensor to copy, distribute or modify the Program subject to
27448 You are not responsible for enforcing compliance by third parties to
27453 infringement or for any other reason (not limited to patent issues),
27457 distribute so as to satisfy simultaneously your obligations under this
27462 the only way you could satisfy both it and this License would be to
27466 any particular circumstance, the balance of the section is intended to
27467 apply and the section as a whole is intended to apply in other
27470 It is not the purpose of this section to induce you to infringe any
27471 patents or other property right claims or to contest validity of any
27475 generous contributions to the wide range of software distributed
27477 system; it is up to the author/donor to decide if he or she is willing
27478 to distribute software through any other system and a licensee cannot
27481 This section is intended to make thoroughly clear what is believed to
27495 of the General Public License from time to time. Such new versions will
27496 be similar in spirit to the present version, but may differ in detail to
27500 specifies a version number of this License which applies to it and ``any
27508 If you wish to incorporate parts of the Program into other free
27509 programs whose distribution conditions are different, write to the author
27510 to ask for permission. For software which is copyrighted by the Free
27511 Software Foundation, write to the Free Software Foundation; we sometimes
27557 @unnumberedsec How to Apply These Terms to Your New Programs
27559 If you develop a new program, and you want it to be of the greatest
27560 possible use to the public, the best way to achieve this is to make it
27563 To do so, attach the following notices to the program. It is safest
27564 to attach them to the start of each source file to most effectively
27566 the ``copyright'' line and a pointer to where the full notice is found.
27569 @var{one line to give the program's name and an idea of what it does.}
27570 Copyright (C) @var{year} @var{name of author}
27583 along with this program; if not, write to the Free Software
27587 Also add information on how to contact you by electronic and paper mail.
27593 Gnomovision version 69, Copyright (C) @var{year} @var{name of author}
27596 to redistribute it under certain conditions; type `show c'
27607 school, if any, to sign a ``copyright disclaimer'' for the program, if
27624 consider it more useful to permit linking proprietary applications with the
27625 library. If this is what you want to do, use the GNU Lesser General
27640 Everyone is permitted to copy and distribute verbatim copies
27648 The purpose of this License is to make a manual, textbook, or other
27649 functional and useful document @dfn{free} in the sense of freedom: to
27650 assure everyone the effective freedom to copy and redistribute it,
27653 to get credit for their work, while not being considered responsible
27661 We have designed this License in order to use it for manuals for free
27664 software does. But this License is not limited to software manuals;
27672 This License applies to any manual or other work, in any medium, that
27675 world-wide, royalty-free license, unlimited in duration, to use that
27677 refers to any such manual or work. Any member of the public is a
27688 publishers or authors of the Document to the Document's overall
27689 subject (or to related matters) and contains nothing that could fall
27701 allowed to be designated as Invariant. The Document may contain zero
27711 represented in a format whose specification is available to the
27715 drawing editor, and that is suitable for input to text formatters or
27716 for automatic translation to a variety of formats suitable for input
27717 to text formatters. A copy made in an otherwise Transparent file
27718 format whose markup, or absence of markup, has been arranged to thwart
27737 plus such following pages as are needed to hold, legibly, the material
27738 this License requires to appear in the title page. For works in
27746 specific section name mentioned below, such as ``Acknowledgements'',
27749 section ``Entitled XYZ'' according to this definition.
27751 The Document may include Warranty Disclaimers next to the notice which
27752 states that this License applies to the Document. These Warranty
27753 Disclaimers are considered to be included by reference in this
27764 to the Document are reproduced in all copies, and that you add no other
27765 conditions whatsoever to those of this License. You may not use
27766 technical measures to obstruct or control the reading or further
27786 Copying with changes limited to the covers, as long as they preserve
27790 If the required texts for either cover are too voluminous to fit
27799 public has access to download using public-standard network protocols
27802 when you begin distribution of Opaque copies in quantity, to ensure
27806 edition to the public.
27809 Document well before redistributing any large number of copies, to give
27810 them a chance to provide you with an updated version of the Document.
27819 and modification of the Modified Version to whoever possesses a copy
27838 State on the Title page the name of the publisher of the
27846 adjacent to the other copyright notices.
27850 giving the public permission to use the Modified Version under the
27862 to it an item stating at least the title, year, new authors, and
27871 public access to a Transparent copy of the Document, and likewise
27876 publisher of the version it refers to gives permission.
27894 Do not retitle any existing section to be Entitled ``Endorsements'' or
27895 to conflict in title with any Invariant Section.
27904 of these sections as invariant. To do this, add their titles to the
27914 You may add a passage of up to five words as a Front-Cover Text, and a
27915 passage of up to 25 words as a Back-Cover Text, to the end of the list
27925 give permission to use their names for publicity for or to assert or
27940 copy. If there are multiple Invariant Sections with the same name but
27942 adding at the end of it, in parentheses, the name of the original
27944 Make the same adjustment to the section titles in the list of
27973 resulting from the compilation is not used to limit the legal rights
27976 apply to the other works in the aggregate which are not themselves
27979 If the Cover Text requirement of section 3 is applicable to these
27994 translations of some or all Invariant Sections in addition to the
28004 ``Dedications'', or ``History'', the requirement (section 4) to Preserve
28012 as expressly provided for under this License. Any other attempt to
28023 of the GNU Free Documentation License from time to time. Such new
28024 versions will be similar in spirit to the present version, but may
28025 differ in detail to address new problems or concerns. See
28030 License ``or any later version'' applies to it, you have the option of
28039 @unnumberedsec ADDENDUM: How to use this License for your documents
28047 Copyright (C) @var{year} @var{your name}.
28048 Permission is granted to copy, distribute and/or modify this document
28069 combination of the three, merge those two alternatives to suit the
28072 If your document contains nontrivial examples of program code, we
28075 to permit their use in free software.
28093 of how to use them. It would be useful to perhaps have a "programming
28097 The default and how this changes needs to be documented.
28100 /.../ regexps are in @code, not @samp
28101 ".." strings are in @code, not @samp
28103 values of expressions in the text (@code{x} has the value 15),
28104 should be in roman, not @code
28107 Use space and not blank to describe the space bar's character
28109 To make dark corners work, the @value{DARKCORNER} has to be outside
28133 Use @code{do}, and not @code{do}-@code{while}, except where
28137 "on", "that", "the", "to", "with", and "without",
28145 ok to use numbers.
28146 In tables, put command-line options in @code, while in the text,
28168 Use numbered lists only to show a sequential series of steps.
28170 Use @code{xxx} for the xxx operator in indexing statements, not @samp.
28177 It's a GNU convention to use the term "file name" for the name of a
28179 which are lists of file names. Using it for a single file name as
28180 well is potentially confusing to users.
28185 Note that "file name" should be two words when it appears as ordinary
28193 Enhance FIELDWIDTHS with some way to indicate "the rest of the record".
28200 % 2. Use @code{foo} for variables and @code{foo()} for functions
28225 % + Introduction to Arrays
28226 % + Referring to an Array Element
28231 % - Using Numbers to Subscript Arrays