xref: /netbsd-src/external/gpl2/gawk/dist/awk.texi (revision 7246249458cd60d34ef6735f6b9277d64406490a)
1\input texinfo   @c -*-texinfo-*-
2@c $NetBSD: awk.texi,v 1.1 2010/12/13 06:21:53 mrg Exp $
3@c %**start of header (This is for running Texinfo on a region.)
4@setfilename awk.info
5@settitle The GNU Awk User's Guide
6@c %**end of header (This is for running Texinfo on a region.)
7
8@dircategory Text creation and manipulation
9@direntry
10* Gawk: (awk).                 A text scanning and processing language.
11@end direntry
12@dircategory Individual utilities
13@direntry
14* awk: (awk)Invoking gawk.                     Text scanning and processing.
15@end direntry
16
17@set xref-automatic-section-title
18
19@c The following information should be updated here only!
20@c This sets the edition of the document, the version of gawk it
21@c applies to and all the info about who's publishing this edition
22
23@c These apply across the board.
24@set UPDATE-MONTH June, 2003
25@set VERSION 3.1
26@set PATCHLEVEL 3
27
28@set FSF
29
30@set TITLE GAWK: Effective AWK Programming
31@set SUBTITLE A User's Guide for GNU Awk
32@set EDITION 3
33
34@iftex
35@set DOCUMENT book
36@set CHAPTER chapter
37@set APPENDIX appendix
38@set SECTION section
39@set SUBSECTION subsection
40@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}
41@end iftex
42@ifinfo
43@set DOCUMENT Info file
44@set CHAPTER major node
45@set APPENDIX major node
46@set SECTION minor node
47@set SUBSECTION node
48@set DARKCORNER (d.c.)
49@end ifinfo
50@ifhtml
51@set DOCUMENT Web page
52@set CHAPTER chapter
53@set APPENDIX appendix
54@set SECTION section
55@set SUBSECTION subsection
56@set DARKCORNER (d.c.)
57@end ifhtml
58@ifxml
59@set DOCUMENT book
60@set CHAPTER chapter
61@set APPENDIX appendix
62@set SECTION section
63@set SUBSECTION subsection
64@set DARKCORNER (d.c.)
65@end ifxml
66
67@c some special symbols
68@iftex
69@set LEQ @math{@leq}
70@end iftex
71@ifnottex
72@set LEQ <=
73@end ifnottex
74
75@set FN file name
76@set FFN File Name
77@set DF data file
78@set DDF Data File
79@set PVERSION version
80@set CTL Ctrl
81
82@ignore
83Some comments on the layout for TeX.
841. Use at least texinfo.tex 2000-09-06.09
852. I have done A LOT of work to make this look good. There are  `@page' commands
86   and use of `@group ... @end group' in a number of places. If you muck
87   with anything, it's your responsibility not to break the layout.
88@end ignore
89
90@c merge the function and variable indexes into the concept index
91@ifinfo
92@synindex fn cp
93@synindex vr cp
94@end ifinfo
95@iftex
96@syncodeindex fn cp
97@syncodeindex vr cp
98@end iftex
99@ifxml
100@syncodeindex fn cp
101@syncodeindex vr cp
102@end ifxml
103
104@c If "finalout" is commented out, the printed output will show
105@c black boxes that mark lines that are too long.  Thus, it is
106@c unwise to comment it out when running a master in case there are
107@c overfulls which are deemed okay.
108
109@iftex
110@finalout
111@end iftex
112
113@copying
114Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
115@sp 2
116
117This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}},
118for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU
119implementation of AWK.
120
121Permission is granted to copy, distribute and/or modify this document
122under the terms of the GNU Free Documentation License, Version 1.2 or
123any later version published by the Free Software Foundation; with the
124Invariant Sections being ``GNU General Public License'', the Front-Cover
125texts being (a) (see below), and with the Back-Cover Texts being (b)
126(see below).  A copy of the license is included in the section entitled
127``GNU Free Documentation License''.
128
129@enumerate a
130@item
131``A GNU Manual''
132
133@item
134``You have freedom to copy and modify this GNU Manual, like GNU
135software.  Copies published by the Free Software Foundation raise
136funds for GNU development.''
137@end enumerate
138@end copying
139
140@c Comment out the "smallbook" for technical review.  Saves
141@c considerable paper.  Remember to turn it back on *before*
142@c starting the page-breaking work.
143
144@c 4/2002: Karl Berry recommends commenting out this and the
145@c `@setchapternewpage odd', and letting users use `texi2dvi -t'
146@c if they want to waste paper.
147@c @smallbook
148
149
150@c Uncomment this for the release.  Leaving it off saves paper
151@c during editing and review.
152@c @setchapternewpage odd
153
154@titlepage
155@title @value{TITLE}
156@subtitle @value{SUBTITLE}
157@subtitle Edition @value{EDITION}
158@subtitle @value{UPDATE-MONTH}
159@author Arnold D. Robbins
160
161@c Include the Distribution inside the titlepage environment so
162@c that headings are turned off.  Headings on and off do not work.
163
164@page
165@vskip 0pt plus 1filll
166@ignore
167The programs and applications presented in this book have been
168included for their instructional value.  They have been tested with care
169but are not guaranteed for any particular purpose.  The publisher does not
170offer any warranties or representations, nor does it accept any
171liabilities with respect to the programs or applications.
172So there.
173@sp 2
174UNIX is a registered trademark of The Open Group in the United States and other countries. @*
175Microsoft, MS and MS-DOS are registered trademarks, and Windows is a
176trademark of Microsoft Corporation in the United States and other
177countries. @*
178Atari, 520ST, 1040ST, TT, STE, Mega and Falcon are registered trademarks
179or trademarks of Atari Corporation. @*
180DEC, Digital, OpenVMS, ULTRIX and VMS are trademarks of Digital Equipment
181Corporation. @*
182@end ignore
183``To boldly go where no man has gone before'' is a
184Registered Trademark of Paramount Pictures Corporation. @*
185@c sorry, i couldn't resist
186@sp 3
187Published by:
188@sp 1
189
190Free Software Foundation @*
19159 Temple Place --- Suite 330 @*
192Boston, MA  02111-1307 USA @*
193Phone: +1-617-542-5942 @*
194Fax: +1-617-542-2652 @*
195Email: @email{gnu@@gnu.org} @*
196URL: @uref{http://www.gnu.org/} @*
197
198@c This one is correct for gawk 3.1.0 from the FSF
199ISBN 1-882114-28-0 @*
200@sp 2
201@insertcopying
202@sp 2
203Cover art by Etienne Suvasa.
204@end titlepage
205
206@c Thanks to Bob Chassell for directions on doing dedications.
207@iftex
208@headings off
209@page
210@w{ }
211@sp 9
212@center @i{To Miriam, for making me complete.}
213@sp 1
214@center @i{To Chana, for the joy you bring us.}
215@sp 1
216@center @i{To Rivka, for the exponential increase.}
217@sp 1
218@center @i{To Nachum, for the added dimension.}
219@sp 1
220@center @i{To Malka, for the new beginning.}
221@w{ }
222@page
223@w{ }
224@page
225@headings on
226@end iftex
227
228@iftex
229@headings off
230@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
231@oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
232@end iftex
233
234@ifnottex
235@ifnotxml
236@node Top
237@top General Introduction
238@c Preface node should come right after the Top
239@c node, in `unnumbered' sections, then the chapter, `What is gawk'.
240@c Licensing nodes are appendices, they're not central to AWK.
241
242This file documents @command{awk}, a program that you can use to select
243particular records in a file and perform operations upon them.
244
245@insertcopying
246
247@end ifnotxml
248@end ifnottex
249
250@menu
251* Foreword::                       Some nice words about this
252                                   @value{DOCUMENT}.
253* Preface::                        What this @value{DOCUMENT} is about; brief
254                                   history and acknowledgments.
255* Getting Started::                A basic introduction to using
256                                   @command{awk}. How to run an @command{awk}
257                                   program. Command-line syntax.
258* Regexp::                         All about matching things using regular
259                                   expressions.
260* Reading Files::                  How to read files and manipulate fields.
261* Printing::                       How to print using @command{awk}. Describes
262                                   the @code{print} and @code{printf}
263                                   statements. Also describes redirection of
264                                   output.
265* Expressions::                    Expressions are the basic building blocks
266                                   of statements.
267* Patterns and Actions::           Overviews of patterns and actions.
268* Arrays::                         The description and use of arrays. Also
269                                   includes array-oriented control statements.
270* Functions::                      Built-in and user-defined functions.
271* Internationalization::           Getting @command{gawk} to speak your
272                                   language.
273* Advanced Features::              Stuff for advanced users, specific to
274                                   @command{gawk}.
275* Invoking Gawk::                  How to run @command{gawk}.
276* Library Functions::              A Library of @command{awk} Functions.
277* Sample Programs::                Many @command{awk} programs with complete
278                                   explanations.
279* Language History::               The evolution of the @command{awk}
280                                   language.
281* Installation::                   Installing @command{gawk} under various
282                                   operating systems.
283* Notes::                          Notes about @command{gawk} extensions and
284                                   possible future work.
285* Basic Concepts::                 A very quick intoduction to programming
286                                   concepts.
287* Glossary::                       An explanation of some unfamiliar terms.
288* Copying::                        Your right to copy and distribute
289                                   @command{gawk}.
290* GNU Free Documentation License:: The license for this @value{DOCUMENT}.
291* Index::                          Concept and Variable Index.
292
293@detailmenu
294* History::                        The history of @command{gawk} and
295                                   @command{awk}.
296* Names::                          What name to use to find @command{awk}.
297* This Manual::                    Using this @value{DOCUMENT}. Includes
298                                   sample input files that you can use.
299* Conventions::                    Typographical Conventions.
300* Manual History::                 Brief history of the GNU project and this
301                                   @value{DOCUMENT}.
302* How To Contribute::              Helping to save the world.
303* Acknowledgments::                Acknowledgments.
304* Running gawk::                   How to run @command{gawk} programs;
305                                   includes command-line syntax.
306* One-shot::                       Running a short throwaway @command{awk}
307                                   program.
308* Read Terminal::                  Using no input files (input from terminal
309                                   instead).
310* Long::                           Putting permanent @command{awk} programs in
311                                   files.
312* Executable Scripts::             Making self-contained @command{awk}
313                                   programs.
314* Comments::                       Adding documentation to @command{gawk}
315                                   programs.
316* Quoting::                        More discussion of shell quoting issues.
317* Sample Data Files::              Sample data files for use in the
318                                   @command{awk} programs illustrated in this
319                                   @value{DOCUMENT}.
320* Very Simple::                    A very simple example.
321* Two Rules::                      A less simple one-line example using two
322                                   rules.
323* More Complex::                   A more complex example.
324* Statements/Lines::               Subdividing or combining statements into
325                                   lines.
326* Other Features::                 Other Features of @command{awk}.
327* When::                           When to use @command{gawk} and when to use
328                                   other things.
329* Regexp Usage::                   How to Use Regular Expressions.
330* Escape Sequences::               How to write nonprinting characters.
331* Regexp Operators::               Regular Expression Operators.
332* Character Lists::                What can go between @samp{[...]}.
333* GNU Regexp Operators::           Operators specific to GNU software.
334* Case-sensitivity::               How to do case-insensitive matching.
335* Leftmost Longest::               How much text matches.
336* Computed Regexps::               Using Dynamic Regexps.
337* Locales::                        How the locale affects things.
338* Records::                        Controlling how data is split into records.
339* Fields::                         An introduction to fields.
340* Nonconstant Fields::             Nonconstant Field Numbers.
341* Changing Fields::                Changing the Contents of a Field.
342* Field Separators::               The field separator and how to change it.
343* Regexp Field Splitting::         Using regexps as the field separator.
344* Single Character Fields::        Making each character a separate field.
345* Command Line Field Separator::   Setting @code{FS} from the command-line.
346* Field Splitting Summary::        Some final points and a summary table.
347* Constant Size::                  Reading constant width data.
348* Multiple Line::                  Reading multi-line records.
349* Getline::                        Reading files under explicit program
350                                   control using the @code{getline} function.
351* Plain Getline::                  Using @code{getline} with no arguments.
352* Getline/Variable::               Using @code{getline} into a variable.
353* Getline/File::                   Using @code{getline} from a file.
354* Getline/Variable/File::          Using @code{getline} into a variable from a
355                                   file.
356* Getline/Pipe::                   Using @code{getline} from a pipe.
357* Getline/Variable/Pipe::          Using @code{getline} into a variable from a
358                                   pipe.
359* Getline/Coprocess::              Using @code{getline} from a coprocess.
360* Getline/Variable/Coprocess::     Using @code{getline} into a variable from a
361                                   coprocess.
362* Getline Notes::                  Important things to know about
363                                   @code{getline}.
364* Getline Summary::                Summary of @code{getline} Variants.
365* Print::                          The @code{print} statement.
366* Print Examples::                 Simple examples of @code{print} statements.
367* Output Separators::              The output separators and how to change
368                                   them.
369* OFMT::                           Controlling Numeric Output With
370                                   @code{print}.
371* Printf::                         The @code{printf} statement.
372* Basic Printf::                   Syntax of the @code{printf} statement.
373* Control Letters::                Format-control letters.
374* Format Modifiers::               Format-specification modifiers.
375* Printf Examples::                Several examples.
376* Redirection::                    How to redirect output to multiple files
377                                   and pipes.
378* Special Files::                  File name interpretation in @command{gawk}.
379                                   @command{gawk} allows access to inherited
380                                   file descriptors.
381* Special FD::                     Special files for I/O.
382* Special Process::                Special files for process information.
383* Special Network::                Special files for network communications.
384* Special Caveats::                Things to watch out for.
385* Close Files And Pipes::          Closing Input and Output Files and Pipes.
386* Constants::                      String, numeric and regexp constants.
387* Scalar Constants::               Numeric and string constants.
388* Nondecimal-numbers::             What are octal and hex numbers.
389* Regexp Constants::               Regular Expression constants.
390* Using Constant Regexps::         When and how to use a regexp constant.
391* Variables::                      Variables give names to values for later
392                                   use.
393* Using Variables::                Using variables in your programs.
394* Assignment Options::             Setting variables on the command-line and a
395                                   summary of command-line syntax. This is an
396                                   advanced method of input.
397* Conversion::                     The conversion of strings to numbers and
398                                   vice versa.
399* Arithmetic Ops::                 Arithmetic operations (@samp{+}, @samp{-},
400                                   etc.)
401* Concatenation::                  Concatenating strings.
402* Assignment Ops::                 Changing the value of a variable or a
403                                   field.
404* Increment Ops::                  Incrementing the numeric value of a
405                                   variable.
406* Truth Values::                   What is ``true'' and what is ``false''.
407* Typing and Comparison::          How variables acquire types and how this
408                                   affects comparison of numbers and strings
409                                   with @samp{<}, etc.
410* Boolean Ops::                    Combining comparison expressions using
411                                   boolean operators @samp{||} (``or''),
412                                   @samp{&&} (``and'') and @samp{!} (``not'').
413* Conditional Exp::                Conditional expressions select between two
414                                   subexpressions under control of a third
415                                   subexpression.
416* Function Calls::                 A function call is an expression.
417* Precedence::                     How various operators nest.
418* Pattern Overview::               What goes into a pattern.
419* Regexp Patterns::                Using regexps as patterns.
420* Expression Patterns::            Any expression can be used as a pattern.
421* Ranges::                         Pairs of patterns specify record ranges.
422* BEGIN/END::                      Specifying initialization and cleanup
423                                   rules.
424* Using BEGIN/END::                How and why to use BEGIN/END rules.
425* I/O And BEGIN/END::              I/O issues in BEGIN/END rules.
426* Empty::                          The empty pattern, which matches every
427                                   record.
428* Using Shell Variables::          How to use shell variables with
429                                   @command{awk}.
430* Action Overview::                What goes into an action.
431* Statements::                     Describes the various control statements in
432                                   detail.
433* If Statement::                   Conditionally execute some @command{awk}
434                                   statements.
435* While Statement::                Loop until some condition is satisfied.
436* Do Statement::                   Do specified action while looping until
437                                   some condition is satisfied.
438* For Statement::                  Another looping statement, that provides
439                                   initialization and increment clauses.
440* Switch Statement::               Switch/case evaluation for conditional
441                                   execution of statements based on a value.
442* Break Statement::                Immediately exit the innermost enclosing
443                                   loop.
444* Continue Statement::             Skip to the end of the innermost enclosing
445                                   loop.
446* Next Statement::                 Stop processing the current input record.
447* Nextfile Statement::             Stop processing the current file.
448* Exit Statement::                 Stop execution of @command{awk}.
449* Built-in Variables::             Summarizes the built-in variables.
450* User-modified::                  Built-in variables that you change to
451                                   control @command{awk}.
452* Auto-set::                       Built-in variables where @command{awk}
453                                   gives you information.
454* ARGC and ARGV::                  Ways to use @code{ARGC} and @code{ARGV}.
455* Array Intro::                    Introduction to Arrays
456* Reference to Elements::          How to examine one element of an array.
457* Assigning Elements::             How to change an element of an array.
458* Array Example::                  Basic Example of an Array
459* Scanning an Array::              A variation of the @code{for} statement. It
460                                   loops through the indices of an array's
461                                   existing elements.
462* Delete::                         The @code{delete} statement removes an
463                                   element from an array.
464* Numeric Array Subscripts::       How to use numbers as subscripts in
465                                   @command{awk}.
466* Uninitialized Subscripts::       Using Uninitialized variables as
467                                   subscripts.
468* Multi-dimensional::              Emulating multidimensional arrays in
469                                   @command{awk}.
470* Multi-scanning::                 Scanning multidimensional arrays.
471* Array Sorting::                  Sorting array values and indices.
472* Built-in::                       Summarizes the built-in functions.
473* Calling Built-in::               How to call built-in functions.
474* Numeric Functions::              Functions that work with numbers, including
475                                   @code{int}, @code{sin} and @code{rand}.
476* String Functions::               Functions for string manipulation, such as
477                                   @code{split}, @code{match} and
478                                   @code{sprintf}.
479* Gory Details::                   More than you want to know about @samp{\}
480                                   and @samp{&} with @code{sub}, @code{gsub},
481                                   and @code{gensub}.
482* I/O Functions::                  Functions for files and shell commands.
483* Time Functions::                 Functions for dealing with timestamps.
484* Bitwise Functions::              Functions for bitwise operations.
485* I18N Functions::                 Functions for string translation.
486* User-defined::                   Describes User-defined functions in detail.
487* Definition Syntax::              How to write definitions and what they
488                                   mean.
489* Function Example::               An example function definition and what it
490                                   does.
491* Function Caveats::               Things to watch out for.
492* Return Statement::               Specifying the value a function returns.
493* Dynamic Typing::                 How variable types can change at runtime.
494* I18N and L10N::                  Internationalization and Localization.
495* Explaining gettext::             How GNU @code{gettext} works.
496* Programmer i18n::                Features for the programmer.
497* Translator i18n::                Features for the translator.
498* String Extraction::              Extracting marked strings.
499* Printf Ordering::                Rearranging @code{printf} arguments.
500* I18N Portability::               @command{awk}-level portability issues.
501* I18N Example::                   A simple i18n example.
502* Gawk I18N::                      @command{gawk} is also internationalized.
503* Nondecimal Data::                Allowing nondecimal input data.
504* Two-way I/O::                    Two-way communications with another
505                                   process.
506* TCP/IP Networking::              Using @command{gawk} for network
507                                   programming.
508* Portal Files::                   Using @command{gawk} with BSD portals.
509* Profiling::                      Profiling your @command{awk} programs.
510* Command Line::                   How to run @command{awk}.
511* Options::                        Command-line options and their meanings.
512* Other Arguments::                Input file names and variable assignments.
513* AWKPATH Variable::               Searching directories for @command{awk}
514                                   programs.
515* Obsolete::                       Obsolete Options and/or features.
516* Undocumented::                   Undocumented Options and Features.
517* Known Bugs::                     Known Bugs in @command{gawk}.
518* Library Names::                  How to best name private global variables
519                                   in library functions.
520* General Functions::              Functions that are of general use.
521* Nextfile Function::              Two implementations of a @code{nextfile}
522                                   function.
523* Assert Function::                A function for assertions in @command{awk}
524                                   programs.
525* Round Function::                 A function for rounding if @code{sprintf}
526                                   does not do it correctly.
527* Cliff Random Function::          The Cliff Random Number Generator.
528* Ordinal Functions::              Functions for using characters as numbers
529                                   and vice versa.
530* Join Function::                  A function to join an array into a string.
531* Gettimeofday Function::          A function to get formatted times.
532* Data File Management::           Functions for managing command-line data
533                                   files.
534* Filetrans Function::             A function for handling data file
535                                   transitions.
536* Rewind Function::                A function for rereading the current file.
537* File Checking::                  Checking that data files are readable.
538* Empty Files::                    Checking for zero-length files.
539* Ignoring Assigns::               Treating assignments as file names.
540* Getopt Function::                A function for processing command-line
541                                   arguments.
542* Passwd Functions::               Functions for getting user information.
543* Group Functions::                Functions for getting group information.
544* Running Examples::               How to run these examples.
545* Clones::                         Clones of common utilities.
546* Cut Program::                    The @command{cut} utility.
547* Egrep Program::                  The @command{egrep} utility.
548* Id Program::                     The @command{id} utility.
549* Split Program::                  The @command{split} utility.
550* Tee Program::                    The @command{tee} utility.
551* Uniq Program::                   The @command{uniq} utility.
552* Wc Program::                     The @command{wc} utility.
553* Miscellaneous Programs::         Some interesting @command{awk} programs.
554* Dupword Program::                Finding duplicated words in a document.
555* Alarm Program::                  An alarm clock.
556* Translate Program::              A program similar to the @command{tr}
557                                   utility.
558* Labels Program::                 Printing mailing labels.
559* Word Sorting::                   A program to produce a word usage count.
560* History Sorting::                Eliminating duplicate entries from a
561                                   history file.
562* Extract Program::                Pulling out programs from Texinfo source
563                                   files.
564* Simple Sed::                     A Simple Stream Editor.
565* Igawk Program::                  A wrapper for @command{awk} that includes
566                                   files.
567* V7/SVR3.1::                      The major changes between V7 and System V
568                                   Release 3.1.
569* SVR4::                           Minor changes between System V Releases 3.1
570                                   and 4.
571* POSIX::                          New features from the POSIX standard.
572* BTL::                            New features from the Bell Laboratories
573                                   version of @command{awk}.
574* POSIX/GNU::                      The extensions in @command{gawk} not in
575                                   POSIX @command{awk}.
576* Contributors::                   The major contributors to @command{gawk}.
577* Gawk Distribution::              What is in the @command{gawk} distribution.
578* Getting::                        How to get the distribution.
579* Extracting::                     How to extract the distribution.
580* Distribution contents::          What is in the distribution.
581* Unix Installation::              Installing @command{gawk} under various
582                                   versions of Unix.
583* Quick Installation::             Compiling @command{gawk} under Unix.
584* Additional Configuration Options:: Other compile-time options.
585* Configuration Philosophy::       How it's all supposed to work.
586* Non-Unix Installation::          Installation on Other Operating Systems.
587* Amiga Installation::             Installing @command{gawk} on an Amiga.
588* BeOS Installation::              Installing @command{gawk} on BeOS.
589* PC Installation::                Installing and Compiling @command{gawk} on
590                                   MS-DOS and OS/2.
591* PC Binary Installation::         Installing a prepared distribution.
592* PC Compiling::                   Compiling @command{gawk} for MS-DOS, Windows32,
593                                   and OS/2.
594* PC Using::                       Running @command{gawk} on MS-DOS, Windows32 and
595                                   OS/2.
596* PC Dynamic::                     Compiling @command{gawk} for dynamic
597                                   libraries.
598* Cygwin::                         Building and running @command{gawk} for
599                                   Cygwin.
600* VMS Installation::               Installing @command{gawk} on VMS.
601* VMS Compilation::                How to compile @command{gawk} under VMS.
602* VMS Installation Details::       How to install @command{gawk} under VMS.
603* VMS Running::                    How to run @command{gawk} under VMS.
604* VMS POSIX::                      Alternate instructions for VMS POSIX.
605* Unsupported::                    Systems whose ports are no longer
606                                   supported.
607* Atari Installation::             Installing @command{gawk} on the Atari ST.
608* Atari Compiling::                Compiling @command{gawk} on Atari.
609* Atari Using::                    Running @command{gawk} on Atari.
610* Tandem Installation::            Installing @command{gawk} on a Tandem.
611* Bugs::                           Reporting Problems and Bugs.
612* Other Versions::                 Other freely available @command{awk}
613                                   implementations.
614* Compatibility Mode::             How to disable certain @command{gawk}
615                                   extensions.
616* Additions::                      Making Additions To @command{gawk}.
617* Adding Code::                    Adding code to the main body of
618                                   @command{gawk}.
619* New Ports::                      Porting @command{gawk} to a new operating
620                                   system.
621* Dynamic Extensions::             Adding new built-in functions to
622                                   @command{gawk}.
623* Internals::                      A brief look at some @command{gawk}
624                                   internals.
625* Sample Library::                 A example of new functions.
626* Internal File Description::      What the new functions will do.
627* Internal File Ops::              The code for internal file operations.
628* Using Internal File Ops::        How to use an external extension.
629* Future Extensions::              New features that may be implemented one
630                                   day.
631* Basic High Level::               The high level view.
632* Basic Data Typing::              A very quick intro to data types.
633* Floating Point Issues::          Stuff to know about floating-point numbers.
634@end detailmenu
635@end menu
636
637@c dedication for Info file
638@ifinfo
639@center To Miriam, for making me complete.
640@sp 1
641@center To Chana, for the joy you bring us.
642@sp 1
643@center To Rivka, for the exponential increase.
644@sp 1
645@center To Nachum, for the added dimension.
646@sp 1
647@center To Malka, for the new beginning.
648@end ifinfo
649
650@summarycontents
651@contents
652
653@node Foreword
654@unnumbered Foreword
655
656Arnold Robbins and I are good friends. We were introduced 11 years ago
657by circumstances---and our favorite programming language, AWK.
658The circumstances started a couple of years
659earlier. I was working at a new job and noticed an unplugged
660Unix computer sitting in the corner.  No one knew how to use it,
661and neither did I.  However,
662a couple of days later it was running, and
663I was @code{root} and the one-and-only user.
664That day, I began the transition from statistician to Unix programmer.
665
666On one of many trips to the library or bookstore in search of
667books on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and
668Weinberger, @cite{The AWK Programming Language}, Addison-Wesley,
6691988.  AWK's simple programming paradigm---find a pattern in the
670input and then perform an action---often reduced complex or tedious
671data manipulations to few lines of code.  I was excited to try my
672hand at programming in AWK.
673
674Alas,  the @command{awk} on my computer was a limited version of the
675language described in the AWK book.  I discovered that my computer
676had ``old @command{awk}'' and the AWK book described ``new @command{awk}.''
677I learned that this was typical; the old version refused to step
678aside or relinquish its name.  If a system had a new @command{awk}, it was
679invariably called @command{nawk}, and few systems had it.
680The best way to get a new @command{awk} was to @command{ftp} the source code for
681@command{gawk} from @code{prep.ai.mit.edu}.  @command{gawk} was a version of
682new @command{awk} written by David Trueman and Arnold, and available under
683the GNU General Public License.
684
685(Incidentally,
686it's no longer difficult to find a new @command{awk}. @command{gawk} ships with
687Linux, and you can download binaries or source code for almost
688any system; my wife uses @command{gawk} on her VMS box.)
689
690My Unix system started out unplugged from the wall; it certainly was not
691plugged into a network.  So, oblivious to the existence of @command{gawk}
692and the Unix community in general, and desiring a new @command{awk}, I wrote
693my own, called @command{mawk}.
694Before I was finished I knew about @command{gawk},
695but it was too late to stop, so I eventually posted
696to a @code{comp.sources} newsgroup.
697
698A few days after my posting, I got a friendly email
699from Arnold introducing
700himself.   He suggested we share design and algorithms and
701attached a draft of the POSIX standard so
702that I could update @command{mawk} to support language extensions added
703after publication of the AWK book.
704
705Frankly, if our roles had
706been reversed, I would not have been so open and we probably would
707have never met.  I'm glad we did meet.
708He is an AWK expert's AWK expert and a genuinely nice person.
709Arnold contributes significant amounts of his
710expertise and time to the Free Software Foundation.
711
712This book is the @command{gawk} reference manual, but at its core it
713is a book about AWK programming that
714will appeal to a wide audience.
715It is a definitive reference to the AWK language as defined by the
7161987 Bell Labs release and codified in the 1992 POSIX Utilities
717standard.
718
719On the other hand, the novice AWK programmer can study
720a wealth of practical programs that emphasize
721the power of AWK's basic idioms:
722data driven control-flow, pattern matching with regular expressions,
723and associative arrays.
724Those looking for something new can try out @command{gawk}'s
725interface to network protocols via special @file{/inet} files.
726
727The programs in this book make clear that an AWK program is
728typically much smaller and faster to develop than
729a counterpart written in C.
730Consequently, there is often a payoff to prototype an
731algorithm or design in AWK to get it running quickly and expose
732problems early. Often, the interpreted performance is adequate
733and the AWK prototype becomes the product.
734
735The new @command{pgawk} (profiling @command{gawk}), produces
736program execution counts.
737I recently experimented with an algorithm that for
738@math{n} lines of input, exhibited
739@tex
740$\sim\! Cn^2$
741@end tex
742@ifnottex
743~ C n^2
744@end ifnottex
745performance, while
746theory predicted
747@tex
748$\sim\! Cn\log n$
749@end tex
750@ifnottex
751~ C n log n
752@end ifnottex
753behavior. A few minutes poring
754over the @file{awkprof.out} profile pinpointed the problem to
755a single line of code.  @command{pgawk} is a welcome addition to
756my programmer's toolbox.
757
758Arnold has distilled over a decade of experience writing and
759using AWK programs, and developing @command{gawk}, into this book.  If you use
760AWK or want to learn how, then read this book.
761
762@display
763Michael Brennan
764Author of @command{mawk}
765@end display
766
767@node Preface
768@unnumbered Preface
769@c I saw a comment somewhere that the preface should describe the book itself,
770@c and the introduction should describe what the book covers.
771@c
772@c 12/2000: Chuck wants the preface & intro combined.
773
774Several kinds of tasks occur repeatedly
775when working with text files.
776You might want to extract certain lines and discard the rest.
777Or you may need to make changes wherever certain patterns appear,
778but leave the rest of the file alone.
779Writing single-use programs for these tasks in languages such as C, C++, or Pascal
780is time-consuming and inconvenient.
781Such jobs are often easier with @command{awk}.
782The @command{awk} utility interprets a special-purpose programming language
783that makes it easy to handle simple data-reformatting jobs.
784
785The GNU implementation of @command{awk} is called @command{gawk}; it is fully
786compatible with the System V Release 4 version of
787@command{awk}.  @command{gawk} is also compatible with the POSIX
788specification of the @command{awk} language.  This means that all
789properly written @command{awk} programs should work with @command{gawk}.
790Thus, we usually don't distinguish between @command{gawk} and other
791@command{awk} implementations.
792
793@cindex @command{awk}, POSIX and, See Also POSIX @command{awk}
794@cindex @command{awk}, POSIX and
795@cindex POSIX, @command{awk} and
796@cindex @command{gawk}, @command{awk} and
797@cindex @command{awk}, @command{gawk} and
798@cindex @command{awk}, uses for
799Using @command{awk} allows you to:
800
801@itemize @bullet
802@item
803Manage small, personal databases
804
805@item
806Generate reports
807
808@item
809Validate data
810
811@item
812Produce indexes and perform other document preparation tasks
813
814@item
815Experiment with algorithms that you can adapt later to other computer
816languages
817@end itemize
818
819@cindex @command{awk}, See Also @command{gawk}
820@cindex @command{gawk}, See Also @command{awk}
821@cindex @command{gawk}, uses for
822In addition,
823@command{gawk}
824provides facilities that make it easy to:
825
826@itemize @bullet
827@item
828Extract bits and pieces of data for processing
829
830@item
831Sort data
832
833@item
834Perform simple network communications
835@end itemize
836
837This @value{DOCUMENT} teaches you about the @command{awk} language and
838how you can use it effectively.  You should already be familiar with basic
839system commands, such as @command{cat} and @command{ls},@footnote{These commands
840are available on POSIX-compliant systems, as well as on traditional
841Unix-based systems. If you are using some other operating system, you still need to
842be familiar with the ideas of I/O redirection and pipes.} as well as basic shell
843facilities, such as input/output (I/O) redirection and pipes.
844
845@cindex GNU @command{awk}, See @command{gawk}
846Implementations of the @command{awk} language are available for many
847different computing environments.  This @value{DOCUMENT}, while describing
848the @command{awk} language in general, also describes the particular
849implementation of @command{awk} called @command{gawk} (which stands for
850``GNU awk'').  @command{gawk} runs on a broad range of Unix systems,
851ranging from 80386 PC-based computers up through large-scale systems,
852such as Crays. @command{gawk} has also been ported to Mac OS X,
853MS-DOS, Microsoft Windows (all versions) and OS/2 PCs, Atari and Amiga
854microcomputers, BeOS, Tandem D20, and VMS.
855
856@menu
857* History::                     The history of @command{gawk} and
858                                @command{awk}.
859* Names::                       What name to use to find @command{awk}.
860* This Manual::                 Using this @value{DOCUMENT}. Includes sample
861                                input files that you can use.
862* Conventions::                 Typographical Conventions.
863* Manual History::              Brief history of the GNU project and this
864                                @value{DOCUMENT}.
865* How To Contribute::           Helping to save the world.
866* Acknowledgments::             Acknowledgments.
867@end menu
868
869@node History
870@unnumberedsec History of @command{awk} and @command{gawk}
871@cindex recipe for a programming language
872@cindex programming language, recipe for
873@center Recipe For A Programming Language
874
875@multitable {2 parts} {1 part  @code{egrep}} {1 part  @code{snobol}}
876@item @tab 1 part  @code{egrep} @tab 1 part  @code{snobol}
877@item @tab 2 parts @code{ed} @tab 3 parts C
878@end multitable
879
880@quotation
881Blend all parts well using @code{lex} and @code{yacc}.
882Document minimally and release.
883
884After eight years, add another part @code{egrep} and two
885more parts C.  Document very well and release.
886@end quotation
887
888@cindex Aho, Alfred
889@cindex Weinberger, Peter
890@cindex Kernighan, Brian
891@cindex @command{awk}, history of
892The name @command{awk} comes from the initials of its designers: Alfred V.@:
893Aho, Peter J.@: Weinberger and Brian W.@: Kernighan.  The original version of
894@command{awk} was written in 1977 at AT&T Bell Laboratories.
895In 1985, a new version made the programming
896language more powerful, introducing user-defined functions, multiple input
897streams, and computed regular expressions.
898This new version became widely available with Unix System V
899Release 3.1 (SVR3.1).
900The version in SVR4 added some new features and cleaned
901up the behavior in some of the ``dark corners'' of the language.
902The specification for @command{awk} in the POSIX Command Language
903and Utilities standard further clarified the language.
904Both the @command{gawk} designers and the original Bell Laboratories @command{awk}
905designers provided feedback for the POSIX specification.
906
907@cindex Rubin, Paul
908@cindex Fenlason, Jay
909@cindex Trueman, David
910Paul Rubin wrote the GNU implementation, @command{gawk}, in 1986.
911Jay Fenlason completed it, with advice from Richard Stallman.  John Woods
912contributed parts of the code as well.  In 1988 and 1989, David Trueman, with
913help from me, thoroughly reworked @command{gawk} for compatibility
914with the newer @command{awk}.
915Circa 1995, I became the primary maintainer.
916Current development focuses on bug fixes,
917performance improvements, standards compliance, and occasionally, new features.
918
919In May of 1997, J@"urgen Kahrs felt the need for network access
920from @command{awk}, and with a little help from me, set about adding
921features to do this for @command{gawk}.  At that time, he also
922wrote the bulk of
923@cite{TCP/IP Internetworking with @command{gawk}}
924(a separate document, available as part of the @command{gawk} distribution).
925His code finally became part of the main @command{gawk} distribution
926with @command{gawk} @value{PVERSION} 3.1.
927
928@xref{Contributors},
929for a complete list of those who made important contributions to @command{gawk}.
930
931@node Names
932@section A Rose by Any Other Name
933
934@cindex @command{awk}, new vs. old
935The @command{awk} language has evolved over the years. Full details are
936provided in @ref{Language History}.
937The language described in this @value{DOCUMENT}
938is often referred to as ``new @command{awk}'' (@command{nawk}).
939
940@cindex @command{awk}, versions of
941Because of this, many systems have multiple
942versions of @command{awk}.
943Some systems have an @command{awk} utility that implements the
944original version of the @command{awk} language and a @command{nawk} utility
945for the new
946version.
947Others have an @command{oawk} version for the ``old @command{awk}''
948language and plain @command{awk} for the new one.  Still others only
949have one version, which is usually the new one.@footnote{Often, these systems
950use @command{gawk} for their @command{awk} implementation!}
951
952@cindex @command{nawk} utility
953@cindex @command{oawk} utility
954All in all, this makes it difficult for you to know which version of
955@command{awk} you should run when writing your programs.  The best advice
956I can give here is to check your local documentation. Look for @command{awk},
957@command{oawk}, and @command{nawk}, as well as for @command{gawk}.
958It is likely that you already
959have some version of new @command{awk} on your system, which is what
960you should use when running your programs.  (Of course, if you're reading
961this @value{DOCUMENT}, chances are good that you have @command{gawk}!)
962
963Throughout this @value{DOCUMENT}, whenever we refer to a language feature
964that should be available in any complete implementation of POSIX @command{awk},
965we simply use the term @command{awk}.  When referring to a feature that is
966specific to the GNU implementation, we use the term @command{gawk}.
967
968@node This Manual
969@section Using This Book
970@cindex @command{awk}, terms describing
971
972The term @command{awk} refers to a particular program as well as to the language you
973use to tell this program what to do.  When we need to be careful, we call
974the language ``the @command{awk} language,''
975and the program ``the @command{awk} utility.''
976This @value{DOCUMENT} explains
977both the @command{awk} language and how to run the @command{awk} utility.
978The term @dfn{@command{awk} program} refers to a program written by you in
979the @command{awk} programming language.
980
981@cindex @command{gawk}, @command{awk} and
982@cindex @command{awk}, @command{gawk} and
983@cindex POSIX @command{awk}
984Primarily, this @value{DOCUMENT} explains the features of @command{awk},
985as defined in the POSIX standard.  It does so in the context of the
986@command{gawk} implementation.  While doing so, it also
987attempts to describe important differences between @command{gawk}
988and other @command{awk} implementations.@footnote{All such differences
989appear in the index under the
990entry ``differences in @command{awk} and @command{gawk}.''}
991Finally, any @command{gawk} features that are not in
992the POSIX standard for @command{awk} are noted.
993
994@ifnotinfo
995This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference.
996If you are a novice, feel free to skip over details that seem too complex.
997You should also ignore the many cross-references; they are for the
998expert user and for the online Info version of the document.
999@end ifnotinfo
1000
1001There are
1002subsections labelled
1003as @strong{Advanced Notes}
1004scattered throughout the @value{DOCUMENT}.
1005They add a more complete explanation of points that are relevant, but not likely
1006to be of interest on first reading.
1007All appear in the index, under the heading ``advanced features.''
1008
1009Most of the time, the examples use complete @command{awk} programs.
1010In some of the more advanced sections, only the part of the @command{awk}
1011program that illustrates the concept currently being described is shown.
1012
1013While this @value{DOCUMENT} is aimed principally at people who have not been
1014exposed
1015to @command{awk}, there is a lot of information here that even the @command{awk}
1016expert should find useful.  In particular, the description of POSIX
1017@command{awk} and the example programs in
1018@ref{Library Functions}, and in
1019@ref{Sample Programs},
1020should be of interest.
1021
1022@ref{Getting Started},
1023provides the essentials you need to know to begin using @command{awk}.
1024
1025@ref{Regexp},
1026introduces regular expressions in general, and in particular the flavors
1027supported by POSIX @command{awk} and @command{gawk}.
1028
1029@ref{Reading Files},
1030describes how @command{awk} reads your data.
1031It introduces the concepts of records and fields, as well
1032as the @code{getline} command.
1033I/O redirection is first described here.
1034
1035@ref{Printing},
1036describes how @command{awk} programs can produce output with
1037@code{print} and @code{printf}.
1038
1039@ref{Expressions},
1040describes expressions, which are the basic building blocks
1041for getting most things done in a program.
1042
1043@ref{Patterns and Actions},
1044describes how to write patterns for matching records, actions for
1045doing something when a record is matched, and the built-in variables
1046@command{awk} and @command{gawk} use.
1047
1048@ref{Arrays},
1049covers @command{awk}'s one-and-only data structure: associative arrays.
1050Deleting array elements and whole arrays is also described, as well as
1051sorting arrays in @command{gawk}.
1052
1053@ref{Functions},
1054describes the built-in functions @command{awk} and
1055@command{gawk} provide, as well as how to define
1056your own functions.
1057
1058@ref{Internationalization},
1059describes special features in @command{gawk} for translating program
1060messages into different languages at runtime.
1061
1062@ref{Advanced Features},
1063describes a number of @command{gawk}-specific advanced features.
1064Of particular note
1065are the abilities to have two-way communications with another process,
1066perform TCP/IP networking, and
1067profile your @command{awk} programs.
1068
1069@ref{Invoking Gawk},
1070describes how to run @command{gawk}, the meaning of its
1071command-line options, and how it finds @command{awk}
1072program source files.
1073
1074@ref{Library Functions}, and
1075@ref{Sample Programs},
1076provide many sample @command{awk} programs.
1077Reading them allows you to see @command{awk}
1078solving real problems.
1079
1080@ref{Language History},
1081describes how the @command{awk} language has evolved since
1082first release to present.  It also describes how @command{gawk}
1083has acquired features over time.
1084
1085@ref{Installation},
1086describes how to get @command{gawk}, how to compile it
1087under Unix, and how to compile and use it on different
1088non-Unix systems.  It also describes how to report bugs
1089in @command{gawk} and where to get three other freely
1090available implementations of @command{awk}.
1091
1092@ref{Notes},
1093describes how to disable @command{gawk}'s extensions, as
1094well as how to contribute new code to @command{gawk},
1095how to write extension libraries, and some possible
1096future directions for @command{gawk} development.
1097
1098@ref{Basic Concepts},
1099provides some very cursory background material for those who
1100are completely unfamiliar with computer programming.
1101Also centralized there is a discussion of some of the issues
1102surrounding floating-point numbers.
1103
1104The
1105@ref{Glossary},
1106defines most, if not all, the significant terms used
1107throughout the book.
1108If you find terms that you aren't familiar with, try looking them up here.
1109
1110@ref{Copying}, and
1111@ref{GNU Free Documentation License},
1112present the licenses that cover the @command{gawk} source code
1113and this @value{DOCUMENT}, respectively.
1114
1115@node Conventions
1116@section Typographical Conventions
1117
1118@cindex Texinfo
1119This @value{DOCUMENT} is written using Texinfo, the GNU documentation
1120formatting language.
1121A single Texinfo source file is used to produce both the printed and online
1122versions of the documentation.
1123@ifnotinfo
1124Because of this, the typographical conventions
1125are slightly different than in other books you may have read.
1126@end ifnotinfo
1127@ifinfo
1128This @value{SECTION} briefly documents the typographical conventions used in Texinfo.
1129@end ifinfo
1130
1131Examples you would type at the command-line are preceded by the common
1132shell primary and secondary prompts, @samp{$} and @samp{>}.
1133Output from the command is preceded by the glyph ``@print{}''.
1134This typically represents the command's standard output.
1135Error messages, and other output on the command's standard error, are preceded
1136by the glyph ``@error{}''.  For example:
1137
1138@example
1139$ echo hi on stdout
1140@print{} hi on stdout
1141$ echo hello on stderr 1>&2
1142@error{} hello on stderr
1143@end example
1144
1145@ifnotinfo
1146In the text, command names appear in @code{this font}, while code segments
1147appear in the same font and quoted, @samp{like this}.  Some things are
1148emphasized @emph{like this}, and if a point needs to be made
1149strongly, it is done @strong{like this}.  The first occurrence of
1150a new term is usually its @dfn{definition} and appears in the same
1151font as the previous occurrence of ``definition'' in this sentence.
1152@value{FN}s are indicated like this: @file{/path/to/ourfile}.
1153@end ifnotinfo
1154
1155Characters that you type at the keyboard look @kbd{like this}.  In particular,
1156there are special characters called ``control characters.''  These are
1157characters that you type by holding down both the @kbd{CONTROL} key and
1158another key, at the same time.  For example, a @kbd{@value{CTL}-d} is typed
1159by first pressing and holding the @kbd{CONTROL} key, next
1160pressing the @kbd{d} key and finally releasing both keys.
1161
1162@c fakenode --- for prepinfo
1163@subsubheading Dark Corners
1164@cindex Kernighan, Brian
1165@quotation
1166@i{Dark corners are basically fractal --- no matter how much
1167you illuminate, there's always a smaller but darker one.}@*
1168Brian Kernighan
1169@end quotation
1170
1171@cindex d.c., See dark corner
1172@cindex dark corner
1173Until the POSIX standard (and @cite{The Gawk Manual}),
1174many features of @command{awk} were either poorly documented or not
1175documented at all.  Descriptions of such features
1176(often called ``dark corners'') are noted in this @value{DOCUMENT} with
1177@iftex
1178the picture of a flashlight in the margin, as shown here.
1179@value{DARKCORNER}
1180@end iftex
1181@ifnottex
1182``(d.c.)''.
1183@end ifnottex
1184They also appear in the index under the heading ``dark corner.''
1185
1186As noted by the opening quote, though, any
1187coverage of dark corners
1188is, by definition, something that is incomplete.
1189
1190@node Manual History
1191@unnumberedsec The GNU Project and This Book
1192
1193@cindex FSF (Free Software Foundation)
1194@cindex Free Software Foundation (FSF)
1195@cindex Stallman, Richard
1196The Free Software Foundation (FSF) is a nonprofit organization dedicated
1197to the production and distribution of freely distributable software.
1198It was founded by Richard M.@: Stallman, the author of the original
1199Emacs editor.  GNU Emacs is the most widely used version of Emacs today.
1200
1201@cindex GNU Project
1202@cindex GPL (General Public License)
1203@cindex General Public License, See GPL
1204@cindex documentation, online
1205The GNU@footnote{GNU stands for ``GNU's not Unix.''}
1206Project is an ongoing effort on the part of the Free Software
1207Foundation to create a complete, freely distributable, POSIX-compliant
1208computing environment.
1209The FSF uses the ``GNU General Public License'' (GPL) to ensure that
1210their software's
1211source code is always available to the end user. A
1212copy of the GPL is included
1213@ifnotinfo
1214in this @value{DOCUMENT}
1215@end ifnotinfo
1216for your reference
1217(@pxref{Copying}).
1218The GPL applies to the C language source code for @command{gawk}.
1219To find out more about the FSF and the GNU Project online,
1220see @uref{http://www.gnu.org, the GNU Project's home page}.
1221This @value{DOCUMENT} may also be read from
1222@uref{http://www.gnu.org/manual/gawk/, their web site}.
1223
1224A shell, an editor (Emacs), highly portable optimizing C, C++, and
1225Objective-C compilers, a symbolic debugger and dozens of large and
1226small utilities (such as @command{gawk}), have all been completed and are
1227freely available.  The GNU operating
1228system kernel (the HURD), has been released but is still in an early
1229stage of development.
1230
1231@cindex Linux
1232@cindex GNU/Linux
1233@cindex operating systems, BSD-based
1234@cindex Alpha (DEC)
1235Until the GNU operating system is more fully developed, you should
1236consider using GNU/Linux, a freely distributable, Unix-like operating
1237system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other
1238systems.@footnote{The terminology ``GNU/Linux'' is explained
1239in the @ref{Glossary}.}
1240There are
1241many books on GNU/Linux. One that is freely available is @cite{Linux
1242Installation and Getting Started}, by Matt Welsh.
1243Many GNU/Linux distributions are often available in computer stores or
1244bundled on CD-ROMs with books about Linux.
1245(There are three other freely available, Unix-like operating systems for
124680386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the
12474.4-Lite Berkeley Software Distribution, and they use recent versions
1248of @command{gawk} for their versions of @command{awk}.)
1249
1250@ifnotinfo
1251The @value{DOCUMENT} you are reading is actually free---at least, the
1252information in it is free to anyone.  The machine-readable
1253source code for the @value{DOCUMENT} comes with @command{gawk}; anyone
1254may take this @value{DOCUMENT} to a copying machine and make as many
1255copies as they like.  (Take a moment to check the Free Documentation
1256License in @ref{GNU Free Documentation License}.)
1257
1258Although you could just print it out yourself, bound books are much
1259easier to read and use.  Furthermore,
1260the proceeds from sales of this book go back to the FSF
1261to help fund development of more free software.
1262@end ifnotinfo
1263
1264@ignore
1265@cindex Close, Diane
1266The @value{DOCUMENT} itself has gone through several previous,
1267preliminary editions.
1268Paul Rubin wrote the very first draft of @cite{The GAWK Manual};
1269it was around 40 pages in size.
1270Diane Close and Richard Stallman improved it, yielding the
1271version which I started working with in the fall of 1988.
1272It was around 90 pages long and barely described the original, ``old''
1273version of @command{awk}. After substantial revision, the first version of
1274the @cite{The GAWK Manual} to be released was Edition 0.11 Beta in
1275October of 1989.  The manual then underwent more substantial revision
1276for Edition 0.13 of December 1991.
1277David Trueman, Pat Rankin and Michal Jaegermann contributed sections
1278of the manual for Edition 0.13.
1279That edition was published by the
1280FSF as a bound book early in 1992.  Since then there were several
1281minor revisions, notably Edition 0.14 of November 1992 that was published
1282by the FSF in January of 1993 and Edition 0.16 of August 1993.
1283
1284Edition 1.0 of @cite{GAWK: The GNU Awk User's Guide} represented a significant re-working
1285of @cite{The GAWK Manual}, with much additional material.
1286The FSF and I agreed that I was now the primary author.
1287@c I also felt that the manual needed a more descriptive title.
1288
1289In January 1996, SSC published Edition 1.0 under the title @cite{Effective AWK Programming}.
1290In February 1997, they published Edition 1.0.3 which had minor changes
1291as a ``second edition.''
1292In 1999, the FSF published this same version as Edition 2
1293of @cite{GAWK: The GNU Awk User's Guide}.
1294
1295Edition @value{EDITION} maintains the basic structure of Edition 1.0,
1296but with significant additional material, reflecting the host of new features
1297in @command{gawk} @value{PVERSION} @value{VERSION}.
1298Of particular note is
1299@ref{Array Sorting},
1300@ref{Bitwise Functions},
1301@ref{Internationalization},
1302@ref{Advanced Features},
1303and
1304@ref{Dynamic Extensions}.
1305@end ignore
1306
1307@cindex Close, Diane
1308The @value{DOCUMENT} itself has gone through a number of previous editions.
1309Paul Rubin wrote the very first draft of @cite{The GAWK Manual};
1310it was around 40 pages in size.
1311Diane Close and Richard Stallman improved it, yielding a
1312version that was
1313around 90 pages long and barely described the original, ``old''
1314version of @command{awk}.
1315
1316I started working with that version in the fall of 1988.
1317As work on it progressed,
1318the FSF published several preliminary versions (numbered 0.@var{x}).
1319In 1996, Edition 1.0 was released with @command{gawk} 3.0.0.
1320The FSF published the first two editions under
1321the title @cite{The GNU Awk User's Guide}.
1322
1323This edition maintains the basic structure of Edition 1.0,
1324but with significant additional material, reflecting the host of new features
1325in @command{gawk} @value{PVERSION} @value{VERSION}.
1326Of particular note is
1327@ref{Array Sorting},
1328as well as
1329@ref{Bitwise Functions},
1330@ref{Internationalization},
1331and also
1332@ref{Advanced Features},
1333and
1334@ref{Dynamic Extensions}.
1335
1336@cite{@value{TITLE}} will undoubtedly continue to evolve.
1337An electronic version
1338comes with the @command{gawk} distribution from the FSF.
1339If you find an error in this @value{DOCUMENT}, please report it!
1340@xref{Bugs}, for information on submitting
1341problem reports electronically, or write to me in care of the publisher.
1342
1343@node How To Contribute
1344@unnumberedsec How to Contribute
1345
1346As the maintainer of GNU @command{awk},
1347I am starting a collection of publicly available @command{awk}
1348programs.
1349For more information,
1350see @uref{ftp://ftp.freefriends.org/arnold/Awkstuff}.
1351If you have written an interesting @command{awk} program, or have written a
1352@command{gawk} extension that you would like to
1353share with the rest of the world, please contact me (@email{arnold@@gnu.org}).
1354Making things available on the Internet helps keep the
1355@command{gawk} distribution down to manageable size.
1356
1357@node Acknowledgments
1358@unnumberedsec Acknowledgments
1359
1360The initial draft of @cite{The GAWK Manual} had the following acknowledgments:
1361
1362@quotation
1363Many people need to be thanked for their assistance in producing this
1364manual.  Jay Fenlason contributed many ideas and sample programs.  Richard
1365Mlynarik and Robert Chassell gave helpful comments on drafts of this
1366manual.  The paper @cite{A Supplemental Document for @command{awk}} by John W.@:
1367Pierce of the Chemistry Department at UC San Diego, pinpointed several
1368issues relevant both to @command{awk} implementation and to this manual, that
1369would otherwise have escaped us.
1370@end quotation
1371
1372@cindex Stallman, Richard
1373I would like to acknowledge Richard M.@: Stallman, for his vision of a
1374better world and for his courage in founding the FSF and starting the
1375GNU Project.
1376
1377The following people (in alphabetical order)
1378provided helpful comments on various
1379versions of this book, up to and including this edition.
1380Rick Adams,
1381Nelson H.F. Beebe,
1382Karl Berry,
1383Dr.@: Michael Brennan,
1384Rich Burridge,
1385Claire Cloutier,
1386Diane Close,
1387Scott Deifik,
1388Christopher (``Topher'') Eliot,
1389Jeffrey Friedl,
1390Dr.@: Darrel Hankerson,
1391Michal Jaegermann,
1392Dr.@: Richard J.@: LeBlanc,
1393Michael Lijewski,
1394Pat Rankin,
1395Miriam Robbins,
1396Mary Sheehan,
1397and
1398Chuck Toporek.
1399
1400@cindex Berry, Karl
1401@cindex Chassell, Robert J.@:
1402@c @cindex Texinfo
1403Robert J.@: Chassell provided much valuable advice on
1404the use of Texinfo.
1405He also deserves special thanks for
1406convincing me @emph{not} to title this @value{DOCUMENT}
1407@cite{How To Gawk Politely}.
1408Karl Berry helped significantly with the @TeX{} part of Texinfo.
1409
1410@cindex Hartholz, Marshall
1411@cindex Hartholz, Elaine
1412@cindex Schreiber, Bert
1413@cindex Schreiber, Rita
1414I would like to thank Marshall and Elaine Hartholz of Seattle and
1415Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
1416time in their homes, which allowed me to make significant progress on
1417this @value{DOCUMENT} and on @command{gawk} itself.
1418
1419@cindex Hughes, Phil
1420Phil Hughes of SSC
1421contributed in a very important way by loaning me his laptop GNU/Linux
1422system, not once, but twice, which allowed me to do a lot of work while
1423away from home.
1424
1425@cindex Trueman, David
1426David Trueman deserves special credit; he has done a yeoman job
1427of evolving @command{gawk} so that it performs well and without bugs.
1428Although he is no longer involved with @command{gawk},
1429working with him on this project was a significant pleasure.
1430
1431@cindex Drepper, Ulrich
1432@cindex GNITS mailing list
1433@cindex mailing list, GNITS
1434The intrepid members of the GNITS mailing list, and most notably Ulrich
1435Drepper, provided invaluable help and feedback for the design of the
1436internationalization features.
1437
1438@cindex Beebe, Nelson
1439@cindex Brown, Martin
1440@cindex Buening, Andreas
1441@cindex Deifik, Scott
1442@cindex Hankerson, Darrel
1443@cindex Hasegawa, Isamu
1444@cindex Jaegermann, Michal
1445@cindex Kahrs, J@"urgen
1446@cindex Rankin, Pat
1447@cindex Rommel, Kai Uwe
1448@cindex Zaretskii, Eli
1449Nelson Beebe,
1450Martin Brown,
1451Andreas Buening,
1452Scott Deifik,
1453Darrel Hankerson,
1454Isamu Hasegawa,
1455Michal Jaegermann,
1456J@"urgen Kahrs,
1457Pat Rankin,
1458Kai Uwe Rommel,
1459and Eli Zaretskii
1460(in alphabetical order)
1461make up the
1462@command{gawk} ``crack portability team.''  Without their hard work and
1463help, @command{gawk} would not be nearly the fine program it is today.  It
1464has been and continues to be a pleasure working with this team of fine
1465people.
1466
1467@cindex Kernighan, Brian
1468David and I would like to thank Brian Kernighan of Bell Laboratories for
1469invaluable assistance during the testing and debugging of @command{gawk}, and for
1470help in clarifying numerous points about the language.  We could not have
1471done nearly as good a job on either @command{gawk} or its documentation without
1472his help.
1473
1474Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed
1475significant editorial help for this @value{DOCUMENT} for the
14763.1 release of @command{gawk}.
1477
1478@cindex Robbins, Miriam
1479@cindex Robbins, Jean
1480@cindex Robbins, Harry
1481@cindex G-d
1482I must thank my wonderful wife, Miriam, for her patience through
1483the many versions of this project, for her proofreading,
1484and for sharing me with the computer.
1485I would like to thank my parents for their love, and for the grace with
1486which they raised and educated me.
1487Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
1488He has sent my way, as well as for the gifts He has given me with which to
1489take advantage of those opportunities.
1490@sp 2
1491@noindent
1492Arnold Robbins @*
1493Nof Ayalon @*
1494ISRAEL @*
1495March, 2001
1496
1497@ignore
1498@c Try this
1499@iftex
1500@page
1501@headings off
1502@majorheading I@ @ @ @ The @command{awk} Language and @command{gawk}
1503Part I describes the @command{awk} language and @command{gawk} program in detail.
1504It starts with the basics, and continues through all of the features of @command{awk}
1505and @command{gawk}.  It contains the following chapters:
1506
1507@itemize @bullet
1508@item
1509@ref{Getting Started}.
1510
1511@item
1512@ref{Regexp}.
1513
1514@item
1515@ref{Reading Files}.
1516
1517@item
1518@ref{Printing}.
1519
1520@item
1521@ref{Expressions}.
1522
1523@item
1524@ref{Patterns and Actions}.
1525
1526@item
1527@ref{Arrays}.
1528
1529@item
1530@ref{Functions}.
1531
1532@item
1533@ref{Internationalization}.
1534
1535@item
1536@ref{Advanced Features}.
1537
1538@item
1539@ref{Invoking Gawk}.
1540@end itemize
1541
1542@page
1543@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
1544@oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
1545@end iftex
1546@end ignore
1547
1548@node Getting Started
1549@chapter Getting Started with @command{awk}
1550@c @cindex script, definition of
1551@c @cindex rule, definition of
1552@c @cindex program, definition of
1553@c @cindex basic function of @command{awk}
1554@cindex @command{awk}, function of
1555
1556The basic function of @command{awk} is to search files for lines (or other
1557units of text) that contain certain patterns.  When a line matches one
1558of the patterns, @command{awk} performs specified actions on that line.
1559@command{awk} keeps processing input lines in this way until it reaches
1560the end of the input files.
1561
1562@cindex @command{awk}, uses for
1563@c comma here is NOT for secondary
1564@cindex programming languages, data-driven vs. procedural
1565@cindex @command{awk} programs
1566Programs in @command{awk} are different from programs in most other languages,
1567because @command{awk} programs are @dfn{data-driven}; that is, you describe
1568the data you want to work with and then what to do when you find it.
1569Most other languages are @dfn{procedural}; you have to describe, in great
1570detail, every step the program is to take.  When working with procedural
1571languages, it is usually much
1572harder to clearly describe the data your program will process.
1573For this reason, @command{awk} programs are often refreshingly easy to
1574read and write.
1575
1576@cindex program, definition of
1577@cindex rule, definition of
1578When you run @command{awk}, you specify an @command{awk} @dfn{program} that
1579tells @command{awk} what to do.  The program consists of a series of
1580@dfn{rules}.  (It may also contain @dfn{function definitions},
1581an advanced feature that we will ignore for now.
1582@xref{User-defined}.)  Each rule specifies one
1583pattern to search for and one action to perform
1584upon finding the pattern.
1585
1586Syntactically, a rule consists of a pattern followed by an action.  The
1587action is enclosed in curly braces to separate it from the pattern.
1588Newlines usually separate rules.  Therefore, an @command{awk}
1589program looks like this:
1590
1591@example
1592@var{pattern} @{ @var{action} @}
1593@var{pattern} @{ @var{action} @}
1594@dots{}
1595@end example
1596
1597@menu
1598* Running gawk::                How to run @command{gawk} programs; includes
1599                                command-line syntax.
1600* Sample Data Files::           Sample data files for use in the @command{awk}
1601                                programs illustrated in this @value{DOCUMENT}.
1602* Very Simple::                 A very simple example.
1603* Two Rules::                   A less simple one-line example using two
1604                                rules.
1605* More Complex::                A more complex example.
1606* Statements/Lines::            Subdividing or combining statements into
1607                                lines.
1608* Other Features::              Other Features of @command{awk}.
1609* When::                        When to use @command{gawk} and when to use
1610                                other things.
1611@end menu
1612
1613@node Running gawk
1614@section How to Run @command{awk} Programs
1615
1616@cindex @command{awk} programs, running
1617There are several ways to run an @command{awk} program.  If the program is
1618short, it is easiest to include it in the command that runs @command{awk},
1619like this:
1620
1621@example
1622awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
1623@end example
1624
1625@cindex command line, formats
1626When the program is long, it is usually more convenient to put it in a file
1627and run it with a command like this:
1628
1629@example
1630awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
1631@end example
1632
1633This @value{SECTION} discusses both mechanisms, along with several
1634variations of each.
1635
1636@menu
1637* One-shot::                    Running a short throwaway @command{awk}
1638                                program.
1639* Read Terminal::               Using no input files (input from terminal
1640                                instead).
1641* Long::                        Putting permanent @command{awk} programs in
1642                                files.
1643* Executable Scripts::          Making self-contained @command{awk} programs.
1644* Comments::                    Adding documentation to @command{gawk}
1645                                programs.
1646* Quoting::                     More discussion of shell quoting issues.
1647@end menu
1648
1649@node One-shot
1650@subsection One-Shot Throwaway @command{awk} Programs
1651
1652Once you are familiar with @command{awk}, you will often type in simple
1653programs the moment you want to use them.  Then you can write the
1654program as the first argument of the @command{awk} command, like this:
1655
1656@example
1657awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
1658@end example
1659
1660@noindent
1661where @var{program} consists of a series of @var{patterns} and
1662@var{actions}, as described earlier.
1663
1664@cindex single quote (@code{'})
1665@cindex @code{'} (single quote)
1666This command format instructs the @dfn{shell}, or command interpreter,
1667to start @command{awk} and use the @var{program} to process records in the
1668input file(s).  There are single quotes around @var{program} so
1669the shell won't interpret any @command{awk} characters as special shell
1670characters.  The quotes also cause the shell to treat all of @var{program} as
1671a single argument for @command{awk}, and allow @var{program} to be more
1672than one line long.
1673
1674@cindex shells, scripts
1675@cindex @command{awk} programs, running, from shell scripts
1676This format is also useful for running short or medium-sized @command{awk}
1677programs from shell scripts, because it avoids the need for a separate
1678file for the @command{awk} program.  A self-contained shell script is more
1679reliable because there are no other files to misplace.
1680
1681@ref{Very Simple},
1682@ifnotinfo
1683later in this @value{CHAPTER},
1684@end ifnotinfo
1685presents several short,
1686self-contained programs.
1687
1688@c Removed for gawk 3.1, doesn't really add anything here.
1689@ignore
1690As an interesting side point, the command
1691
1692@example
1693awk '/foo/' @var{files} @dots{}
1694@end example
1695
1696@noindent
1697is essentially the same as
1698
1699@cindex @command{egrep} utility
1700@example
1701egrep foo @var{files} @dots{}
1702@end example
1703@end ignore
1704
1705@node Read Terminal
1706@subsection Running @command{awk} Without Input Files
1707
1708@cindex standard input
1709@cindex input, standard
1710@cindex input files, running @command{awk} without
1711You can also run @command{awk} without any input files.  If you type the
1712following command line:
1713
1714@example
1715awk '@var{program}'
1716@end example
1717
1718@noindent
1719@command{awk} applies the @var{program} to the @dfn{standard input},
1720which usually means whatever you type on the terminal.  This continues
1721until you indicate end-of-file by typing @kbd{@value{CTL}-d}.
1722(On other operating systems, the end-of-file character may be different.
1723For example, on OS/2 and MS-DOS, it is @kbd{@value{CTL}-z}.)
1724
1725@cindex files, input, See input files
1726@cindex input files, running @command{awk} without
1727@cindex @command{awk} programs, running, without input files
1728As an example, the following program prints a friendly piece of advice
1729(from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}),
1730to keep you from worrying about the complexities of computer programming
1731(@code{BEGIN} is a feature we haven't discussed yet):
1732
1733@example
1734$ awk "BEGIN @{ print \"Don't Panic!\" @}"
1735@print{} Don't Panic!
1736@end example
1737
1738@cindex quoting
1739@cindex double quote (@code{"})
1740@cindex @code{"} (double quote)
1741@cindex @code{\} (backslash)
1742@cindex backslash (@code{\})
1743This program does not read any input.  The @samp{\} before each of the
1744inner double quotes is necessary because of the shell's quoting
1745rules---in particular because it mixes both single quotes and
1746double quotes.@footnote{Although we generally recommend the use of single
1747quotes around the program text, double quotes are needed here in order to
1748put the single quote into the message.}
1749
1750This next simple @command{awk} program
1751emulates the @command{cat} utility; it copies whatever you type on the
1752keyboard to its standard output (why this works is explained shortly).
1753
1754@example
1755$ awk '@{ print @}'
1756Now is the time for all good men
1757@print{} Now is the time for all good men
1758to come to the aid of their country.
1759@print{} to come to the aid of their country.
1760Four score and seven years ago, ...
1761@print{} Four score and seven years ago, ...
1762What, me worry?
1763@print{} What, me worry?
1764@kbd{@value{CTL}-d}
1765@end example
1766
1767@node Long
1768@subsection Running Long Programs
1769
1770@cindex @command{awk} programs, running
1771@cindex @command{awk} programs, lengthy
1772@cindex files, @command{awk} programs in
1773Sometimes your @command{awk} programs can be very long.  In this case, it is
1774more convenient to put the program into a separate file.  In order to tell
1775@command{awk} to use that file for its program, you type:
1776
1777@example
1778awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
1779@end example
1780
1781@cindex @code{-f} option
1782@cindex command line, options
1783@cindex options, command-line
1784The @option{-f} instructs the @command{awk} utility to get the @command{awk} program
1785from the file @var{source-file}.  Any @value{FN} can be used for
1786@var{source-file}.  For example, you could put the program:
1787
1788@example
1789BEGIN @{ print "Don't Panic!" @}
1790@end example
1791
1792@noindent
1793into the file @file{advice}.  Then this command:
1794
1795@example
1796awk -f advice
1797@end example
1798
1799@noindent
1800does the same thing as this one:
1801
1802@example
1803awk "BEGIN @{ print \"Don't Panic!\" @}"
1804@end example
1805
1806@cindex quoting
1807@noindent
1808This was explained earlier
1809(@pxref{Read Terminal}).
1810Note that you don't usually need single quotes around the @value{FN} that you
1811specify with @option{-f}, because most @value{FN}s don't contain any of the shell's
1812special characters.  Notice that in @file{advice}, the @command{awk}
1813program did not have single quotes around it.  The quotes are only needed
1814for programs that are provided on the @command{awk} command line.
1815
1816@c STARTOFRANGE sq1x
1817@cindex single quote (@code{'})
1818@c STARTOFRANGE qs2x
1819@cindex @code{'} (single quote)
1820If you want to identify your @command{awk} program files clearly as such,
1821you can add the extension @file{.awk} to the @value{FN}.  This doesn't
1822affect the execution of the @command{awk} program but it does make
1823``housekeeping'' easier.
1824
1825@node Executable Scripts
1826@subsection Executable @command{awk} Programs
1827@cindex @command{awk} programs
1828@cindex @code{#} (number sign), @code{#!} (executable scripts)
1829@cindex number sign (@code{#}), @code{#!} (executable scripts)
1830@cindex Unix, @command{awk} scripts and
1831@cindex @code{#} (number sign), @code{#!} (executable scripts), portability issues with
1832@cindex number sign (@code{#}), @code{#!} (executable scripts), portability issues with
1833
1834Once you have learned @command{awk}, you may want to write self-contained
1835@command{awk} scripts, using the @samp{#!} script mechanism.  You can do
1836this on many Unix systems@footnote{The @samp{#!} mechanism works on
1837Linux systems,
1838systems derived from the 4.4-Lite Berkeley Software Distribution,
1839and most commercial Unix systems.} as well as on the GNU system.
1840For example, you could update the file @file{advice} to look like this:
1841
1842@example
1843#! /bin/awk -f
1844
1845BEGIN @{ print "Don't Panic!" @}
1846@end example
1847
1848@noindent
1849After making this file executable (with the @command{chmod} utility),
1850simply type @samp{advice}
1851at the shell and the system arranges to run @command{awk}@footnote{The
1852line beginning with @samp{#!} lists the full @value{FN} of an interpreter
1853to run and an optional initial command-line argument to pass to that
1854interpreter.  The operating system then runs the interpreter with the given
1855argument and the full argument list of the executed program.  The first argument
1856in the list is the full @value{FN} of the @command{awk} program.  The rest of the
1857argument list contains either options to @command{awk}, or @value{DF}s,
1858or both.} as if you had
1859typed @samp{awk -f advice}:
1860
1861@example
1862$ chmod +x advice
1863$ advice
1864@print{} Don't Panic!
1865@end example
1866
1867@noindent
1868(We assume you have the current directory in your shell's search
1869path variable (typically @code{$PATH}).  If not, you may need
1870to type @samp{./advice} at the shell.)
1871
1872Self-contained @command{awk} scripts are useful when you want to write a
1873program that users can invoke without their having to know that the program is
1874written in @command{awk}.
1875
1876@c fakenode --- for prepinfo
1877@subheading Advanced Notes: Portability Issues with @samp{#!}
1878@cindex portability, @code{#!} (executable scripts)
1879
1880Some systems limit the length of the interpreter name to 32 characters.
1881Often, this can be dealt with by using a symbolic link.
1882
1883You should not put more than one argument on the @samp{#!}
1884line after the path to @command{awk}. It does not work. The operating system
1885treats the rest of the line as a single argument and passes it to @command{awk}.
1886Doing this leads to confusing behavior---most likely a usage diagnostic
1887of some sort from @command{awk}.
1888
1889@cindex @code{ARGC}/@code{ARGV} variables, portability and
1890@cindex portability, @code{ARGV} variable
1891Finally,
1892the value of @code{ARGV[0]}
1893(@pxref{Built-in Variables})
1894varies depending upon your operating system.
1895Some systems put @samp{awk} there, some put the full pathname
1896of @command{awk} (such as @file{/bin/awk}), and some put the name
1897of your script (@samp{advice}).  Don't rely on the value of @code{ARGV[0]}
1898to provide your script name.
1899
1900@node Comments
1901@subsection Comments in @command{awk} Programs
1902@cindex @code{#} (number sign), commenting
1903@cindex number sign (@code{#}), commenting
1904@cindex commenting
1905@cindex @command{awk} programs, documenting
1906
1907A @dfn{comment} is some text that is included in a program for the sake
1908of human readers; it is not really an executable part of the program.  Comments
1909can explain what the program does and how it works.  Nearly all
1910programming languages have provisions for comments, as programs are
1911typically hard to understand without them.
1912
1913In the @command{awk} language, a comment starts with the sharp sign
1914character (@samp{#}) and continues to the end of the line.
1915The @samp{#} does not have to be the first character on the line. The
1916@command{awk} language ignores the rest of a line following a sharp sign.
1917For example, we could have put the following into @file{advice}:
1918
1919@example
1920# This program prints a nice friendly message.  It helps
1921# keep novice users from being afraid of the computer.
1922BEGIN    @{ print "Don't Panic!" @}
1923@end example
1924
1925You can put comment lines into keyboard-composed throwaway @command{awk}
1926programs, but this usually isn't very useful; the purpose of a
1927comment is to help you or another person understand the program
1928when reading it at a later time.
1929
1930@cindex quoting
1931@cindex single quote (@code{'}), vs. apostrophe
1932@cindex @code{'} (single quote), vs. apostrophe
1933@strong{Caution:} As mentioned in
1934@ref{One-shot},
1935you can enclose small to medium programs in single quotes, in order to keep
1936your shell scripts self-contained.  When doing so, @emph{don't} put
1937an apostrophe (i.e., a single quote) into a comment (or anywhere else
1938in your program). The shell interprets the quote as the closing
1939quote for the entire program. As a result, usually the shell
1940prints a message about mismatched quotes, and if @command{awk} actually
1941runs, it will probably print strange messages about syntax errors.
1942For example, look at the following:
1943
1944@example
1945$ awk '@{ print "hello" @} # let's be cute'
1946>
1947@end example
1948
1949The shell sees that the first two quotes match, and that
1950a new quoted object begins at the end of the command line.
1951It therefore prompts with the secondary prompt, waiting for more input.
1952With Unix @command{awk}, closing the quoted string produces this result:
1953
1954@example
1955$ awk '@{ print "hello" @} # let's be cute'
1956> '
1957@error{} awk: can't open file be
1958@error{}  source line number 1
1959@end example
1960
1961@cindex @code{\} (backslash)
1962@cindex backslash (@code{\})
1963Putting a backslash before the single quote in @samp{let's} wouldn't help,
1964since backslashes are not special inside single quotes.
1965The next @value{SUBSECTION} describes the shell's quoting rules.
1966
1967@node Quoting
1968@subsection Shell-Quoting Issues
1969@cindex quoting, rules for
1970
1971For short to medium length @command{awk} programs, it is most convenient
1972to enter the program on the @command{awk} command line.
1973This is best done by enclosing the entire program in single quotes.
1974This is true whether you are entering the program interactively at
1975the shell prompt, or writing it as part of a larger shell script:
1976
1977@example
1978awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{}
1979@end example
1980
1981@cindex shells, quoting, rules for
1982@cindex Bourne shell, quoting rules for
1983Once you are working with the shell, it is helpful to have a basic
1984knowledge of shell quoting rules.  The following rules apply only to
1985POSIX-compliant, Bourne-style shells (such as @command{bash}, the GNU Bourne-Again
1986Shell).  If you use @command{csh}, you're on your own.
1987
1988@itemize @bullet
1989@item
1990Quoted items can be concatenated with nonquoted items as well as with other
1991quoted items.  The shell turns everything into one argument for
1992the command.
1993
1994@item
1995Preceding any single character with a backslash (@samp{\}) quotes
1996that character.  The shell removes the backslash and passes the quoted
1997character on to the command.
1998
1999@item
2000@cindex @code{\} (backslash)
2001@cindex backslash (@code{\})
2002@cindex single quote (@code{'})
2003@cindex @code{'} (single quote)
2004Single quotes protect everything between the opening and closing quotes.
2005The shell does no interpretation of the quoted text, passing it on verbatim
2006to the command.
2007It is @emph{impossible} to embed a single quote inside single-quoted text.
2008Refer back to
2009@ref{Comments},
2010for an example of what happens if you try.
2011
2012@item
2013@cindex double quote (@code{"})
2014@cindex @code{"} (double quote)
2015Double quotes protect most things between the opening and closing quotes.
2016The shell does at least variable and command substitution on the quoted text.
2017Different shells may do additional kinds of processing on double-quoted text.
2018
2019Since certain characters within double-quoted text are processed by the shell,
2020they must be @dfn{escaped} within the text.  Of note are the characters
2021@samp{$}, @samp{`}, @samp{\}, and @samp{"}, all of which must be preceded by
2022a backslash within double-quoted text if they are to be passed on literally
2023to the program.  (The leading backslash is stripped first.)
2024Thus, the example seen
2025@ifnotinfo
2026previously
2027@end ifnotinfo
2028in @ref{Read Terminal},
2029is applicable:
2030
2031@example
2032$ awk "BEGIN @{ print \"Don't Panic!\" @}"
2033@print{} Don't Panic!
2034@end example
2035
2036@cindex single quote (@code{'}), with double quotes
2037@cindex @code{'} (single quote), with double quotes
2038Note that the single quote is not special within double quotes.
2039
2040@item
2041Null strings are removed when they occur as part of a non-null
2042command-line argument, while explicit non-null objects are kept.
2043For example, to specify that the field separator @code{FS} should
2044be set to the null string, use:
2045
2046@example
2047awk -F "" '@var{program}' @var{files} # correct
2048@end example
2049
2050@noindent
2051@cindex null strings, quoting and
2052Don't use this:
2053
2054@example
2055awk -F"" '@var{program}' @var{files}  # wrong!
2056@end example
2057
2058@noindent
2059In the second case, @command{awk} will attempt to use the text of the program
2060as the value of @code{FS}, and the first @value{FN} as the text of the program!
2061This results in syntax errors at best, and confusing behavior at worst.
2062@end itemize
2063
2064@cindex quoting, tricks for
2065Mixing single and double quotes is difficult.  You have to resort
2066to shell quoting tricks, like this:
2067
2068@example
2069$ awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'
2070@print{} Here is a single quote <'>
2071@end example
2072
2073@noindent
2074This program consists of three concatenated quoted strings.  The first and the
2075third are single-quoted, the second is double-quoted.
2076
2077This can be ``simplified'' to:
2078
2079@example
2080$ awk 'BEGIN @{ print "Here is a single quote <'\''>" @}'
2081@print{} Here is a single quote <'>
2082@end example
2083
2084@noindent
2085Judge for yourself which of these two is the more readable.
2086
2087Another option is to use double quotes, escaping the embedded, @command{awk}-level
2088double quotes:
2089
2090@example
2091$ awk "BEGIN @{ print \"Here is a single quote <'>\" @}"
2092@print{} Here is a single quote <'>
2093@end example
2094
2095@noindent
2096@c ENDOFRANGE sq1x
2097@c ENDOFRANGE qs2x
2098This option is also painful, because double quotes, backslashes, and dollar signs
2099are very common in @command{awk} programs.
2100
2101If you really need both single and double quotes in your @command{awk}
2102program, it is probably best to move it into a separate file, where
2103the shell won't be part of the picture, and you can say what you mean.
2104
2105@node Sample Data Files
2106@section @value{DDF}s for the Examples
2107@c For gawk >= 3.2, update these data files. No-one has such slow modems!
2108
2109@cindex input files, examples
2110@cindex @code{BBS-list} file
2111Many of the examples in this @value{DOCUMENT} take their input from two sample
2112@value{DF}s.  The first, @file{BBS-list}, represents a list of
2113computer bulletin board systems together with information about those systems.
2114The second @value{DF}, called @file{inventory-shipped}, contains
2115information about monthly shipments.  In both files,
2116each line is considered to be one @dfn{record}.
2117
2118In the @value{DF} @file{BBS-list}, each record contains the name of a computer
2119bulletin board, its phone number, the board's baud rate(s), and a code for
2120the number of hours it is operational.  An @samp{A} in the last column
2121means the board operates 24 hours a day.  A @samp{B} in the last
2122column means the board only operates on evening and weekend hours.
2123A @samp{C} means the board operates only on weekends:
2124
2125@c 2e: Update the baud rates to reflect today's faster modems
2126@example
2127@c system if test ! -d eg      ; then mkdir eg      ; fi
2128@c system if test ! -d eg/lib  ; then mkdir eg/lib  ; fi
2129@c system if test ! -d eg/data ; then mkdir eg/data ; fi
2130@c system if test ! -d eg/prog ; then mkdir eg/prog ; fi
2131@c system if test ! -d eg/misc ; then mkdir eg/misc ; fi
2132@c file eg/data/BBS-list
2133aardvark     555-5553     1200/300          B
2134alpo-net     555-3412     2400/1200/300     A
2135barfly       555-7685     1200/300          A
2136bites        555-1675     2400/1200/300     A
2137camelot      555-0542     300               C
2138core         555-2912     1200/300          C
2139fooey        555-1234     2400/1200/300     B
2140foot         555-6699     1200/300          B
2141macfoo       555-6480     1200/300          A
2142sdace        555-3430     2400/1200/300     A
2143sabafoo      555-2127     1200/300          C
2144@c endfile
2145@end example
2146
2147@cindex @code{inventory-shipped} file
2148The @value{DF} @file{inventory-shipped} represents
2149information about shipments during the year.
2150Each record contains the month, the number
2151of green crates shipped, the number of red boxes shipped, the number of
2152orange bags shipped, and the number of blue packages shipped,
2153respectively.  There are 16 entries, covering the 12 months of last year
2154and the first four months of the current year.
2155
2156@example
2157@c file eg/data/inventory-shipped
2158Jan  13  25  15 115
2159Feb  15  32  24 226
2160Mar  15  24  34 228
2161Apr  31  52  63 420
2162May  16  34  29 208
2163Jun  31  42  75 492
2164Jul  24  34  67 436
2165Aug  15  34  47 316
2166Sep  13  55  37 277
2167Oct  29  54  68 525
2168Nov  20  87  82 577
2169Dec  17  35  61 401
2170
2171Jan  21  36  64 620
2172Feb  26  58  80 652
2173Mar  24  75  70 495
2174Apr  21  70  74 514
2175@c endfile
2176@end example
2177
2178@ifinfo
2179If you are reading this in GNU Emacs using Info, you can copy the regions
2180of text showing these sample files into your own test files.  This way you
2181can try out the examples shown in the remainder of this document.  You do
2182this by using the command @kbd{M-x write-region} to copy text from the Info
2183file into a file for use with @command{awk}
2184(@xref{Misc File Ops, , Miscellaneous File Operations, emacs, GNU Emacs Manual},
2185for more information).  Using this information, create your own
2186@file{BBS-list} and @file{inventory-shipped} files and practice what you
2187learn in this @value{DOCUMENT}.
2188
2189@cindex Texinfo
2190If you are using the stand-alone version of Info,
2191see @ref{Extract Program},
2192for an @command{awk} program that extracts these @value{DF}s from
2193@file{gawk.texi}, the Texinfo source file for this Info file.
2194@end ifinfo
2195
2196@node Very Simple
2197@section Some Simple Examples
2198
2199The following command runs a simple @command{awk} program that searches the
2200input file @file{BBS-list} for the character string @samp{foo} (a
2201grouping of characters is usually called a @dfn{string};
2202the term @dfn{string} is based on similar usage in English, such
2203as ``a string of pearls,'' or ``a string of cars in a train''):
2204
2205@example
2206awk '/foo/ @{ print $0 @}' BBS-list
2207@end example
2208
2209@noindent
2210When lines containing @samp{foo} are found, they are printed because
2211@w{@samp{print $0}} means print the current line.  (Just @samp{print} by
2212itself means the same thing, so we could have written that
2213instead.)
2214
2215You will notice that slashes (@samp{/}) surround the string @samp{foo}
2216in the @command{awk} program.  The slashes indicate that @samp{foo}
2217is the pattern to search for.  This type of pattern is called a
2218@dfn{regular expression}, which is covered in more detail later
2219(@pxref{Regexp}).
2220The pattern is allowed to match parts of words.
2221There are
2222single quotes around the @command{awk} program so that the shell won't
2223interpret any of it as special shell characters.
2224
2225Here is what this program prints:
2226
2227@example
2228$ awk '/foo/ @{ print $0 @}' BBS-list
2229@print{} fooey        555-1234     2400/1200/300     B
2230@print{} foot         555-6699     1200/300          B
2231@print{} macfoo       555-6480     1200/300          A
2232@print{} sabafoo      555-2127     1200/300          C
2233@end example
2234
2235@cindex actions, default
2236@cindex patterns, default
2237In an @command{awk} rule, either the pattern or the action can be omitted,
2238but not both.  If the pattern is omitted, then the action is performed
2239for @emph{every} input line.  If the action is omitted, the default
2240action is to print all lines that match the pattern.
2241
2242@cindex actions, empty
2243Thus, we could leave out the action (the @code{print} statement and the curly
2244braces) in the previous example and the result would be the same: all
2245lines matching the pattern @samp{foo} are printed.  By comparison,
2246omitting the @code{print} statement but retaining the curly braces makes an
2247empty action that does nothing (i.e., no lines are printed).
2248
2249@cindex @command{awk} programs, one-line examples
2250Many practical @command{awk} programs are just a line or two.  Following is a
2251collection of useful, short programs to get you started.  Some of these
2252programs contain constructs that haven't been covered yet. (The description
2253of the program will give you a good idea of what is going on, but please
2254read the rest of the @value{DOCUMENT} to become an @command{awk} expert!)
2255Most of the examples use a @value{DF} named @file{data}.  This is just a
2256placeholder; if you use these programs yourself, substitute
2257your own @value{FN}s for @file{data}.
2258For future reference, note that there is often more than
2259one way to do things in @command{awk}.  At some point, you may want
2260to look back at these examples and see if
2261you can come up with different ways to do the same things shown here:
2262
2263@itemize @bullet
2264@item
2265Print the length of the longest input line:
2266
2267@example
2268awk '@{ if (length($0) > max) max = length($0) @}
2269     END @{ print max @}' data
2270@end example
2271
2272@item
2273Print every line that is longer than 80 characters:
2274
2275@example
2276awk 'length($0) > 80' data
2277@end example
2278
2279The sole rule has a relational expression as its pattern and it has no
2280action---so the default action, printing the record, is used.
2281
2282@cindex @command{expand} utility
2283@item
2284Print the length of the longest line in @file{data}:
2285
2286@example
2287expand data | awk '@{ if (x < length()) x = length() @}
2288              END @{ print "maximum line length is " x @}'
2289@end example
2290
2291The input is processed by the @command{expand} utility to change tabs
2292into spaces, so the widths compared are actually the right-margin columns.
2293
2294@item
2295Print every line that has at least one field:
2296
2297@example
2298awk 'NF > 0' data
2299@end example
2300
2301This is an easy way to delete blank lines from a file (or rather, to
2302create a new file similar to the old file but from which the blank lines
2303have been removed).
2304
2305@item
2306Print seven random numbers from 0 to 100, inclusive:
2307
2308@example
2309awk 'BEGIN @{ for (i = 1; i <= 7; i++)
2310                 print int(101 * rand()) @}'
2311@end example
2312
2313@item
2314Print the total number of bytes used by @var{files}:
2315
2316@example
2317ls -l @var{files} | awk '@{ x += $5 @}
2318                  END @{ print "total bytes: " x @}'
2319@end example
2320
2321@item
2322Print the total number of kilobytes used by @var{files}:
2323
2324@c Don't use \ continuation, not discussed yet
2325@example
2326ls -l @var{files} | awk '@{ x += $5 @}
2327   END @{ print "total K-bytes: " (x + 1023)/1024 @}'
2328@end example
2329
2330@item
2331Print a sorted list of the login names of all users:
2332
2333@example
2334awk -F: '@{ print $1 @}' /etc/passwd | sort
2335@end example
2336
2337@item
2338Count the lines in a file:
2339
2340@example
2341awk 'END @{ print NR @}' data
2342@end example
2343
2344@item
2345Print the even-numbered lines in the @value{DF}:
2346
2347@example
2348awk 'NR % 2 == 0' data
2349@end example
2350
2351If you use the expression @samp{NR % 2 == 1} instead,
2352the program would print the odd-numbered lines.
2353@end itemize
2354
2355@node Two Rules
2356@section An Example with Two Rules
2357@cindex @command{awk} programs
2358
2359The @command{awk} utility reads the input files one line at a
2360time.  For each line, @command{awk} tries the patterns of each of the rules.
2361If several patterns match, then several actions are run in the order in
2362which they appear in the @command{awk} program.  If no patterns match, then
2363no actions are run.
2364
2365After processing all the rules that match the line (and perhaps there are none),
2366@command{awk} reads the next line.  (However,
2367@pxref{Next Statement},
2368and also @pxref{Nextfile Statement}).
2369This continues until the program reaches the end of the file.
2370For example, the following @command{awk} program contains two rules:
2371
2372@example
2373/12/  @{ print $0 @}
2374/21/  @{ print $0 @}
2375@end example
2376
2377@noindent
2378The first rule has the string @samp{12} as the
2379pattern and @samp{print $0} as the action.  The second rule has the
2380string @samp{21} as the pattern and also has @samp{print $0} as the
2381action.  Each rule's action is enclosed in its own pair of braces.
2382
2383This program prints every line that contains the string
2384@samp{12} @emph{or} the string @samp{21}.  If a line contains both
2385strings, it is printed twice, once by each rule.
2386
2387This is what happens if we run this program on our two sample @value{DF}s,
2388@file{BBS-list} and @file{inventory-shipped}:
2389
2390@example
2391$ awk '/12/ @{ print $0 @}
2392>      /21/ @{ print $0 @}' BBS-list inventory-shipped
2393@print{} aardvark     555-5553     1200/300          B
2394@print{} alpo-net     555-3412     2400/1200/300     A
2395@print{} barfly       555-7685     1200/300          A
2396@print{} bites        555-1675     2400/1200/300     A
2397@print{} core         555-2912     1200/300          C
2398@print{} fooey        555-1234     2400/1200/300     B
2399@print{} foot         555-6699     1200/300          B
2400@print{} macfoo       555-6480     1200/300          A
2401@print{} sdace        555-3430     2400/1200/300     A
2402@print{} sabafoo      555-2127     1200/300          C
2403@print{} sabafoo      555-2127     1200/300          C
2404@print{} Jan  21  36  64 620
2405@print{} Apr  21  70  74 514
2406@end example
2407
2408@noindent
2409Note how the line beginning with @samp{sabafoo}
2410in @file{BBS-list} was printed twice, once for each rule.
2411
2412@node More Complex
2413@section A More Complex Example
2414
2415Now that we've mastered some simple tasks, let's look at
2416what typical @command{awk}
2417programs do.  This example shows how @command{awk} can be used to
2418summarize, select, and rearrange the output of another utility.  It uses
2419features that haven't been covered yet, so don't worry if you don't
2420understand all the details:
2421
2422@example
2423ls -l | awk '$6 == "Nov" @{ sum += $5 @}
2424             END @{ print sum @}'
2425@end example
2426
2427@cindex @command{csh} utility, backslash continuation and
2428@cindex @command{ls} utility
2429@cindex backslash (@code{\}), continuing lines and, in @command{csh}
2430@cindex @code{\} (backslash), continuing lines and, in @command{csh}
2431This command prints the total number of bytes in all the files in the
2432current directory that were last modified in November (of any year).
2433@footnote{In the C shell (@command{csh}), you need to type
2434a semicolon and then a backslash at the end of the first line; see
2435@ref{Statements/Lines}, for an
2436explanation.  In a POSIX-compliant shell, such as the Bourne
2437shell or @command{bash}, you can type the example as shown.  If the command
2438@samp{echo $path} produces an empty output line, you are most likely
2439using a POSIX-compliant shell.  Otherwise, you are probably using the
2440C shell or a shell derived from it.}
2441The @w{@samp{ls -l}} part of this example is a system command that gives
2442you a listing of the files in a directory, including each file's size and the date
2443the file was last modified. Its output looks like this:
2444
2445@example
2446-rw-r--r--  1 arnold   user   1933 Nov  7 13:05 Makefile
2447-rw-r--r--  1 arnold   user  10809 Nov  7 13:03 awk.h
2448-rw-r--r--  1 arnold   user    983 Apr 13 12:14 awk.tab.h
2449-rw-r--r--  1 arnold   user  31869 Jun 15 12:20 awk.y
2450-rw-r--r--  1 arnold   user  22414 Nov  7 13:03 awk1.c
2451-rw-r--r--  1 arnold   user  37455 Nov  7 13:03 awk2.c
2452-rw-r--r--  1 arnold   user  27511 Dec  9 13:07 awk3.c
2453-rw-r--r--  1 arnold   user   7989 Nov  7 13:03 awk4.c
2454@end example
2455
2456@noindent
2457@cindex line continuations, with C shell
2458The first field contains read-write permissions, the second field contains
2459the number of links to the file, and the third field identifies the owner of
2460the file. The fourth field identifies the group of the file.
2461The fifth field contains the size of the file in bytes.  The
2462sixth, seventh, and eighth fields contain the month, day, and time,
2463respectively, that the file was last modified.  Finally, the ninth field
2464contains the name of the file.@footnote{On some
2465very old systems, you may need to use @samp{ls -lg} to get this output.}
2466
2467@c @cindex automatic initialization
2468@cindex initialization, automatic
2469The @samp{$6 == "Nov"} in our @command{awk} program is an expression that
2470tests whether the sixth field of the output from @w{@samp{ls -l}}
2471matches the string @samp{Nov}.  Each time a line has the string
2472@samp{Nov} for its sixth field, the action @samp{sum += $5} is
2473performed.  This adds the fifth field (the file's size) to the variable
2474@code{sum}.  As a result, when @command{awk} has finished reading all the
2475input lines, @code{sum} is the total of the sizes of the files whose
2476lines matched the pattern.  (This works because @command{awk} variables
2477are automatically initialized to zero.)
2478
2479After the last line of output from @command{ls} has been processed, the
2480@code{END} rule executes and prints the value of @code{sum}.
2481In this example, the value of @code{sum} is 80600.
2482
2483These more advanced @command{awk} techniques are covered in later sections
2484(@pxref{Action Overview}).  Before you can move on to more
2485advanced @command{awk} programming, you have to know how @command{awk} interprets
2486your input and displays your output.  By manipulating fields and using
2487@code{print} statements, you can produce some very useful and
2488impressive-looking reports.
2489
2490@node Statements/Lines
2491@section @command{awk} Statements Versus Lines
2492@cindex line breaks
2493@cindex newlines
2494
2495Most often, each line in an @command{awk} program is a separate statement or
2496separate rule, like this:
2497
2498@example
2499awk '/12/  @{ print $0 @}
2500     /21/  @{ print $0 @}' BBS-list inventory-shipped
2501@end example
2502
2503@cindex @command{gawk}, newlines in
2504However, @command{gawk} ignores newlines after any of the following
2505symbols and keywords:
2506
2507@example
2508,    @{    ?    :    ||    &&    do    else
2509@end example
2510
2511@noindent
2512A newline at any other point is considered the end of the
2513statement.@footnote{The @samp{?} and @samp{:} referred to here is the
2514three-operand conditional expression described in
2515@ref{Conditional Exp}.
2516Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk}
2517extension; if @option{--posix} is specified
2518(@pxref{Options}), then this extension is disabled.}
2519
2520@cindex @code{\} (backslash), continuing lines and
2521@cindex backslash (@code{\}), continuing lines and
2522If you would like to split a single statement into two lines at a point
2523where a newline would terminate it, you can @dfn{continue} it by ending the
2524first line with a backslash character (@samp{\}).  The backslash must be
2525the final character on the line in order to be recognized as a continuation
2526character.  A backslash is allowed anywhere in the statement, even
2527in the middle of a string or regular expression.  For example:
2528
2529@example
2530awk '/This regular expression is too long, so continue it\
2531 on the next line/ @{ print $1 @}'
2532@end example
2533
2534@noindent
2535@cindex portability, backslash continuation and
2536We have generally not used backslash continuation in the sample programs
2537in this @value{DOCUMENT}.  In @command{gawk}, there is no limit on the
2538length of a line, so backslash continuation is never strictly necessary;
2539it just makes programs more readable.  For this same reason, as well as
2540for clarity, we have kept most statements short in the sample programs
2541presented throughout the @value{DOCUMENT}.  Backslash continuation is
2542most useful when your @command{awk} program is in a separate source file
2543instead of entered from the command line.  You should also note that
2544many @command{awk} implementations are more particular about where you
2545may use backslash continuation. For example, they may not allow you to
2546split a string constant using backslash continuation.  Thus, for maximum
2547portability of your @command{awk} programs, it is best not to split your
2548lines in the middle of a regular expression or a string.
2549@c 10/2000: gawk, mawk, and current bell labs awk allow it,
2550@c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though!  sigh.
2551
2552@cindex @command{csh} utility
2553@cindex backslash (@code{\}), continuing lines and, in @command{csh}
2554@cindex @code{\} (backslash), continuing lines and, in @command{csh}
2555@strong{Caution:} @emph{Backslash continuation does not work as described
2556with the C shell.}  It works for @command{awk} programs in files and
2557for one-shot programs, @emph{provided} you are using a POSIX-compliant
2558shell, such as the Unix Bourne shell or @command{bash}.  But the C shell behaves
2559differently!  There, you must use two backslashes in a row, followed by
2560a newline.  Note also that when using the C shell, @emph{every} newline
2561in your awk program must be escaped with a backslash. To illustrate:
2562
2563@example
2564% awk 'BEGIN @{ \
2565?   print \\
2566?       "hello, world" \
2567? @}'
2568@print{} hello, world
2569@end example
2570
2571@noindent
2572Here, the @samp{%} and @samp{?} are the C shell's primary and secondary
2573prompts, analogous to the standard shell's @samp{$} and @samp{>}.
2574
2575Compare the previous example to how it is done with a POSIX-compliant shell:
2576
2577@example
2578$ awk 'BEGIN @{
2579>   print \
2580>       "hello, world"
2581> @}'
2582@print{} hello, world
2583@end example
2584
2585@command{awk} is a line-oriented language.  Each rule's action has to
2586begin on the same line as the pattern.  To have the pattern and action
2587on separate lines, you @emph{must} use backslash continuation; there
2588is no other option.
2589
2590@cindex backslash (@code{\}), continuing lines and, comments and
2591@cindex @code{\} (backslash), continuing lines and, comments and
2592@cindex commenting, backslash continuation and
2593Another thing to keep in mind is that backslash continuation and
2594comments do not mix. As soon as @command{awk} sees the @samp{#} that
2595starts a comment, it ignores @emph{everything} on the rest of the
2596line. For example:
2597
2598@example
2599$ gawk 'BEGIN @{ print "dont panic" # a friendly \
2600>                                    BEGIN rule
2601> @}'
2602@error{} gawk: cmd. line:2:                BEGIN rule
2603@error{} gawk: cmd. line:2:                ^ parse error
2604@end example
2605
2606@noindent
2607In this case, it looks like the backslash would continue the comment onto the
2608next line. However, the backslash-newline combination is never even
2609noticed because it is ``hidden'' inside the comment. Thus, the
2610@code{BEGIN} is noted as a syntax error.
2611
2612@cindex statements, multiple
2613@cindex @code{;} (semicolon)
2614@cindex semicolon (@code{;})
2615When @command{awk} statements within one rule are short, you might want to put
2616more than one of them on a line.  This is accomplished by separating the statements
2617with a semicolon (@samp{;}).
2618This also applies to the rules themselves.
2619Thus, the program shown at the start of this @value{SECTION}
2620could also be written this way:
2621
2622@example
2623/12/ @{ print $0 @} ; /21/ @{ print $0 @}
2624@end example
2625
2626@noindent
2627@strong{Note:} The requirement that states that rules on the same line must be
2628separated with a semicolon was not in the original @command{awk}
2629language; it was added for consistency with the treatment of statements
2630within an action.
2631
2632@node Other Features
2633@section Other Features of @command{awk}
2634
2635@cindex variables
2636The @command{awk} language provides a number of predefined, or
2637@dfn{built-in}, variables that your programs can use to get information
2638from @command{awk}.  There are other variables your program can set
2639as well to control how @command{awk} processes your data.
2640
2641In addition, @command{awk} provides a number of built-in functions for doing
2642common computational and string-related operations.
2643@command{gawk} provides built-in functions for working with timestamps,
2644performing bit manipulation, and for runtime string translation.
2645
2646As we develop our presentation of the @command{awk} language, we introduce
2647most of the variables and many of the functions. They are defined
2648systematically in @ref{Built-in Variables}, and
2649@ref{Built-in}.
2650
2651@node When
2652@section When to Use @command{awk}
2653
2654@cindex @command{awk}, uses for
2655Now that you've seen some of what @command{awk} can do,
2656you might wonder how @command{awk} could be useful for you.  By using
2657utility programs, advanced patterns, field separators, arithmetic
2658statements, and other selection criteria, you can produce much more
2659complex output.  The @command{awk} language is very useful for producing
2660reports from large amounts of raw data, such as summarizing information
2661from the output of other utility programs like @command{ls}.
2662(@xref{More Complex}.)
2663
2664Programs written with @command{awk} are usually much smaller than they would
2665be in other languages.  This makes @command{awk} programs easy to compose and
2666use.  Often, @command{awk} programs can be quickly composed at your terminal,
2667used once, and thrown away.  Because @command{awk} programs are interpreted, you
2668can avoid the (usually lengthy) compilation part of the typical
2669edit-compile-test-debug cycle of software development.
2670
2671Complex programs have been written in @command{awk}, including a complete
2672retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for
2673more information), and a microcode assembler for a special-purpose Prolog
2674computer.  However, @command{awk}'s capabilities are strained by tasks of
2675such complexity.
2676
2677@cindex @command{awk} programs, complex
2678If you find yourself writing @command{awk} scripts of more than, say, a few
2679hundred lines, you might consider using a different programming
2680language.  Emacs Lisp is a good choice if you need sophisticated string
2681or pattern matching capabilities.  The shell is also good at string and
2682pattern matching; in addition, it allows powerful use of the system
2683utilities.  More conventional languages, such as C, C++, and Java, offer
2684better facilities for system programming and for managing the complexity
2685of large programs.  Programs in these languages may require more lines
2686of source code than the equivalent @command{awk} programs, but they are
2687easier to maintain and usually run more efficiently.
2688
2689@node Regexp
2690@chapter Regular Expressions
2691@cindex regexp, See regular expressions
2692@c STARTOFRANGE regexp
2693@cindex regular expressions
2694
2695A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
2696set of strings.
2697Because regular expressions are such a fundamental part of @command{awk}
2698programming, their format and use deserve a separate @value{CHAPTER}.
2699
2700@cindex forward slash (@code{/})
2701@cindex @code{/} (forward slash)
2702A regular expression enclosed in slashes (@samp{/})
2703is an @command{awk} pattern that matches every input record whose text
2704belongs to that set.
2705The simplest regular expression is a sequence of letters, numbers, or
2706both.  Such a regexp matches any string that contains that sequence.
2707Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
2708Therefore, the pattern @code{/foo/} matches any input record containing
2709the three characters @samp{foo} @emph{anywhere} in the record.  Other
2710kinds of regexps let you specify more complicated classes of strings.
2711
2712@ifnotinfo
2713Initially, the examples in this @value{CHAPTER} are simple.
2714As we explain more about how
2715regular expressions work, we will present more complicated instances.
2716@end ifnotinfo
2717
2718@menu
2719* Regexp Usage::                How to Use Regular Expressions.
2720* Escape Sequences::            How to write nonprinting characters.
2721* Regexp Operators::            Regular Expression Operators.
2722* Character Lists::             What can go between @samp{[...]}.
2723* GNU Regexp Operators::        Operators specific to GNU software.
2724* Case-sensitivity::            How to do case-insensitive matching.
2725* Leftmost Longest::            How much text matches.
2726* Computed Regexps::            Using Dynamic Regexps.
2727* Locales::                     How the locale affects things.
2728@end menu
2729
2730@node Regexp Usage
2731@section How to Use Regular Expressions
2732
2733@cindex regular expressions, as patterns
2734A regular expression can be used as a pattern by enclosing it in
2735slashes.  Then the regular expression is tested against the
2736entire text of each record.  (Normally, it only needs
2737to match some part of the text in order to succeed.)  For example, the
2738following prints the second field of each record that contains the string
2739@samp{foo} anywhere in it:
2740
2741@example
2742$ awk '/foo/ @{ print $2 @}' BBS-list
2743@print{} 555-1234
2744@print{} 555-6699
2745@print{} 555-6480
2746@print{} 555-2127
2747@end example
2748
2749@cindex regular expressions, operators
2750@cindex operators, string-matching
2751@c @cindex operators, @code{~}
2752@cindex string-matching operators
2753@code{~} (tilde), @code{~} operator
2754@cindex tilde (@code{~}), @code{~} operator
2755@cindex @code{!} (exclamation point), @code{!~} operator
2756@cindex exclamation point (@code{!}), @code{!~} operator
2757@c @cindex operators, @code{!~}
2758@cindex @code{if} statement
2759@cindex @code{while} statement
2760@cindex @code{do}-@code{while} statement
2761@c @cindex statements, @code{if}
2762@c @cindex statements, @code{while}
2763@c @cindex statements, @code{do}
2764Regular expressions can also be used in matching expressions.  These
2765expressions allow you to specify the string to match against; it need
2766not be the entire current input record.  The two operators @samp{~}
2767and @samp{!~} perform regular expression comparisons.  Expressions
2768using these operators can be used as patterns, or in @code{if},
2769@code{while}, @code{for}, and @code{do} statements.
2770(@xref{Statements}.)
2771For example:
2772
2773@example
2774@var{exp} ~ /@var{regexp}/
2775@end example
2776
2777@noindent
2778is true if the expression @var{exp} (taken as a string)
2779matches @var{regexp}.  The following example matches, or selects,
2780all input records with the uppercase letter @samp{J} somewhere in the
2781first field:
2782
2783@example
2784$ awk '$1 ~ /J/' inventory-shipped
2785@print{} Jan  13  25  15 115
2786@print{} Jun  31  42  75 492
2787@print{} Jul  24  34  67 436
2788@print{} Jan  21  36  64 620
2789@end example
2790
2791So does this:
2792
2793@example
2794awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
2795@end example
2796
2797This next example is true if the expression @var{exp}
2798(taken as a character string)
2799does @emph{not} match @var{regexp}:
2800
2801@example
2802@var{exp} !~ /@var{regexp}/
2803@end example
2804
2805The following example matches,
2806or selects, all input records whose first field @emph{does not} contain
2807the uppercase letter @samp{J}:
2808
2809@example
2810$ awk '$1 !~ /J/' inventory-shipped
2811@print{} Feb  15  32  24 226
2812@print{} Mar  15  24  34 228
2813@print{} Apr  31  52  63 420
2814@print{} May  16  34  29 208
2815@dots{}
2816@end example
2817
2818@cindex regexp constants
2819@cindex regular expressions, constants, See regexp constants
2820When a regexp is enclosed in slashes, such as @code{/foo/}, we call it
2821a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
2822@code{"foo"} is a string constant.
2823
2824@node Escape Sequences
2825@section Escape Sequences
2826
2827@cindex escape sequences
2828@cindex backslash (@code{\}), in escape sequences
2829@cindex @code{\} (backslash), in escape sequences
2830Some characters cannot be included literally in string constants
2831(@code{"foo"}) or regexp constants (@code{/foo/}).
2832Instead, they should be represented with @dfn{escape sequences},
2833which are character sequences beginning with a backslash (@samp{\}).
2834One use of an escape sequence is to include a double-quote character in
2835a string constant.  Because a plain double quote ends the string, you
2836must use @samp{\"} to represent an actual double-quote character as a
2837part of the string.  For example:
2838
2839@example
2840$ awk 'BEGIN @{ print "He said \"hi!\" to her." @}'
2841@print{} He said "hi!" to her.
2842@end example
2843
2844The  backslash character itself is another character that cannot be
2845included normally; you must write @samp{\\} to put one backslash in the
2846string or regexp.  Thus, the string whose contents are the two characters
2847@samp{"} and @samp{\} must be written @code{"\"\\"}.
2848
2849Backslash also represents unprintable characters
2850such as TAB or newline.  While there is nothing to stop you from entering most
2851unprintable characters directly in a string constant or regexp constant,
2852they may look ugly.
2853
2854The following table lists
2855all the escape sequences used in @command{awk} and
2856what they represent. Unless noted otherwise, all these escape
2857sequences apply to both string constants and regexp constants:
2858
2859@table @code
2860@item \\
2861A literal backslash, @samp{\}.
2862
2863@c @cindex @command{awk} language, V.4 version
2864@cindex @code{\} (backslash), @code{\a} escape sequence
2865@cindex backslash (@code{\}), @code{\a} escape sequence
2866@item \a
2867The ``alert'' character, @kbd{@value{CTL}-g}, ASCII code 7 (BEL).
2868(This usually makes some sort of audible noise.)
2869
2870@cindex @code{\} (backslash), @code{\b} escape sequence
2871@cindex backslash (@code{\}), @code{\b} escape sequence
2872@item \b
2873Backspace, @kbd{@value{CTL}-h}, ASCII code 8 (BS).
2874
2875@cindex @code{\} (backslash), @code{\f} escape sequence
2876@cindex backslash (@code{\}), @code{\f} escape sequence
2877@item \f
2878Formfeed, @kbd{@value{CTL}-l}, ASCII code 12 (FF).
2879
2880@cindex @code{\} (backslash), @code{\n} escape sequence
2881@cindex backslash (@code{\}), @code{\n} escape sequence
2882@item \n
2883Newline, @kbd{@value{CTL}-j}, ASCII code 10 (LF).
2884
2885@cindex @code{\} (backslash), @code{\r} escape sequence
2886@cindex backslash (@code{\}), @code{\r} escape sequence
2887@item \r
2888Carriage return, @kbd{@value{CTL}-m}, ASCII code 13 (CR).
2889
2890@cindex @code{\} (backslash), @code{\t} escape sequence
2891@cindex backslash (@code{\}), @code{\t} escape sequence
2892@item \t
2893Horizontal TAB, @kbd{@value{CTL}-i}, ASCII code 9 (HT).
2894
2895@c @cindex @command{awk} language, V.4 version
2896@cindex @code{\} (backslash), @code{\v} escape sequence
2897@cindex backslash (@code{\}), @code{\v} escape sequence
2898@item \v
2899Vertical tab, @kbd{@value{CTL}-k}, ASCII code 11 (VT).
2900
2901@cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence
2902@cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence
2903@item \@var{nnn}
2904The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits
2905between @samp{0} and @samp{7}.  For example, the code for the ASCII ESC
2906(escape) character is @samp{\033}.
2907
2908@c @cindex @command{awk} language, V.4 version
2909@c @cindex @command{awk} language, POSIX version
2910@cindex @code{\} (backslash), @code{\x} escape sequence
2911@cindex backslash (@code{\}), @code{\x} escape sequence
2912@item \x@var{hh}@dots{}
2913The hexadecimal value @var{hh}, where @var{hh} stands for a sequence
2914of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F}
2915or @samp{a}--@samp{f}).  Like the same construct
2916in ISO C, the escape sequence continues until the first nonhexadecimal
2917digit is seen.  However, using more than two hexadecimal digits produces
2918undefined results. (The @samp{\x} escape sequence is not allowed in
2919POSIX @command{awk}.)
2920
2921@cindex @code{\} (backslash), @code{\/} escape sequence
2922@cindex backslash (@code{\}), @code{\/} escape sequence
2923@item \/
2924A literal slash (necessary for regexp constants only).
2925This expression is used when you want to write a regexp
2926constant that contains a slash. Because the regexp is delimited by
2927slashes, you need to escape the slash that is part of the pattern,
2928in order to tell @command{awk} to keep processing the rest of the regexp.
2929
2930@cindex @code{\} (backslash), @code{\"} escape sequence
2931@cindex backslash (@code{\}), @code{\"} escape sequence
2932@item \"
2933A literal double quote (necessary for string constants only).
2934This expression is used when you want to write a string
2935constant that contains a double quote. Because the string is delimited by
2936double quotes, you need to escape the quote that is part of the string,
2937in order to tell @command{awk} to keep processing the rest of the string.
2938@end table
2939
2940In @command{gawk}, a number of additional two-character sequences that begin
2941with a backslash have special meaning in regexps.
2942@xref{GNU Regexp Operators}.
2943
2944In a regexp, a backslash before any character that is not in the previous list
2945and not listed in
2946@ref{GNU Regexp Operators},
2947means that the next character should be taken literally, even if it would
2948normally be a regexp operator.  For example, @code{/a\+b/} matches the three
2949characters @samp{a+b}.
2950
2951@cindex backslash (@code{\}), in escape sequences
2952@cindex @code{\} (backslash), in escape sequences
2953@cindex portability
2954For complete portability, do not use a backslash before any character not
2955shown in the previous list.
2956
2957To summarize:
2958
2959@itemize @bullet
2960@item
2961The escape sequences in the table above are always processed first,
2962for both string constants and regexp constants. This happens very early,
2963as soon as @command{awk} reads your program.
2964
2965@item
2966@command{gawk} processes both regexp constants and dynamic regexps
2967(@pxref{Computed Regexps}),
2968for the special operators listed in
2969@ref{GNU Regexp Operators}.
2970
2971@item
2972A backslash before any other character means to treat that character
2973literally.
2974@end itemize
2975
2976@c fakenode --- for prepinfo
2977@subheading Advanced Notes: Backslash Before Regular Characters
2978@cindex portability, backslash in escape sequences
2979@cindex POSIX @command{awk}, backslashes in string constants
2980@cindex backslash (@code{\}), in escape sequences, POSIX and
2981@cindex @code{\} (backslash), in escape sequences, POSIX and
2982
2983@cindex troubleshooting, backslash before nonspecial character
2984If you place a backslash in a string constant before something that is
2985not one of the characters previously listed, POSIX @command{awk} purposely
2986leaves what happens as undefined.  There are two choices:
2987
2988@c @cindex automatic warnings
2989@c @cindex warnings, automatic
2990@table @asis
2991@item Strip the backslash out
2992This is what Unix @command{awk} and @command{gawk} both do.
2993For example, @code{"a\qc"} is the same as @code{"aqc"}.
2994(Because this is such an easy bug both to introduce and to miss,
2995@command{gawk} warns you about it.)
2996Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
2997surrounded by whitespace as the field separator. There should be
2998two backslashes in the string @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
2999@c I did this!  This is why I added the warning.
3000
3001@cindex @command{gawk}, escape sequences
3002@cindex Unix @command{awk}, backslashes in escape sequences
3003@item Leave the backslash alone
3004Some other @command{awk} implementations do this.
3005In such implementations, typing @code{"a\qc"} is the same as typing
3006@code{"a\\qc"}.
3007@end table
3008
3009@c fakenode --- for prepinfo
3010@subheading Advanced Notes: Escape Sequences for Metacharacters
3011@cindex metacharacters, escape sequences for
3012
3013Suppose you use an octal or hexadecimal
3014escape to represent a regexp metacharacter.
3015(See @ref{Regexp Operators}.)
3016Does @command{awk} treat the character as a literal character or as a regexp
3017operator?
3018
3019@cindex dark corner, escape sequences, for metacharacters
3020Historically, such characters were taken literally.
3021@value{DARKCORNER}
3022However, the POSIX standard indicates that they should be treated
3023as real metacharacters, which is what @command{gawk} does.
3024In compatibility mode (@pxref{Options}),
3025@command{gawk} treats the characters represented by octal and hexadecimal
3026escape sequences literally when used in regexp constants. Thus,
3027@code{/a\52b/} is equivalent to @code{/a\*b/}.
3028
3029@node Regexp Operators
3030@section Regular Expression Operators
3031@c STARTOFRANGE regexpo
3032@cindex regular expressions, operators
3033
3034You can combine regular expressions with special characters,
3035called @dfn{regular expression operators} or @dfn{metacharacters}, to
3036increase the power and versatility of regular expressions.
3037
3038The escape sequences described
3039@ifnotinfo
3040earlier
3041@end ifnotinfo
3042in @ref{Escape Sequences},
3043are valid inside a regexp.  They are introduced by a @samp{\} and
3044are recognized and converted into corresponding real characters as
3045the very first step in processing regexps.
3046
3047Here is a list of metacharacters.  All characters that are not escape
3048sequences and that are not listed in the table stand for themselves:
3049
3050@table @code
3051@cindex backslash (@code{\})
3052@cindex @code{\} (backslash)
3053@item \
3054This is used to suppress the special meaning of a character when
3055matching.  For example, @samp{\$}
3056matches the character @samp{$}.
3057
3058@cindex regular expressions, anchors in
3059@cindex Texinfo, chapter beginnings in files
3060@cindex @code{^} (caret)
3061@cindex caret (@code{^})
3062@item ^
3063This matches the beginning of a string.  For example, @samp{^@@chapter}
3064matches @samp{@@chapter} at the beginning of a string and can be used
3065to identify chapter beginnings in Texinfo source files.
3066The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to
3067match only at the beginning of the string.
3068
3069It is important to realize that @samp{^} does not match the beginning of
3070a line embedded in a string.
3071The condition is not true in the following example:
3072
3073@example
3074if ("line1\nLINE 2" ~ /^L/) @dots{}
3075@end example
3076
3077@cindex @code{$} (dollar sign)
3078@cindex dollar sign (@code{$})
3079@item $
3080This is similar to @samp{^}, but it matches only at the end of a string.
3081For example, @samp{p$}
3082matches a record that ends with a @samp{p}.  The @samp{$} is an anchor
3083and does not match the end of a line embedded in a string.
3084The condition in the following example is not true:
3085
3086@example
3087if ("line1\nLINE 2" ~ /1$/) @dots{}
3088@end example
3089
3090@cindex @code{.} (period)
3091@cindex period (@code{.})
3092@item .
3093This matches any single character,
3094@emph{including} the newline character.  For example, @samp{.P}
3095matches any single character followed by a @samp{P} in a string.  Using
3096concatenation, we can make a regular expression such as @samp{U.A}, which
3097matches any three-character sequence that begins with @samp{U} and ends
3098with @samp{A}.
3099
3100@c comma before using does NOT do tertiary
3101@cindex POSIX @command{awk}, period (@code{.}), using
3102In strict POSIX mode (@pxref{Options}),
3103@samp{.} does not match the @sc{nul}
3104character, which is a character with all bits equal to zero.
3105Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
3106may not be able to match the @sc{nul} character.
3107
3108@cindex @code{[]} (square brackets)
3109@cindex square brackets (@code{[]})
3110@cindex character lists
3111@cindex character sets, See Also character lists
3112@cindex bracket expressions, See character lists
3113@item [@dots{}]
3114This is called a @dfn{character list}.@footnote{In other literature,
3115you may see a character list referred to as either a
3116@dfn{character set}, a @dfn{character class}, or a @dfn{bracket expression}.}
3117It matches any @emph{one} of the characters that are enclosed in
3118the square brackets.  For example, @samp{[MVX]} matches any one of
3119the characters @samp{M}, @samp{V}, or @samp{X} in a string.  A full
3120discussion of what can be inside the square brackets of a character list
3121is given in
3122@ref{Character Lists}.
3123
3124@cindex character lists, complemented
3125@item [^ @dots{}]
3126This is a @dfn{complemented character list}.  The first character after
3127the @samp{[} @emph{must} be a @samp{^}.  It matches any characters
3128@emph{except} those in the square brackets.  For example, @samp{[^awk]}
3129matches any character that is not an @samp{a}, @samp{w},
3130or @samp{k}.
3131
3132@cindex @code{|} (vertical bar)
3133@cindex vertical bar (@code{|})
3134@item |
3135This is the @dfn{alternation operator} and it is used to specify
3136alternatives.
3137The @samp{|} has the lowest precedence of all the regular
3138expression operators.
3139For example, @samp{^P|[[:digit:]]}
3140matches any string that matches either @samp{^P} or @samp{[[:digit:]]}.  This
3141means it matches any string that starts with @samp{P} or contains a digit.
3142
3143The alternation applies to the largest possible regexps on either side.
3144
3145@cindex @code{()} (parentheses)
3146@cindex parentheses @code{()}
3147@item (@dots{})
3148Parentheses are used for grouping in regular expressions, as in
3149arithmetic.  They can be used to concatenate regular expressions
3150containing the alternation operator, @samp{|}.  For example,
3151@samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and
3152@samp{@@samp@{bar@}}.
3153(These are Texinfo formatting control sequences. The @samp{+} is
3154explained further on in this list.)
3155
3156@cindex @code{*} (asterisk), @code{*} operator, as regexp operator
3157@cindex asterisk (@code{*}), @code{*} operator, as regexp operator
3158@item *
3159This symbol means that the preceding regular expression should be
3160repeated as many times as necessary to find a match.  For example, @samp{ph*}
3161applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
3162of one @samp{p} followed by any number of @samp{h}s.  This also matches
3163just @samp{p} if no @samp{h}s are present.
3164
3165The @samp{*} repeats the @emph{smallest} possible preceding expression.
3166(Use parentheses if you want to repeat a larger expression.)  It finds
3167as many repetitions as possible.  For example,
3168@samp{awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample}
3169prints every record in @file{sample} containing a string of the form
3170@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.
3171Notice the escaping of the parentheses by preceding them
3172with backslashes.
3173
3174@cindex @code{+} (plus sign)
3175@cindex plus sign (@code{+})
3176@item +
3177This symbol is similar to @samp{*}, except that the preceding expression must be
3178matched at least once.  This means that @samp{wh+y}
3179would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas
3180@samp{wh*y} would match all three of these strings.
3181The following is a simpler
3182way of writing the last @samp{*} example:
3183
3184@example
3185awk '/\(c[ad]+r x\)/ @{ print @}' sample
3186@end example
3187
3188@cindex @code{?} (question mark)
3189@cindex question mark (@code{?})
3190@item ?
3191This symbol is similar to @samp{*}, except that the preceding expression can be
3192matched either once or not at all.  For example, @samp{fe?d}
3193matches @samp{fed} and @samp{fd}, but nothing else.
3194
3195@cindex interval expressions
3196@item @{@var{n}@}
3197@itemx @{@var{n},@}
3198@itemx @{@var{n},@var{m}@}
3199One or two numbers inside braces denote an @dfn{interval expression}.
3200If there is one number in the braces, the preceding regexp is repeated
3201@var{n} times.
3202If there are two numbers separated by a comma, the preceding regexp is
3203repeated @var{n} to @var{m} times.
3204If there is one number followed by a comma, then the preceding regexp
3205is repeated at least @var{n} times:
3206
3207@table @code
3208@item wh@{3@}y
3209Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}.
3210
3211@item wh@{3,5@}y
3212Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy}, only.
3213
3214@item wh@{2,@}y
3215Matches @samp{whhy} or @samp{whhhy}, and so on.
3216@end table
3217
3218@cindex POSIX @command{awk}, interval expressions in
3219Interval expressions were not traditionally available in @command{awk}.
3220They were added as part of the POSIX standard to make @command{awk}
3221and @command{egrep} consistent with each other.
3222
3223@cindex @command{gawk}, interval expressions and
3224However, because old programs may use @samp{@{} and @samp{@}} in regexp
3225constants, by default @command{gawk} does @emph{not} match interval expressions
3226in regexps.  If either @option{--posix} or @option{--re-interval} are specified
3227(@pxref{Options}), then interval expressions
3228are allowed in regexps.
3229
3230For new programs that use @samp{@{} and @samp{@}} in regexp constants,
3231it is good practice to always escape them with a backslash.  Then the
3232regexp constants are valid and work the way you want them to, using
3233any version of @command{awk}.@footnote{Use two backslashes if you're
3234using a string constant with a regexp operator or function.}
3235@end table
3236
3237@cindex precedence, regexp operators
3238@cindex regular expressions, operators, precedence of
3239In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators,
3240as well as the braces @samp{@{} and @samp{@}},
3241have
3242the highest precedence, followed by concatenation, and finally by @samp{|}.
3243As in arithmetic, parentheses can change how operators are grouped.
3244
3245@cindex POSIX @command{awk}, regular expressions and
3246@cindex @command{gawk}, regular expressions, precedence
3247In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and @samp{?} operators
3248stand for themselves when there is nothing in the regexp that precedes them.
3249For example, @samp{/+/} matches a literal plus sign.  However, many other versions of
3250@command{awk} treat such a usage as a syntax error.
3251
3252If @command{gawk} is in compatibility mode
3253(@pxref{Options}),
3254POSIX character classes and interval expressions are not available in
3255regular expressions.
3256@c ENDOFRANGE regexpo
3257
3258@node Character Lists
3259@section Using Character Lists
3260@c STARTOFRANGE charlist
3261@cindex character lists
3262@cindex character lists, range expressions
3263@cindex range expressions
3264
3265Within a character list, a @dfn{range expression} consists of two
3266characters separated by a hyphen.  It matches any single character that
3267sorts between the two characters, using the locale's
3268collating sequence and character set.  For example, in the default C
3269locale, @samp{[a-dx-z]} is equivalent to @samp{[abcdxyz]}.  Many locales
3270sort characters in dictionary order, and in these locales,
3271@samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]}; instead it
3272might be equivalent to @samp{[aBbCcDdxXyYz]}, for example.  To obtain
3273the traditional interpretation of bracket expressions, you can use the C
3274locale by setting the @env{LC_ALL} environment variable to the value
3275@samp{C}.
3276
3277@cindex @code{\} (backslash), in character lists
3278@cindex backslash (@code{\}), in character lists
3279@cindex @code{^} (caret), in character lists
3280@cindex caret (@code{^}), in character lists
3281@cindex @code{-} (hyphen), in character lists
3282@cindex hyphen (@code{-}), in character lists
3283To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a
3284character list, put a @samp{\} in front of it.  For example:
3285
3286@example
3287[d\]]
3288@end example
3289
3290@noindent
3291matches either @samp{d} or @samp{]}.
3292
3293@cindex POSIX @command{awk}, character lists and
3294@cindex Extended Regular Expressions (EREs)
3295@cindex EREs (Extended Regular Expressions)
3296@cindex @command{egrep} utility
3297This treatment of @samp{\} in character lists
3298is compatible with other @command{awk}
3299implementations and is also mandated by POSIX.
3300The regular expressions in @command{awk} are a superset
3301of the POSIX specification for Extended Regular Expressions (EREs).
3302POSIX EREs are based on the regular expressions accepted by the
3303traditional @command{egrep} utility.
3304
3305@cindex character lists, character classes
3306@cindex POSIX @command{awk}, character lists and, character classes
3307@dfn{Character classes} are a new feature introduced in the POSIX standard.
3308A character class is a special notation for describing
3309lists of characters that have a specific attribute, but the
3310actual characters can vary from country to country and/or
3311from character set to character set.  For example, the notion of what
3312is an alphabetic character differs between the United States and France.
3313
3314A character class is only valid in a regexp @emph{inside} the
3315brackets of a character list.  Character classes consist of @samp{[:},
3316a keyword denoting the class, and @samp{:]}.  Here are the character
3317classes defined by the POSIX standard.
3318
3319@c the regular table is commented out while trying out the multitable.
3320@c leave it here in case we need to go back, but make sure the text
3321@c still corresponds!
3322
3323@ignore
3324@table @code
3325@item [:alnum:]
3326Alphanumeric characters.
3327
3328@item [:alpha:]
3329Alphabetic characters.
3330
3331@item [:blank:]
3332Space and TAB characters.
3333
3334@item [:cntrl:]
3335Control characters.
3336
3337@item [:digit:]
3338Numeric characters.
3339
3340@item [:graph:]
3341Characters that are printable and visible.
3342(A space is printable but not visible, whereas an @samp{a} is both.)
3343
3344@item [:lower:]
3345Lowercase alphabetic characters.
3346
3347@item [:print:]
3348Printable characters (characters that are not control characters).
3349
3350@item [:punct:]
3351Punctuation characters (characters that are not letters, digits,
3352control characters, or space characters).
3353
3354@item [:space:]
3355Space characters (such as space, TAB, and formfeed, to name a few).
3356
3357@item [:upper:]
3358Uppercase alphabetic characters.
3359
3360@item [:xdigit:]
3361Characters that are hexadecimal digits.
3362@end table
3363@end ignore
3364
3365@multitable {@code{[:xdigit:]}} {Characters that are both printable and visible.  (A space is}
3366@item @code{[:alnum:]} @tab Alphanumeric characters.
3367@item @code{[:alpha:]} @tab Alphabetic characters.
3368@item @code{[:blank:]} @tab Space and TAB characters.
3369@item @code{[:cntrl:]} @tab Control characters.
3370@item @code{[:digit:]} @tab Numeric characters.
3371@item @code{[:graph:]} @tab Characters that are both printable and visible.
3372(A space is printable but not visible, whereas an @samp{a} is both.)
3373@item @code{[:lower:]} @tab Lowercase alphabetic characters.
3374@item @code{[:print:]} @tab Printable characters (characters that are not control characters).
3375@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits,
3376control characters, or space characters).
3377@item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few).
3378@item @code{[:upper:]} @tab Uppercase alphabetic characters.
3379@item @code{[:xdigit:]} @tab Characters that are hexadecimal digits.
3380@end multitable
3381
3382For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/}
3383to match alphanumeric characters.  If your
3384character set had other alphabetic characters in it, this would not
3385match them, and if your character set collated differently from
3386ASCII, this might not even match the ASCII alphanumeric characters.
3387With the POSIX character classes, you can write
3388@code{/[[:alnum:]]/} to match the alphabetic
3389and numeric characters in your character set.
3390
3391@cindex character lists, collating elements
3392@cindex character lists, non-ASCII
3393@cindex collating elements
3394Two additional special sequences can appear in character lists.
3395These apply to non-ASCII character sets, which can have single symbols
3396(called @dfn{collating elements}) that are represented with more than one
3397character. They can also have several characters that are equivalent for
3398@dfn{collating}, or sorting, purposes.  (For example, in French, a plain ``e''
3399and a grave-accented ``@`e'' are equivalent.)
3400These sequences are:
3401
3402@table @asis
3403@cindex character lists, collating symbols
3404@cindex collating symbols
3405@item Collating symbols
3406Multicharacter collating elements enclosed between
3407@samp{[.} and @samp{.]}.  For example, if @samp{ch} is a collating element,
3408then @code{[[.ch.]]} is a regexp that matches this collating element, whereas
3409@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
3410
3411@cindex character lists, equivalence classes
3412@item Equivalence classes
3413Locale-specific names for a list of
3414characters that are equal. The name is enclosed between
3415@samp{[=} and @samp{=]}.
3416For example, the name @samp{e} might be used to represent all of
3417``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp
3418that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}.
3419@end table
3420
3421These features are very valuable in non-English-speaking locales.
3422
3423@cindex internationalization, localization, character classes
3424@cindex @command{gawk}, character classes and
3425@cindex POSIX @command{awk}, character lists and, character classes
3426@strong{Caution:} The library functions that @command{gawk} uses for regular
3427expression matching currently recognize only POSIX character classes;
3428they do not recognize collating symbols or equivalence classes.
3429@c maybe one day ...
3430@c ENDOFRANGE charlist
3431
3432@node GNU Regexp Operators
3433@section @command{gawk}-Specific Regexp Operators
3434
3435@c This section adapted (long ago) from the regex-0.12 manual
3436
3437@c STARTOFRANGE regexpg
3438@cindex regular expressions, operators, @command{gawk}
3439@c STARTOFRANGE gregexp
3440@cindex @command{gawk}, regular expressions, operators
3441@cindex operators, GNU-specific
3442@cindex regular expressions, operators, for words
3443@cindex word, regexp definition of
3444GNU software that deals with regular expressions provides a number of
3445additional regexp operators.  These operators are described in this
3446@value{SECTION} and are specific to @command{gawk};
3447they are not available in other @command{awk} implementations.
3448Most of the additional operators deal with word matching.
3449For our purposes, a @dfn{word} is a sequence of one or more letters, digits,
3450or underscores (@samp{_}):
3451
3452@table @code
3453@c @cindex operators, @code{\w} (@command{gawk})
3454@cindex backslash (@code{\}), @code{\w} operator (@command{gawk})
3455@cindex @code{\} (backslash), @code{\w} operator (@command{gawk})
3456@item \w
3457Matches any word-constituent character---that is, it matches any
3458letter, digit, or underscore. Think of it as shorthand for
3459@w{@code{[[:alnum:]_]}}.
3460
3461@c @cindex operators, @code{\W} (@command{gawk})
3462@cindex backslash (@code{\}), @code{\W} operator (@command{gawk})
3463@cindex @code{\} (backslash), @code{\W} operator (@command{gawk})
3464@item \W
3465Matches any character that is not word-constituent.
3466Think of it as shorthand for
3467@w{@code{[^[:alnum:]_]}}.
3468
3469@c @cindex operators, @code{\<} (@command{gawk})
3470@cindex backslash (@code{\}), @code{\<} operator (@command{gawk})
3471@cindex @code{\} (backslash), @code{\<} operator (@command{gawk})
3472@item \<
3473Matches the empty string at the beginning of a word.
3474For example, @code{/\<away/} matches @samp{away} but not
3475@samp{stowaway}.
3476
3477@c @cindex operators, @code{\>} (@command{gawk})
3478@cindex backslash (@code{\}), @code{\>} operator (@command{gawk})
3479@cindex @code{\} (backslash), @code{\>} operator (@command{gawk})
3480@item \>
3481Matches the empty string at the end of a word.
3482For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}.
3483
3484@c @cindex operators, @code{\y} (@command{gawk})
3485@cindex backslash (@code{\}), @code{\y} operator (@command{gawk})
3486@cindex @code{\} (backslash), @code{\y} operator (@command{gawk})
3487@c comma before using does NOT do secondary
3488@cindex word boundaries, matching
3489@item \y
3490Matches the empty string at either the beginning or the
3491end of a word (i.e., the word boundar@strong{y}).  For example, @samp{\yballs?\y}
3492matches either @samp{ball} or @samp{balls}, as a separate word.
3493
3494@c @cindex operators, @code{\B} (@command{gawk})
3495@cindex backslash (@code{\}), @code{\B} operator (@command{gawk})
3496@cindex @code{\} (backslash), @code{\B} operator (@command{gawk})
3497@item \B
3498Matches the empty string that occurs between two
3499word-constituent characters. For example,
3500@code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}.
3501@samp{\B} is essentially the opposite of @samp{\y}.
3502@end table
3503
3504@cindex buffers, operators for
3505@cindex regular expressions, operators, for buffers
3506@cindex operators, string-matching, for buffers
3507There are two other operators that work on buffers.  In Emacs, a
3508@dfn{buffer} is, naturally, an Emacs buffer.  For other programs,
3509@command{gawk}'s regexp library routines consider the entire
3510string to match as the buffer.
3511The operators are:
3512
3513@table @code
3514@item \`
3515@c @cindex operators, @code{\`} (@command{gawk})
3516@cindex backslash (@code{\}), @code{\`} operator (@command{gawk})
3517@cindex @code{\} (backslash), @code{\`} operator (@command{gawk})
3518Matches the empty string at the
3519beginning of a buffer (string).
3520
3521@c @cindex operators, @code{\'} (@command{gawk})
3522@cindex backslash (@code{\}), @code{\'} operator (@command{gawk})
3523@cindex @code{\} (backslash), @code{\'} operator (@command{gawk})
3524@item \'
3525Matches the empty string at the
3526end of a buffer (string).
3527@end table
3528
3529@cindex @code{^} (caret)
3530@cindex caret (@code{^})
3531@cindex @code{?} (question mark)
3532@cindex question mark (@code{?})
3533Because @samp{^} and @samp{$} always work in terms of the beginning
3534and end of strings, these operators don't add any new capabilities
3535for @command{awk}.  They are provided for compatibility with other
3536GNU software.
3537
3538@cindex @command{gawk}, word-boundary operator
3539@cindex word-boundary operator (@command{gawk})
3540@cindex operators, word-boundary (@command{gawk})
3541In other GNU software, the word-boundary operator is @samp{\b}. However,
3542that conflicts with the @command{awk} language's definition of @samp{\b}
3543as backspace, so @command{gawk} uses a different letter.
3544An alternative method would have been to require two backslashes in the
3545GNU operators, but this was deemed too confusing. The current
3546method of using @samp{\y} for the GNU @samp{\b} appears to be the
3547lesser of two evils.
3548
3549@c NOTE!!! Keep this in sync with the same table in the summary appendix!
3550@c
3551@c Should really do this with file inclusion.
3552@cindex regular expressions, @command{gawk}, command-line options
3553@cindex @command{gawk}, command-line options
3554The various command-line options
3555(@pxref{Options})
3556control how @command{gawk} interprets characters in regexps:
3557
3558@table @asis
3559@item No options
3560In the default case, @command{gawk} provides all the facilities of
3561POSIX regexps and the
3562@ifnotinfo
3563previously described
3564GNU regexp operators.
3565@end ifnotinfo
3566@ifnottex
3567GNU regexp operators described
3568in @ref{Regexp Operators}.
3569@end ifnottex
3570However, interval expressions are not supported.
3571
3572@item @code{--posix}
3573Only POSIX regexps are supported; the GNU operators are not special
3574(e.g., @samp{\w} matches a literal @samp{w}).  Interval expressions
3575are allowed.
3576
3577@item @code{--traditional}
3578Traditional Unix @command{awk} regexps are matched. The GNU operators
3579are not special, interval expressions are not available, nor
3580are the POSIX character classes (@code{[[:alnum:]]}, etc.).
3581Characters described by octal and hexadecimal escape sequences are
3582treated literally, even if they represent regexp metacharacters.
3583
3584@item @code{--re-interval}
3585Allow interval expressions in regexps, even if @option{--traditional}
3586has been provided.  (@option{--posix} automatically enables
3587interval expressions, so @option{--re-interval} is redundant
3588when @option{--posix} is is used.)
3589@end table
3590@c ENDOFRANGE gregexp
3591@c ENDOFRANGE regexpg
3592
3593@node Case-sensitivity
3594@section Case Sensitivity in Matching
3595
3596@c STARTOFRANGE regexpcs
3597@cindex regular expressions, case sensitivity
3598@c STARTOFRANGE csregexp
3599@cindex case sensitivity, regexps and
3600Case is normally significant in regular expressions, both when matching
3601ordinary characters (i.e., not metacharacters) and inside character
3602sets.  Thus, a @samp{w} in a regular expression matches only a lowercase
3603@samp{w} and not an uppercase @samp{W}.
3604
3605The simplest way to do a case-independent match is to use a character
3606list---for example, @samp{[Ww]}.  However, this can be cumbersome if
3607you need to use it often, and it can make the regular expressions harder
3608to read.  There are two alternatives that you might prefer.
3609
3610One way to perform a case-insensitive match at a particular point in the
3611program is to convert the data to a single case, using the
3612@code{tolower} or @code{toupper} built-in string functions (which we
3613haven't discussed yet;
3614@pxref{String Functions}).
3615For example:
3616
3617@example
3618tolower($1) ~ /foo/  @{ @dots{} @}
3619@end example
3620
3621@noindent
3622converts the first field to lowercase before matching against it.
3623This works in any POSIX-compliant @command{awk}.
3624
3625@cindex @command{gawk}, regular expressions, case sensitivity
3626@cindex case sensitivity, @command{gawk}
3627@cindex differences in @command{awk} and @command{gawk}, regular expressions
3628@cindex @code{~} (tilde), @code{~} operator
3629@cindex tilde (@code{~}), @code{~} operator
3630@cindex @code{!} (exclamation point), @code{!~} operator
3631@cindex exclamation point (@code{!}), @code{!~} operator
3632@cindex @code{IGNORECASE} variable
3633@c @cindex variables, @code{IGNORECASE}
3634Another method, specific to @command{gawk}, is to set the variable
3635@code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}).
3636When @code{IGNORECASE} is not zero, @emph{all} regexp and string
3637operations ignore case.  Changing the value of
3638@code{IGNORECASE} dynamically controls the case-sensitivity of the
3639program as it runs.  Case is significant by default because
3640@code{IGNORECASE} (like most variables) is initialized to zero:
3641
3642@example
3643x = "aB"
3644if (x ~ /ab/) @dots{}   # this test will fail
3645
3646IGNORECASE = 1
3647if (x ~ /ab/) @dots{}   # now it will succeed
3648@end example
3649
3650In general, you cannot use @code{IGNORECASE} to make certain rules
3651case-insensitive and other rules case-sensitive, because there is no
3652straightforward way
3653to set @code{IGNORECASE} just for the pattern of
3654a particular rule.@footnote{Experienced C and C++ programmers will note
3655that it is possible, using something like
3656@samp{IGNORECASE = 1 && /foObAr/ @{ @dots{} @}}
3657and
3658@samp{IGNORECASE = 0 || /foobar/ @{ @dots{} @}}.
3659However, this is somewhat obscure and we don't recommend it.}
3660To do this, use either character lists or @code{tolower}.  However, one
3661thing you can do with @code{IGNORECASE} only is dynamically turn
3662case-sensitivity on or off for all the rules at once.
3663
3664@code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule
3665(@pxref{Other Arguments}; also
3666@pxref{Using BEGIN/END}).
3667Setting @code{IGNORECASE} from the command line is a way to make
3668a program case-insensitive without having to edit it.
3669
3670Prior to @command{gawk} 3.0, the value of @code{IGNORECASE}
3671affected regexp operations only. It did not affect string comparison
3672with @samp{==}, @samp{!=}, and so on.
3673Beginning with @value{PVERSION} 3.0, both regexp and string comparison
3674operations are also affected by @code{IGNORECASE}.
3675
3676@c @cindex ISO 8859-1
3677@c @cindex ISO Latin-1
3678Beginning with @command{gawk} 3.0,
3679the equivalences between upper-
3680and lowercase characters are based on the ISO-8859-1 (ISO Latin-1)
3681character set. This character set is a superset of the traditional 128
3682ASCII characters, which also provides a number of characters suitable
3683for use with European languages.
3684
3685The value of @code{IGNORECASE} has no effect if @command{gawk} is in
3686compatibility mode (@pxref{Options}).
3687Case is always significant in compatibility mode.
3688@c ENDOFRANGE csregexp
3689@c ENDOFRANGE regexpcs
3690
3691@node Leftmost Longest
3692@section How Much Text Matches?
3693
3694@cindex regular expressions, leftmost longest match
3695@c @cindex matching, leftmost longest
3696Consider the following:
3697
3698@example
3699echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
3700@end example
3701
3702This example uses the @code{sub} function (which we haven't discussed yet;
3703@pxref{String Functions})
3704to make a change to the input record. Here, the regexp @code{/a+/}
3705indicates ``one or more @samp{a} characters,'' and the replacement
3706text is @samp{<A>}.
3707
3708The input contains four @samp{a} characters.
3709@command{awk} (and POSIX) regular expressions always match
3710the leftmost, @emph{longest} sequence of input characters that can
3711match.  Thus, all four @samp{a} characters are
3712replaced with @samp{<A>} in this example:
3713
3714@example
3715$ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
3716@print{} <A>bcd
3717@end example
3718
3719For simple match/no-match tests, this is not so important. But when doing
3720text matching and substitutions with the @code{match}, @code{sub}, @code{gsub},
3721and @code{gensub} functions, it is very important.
3722@ifinfo
3723@xref{String Functions},
3724for more information on these functions.
3725@end ifinfo
3726Understanding this principle is also important for regexp-based record
3727and field splitting (@pxref{Records},
3728and also @pxref{Field Separators}).
3729
3730@node Computed Regexps
3731@section Using Dynamic Regexps
3732
3733@c STARTOFRANGE dregexp
3734@cindex regular expressions, computed
3735@c STARTOFRANGE regexpd
3736@cindex regular expressions, dynamic
3737@cindex @code{~} (tilde), @code{~} operator
3738@cindex tilde (@code{~}), @code{~} operator
3739@cindex @code{!} (exclamation point), @code{!~} operator
3740@cindex exclamation point (@code{!}), @code{!~} operator
3741@c @cindex operators, @code{~}
3742@c @cindex operators, @code{!~}
3743The righthand side of a @samp{~} or @samp{!~} operator need not be a
3744regexp constant (i.e., a string of characters between slashes).  It may
3745be any expression.  The expression is evaluated and converted to a string
3746if necessary; the contents of the string are used as the
3747regexp.  A regexp that is computed in this way is called a @dfn{dynamic
3748regexp}:
3749
3750@example
3751BEGIN @{ digits_regexp = "[[:digit:]]+" @}
3752$0 ~ digits_regexp    @{ print @}
3753@end example
3754
3755@noindent
3756This sets @code{digits_regexp} to a regexp that describes one or more digits,
3757and tests whether the input record matches this regexp.
3758
3759@c @strong{Caution:}
3760When using the @samp{~} and @samp{!~}
3761@strong{Caution:} When using the @samp{~} and @samp{!~}
3762operators, there is a difference between a regexp constant
3763enclosed in slashes and a string constant enclosed in double quotes.
3764If you are going to use a string constant, you have to understand that
3765the string is, in essence, scanned @emph{twice}: the first time when
3766@command{awk} reads your program, and the second time when it goes to
3767match the string on the lefthand side of the operator with the pattern
3768on the right.  This is true of any string-valued expression (such as
3769@code{digits_regexp}, shown previously), not just string constants.
3770
3771@cindex regexp constants, slashes vs. quotes
3772@cindex @code{\} (backslash), regexp constants
3773@cindex backslash (@code{\}), regexp constants
3774@cindex @code{"} (double quote), regexp constants
3775@cindex double quote (@code{"}), regexp constants
3776What difference does it make if the string is
3777scanned twice? The answer has to do with escape sequences, and particularly
3778with backslashes.  To get a backslash into a regular expression inside a
3779string, you have to type two backslashes.
3780
3781For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
3782Only one backslash is needed.  To do the same thing with a string,
3783you have to type @code{"\\*"}.  The first backslash escapes the
3784second one so that the string actually contains the
3785two characters @samp{\} and @samp{*}.
3786
3787@cindex troubleshooting, regexp constants vs. string constants
3788@cindex regexp constants, vs. string constants
3789@cindex string constants, vs. regexp constants
3790Given that you can use both regexp and string constants to describe
3791regular expressions, which should you use?  The answer is ``regexp
3792constants,'' for several reasons:
3793
3794@itemize @bullet
3795@item
3796String constants are more complicated to write and
3797more difficult to read. Using regexp constants makes your programs
3798less error-prone.  Not understanding the difference between the two
3799kinds of constants is a common source of errors.
3800
3801@item
3802It is more efficient to use regexp constants. @command{awk} can note
3803that you have supplied a regexp and store it internally in a form that
3804makes pattern matching more efficient.  When using a string constant,
3805@command{awk} must first convert the string into this internal form and
3806then perform the pattern matching.
3807
3808@item
3809Using regexp constants is better form; it shows clearly that you
3810intend a regexp match.
3811@end itemize
3812
3813@c fakenode --- for prepinfo
3814@subheading Advanced Notes: Using @code{\n} in Character Lists of Dynamic Regexps
3815@cindex regular expressions, dynamic, with embedded newlines
3816@cindex newlines, in dynamic regexps
3817
3818Some commercial versions of @command{awk} do not allow the newline
3819character to be used inside a character list for a dynamic regexp:
3820
3821@example
3822$ awk '$0 ~ "[ \t\n]"'
3823@error{} awk: newline in character class [
3824@error{} ]...
3825@error{}  source line number 1
3826@error{}  context is
3827@error{}          >>>  <<<
3828@end example
3829
3830@cindex newlines, in regexp constants
3831But a newline in a regexp constant works with no problem:
3832
3833@example
3834$ awk '$0 ~ /[ \t\n]/'
3835here is a sample line
3836@print{} here is a sample line
3837@kbd{@value{CTL}-d}
3838@end example
3839
3840@command{gawk} does not have this problem, and it isn't likely to
3841occur often in practice, but it's worth noting for future reference.
3842@c ENDOFRANGE dregexp
3843@c ENDOFRANGE regexpd
3844@c ENDOFRANGE regexp
3845
3846@node Locales
3847@section Where You Are Makes A Difference
3848
3849Modern systems support the notion of @dfn{locales}: a way to tell
3850the system about the local character set and language.  The current
3851locale setting can affect the way regexp matching works, often
3852in surprising ways.  In particular, many locales do case-insensitive
3853matching, even when you may have specified characters of only
3854one particular case.
3855
3856The following example uses the @code{sub} function, which
3857does text replacement
3858(@pxref{String Functions}).
3859Here, the intent is to remove trailing uppercase characters:
3860
3861@example
3862$ echo something1234abc | gawk '@{ sub("[A-Z]*$", ""); print @}'
3863@print{} something1234
3864@end example
3865
3866@noindent
3867This output is unexpected, since the @samp{abc} at the end of @samp{something1234abc}
3868should not normally match @samp{[A-Z]*}.  This result is due to the
3869locale setting (and thus you may not see it on your system).
3870There are two fixes.  The first is to use the POSIX character
3871class @samp{[[:upper:]]}, instead of @samp{[A-Z]}.
3872The second is to change the locale setting in the environment,
3873before running @command{gawk},
3874by using the shell statements:
3875
3876@example
3877LANG=C LC_ALL=C
3878export LANG LC_ALL
3879@end example
3880
3881The setting @samp{C} forces @command{gawk} to behave in the traditional
3882Unix manner, where case distinctions do matter.
3883You may wish to put these statements into your shell startup file,
3884e.g., @file{$HOME/.profile}.
3885
3886Similar considerations apply to other ranges.  For example,
3887@samp{["-/]} is perfectly valid in ASCII, but is not valid in many
3888Unicode locales, such as @samp{en_US.UTF-8}.  (In general, such
3889ranges should be avoided; either list the characters individually,
3890or use a POSIX character class such as @samp{[[:punct:]]}.)
3891
3892For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant.
3893For other single byte record separators, using @samp{LC_ALL=C} will give you
3894much better performance when reading records.  Otherwise, @command{gawk} has
3895to make several function calls, @emph{per input character} to find the record
3896terminator.
3897
3898@node Reading Files
3899@chapter Reading Input Files
3900
3901@c STARTOFRANGE infir
3902@cindex input files, reading
3903@cindex input files
3904@cindex @code{FILENAME} variable
3905In the typical @command{awk} program, all input is read either from the
3906standard input (by default, this is the keyboard, but often it is a pipe from another
3907command) or from files whose names you specify on the @command{awk}
3908command line.  If you specify input files, @command{awk} reads them
3909in order, processing all the data from one before going on to the next.
3910The name of the current input file can be found in the built-in variable
3911@code{FILENAME}
3912(@pxref{Built-in Variables}).
3913
3914@cindex records
3915@cindex fields
3916The input is read in units called @dfn{records}, and is processed by the
3917rules of your program one record at a time.
3918By default, each record is one line.  Each
3919record is automatically split into chunks called @dfn{fields}.
3920This makes it more convenient for programs to work on the parts of a record.
3921
3922@cindex @code{getline} command
3923On rare occasions, you may need to use the @code{getline} command.
3924The  @code{getline} command is valuable, both because it
3925can do explicit input from any number of files, and because the files
3926used with it do not have to be named on the @command{awk} command line
3927(@pxref{Getline}).
3928
3929@menu
3930* Records::                     Controlling how data is split into records.
3931* Fields::                      An introduction to fields.
3932* Nonconstant Fields::          Nonconstant Field Numbers.
3933* Changing Fields::             Changing the Contents of a Field.
3934* Field Separators::            The field separator and how to change it.
3935* Constant Size::               Reading constant width data.
3936* Multiple Line::               Reading multi-line records.
3937* Getline::                     Reading files under explicit program control
3938                                using the @code{getline} function.
3939@end menu
3940
3941@node Records
3942@section How Input Is Split into Records
3943
3944@c STARTOFRANGE inspl
3945@cindex input, splitting into records
3946@c STARTOFRANGE recspl
3947@cindex records, splitting input into
3948@cindex @code{NR} variable
3949@cindex @code{FNR} variable
3950The @command{awk} utility divides the input for your @command{awk}
3951program into records and fields.
3952@command{awk} keeps track of the number of records that have
3953been read
3954so far
3955from the current input file.  This value is stored in a
3956built-in variable called @code{FNR}.  It is reset to zero when a new
3957file is started.  Another built-in variable, @code{NR}, is the total
3958number of input records read so far from all @value{DF}s.  It starts at zero,
3959but is never automatically reset to zero.
3960
3961@cindex separators, for records
3962@cindex record separators
3963Records are separated by a character called the @dfn{record separator}.
3964By default, the record separator is the newline character.
3965This is why records are, by default, single lines.
3966A different character can be used for the record separator by
3967assigning the character to the built-in variable @code{RS}.
3968
3969@cindex newlines, as record separators
3970@cindex @code{RS} variable
3971Like any other variable,
3972the value of @code{RS} can be changed in the @command{awk} program
3973with the assignment operator, @samp{=}
3974(@pxref{Assignment Ops}).
3975The new record-separator character should be enclosed in quotation marks,
3976which indicate a string constant.  Often the right time to do this is
3977at the beginning of execution, before any input is processed,
3978so that the very first record is read with the proper separator.
3979To do this, use the special @code{BEGIN} pattern
3980(@pxref{BEGIN/END}).
3981For example:
3982
3983@cindex @code{BEGIN} pattern
3984@example
3985awk 'BEGIN @{ RS = "/" @}
3986     @{ print $0 @}' BBS-list
3987@end example
3988
3989@noindent
3990changes the value of @code{RS} to @code{"/"}, before reading any input.
3991This is a string whose first character is a slash; as a result, records
3992are separated by slashes.  Then the input file is read, and the second
3993rule in the @command{awk} program (the action with no pattern) prints each
3994record.  Because each @code{print} statement adds a newline at the end of
3995its output, this @command{awk} program copies the input
3996with each slash changed to a newline.  Here are the results of running
3997the program on @file{BBS-list}:
3998
3999@example
4000$ awk 'BEGIN @{ RS = "/" @}
4001>      @{ print $0 @}' BBS-list
4002@print{} aardvark     555-5553     1200
4003@print{} 300          B
4004@print{} alpo-net     555-3412     2400
4005@print{} 1200
4006@print{} 300     A
4007@print{} barfly       555-7685     1200
4008@print{} 300          A
4009@print{} bites        555-1675     2400
4010@print{} 1200
4011@print{} 300     A
4012@print{} camelot      555-0542     300               C
4013@print{} core         555-2912     1200
4014@print{} 300          C
4015@print{} fooey        555-1234     2400
4016@print{} 1200
4017@print{} 300     B
4018@print{} foot         555-6699     1200
4019@print{} 300          B
4020@print{} macfoo       555-6480     1200
4021@print{} 300          A
4022@print{} sdace        555-3430     2400
4023@print{} 1200
4024@print{} 300     A
4025@print{} sabafoo      555-2127     1200
4026@print{} 300          C
4027@print{}
4028@end example
4029
4030@noindent
4031Note that the entry for the @samp{camelot} BBS is not split.
4032In the original @value{DF}
4033(@pxref{Sample Data Files}),
4034the line looks like this:
4035
4036@example
4037camelot      555-0542     300               C
4038@end example
4039
4040@noindent
4041It has one baud rate only, so there are no slashes in the record,
4042unlike the others which have two or more baud rates.
4043In fact, this record is treated as part of the record
4044for the @samp{core} BBS; the newline separating them in the output
4045is the original newline in the @value{DF}, not the one added by
4046@command{awk} when it printed the record!
4047
4048@cindex record separators, changing
4049@cindex separators, for records
4050Another way to change the record separator is on the command line,
4051using the variable-assignment feature
4052(@pxref{Other Arguments}):
4053
4054@example
4055awk '@{ print $0 @}' RS="/" BBS-list
4056@end example
4057
4058@noindent
4059This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
4060
4061Using an unusual character such as @samp{/} for the record separator
4062produces correct behavior in the vast majority of cases.  However,
4063the following (extreme) pipeline prints a surprising @samp{1}:
4064
4065@example
4066$ echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
4067@print{} 1
4068@end example
4069
4070There is one field, consisting of a newline.  The value of the built-in
4071variable @code{NF} is the number of fields in the current record.
4072
4073@cindex dark corner, input files
4074Reaching the end of an input file terminates the current input record,
4075even if the last character in the file is not the character in @code{RS}.
4076@value{DARKCORNER}
4077
4078@cindex null strings
4079@cindex strings, empty, See null strings
4080The empty string @code{""} (a string without any characters)
4081has a special meaning
4082as the value of @code{RS}. It means that records are separated
4083by one or more blank lines and nothing else.
4084@xref{Multiple Line}, for more details.
4085
4086If you change the value of @code{RS} in the middle of an @command{awk} run,
4087the new value is used to delimit subsequent records, but the record
4088currently being processed, as well as records already processed, are not
4089affected.
4090
4091@cindex @code{RT} variable
4092@cindex records, terminating
4093@cindex terminating records
4094@cindex differences in @command{awk} and @command{gawk}, record separators
4095@cindex regular expressions, as record separators
4096@cindex record separators, regular expressions as
4097@cindex separators, for records, regular expressions as
4098After the end of the record has been determined, @command{gawk}
4099sets the variable @code{RT} to the text in the input that matched
4100@code{RS}.
4101When using @command{gawk},
4102the value of @code{RS} is not limited to a one-character
4103string.  It can be any regular expression
4104(@pxref{Regexp}).
4105In general, each record
4106ends at the next string that matches the regular expression; the next
4107record starts at the end of the matching string.  This general rule is
4108actually at work in the usual case, where @code{RS} contains just a
4109newline: a record ends at the beginning of the next matching string (the
4110next newline in the input), and the following record starts just after
4111the end of this string (at the first character of the following line).
4112The newline, because it matches @code{RS}, is not part of either record.
4113
4114When @code{RS} is a single character, @code{RT}
4115contains the same single character. However, when @code{RS} is a
4116regular expression, @code{RT} contains
4117the actual input text that matched the regular expression.
4118
4119The following example illustrates both of these features.
4120It sets @code{RS} equal to a regular expression that
4121matches either a newline or a series of one or more uppercase letters
4122with optional leading and/or trailing whitespace:
4123
4124@example
4125$ echo record 1 AAAA record 2 BBBB record 3 |
4126> gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @}
4127>             @{ print "Record =", $0, "and RT =", RT @}'
4128@print{} Record = record 1 and RT =  AAAA
4129@print{} Record = record 2 and RT =  BBBB
4130@print{} Record = record 3 and RT =
4131@print{}
4132@end example
4133
4134@noindent
4135The final line of output has an extra blank line. This is because the
4136value of @code{RT} is a newline, and the @code{print} statement
4137supplies its own terminating newline.
4138@xref{Simple Sed}, for a more useful example
4139of @code{RS} as a regexp and @code{RT}.
4140
4141If you set @code{RS} to a regular expression that allows optional
4142trailing text, such as @samp{RS = "abc(XYZ)?"} it is possible, due
4143to implementation constraints, that @command{gawk} may match the leading
4144part of the regular expression, but not the trailing part, particularly
4145if the input text that could match the trailing part is fairly long.
4146@command{gawk} attempts to avoid this problem, but currently, there's
4147no guarantee that this will never happen.
4148
4149@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables
4150The use of @code{RS} as a regular expression and the @code{RT}
4151variable are @command{gawk} extensions; they are not available in
4152compatibility mode
4153(@pxref{Options}).
4154In compatibility mode, only the first character of the value of
4155@code{RS} is used to determine the end of the record.
4156
4157@c fakenode --- for prepinfo
4158@subheading Advanced Notes: @code{RS = "\0"} Is Not Portable
4159
4160@cindex advanced features, @value{DF}s as single record
4161@cindex portability, @value{DF}s as single record
4162There are times when you might want to treat an entire @value{DF} as a
4163single record.  The only way to make this happen is to give @code{RS}
4164a value that you know doesn't occur in the input file.  This is hard
4165to do in a general way, such that a program always works for arbitrary
4166input files.
4167@c can you say `understatement' boys and girls?
4168
4169You might think that for text files, the @sc{nul} character, which
4170consists of a character with all bits equal to zero, is a good
4171value to use for @code{RS} in this case:
4172
4173@example
4174BEGIN @{ RS = "\0" @}  # whole file becomes one record?
4175@end example
4176
4177@cindex differences in @command{awk} and @command{gawk}, strings, storing
4178@command{gawk} in fact accepts this, and uses the @sc{nul}
4179character for the record separator.
4180However, this usage is @emph{not} portable
4181to other @command{awk} implementations.
4182
4183@cindex dark corner, strings, storing
4184All other @command{awk} implementations@footnote{At least that we know
4185about.} store strings internally as C-style strings.  C strings use the
4186@sc{nul} character as the string terminator.  In effect, this means that
4187@samp{RS = "\0"} is the same as @samp{RS = ""}.
4188@value{DARKCORNER}
4189
4190@cindex records, treating files as
4191@cindex files, as single records
4192The best way to treat a whole file as a single record is to
4193simply read the file in, one record at a time, concatenating each
4194record onto the end of the previous ones.
4195@c ENDOFRANGE inspl
4196@c ENDOFRANGE recspl
4197
4198@node Fields
4199@section Examining Fields
4200
4201@cindex examining fields
4202@cindex fields
4203@cindex accessing fields
4204@c STARTOFRANGE fiex
4205@cindex fields, examining
4206@cindex POSIX @command{awk}, field separators and
4207@cindex field separators, POSIX and
4208@cindex separators, field, POSIX and
4209When @command{awk} reads an input record, the record is
4210automatically @dfn{parsed} or separated by the interpreter into chunks
4211called @dfn{fields}.  By default, fields are separated by @dfn{whitespace},
4212like words in a line.
4213Whitespace in @command{awk} means any string of one or more spaces,
4214tabs, or newlines;@footnote{In POSIX @command{awk}, newlines are not
4215considered whitespace for separating fields.} other characters, such as
4216formfeed, vertical tab, etc.@: that are
4217considered whitespace by other languages, are @emph{not} considered
4218whitespace by @command{awk}.
4219
4220The purpose of fields is to make it more convenient for you to refer to
4221these pieces of the record.  You don't have to use them---you can
4222operate on the whole record if you want---but fields are what make
4223simple @command{awk} programs so powerful.
4224
4225@cindex @code{$} field operator
4226@cindex field operator @code{$}
4227@cindex @code{$} (dollar sign), @code{$} field operator
4228@cindex dollar sign (@code{$}), @code{$} field operator
4229@c The comma here does NOT mark a secondary term:
4230@cindex field operators, dollar sign as
4231A dollar-sign (@samp{$}) is used
4232to refer to a field in an @command{awk} program,
4233followed by the number of the field you want.  Thus, @code{$1}
4234refers to the first field, @code{$2} to the second, and so on.
4235(Unlike the Unix shells, the field numbers are not limited to single digits.
4236@code{$127} is the one hundred twenty-seventh field in the record.)
4237For example, suppose the following is a line of input:
4238
4239@example
4240This seems like a pretty nice example.
4241@end example
4242
4243@noindent
4244Here the first field, or @code{$1}, is @samp{This}, the second field, or
4245@code{$2}, is @samp{seems}, and so on.  Note that the last field,
4246@code{$7}, is @samp{example.}.  Because there is no space between the
4247@samp{e} and the @samp{.}, the period is considered part of the seventh
4248field.
4249
4250@cindex @code{NF} variable
4251@cindex fields, number of
4252@code{NF} is a built-in variable whose value is the number of fields
4253in the current record.  @command{awk} automatically updates the value
4254of @code{NF} each time it reads a record.  No matter how many fields
4255there are, the last field in a record can be represented by @code{$NF}.
4256So, @code{$NF} is the same as @code{$7}, which is @samp{example.}.
4257If you try to reference a field beyond the last
4258one (such as @code{$8} when the record has only seven fields), you get
4259the empty string.  (If used in a numeric operation, you get zero.)
4260
4261The use of @code{$0}, which looks like a reference to the ``zero-th'' field, is
4262a special case: it represents the whole input record
4263when you are not interested in specific fields.
4264Here are some more examples:
4265
4266@example
4267$ awk '$1 ~ /foo/ @{ print $0 @}' BBS-list
4268@print{} fooey        555-1234     2400/1200/300     B
4269@print{} foot         555-6699     1200/300          B
4270@print{} macfoo       555-6480     1200/300          A
4271@print{} sabafoo      555-2127     1200/300          C
4272@end example
4273
4274@noindent
4275This example prints each record in the file @file{BBS-list} whose first
4276field contains the string @samp{foo}.  The operator @samp{~} is called a
4277@dfn{matching operator}
4278(@pxref{Regexp Usage});
4279it tests whether a string (here, the field @code{$1}) matches a given regular
4280expression.
4281
4282By contrast, the following example
4283looks for @samp{foo} in @emph{the entire record} and prints the first
4284field and the last field for each matching input record:
4285
4286@example
4287$ awk '/foo/ @{ print $1, $NF @}' BBS-list
4288@print{} fooey B
4289@print{} foot B
4290@print{} macfoo A
4291@print{} sabafoo C
4292@end example
4293@c ENDOFRANGE fiex
4294
4295@node Nonconstant Fields
4296@section Nonconstant Field Numbers
4297@cindex fields, numbers
4298@cindex field numbers
4299
4300The number of a field does not need to be a constant.  Any expression in
4301the @command{awk} language can be used after a @samp{$} to refer to a
4302field.  The value of the expression specifies the field number.  If the
4303value is a string, rather than a number, it is converted to a number.
4304Consider this example:
4305
4306@example
4307awk '@{ print $NR @}'
4308@end example
4309
4310@noindent
4311Recall that @code{NR} is the number of records read so far: one in the
4312first record, two in the second, etc.  So this example prints the first
4313field of the first record, the second field of the second record, and so
4314on.  For the twentieth record, field number 20 is printed; most likely,
4315the record has fewer than 20 fields, so this prints a blank line.
4316Here is another example of using expressions as field numbers:
4317
4318@example
4319awk '@{ print $(2*2) @}' BBS-list
4320@end example
4321
4322@command{awk} evaluates the expression @samp{(2*2)} and uses
4323its value as the number of the field to print.  The @samp{*} sign
4324represents multiplication, so the expression @samp{2*2} evaluates to four.
4325The parentheses are used so that the multiplication is done before the
4326@samp{$} operation; they are necessary whenever there is a binary
4327operator in the field-number expression.  This example, then, prints the
4328hours of operation (the fourth field) for every line of the file
4329@file{BBS-list}.  (All of the @command{awk} operators are listed, in
4330order of decreasing precedence, in
4331@ref{Precedence}.)
4332
4333If the field number you compute is zero, you get the entire record.
4334Thus, @samp{$(2-2)} has the same value as @code{$0}.  Negative field
4335numbers are not allowed; trying to reference one usually terminates
4336the program.  (The POSIX standard does not define
4337what happens when you reference a negative field number.  @command{gawk}
4338notices this and terminates your program.  Other @command{awk}
4339implementations may behave differently.)
4340
4341As mentioned in @ref{Fields},
4342@command{awk} stores the current record's number of fields in the built-in
4343variable @code{NF} (also @pxref{Built-in Variables}).  The expression
4344@code{$NF} is not a special feature---it is the direct consequence of
4345evaluating @code{NF} and using its value as a field number.
4346
4347@node Changing Fields
4348@section Changing the Contents of a Field
4349
4350@c STARTOFRANGE ficon
4351@cindex fields, changing contents of
4352The contents of a field, as seen by @command{awk}, can be changed within an
4353@command{awk} program; this changes what @command{awk} perceives as the
4354current input record.  (The actual input is untouched; @command{awk} @emph{never}
4355modifies the input file.)
4356Consider the following example and its output:
4357
4358@example
4359$ awk '@{ nboxes = $3 ; $3 = $3 - 10
4360>        print nboxes, $3 @}' inventory-shipped
4361@print{} 25 15
4362@print{} 32 22
4363@print{} 24 14
4364@dots{}
4365@end example
4366
4367@noindent
4368The program first saves the original value of field three in the variable
4369@code{nboxes}.
4370The @samp{-} sign represents subtraction, so this program reassigns
4371field three, @code{$3}, as the original value of field three minus ten:
4372@samp{$3 - 10}.  (@xref{Arithmetic Ops}.)
4373Then it prints the original and new values for field three.
4374(Someone in the warehouse made a consistent mistake while inventorying
4375the red boxes.)
4376
4377For this to work, the text in field @code{$3} must make sense
4378as a number; the string of characters must be converted to a number
4379for the computer to do arithmetic on it.  The number resulting
4380from the subtraction is converted back to a string of characters that
4381then becomes field three.
4382@xref{Conversion}.
4383
4384When the value of a field is changed (as perceived by @command{awk}), the
4385text of the input record is recalculated to contain the new field where
4386the old one was.  In other words, @code{$0} changes to reflect the altered
4387field.  Thus, this program
4388prints a copy of the input file, with 10 subtracted from the second
4389field of each line:
4390
4391@example
4392$ awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped
4393@print{} Jan 3 25 15 115
4394@print{} Feb 5 32 24 226
4395@print{} Mar 5 24 34 228
4396@dots{}
4397@end example
4398
4399It is also possible to also assign contents to fields that are out
4400of range.  For example:
4401
4402@example
4403$ awk '@{ $6 = ($5 + $4 + $3 + $2)
4404>        print $6 @}' inventory-shipped
4405@print{} 168
4406@print{} 297
4407@print{} 301
4408@dots{}
4409@end example
4410
4411@cindex adding, fields
4412@cindex fields, adding
4413@noindent
4414We've just created @code{$6}, whose value is the sum of fields
4415@code{$2}, @code{$3}, @code{$4}, and @code{$5}.  The @samp{+} sign
4416represents addition.  For the file @file{inventory-shipped}, @code{$6}
4417represents the total number of parcels shipped for a particular month.
4418
4419Creating a new field changes @command{awk}'s internal copy of the current
4420input record, which is the value of @code{$0}.  Thus, if you do @samp{print $0}
4421after adding a field, the record printed includes the new field, with
4422the appropriate number of field separators between it and the previously
4423existing fields.
4424
4425@cindex @code{OFS} variable
4426@cindex output field separator, See @code{OFS} variable
4427@cindex field separators, See Also @code{OFS}
4428This recomputation affects and is affected by
4429@code{NF} (the number of fields; @pxref{Fields}).
4430For example, the value of @code{NF} is set to the number of the highest
4431field you create.
4432The exact format of @code{$0} is also affected by a feature that has not been discussed yet:
4433the @dfn{output field separator}, @code{OFS},
4434used to separate the fields (@pxref{Output Separators}).
4435
4436Note, however, that merely @emph{referencing} an out-of-range field
4437does @emph{not} change the value of either @code{$0} or @code{NF}.
4438Referencing an out-of-range field only produces an empty string.  For
4439example:
4440
4441@example
4442if ($(NF+1) != "")
4443    print "can't happen"
4444else
4445    print "everything is normal"
4446@end example
4447
4448@noindent
4449should print @samp{everything is normal}, because @code{NF+1} is certain
4450to be out of range.  (@xref{If Statement},
4451for more information about @command{awk}'s @code{if-else} statements.
4452@xref{Typing and Comparison},
4453for more information about the @samp{!=} operator.)
4454
4455It is important to note that making an assignment to an existing field
4456changes the
4457value of @code{$0} but does not change the value of @code{NF},
4458even when you assign the empty string to a field.  For example:
4459
4460@example
4461$ echo a b c d | awk '@{ OFS = ":"; $2 = ""
4462>                       print $0; print NF @}'
4463@print{} a::c:d
4464@print{} 4
4465@end example
4466
4467@noindent
4468The field is still there; it just has an empty value, denoted by
4469the two colons between @samp{a} and @samp{c}.
4470This example shows what happens if you create a new field:
4471
4472@example
4473$ echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new"
4474>                       print $0; print NF @}'
4475@print{} a::c:d::new
4476@print{} 6
4477@end example
4478
4479@noindent
4480The intervening field, @code{$5}, is created with an empty value
4481(indicated by the second pair of adjacent colons),
4482and @code{NF} is updated with the value six.
4483
4484@c FIXME: Verify that this is in POSIX
4485@cindex dark corner, @code{NF} variable, decrementing
4486@cindex @code{NF} variable, decrementing
4487Decrementing @code{NF} throws away the values of the fields
4488after the new value of @code{NF} and recomputes @code{$0}.
4489@value{DARKCORNER}
4490Here is an example:
4491
4492@example
4493$ echo a b c d e f | awk '@{ print "NF =", NF;
4494>                            NF = 3; print $0 @}'
4495@print{} NF = 6
4496@print{} a b c
4497@end example
4498
4499@c the comma before decrementing does NOT represent a tertiary entry
4500@cindex portability, @code{NF} variable, decrementing
4501@strong{Caution:} Some versions of @command{awk} don't
4502rebuild @code{$0} when @code{NF} is decremented. Caveat emptor.
4503
4504Finally, there are times when it is convenient to force
4505@command{awk} to rebuild the entire record, using the current
4506value of the fields and @code{OFS}.  To do this, use the
4507seemingly innocuous assignment:
4508
4509@example
4510$1 = $1   # force record to be reconstituted
4511print $0  # or whatever else with $0
4512@end example
4513
4514@noindent
4515This forces @command{awk} rebuild the record.  It does help
4516to add a comment, as we've shown here.
4517
4518There is a flip side to the relationship between @code{$0} and
4519the fields.  Any assignment to @code{$0} causes the record to be
4520reparsed into fields using the @emph{current} value of @code{FS}.
4521This also applies to any built-in function that updates @code{$0},
4522such as @code{sub} and @code{gsub}
4523(@pxref{String Functions}).
4524@c ENDOFRANGE ficon
4525
4526@node Field Separators
4527@section Specifying How Fields Are Separated
4528
4529@menu
4530* Regexp Field Splitting::       Using regexps as the field separator.
4531* Single Character Fields::      Making each character a separate field.
4532* Command Line Field Separator:: Setting @code{FS} from the command-line.
4533* Field Splitting Summary::      Some final points and a summary table.
4534@end menu
4535
4536@cindex @code{FS} variable
4537@cindex fields, separating
4538@c STARTOFRANGE fisepr
4539@cindex field separators
4540@c STARTOFRANGE fisepg
4541@cindex fields, separating
4542The @dfn{field separator}, which is either a single character or a regular
4543expression, controls the way @command{awk} splits an input record into fields.
4544@command{awk} scans the input record for character sequences that
4545match the separator; the fields themselves are the text between the matches.
4546
4547In the examples that follow, we use the bullet symbol (@bullet{}) to
4548represent spaces in the output.
4549If the field separator is @samp{oo}, then the following line:
4550
4551@example
4552moo goo gai pan
4553@end example
4554
4555@noindent
4556is split into three fields: @samp{m}, @samp{@bullet{}g}, and
4557@samp{@bullet{}gai@bullet{}pan}.
4558Note the leading spaces in the values of the second and third fields.
4559
4560@cindex troubleshooting, @command{awk} uses @code{FS} not @code{IFS}
4561The field separator is represented by the built-in variable @code{FS}.
4562Shell programmers take note:  @command{awk} does @emph{not} use the
4563name @code{IFS} that is used by the POSIX-compliant shells (such as
4564the Unix Bourne shell, @command{sh}, or @command{bash}).
4565
4566@cindex @code{FS} variable, changing value of
4567The value of @code{FS} can be changed in the @command{awk} program with the
4568assignment operator, @samp{=} (@pxref{Assignment Ops}).
4569Often the right time to do this is at the beginning of execution
4570before any input has been processed, so that the very first record
4571is read with the proper separator.  To do this, use the special
4572@code{BEGIN} pattern
4573(@pxref{BEGIN/END}).
4574For example, here we set the value of @code{FS} to the string
4575@code{","}:
4576
4577@example
4578awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
4579@end example
4580
4581@cindex @code{BEGIN} pattern
4582@noindent
4583Given the input line:
4584
4585@example
4586John Q. Smith, 29 Oak St., Walamazoo, MI 42139
4587@end example
4588
4589@noindent
4590this @command{awk} program extracts and prints the string
4591@samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
4592
4593@cindex field separators, choice of
4594@cindex regular expressions as field separators
4595@cindex field separators, regular expressions as
4596Sometimes the input data contains separator characters that don't
4597separate fields the way you thought they would.  For instance, the
4598person's name in the example we just used might have a title or
4599suffix attached, such as:
4600
4601@example
4602John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
4603@end example
4604
4605@noindent
4606The same program would extract @samp{@bullet{}LXIX}, instead of
4607@samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
4608If you were expecting the program to print the
4609address, you would be surprised.  The moral is to choose your data layout and
4610separator characters carefully to prevent such problems.
4611(If the data is not in a form that is easy to process, perhaps you
4612can massage it first with a separate @command{awk} program.)
4613
4614@cindex newlines, as field separators
4615@cindex whitespace, as field separators
4616Fields are normally separated by whitespace sequences
4617(spaces, tabs, and newlines), not by single spaces.  Two spaces in a row do not
4618delimit an empty field.  The default value of the field separator @code{FS}
4619is a string containing a single space, @w{@code{" "}}.  If @command{awk}
4620interpreted this value in the usual way, each space character would separate
4621fields, so two spaces in a row would make an empty field between them.
4622The reason this does not happen is that a single space as the value of
4623@code{FS} is a special case---it is taken to specify the default manner
4624of delimiting fields.
4625
4626If @code{FS} is any other single character, such as @code{","}, then
4627each occurrence of that character separates two fields.  Two consecutive
4628occurrences delimit an empty field.  If the character occurs at the
4629beginning or the end of the line, that too delimits an empty field.  The
4630space character is the only single character that does not follow these
4631rules.
4632
4633@node Regexp Field Splitting
4634@subsection Using Regular Expressions to Separate Fields
4635
4636@c STARTOFRANGE regexpfs
4637@cindex regular expressions, as field separators
4638@c STARTOFRANGE fsregexp
4639@cindex field separators, regular expressions as
4640The previous @value{SUBSECTION}
4641discussed the use of single characters or simple strings as the
4642value of @code{FS}.
4643More generally, the value of @code{FS} may be a string containing any
4644regular expression.  In this case, each match in the record for the regular
4645expression separates fields.  For example, the assignment:
4646
4647@example
4648FS = ", \t"
4649@end example
4650
4651@noindent
4652makes every area of an input line that consists of a comma followed by a
4653space and a TAB into a field separator.
4654@ifinfo
4655(@samp{\t}
4656is an @dfn{escape sequence} that stands for a TAB;
4657@pxref{Escape Sequences},
4658for the complete list of similar escape sequences.)
4659@end ifinfo
4660
4661For a less trivial example of a regular expression, try using
4662single spaces to separate fields the way single commas are used.
4663@code{FS} can be set to @w{@code{"[@ ]"}} (left bracket, space, right
4664bracket).  This regular expression matches a single space and nothing else
4665(@pxref{Regexp}).
4666
4667There is an important difference between the two cases of @samp{FS = @w{" "}}
4668(a single space) and @samp{FS = @w{"[ \t\n]+"}}
4669(a regular expression matching one or more spaces, tabs, or newlines).
4670For both values of @code{FS}, fields are separated by @dfn{runs}
4671(multiple adjacent occurrences) of spaces, tabs,
4672and/or newlines.  However, when the value of @code{FS} is @w{@code{" "}},
4673@command{awk} first strips leading and trailing whitespace from
4674the record and then decides where the fields are.
4675For example, the following pipeline prints @samp{b}:
4676
4677@example
4678$ echo ' a b c d ' | awk '@{ print $2 @}'
4679@print{} b
4680@end example
4681
4682@noindent
4683However, this pipeline prints @samp{a} (note the extra spaces around
4684each letter):
4685
4686@example
4687$ echo ' a  b  c  d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}
4688>                                  @{ print $2 @}'
4689@print{} a
4690@end example
4691
4692@noindent
4693@cindex null strings
4694@cindex strings, null
4695@cindex empty strings, See null strings
4696In this case, the first field is @dfn{null} or empty.
4697
4698The stripping of leading and trailing whitespace also comes into
4699play whenever @code{$0} is recomputed.  For instance, study this pipeline:
4700
4701@example
4702$ echo '   a b c d' | awk '@{ print; $2 = $2; print @}'
4703@print{}    a b c d
4704@print{} a b c d
4705@end example
4706
4707@noindent
4708The first @code{print} statement prints the record as it was read,
4709with leading whitespace intact.  The assignment to @code{$2} rebuilds
4710@code{$0} by concatenating @code{$1} through @code{$NF} together,
4711separated by the value of @code{OFS}.  Because the leading whitespace
4712was ignored when finding @code{$1}, it is not part of the new @code{$0}.
4713Finally, the last @code{print} statement prints the new @code{$0}.
4714@c ENDOFRANGE regexpfs
4715@c ENDOFRANGE fsregexp
4716
4717@node Single Character Fields
4718@subsection Making Each Character a Separate Field
4719
4720@cindex differences in @command{awk} and @command{gawk}, single-character fields
4721@cindex single-character fields
4722@cindex fields, single-character
4723There are times when you may want to examine each character
4724of a record separately.  This can be done in @command{gawk} by
4725simply assigning the null string (@code{""}) to @code{FS}. In this case,
4726each individual character in the record becomes a separate field.
4727For example:
4728
4729@example
4730$ echo a b | gawk 'BEGIN @{ FS = "" @}
4731>                  @{
4732>                      for (i = 1; i <= NF; i = i + 1)
4733>                          print "Field", i, "is", $i
4734>                  @}'
4735@print{} Field 1 is a
4736@print{} Field 2 is
4737@print{} Field 3 is b
4738@end example
4739
4740@cindex dark corner, @code{FS} as null string
4741@cindex FS variable, as null string
4742Traditionally, the behavior of @code{FS} equal to @code{""} was not defined.
4743In this case, most versions of Unix @command{awk} simply treat the entire record
4744as only having one field.
4745@value{DARKCORNER}
4746In compatibility mode
4747(@pxref{Options}),
4748if @code{FS} is the null string, then @command{gawk} also
4749behaves this way.
4750
4751@node Command Line Field Separator
4752@subsection Setting @code{FS} from the Command Line
4753@cindex @code{-F} option
4754@cindex options, command-line
4755@cindex command line, options
4756@cindex field separators, on command line
4757@c The comma before "setting" does NOT represent a tertiary
4758@cindex command line, @code{FS} on, setting
4759@cindex @code{FS} variable, setting from command line
4760
4761@code{FS} can be set on the command line.  Use the @option{-F} option to
4762do so.  For example:
4763
4764@example
4765awk -F, '@var{program}' @var{input-files}
4766@end example
4767
4768@noindent
4769sets @code{FS} to the @samp{,} character.  Notice that the option uses
4770an uppercase @samp{F} instead of a lowercase @samp{f}. The latter
4771option (@option{-f}) specifies a file
4772containing an @command{awk} program.  Case is significant in command-line
4773options:
4774the @option{-F} and @option{-f} options have nothing to do with each other.
4775You can use both options at the same time to set the @code{FS} variable
4776@emph{and} get an @command{awk} program from a file.
4777
4778The value used for the argument to @option{-F} is processed in exactly the
4779same way as assignments to the built-in variable @code{FS}.
4780Any special characters in the field separator must be escaped
4781appropriately.  For example, to use a @samp{\} as the field separator
4782on the command line, you would have to type:
4783
4784@example
4785# same as FS = "\\"
4786awk -F\\\\ '@dots{}' files @dots{}
4787@end example
4788
4789@noindent
4790@cindex @code{\} (backslash), as field separators
4791@cindex backslash (@code{\}), as field separators
4792Because @samp{\} is used for quoting in the shell, @command{awk} sees
4793@samp{-F\\}.  Then @command{awk} processes the @samp{\\} for escape
4794characters (@pxref{Escape Sequences}), finally yielding
4795a single @samp{\} to use for the field separator.
4796
4797@c @cindex historical features
4798As a special case, in compatibility mode
4799(@pxref{Options}),
4800if the argument to @option{-F} is @samp{t}, then @code{FS} is set to
4801the TAB character.  If you type @samp{-F\t} at the
4802shell, without any quotes, the @samp{\} gets deleted, so @command{awk}
4803figures that you really want your fields to be separated with tabs and
4804not @samp{t}s.  Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line
4805if you really do want to separate your fields with @samp{t}s.
4806
4807For example, let's use an @command{awk} program file called @file{baud.awk}
4808that contains the pattern @code{/300/} and the action @samp{print $1}:
4809
4810@example
4811/300/   @{ print $1 @}
4812@end example
4813
4814Let's also set @code{FS} to be the @samp{-} character and run the
4815program on the file @file{BBS-list}.  The following command prints a
4816list of the names of the bulletin boards that operate at 300 baud and
4817the first three digits of their phone numbers:
4818
4819@c tweaked to make the tex output look better in @smallbook
4820@example
4821$ awk -F- -f baud.awk BBS-list
4822@print{} aardvark     555
4823@print{} alpo
4824@print{} barfly       555
4825@print{} bites        555
4826@print{} camelot      555
4827@print{} core         555
4828@print{} fooey        555
4829@print{} foot         555
4830@print{} macfoo       555
4831@print{} sdace        555
4832@print{} sabafoo      555
4833@end example
4834
4835@noindent
4836Note the second line of output.  The second line
4837in the original file looked like this:
4838
4839@example
4840alpo-net     555-3412     2400/1200/300     A
4841@end example
4842
4843The @samp{-} as part of the system's name was used as the field
4844separator, instead of the @samp{-} in the phone number that was
4845originally intended.  This demonstrates why you have to be careful in
4846choosing your field and record separators.
4847
4848@c The comma after "password files" does NOT start a tertiary
4849@cindex Unix @command{awk}, password files, field separators and
4850Perhaps the most common use of a single character as the field
4851separator occurs when processing the Unix system password file.
4852On many Unix systems, each user has a separate entry in the system password
4853file, one line per user.  The information in these lines is separated
4854by colons.  The first field is the user's logon name and the second is
4855the user's (encrypted or shadow) password.  A password file entry might look
4856like this:
4857
4858@cindex Robbins, Arnold
4859@example
4860arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash
4861@end example
4862
4863The following program searches the system password file and prints
4864the entries for users who have no password:
4865
4866@example
4867awk -F: '$2 == ""' /etc/passwd
4868@end example
4869
4870@node Field Splitting Summary
4871@subsection Field-Splitting Summary
4872
4873It is important to remember that when you assign a string constant
4874as the value of @code{FS}, it undergoes normal @command{awk} string
4875processing.  For example, with Unix @command{awk} and @command{gawk},
4876the assignment @samp{FS = "\.."} assigns the character string @code{".."}
4877to @code{FS} (the backslash is stripped).  This creates a regexp meaning
4878``fields are separated by occurrences of any two characters.''
4879If instead you want fields to be separated by a literal period followed
4880by any single character, use @samp{FS = "\\.."}.
4881
4882The following table summarizes how fields are split, based on the value
4883of @code{FS} (@samp{==} means ``is equal to''):
4884
4885@table @code
4886@item FS == " "
4887Fields are separated by runs of whitespace.  Leading and trailing
4888whitespace are ignored.  This is the default.
4889
4890@item FS == @var{any other single character}
4891Fields are separated by each occurrence of the character.  Multiple
4892successive occurrences delimit empty fields, as do leading and
4893trailing occurrences.
4894The character can even be a regexp metacharacter; it does not need
4895to be escaped.
4896
4897@item FS == @var{regexp}
4898Fields are separated by occurrences of characters that match @var{regexp}.
4899Leading and trailing matches of @var{regexp} delimit empty fields.
4900
4901@item FS == ""
4902Each individual character in the record becomes a separate field.
4903(This is a @command{gawk} extension; it is not specified by the
4904POSIX standard.)
4905@end table
4906
4907@c fakenode --- for prepinfo
4908@subheading Advanced Notes: Changing @code{FS} Does Not Affect the Fields
4909
4910@cindex POSIX @command{awk}, field separators and
4911@cindex field separators, POSIX and
4912According to the POSIX standard, @command{awk} is supposed to behave
4913as if each record is split into fields at the time it is read.
4914In particular, this means that if you change the value of @code{FS}
4915after a record is read, the value of the fields (i.e., how they were split)
4916should reflect the old value of @code{FS}, not the new one.
4917
4918@cindex dark corner, field separators
4919@cindex @command{sed} utility
4920@cindex stream editors
4921However, many implementations of @command{awk} do not work this way.  Instead,
4922they defer splitting the fields until a field is actually
4923referenced.  The fields are split
4924using the @emph{current} value of @code{FS}!
4925@value{DARKCORNER}
4926This behavior can be difficult
4927to diagnose. The following example illustrates the difference
4928between the two methods.
4929(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.''
4930Its behavior is also defined by the POSIX standard.}
4931command prints just the first line of @file{/etc/passwd}.)
4932
4933@example
4934sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
4935@end example
4936
4937@noindent
4938which usually prints:
4939
4940@example
4941root
4942@end example
4943
4944@noindent
4945on an incorrect implementation of @command{awk}, while @command{gawk}
4946prints something like:
4947
4948@example
4949root:nSijPlPhZZwgE:0:0:Root:/:
4950@end example
4951
4952@c fakenode --- for prepinfo
4953@subheading Advanced Notes: @code{FS} and @code{IGNORECASE}
4954
4955The @code{IGNORECASE} variable
4956(@pxref{User-modified})
4957affects field splitting @emph{only} when the value of @code{FS} is a regexp.
4958It has no effect when @code{FS} is a single character, even if
4959that character is a letter.  Thus, in the following code:
4960
4961@example
4962FS = "c"
4963IGNORECASE = 1
4964$0 = "aCa"
4965print $1
4966@end example
4967
4968@noindent
4969The output is @samp{aCa}.  If you really want to split fields on an
4970alphabetic character while ignoring case, use a regexp that will
4971do it for you.  E.g., @samp{FS = "[c]"}.  In this case, @code{IGNORECASE}
4972will take effect.
4973
4974@c ENDOFRANGE fisepr
4975@c ENDOFRANGE fisepg
4976
4977@node Constant Size
4978@section Reading Fixed-Width Data
4979
4980@ifnotinfo
4981@strong{Note:} This @value{SECTION} discusses an advanced
4982feature of @command{gawk}.  If you are a novice @command{awk} user,
4983you might want to skip it on the first reading.
4984@end ifnotinfo
4985
4986@ifinfo
4987(This @value{SECTION} discusses an advanced feature of @command{awk}.
4988If you are a novice @command{awk} user, you might want to skip it on
4989the first reading.)
4990@end ifinfo
4991
4992@cindex data, fixed-width
4993@cindex fixed-width data
4994@cindex advanced features, fixed-width data
4995@command{gawk} @value{PVERSION} 2.13 introduced a facility for dealing with
4996fixed-width fields with no distinctive field separator.  For example,
4997data of this nature arises in the input for old Fortran programs where
4998numbers are run together, or in the output of programs that did not
4999anticipate the use of their output as input for other programs.
5000
5001An example of the latter is a table where all the columns are lined up by
5002the use of a variable number of spaces and @emph{empty fields are just
5003spaces}.  Clearly, @command{awk}'s normal field splitting based on @code{FS}
5004does not work well in this case.  Although a portable @command{awk} program
5005can use a series of @code{substr} calls on @code{$0}
5006(@pxref{String Functions}),
5007this is awkward and inefficient for a large number of fields.
5008
5009@c comma before specifying is part of tertiary
5010@cindex troubleshooting, fatal errors, field widths, specifying
5011@cindex @command{w} utility
5012@cindex @code{FIELDWIDTHS} variable
5013The splitting of an input record into fixed-width fields is specified by
5014assigning a string containing space-separated numbers to the built-in
5015variable @code{FIELDWIDTHS}.  Each number specifies the width of the field,
5016@emph{including} columns between fields.  If you want to ignore the columns
5017between fields, you can specify the width as a separate field that is
5018subsequently ignored.
5019It is a fatal error to supply a field width that is not a positive number.
5020The following data is the output of the Unix @command{w} utility.  It is useful
5021to illustrate the use of @code{FIELDWIDTHS}:
5022
5023@example
5024@group
5025 10:06pm  up 21 days, 14:04,  23 users
5026User     tty       login@  idle   JCPU   PCPU  what
5027hzuo     ttyV0     8:58pm            9      5  vi p24.tex
5028hzang    ttyV3     6:37pm    50                -csh
5029eklye    ttyV5     9:53pm            7      1  em thes.tex
5030dportein ttyV6     8:17pm  1:47                -csh
5031gierd    ttyD3    10:00pm     1                elm
5032dave     ttyD4     9:47pm            4      4  w
5033brent    ttyp0    26Jun91  4:46  26:46   4:41  bash
5034dave     ttyq4    26Jun9115days     46     46  wnewmail
5035@end group
5036@end example
5037
5038The following program takes the above input, converts the idle time to
5039number of seconds, and prints out the first two fields and the calculated
5040idle time:
5041
5042@strong{Note:}
5043This program uses a number of @command{awk} features that
5044haven't been introduced yet.
5045
5046@example
5047BEGIN  @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
5048NR > 2 @{
5049    idle = $4
5050    sub(/^  */, "", idle)   # strip leading spaces
5051    if (idle == "")
5052        idle = 0
5053    if (idle ~ /:/) @{
5054        split(idle, t, ":")
5055        idle = t[1] * 60 + t[2]
5056    @}
5057    if (idle ~ /days/)
5058        idle *= 24 * 60 * 60
5059
5060    print $1, $2, idle
5061@}
5062@end example
5063
5064Running the program on the data produces the following results:
5065
5066@example
5067hzuo      ttyV0  0
5068hzang     ttyV3  50
5069eklye     ttyV5  0
5070dportein  ttyV6  107
5071gierd     ttyD3  1
5072dave      ttyD4  0
5073brent     ttyp0  286
5074dave      ttyq4  1296000
5075@end example
5076
5077Another (possibly more practical) example of fixed-width input data
5078is the input from a deck of balloting cards.  In some parts of
5079the United States, voters mark their choices by punching holes in computer
5080cards.  These cards are then processed to count the votes for any particular
5081candidate or on any particular issue.  Because a voter may choose not to
5082vote on some issue, any column on the card may be empty.  An @command{awk}
5083program for processing such data could use the @code{FIELDWIDTHS} feature
5084to simplify reading the data.  (Of course, getting @command{gawk} to run on
5085a system with card readers is another story!)
5086
5087@ignore
5088Exercise: Write a ballot card reading program
5089@end ignore
5090
5091@cindex @command{gawk}, splitting fields and
5092Assigning a value to @code{FS} causes @command{gawk} to use
5093@code{FS} for field splitting again.  Use @samp{FS = FS} to make this happen,
5094without having to know the current value of @code{FS}.
5095In order to tell which kind of field splitting is in effect,
5096use @code{PROCINFO["FS"]}
5097(@pxref{Auto-set}).
5098The value is @code{"FS"} if regular field splitting is being used,
5099or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used:
5100
5101@example
5102if (PROCINFO["FS"] == "FS")
5103    @var{regular field splitting} @dots{}
5104else
5105    @var{fixed-width field splitting} @dots{}
5106@end example
5107
5108This information is useful when writing a function
5109that needs to temporarily change @code{FS} or @code{FIELDWIDTHS},
5110read some records, and then restore the original settings
5111(@pxref{Passwd Functions},
5112for an example of such a function).
5113
5114@node Multiple Line
5115@section Multiple-Line Records
5116
5117@c STARTOFRANGE recm
5118@cindex records, multiline
5119@c STARTOFRANGE imr
5120@cindex input, multiline records
5121@c STARTOFRANGE frm
5122@cindex files, reading, multiline records
5123@cindex input, files, See input files
5124In some databases, a single line cannot conveniently hold all the
5125information in one entry.  In such cases, you can use multiline
5126records.  The first step in doing this is to choose your data format.
5127
5128@cindex record separators, with multiline records
5129One technique is to use an unusual character or string to separate
5130records.  For example, you could use the formfeed character (written
5131@samp{\f} in @command{awk}, as in C) to separate them, making each record
5132a page of the file.  To do this, just set the variable @code{RS} to
5133@code{"\f"} (a string containing the formfeed character).  Any
5134other character could equally well be used, as long as it won't be part
5135of the data in a record.
5136
5137@cindex @code{RS} variable, multiline records and
5138Another technique is to have blank lines separate records.  By a special
5139dispensation, an empty string as the value of @code{RS} indicates that
5140records are separated by one or more blank lines.  When @code{RS} is set
5141to the empty string, each record always ends at the first blank line
5142encountered.  The next record doesn't start until the first nonblank
5143line that follows.  No matter how many blank lines appear in a row, they
5144all act as one record separator.
5145(Blank lines must be completely empty; lines that contain only
5146whitespace do not count.)
5147
5148@cindex leftmost longest match
5149@cindex matching, leftmost longest
5150You can achieve the same effect as @samp{RS = ""} by assigning the
5151string @code{"\n\n+"} to @code{RS}. This regexp matches the newline
5152at the end of the record and one or more blank lines after the record.
5153In addition, a regular expression always matches the longest possible
5154sequence when there is a choice
5155(@pxref{Leftmost Longest}).
5156So the next record doesn't start until
5157the first nonblank line that follows---no matter how many blank lines
5158appear in a row, they are considered one record separator.
5159
5160@cindex dark corner, multiline records
5161There is an important difference between @samp{RS = ""} and
5162@samp{RS = "\n\n+"}. In the first case, leading newlines in the input
5163@value{DF} are ignored, and if a file ends without extra blank lines
5164after the last record, the final newline is removed from the record.
5165In the second case, this special processing is not done.
5166@value{DARKCORNER}
5167
5168@cindex field separators, in multiline records
5169Now that the input is separated into records, the second step is to
5170separate the fields in the record.  One way to do this is to divide each
5171of the lines into fields in the normal manner.  This happens by default
5172as the result of a special feature.  When @code{RS} is set to the empty
5173string, @emph{and} @code{FS} is a set to a single character,
5174the newline character @emph{always} acts as a field separator.
5175This is in addition to whatever field separations result from
5176@code{FS}.@footnote{When @code{FS} is the null string (@code{""})
5177or a regexp, this special feature of @code{RS} does not apply.
5178It does apply to the default field separator of a single space:
5179@samp{FS = " "}.}
5180
5181The original motivation for this special exception was probably to provide
5182useful behavior in the default case (i.e., @code{FS} is equal
5183to @w{@code{" "}}).  This feature can be a problem if you really don't
5184want the newline character to separate fields, because there is no way to
5185prevent it.  However, you can work around this by using the @code{split}
5186function to break up the record manually
5187(@pxref{String Functions}).
5188If you have a single character field separator, you can work around
5189the special feature in a different way, by making @code{FS} into a
5190regexp for that single character.  For example, if the field
5191separator is a percent character, instead of
5192@samp{FS = "%"}, use @samp{FS = "[%]"}.
5193
5194Another way to separate fields is to
5195put each field on a separate line: to do this, just set the
5196variable @code{FS} to the string @code{"\n"}.  (This single
5197character seperator matches a single newline.)
5198A practical example of a @value{DF} organized this way might be a mailing
5199list, where each entry is separated by blank lines.  Consider a mailing
5200list in a file named @file{addresses}, which looks like this:
5201
5202@example
5203Jane Doe
5204123 Main Street
5205Anywhere, SE 12345-6789
5206
5207John Smith
5208456 Tree-lined Avenue
5209Smallville, MW 98765-4321
5210@dots{}
5211@end example
5212
5213@noindent
5214A simple program to process this file is as follows:
5215
5216@example
5217# addrs.awk --- simple mailing list program
5218
5219# Records are separated by blank lines.
5220# Each line is one field.
5221BEGIN @{ RS = "" ; FS = "\n" @}
5222
5223@{
5224      print "Name is:", $1
5225      print "Address is:", $2
5226      print "City and State are:", $3
5227      print ""
5228@}
5229@end example
5230
5231Running the program produces the following output:
5232
5233@example
5234$ awk -f addrs.awk addresses
5235@print{} Name is: Jane Doe
5236@print{} Address is: 123 Main Street
5237@print{} City and State are: Anywhere, SE 12345-6789
5238@print{}
5239@print{} Name is: John Smith
5240@print{} Address is: 456 Tree-lined Avenue
5241@print{} City and State are: Smallville, MW 98765-4321
5242@print{}
5243@dots{}
5244@end example
5245
5246@xref{Labels Program}, for a more realistic
5247program that deals with address lists.
5248The following
5249table
5250summarizes how records are split, based on the
5251value of
5252@ifinfo
5253@code{RS}.
5254(@samp{==} means ``is equal to.'')
5255@end ifinfo
5256@ifnotinfo
5257@code{RS}:
5258@end ifnotinfo
5259
5260@table @code
5261@item RS == "\n"
5262Records are separated by the newline character (@samp{\n}).  In effect,
5263every line in the @value{DF} is a separate record, including blank lines.
5264This is the default.
5265
5266@item RS == @var{any single character}
5267Records are separated by each occurrence of the character.  Multiple
5268successive occurrences delimit empty records.
5269
5270@item RS == ""
5271Records are separated by runs of blank lines.  The newline character
5272always serves as a field separator, in addition to whatever value
5273@code{FS} may have. Leading and trailing newlines in a file are ignored.
5274
5275@item RS == @var{regexp}
5276Records are separated by occurrences of characters that match @var{regexp}.
5277Leading and trailing matches of @var{regexp} delimit empty records.
5278(This is a @command{gawk} extension; it is not specified by the
5279POSIX standard.)
5280@end table
5281
5282@cindex @code{RT} variable
5283In all cases, @command{gawk} sets @code{RT} to the input text that matched the
5284value specified by @code{RS}.
5285@c ENDOFRANGE recm
5286@c ENDOFRANGE imr
5287@c ENDOFRANGE frm
5288
5289@node Getline
5290@section Explicit Input with @code{getline}
5291
5292@c STARTOFRANGE getl
5293@cindex @code{getline} command, explicit input with
5294@cindex input, explicit
5295So far we have been getting our input data from @command{awk}'s main
5296input stream---either the standard input (usually your terminal, sometimes
5297the output from another program) or from the
5298files specified on the command line.  The @command{awk} language has a
5299special built-in command called @code{getline} that
5300can be used to read input under your explicit control.
5301
5302The @code{getline} command is used in several different ways and should
5303@emph{not} be used by beginners.
5304The examples that follow the explanation of the @code{getline} command
5305include material that has not been covered yet.  Therefore, come back
5306and study the @code{getline} command @emph{after} you have reviewed the
5307rest of this @value{DOCUMENT} and have a good knowledge of how @command{awk} works.
5308
5309@cindex @code{ERRNO} variable
5310@cindex differences in @command{awk} and @command{gawk}, @code{getline} command
5311@cindex @code{getline} command, return values
5312The @code{getline} command returns one if it finds a record and zero if
5313it encounters the end of the file.  If there is some error in getting
5314a record, such as a file that cannot be opened, then @code{getline}
5315returns @minus{}1.  In this case, @command{gawk} sets the variable
5316@code{ERRNO} to a string describing the error that occurred.
5317
5318In the following examples, @var{command} stands for a string value that
5319represents a shell command.
5320
5321@menu
5322* Plain Getline::               Using @code{getline} with no arguments.
5323* Getline/Variable::            Using @code{getline} into a variable.
5324* Getline/File::                Using @code{getline} from a file.
5325* Getline/Variable/File::       Using @code{getline} into a variable from a
5326                                file.
5327* Getline/Pipe::                Using @code{getline} from a pipe.
5328* Getline/Variable/Pipe::       Using @code{getline} into a variable from a
5329                                pipe.
5330* Getline/Coprocess::           Using @code{getline} from a coprocess.
5331* Getline/Variable/Coprocess::  Using @code{getline} into a variable from a
5332                                coprocess.
5333* Getline Notes::               Important things to know about @code{getline}.
5334* Getline Summary::             Summary of @code{getline} Variants.
5335@end menu
5336
5337@node Plain Getline
5338@subsection Using @code{getline} with No Arguments
5339
5340The @code{getline} command can be used without arguments to read input
5341from the current input file.  All it does in this case is read the next
5342input record and split it up into fields.  This is useful if you've
5343finished processing the current record, but want to do some special
5344processing on the next record @emph{right now}.  For example:
5345
5346@example
5347@{
5348     if ((t = index($0, "/*")) != 0) @{
5349          # value of `tmp' will be "" if t is 1
5350          tmp = substr($0, 1, t - 1)
5351          u = index(substr($0, t + 2), "*/")
5352          while (u == 0) @{
5353               if (getline <= 0) @{
5354                    m = "unexpected EOF or error"
5355                    m = (m ": " ERRNO)
5356                    print m > "/dev/stderr"
5357                    exit
5358               @}
5359               t = -1
5360               u = index($0, "*/")
5361          @}
5362          # substr expression will be "" if */
5363          # occurred at end of line
5364          $0 = tmp substr($0, u + 2)
5365     @}
5366     print $0
5367@}
5368@end example
5369
5370This @command{awk} program deletes all C-style comments (@samp{/* @dots{}
5371*/}) from the input.  By replacing the @samp{print $0} with other
5372statements, you could perform more complicated processing on the
5373decommented input, such as searching for matches of a regular
5374expression.  (This program has a subtle problem---it does not work if one
5375comment ends and another begins on the same line.)
5376
5377@ignore
5378Exercise,
5379write a program that does handle multiple comments on the line.
5380@end ignore
5381
5382This form of the @code{getline} command sets @code{NF},
5383@code{NR}, @code{FNR}, and the value of @code{$0}.
5384
5385@strong{Note:} The new value of @code{$0} is used to test
5386the patterns of any subsequent rules.  The original value
5387of @code{$0} that triggered the rule that executed @code{getline}
5388is lost.
5389By contrast, the @code{next} statement reads a new record
5390but immediately begins processing it normally, starting with the first
5391rule in the program.  @xref{Next Statement}.
5392
5393@node Getline/Variable
5394@subsection Using @code{getline} into a Variable
5395@c comma before using is NOT for tertiary
5396@cindex variables, @code{getline} command into, using
5397
5398You can use @samp{getline @var{var}} to read the next record from
5399@command{awk}'s input into the variable @var{var}.  No other processing is
5400done.
5401For example, suppose the next line is a comment or a special string,
5402and you want to read it without triggering
5403any rules.  This form of @code{getline} allows you to read that line
5404and store it in a variable so that the main
5405read-a-line-and-check-each-rule loop of @command{awk} never sees it.
5406The following example swaps every two lines of input:
5407
5408@example
5409@{
5410     if ((getline tmp) > 0) @{
5411          print tmp
5412          print $0
5413     @} else
5414          print $0
5415@}
5416@end example
5417
5418@noindent
5419It takes the following list:
5420
5421@example
5422wan
5423tew
5424free
5425phore
5426@end example
5427
5428@noindent
5429and produces these results:
5430
5431@example
5432tew
5433wan
5434phore
5435free
5436@end example
5437
5438The @code{getline} command used in this way sets only the variables
5439@code{NR} and @code{FNR} (and of course, @var{var}).  The record is not
5440split into fields, so the values of the fields (including @code{$0}) and
5441the value of @code{NF} do not change.
5442
5443@node Getline/File
5444@subsection Using @code{getline} from a File
5445
5446@cindex input redirection
5447@cindex redirection of input
5448@cindex @code{<} (left angle bracket), @code{<} operator (I/O)
5449@cindex left angle bracket (@code{<}), @code{<} operator (I/O)
5450@cindex operators, input/output
5451Use @samp{getline < @var{file}} to read the next record from @var{file}.
5452Here @var{file} is a string-valued expression that
5453specifies the @value{FN}.  @samp{< @var{file}} is called a @dfn{redirection}
5454because it directs input to come from a different place.
5455For example, the following
5456program reads its input record from the file @file{secondary.input} when it
5457encounters a first field with a value equal to 10 in the current input
5458file:
5459
5460@example
5461@{
5462    if ($1 == 10) @{
5463         getline < "secondary.input"
5464         print
5465    @} else
5466         print
5467@}
5468@end example
5469
5470Because the main input stream is not used, the values of @code{NR} and
5471@code{FNR} are not changed. However, the record it reads is split into fields in
5472the normal manner, so the values of @code{$0} and the other fields are
5473changed, resulting in a new value of @code{NF}.
5474
5475@cindex POSIX @command{awk}, @code{<} operator and
5476@c Thanks to Paul Eggert for initial wording here
5477According to POSIX, @samp{getline < @var{expression}} is ambiguous if
5478@var{expression} contains unparenthesized operators other than
5479@samp{$}; for example, @samp{getline < dir "/" file} is ambiguous
5480because the concatenation operator is not parenthesized.  You should
5481write it as @samp{getline < (dir "/" file)} if you want your program
5482to be portable to other @command{awk} implementations.
5483
5484@node Getline/Variable/File
5485@subsection Using @code{getline} into a Variable from a File
5486@c comma before using is NOT for tertiary
5487@cindex variables, @code{getline} command into, using
5488
5489Use @samp{getline @var{var} < @var{file}} to read input
5490from the file
5491@var{file}, and put it in the variable @var{var}.  As above, @var{file}
5492is a string-valued expression that specifies the file from which to read.
5493
5494In this version of @code{getline}, none of the built-in variables are
5495changed and the record is not split into fields.  The only variable
5496changed is @var{var}.
5497For example, the following program copies all the input files to the
5498output, except for records that say @w{@samp{@@include @var{filename}}}.
5499Such a record is replaced by the contents of the file
5500@var{filename}:
5501
5502@example
5503@{
5504     if (NF == 2 && $1 == "@@include") @{
5505          while ((getline line < $2) > 0)
5506               print line
5507          close($2)
5508     @} else
5509          print
5510@}
5511@end example
5512
5513Note here how the name of the extra input file is not built into
5514the program; it is taken directly from the data, specifically from the second field on
5515the @samp{@@include} line.
5516
5517@cindex @code{close} function
5518The @code{close} function is called to ensure that if two identical
5519@samp{@@include} lines appear in the input, the entire specified file is
5520included twice.
5521@xref{Close Files And Pipes}.
5522
5523One deficiency of this program is that it does not process nested
5524@samp{@@include} statements
5525(i.e., @samp{@@include} statements in included files)
5526the way a true macro preprocessor would.
5527@xref{Igawk Program}, for a program
5528that does handle nested @samp{@@include} statements.
5529
5530@node Getline/Pipe
5531@subsection Using @code{getline} from a Pipe
5532
5533@cindex @code{|} (vertical bar), @code{|} operator (I/O)
5534@cindex vertical bar (@code{|}), @code{|} operator (I/O)
5535@cindex input pipeline
5536@cindex pipes, input
5537@cindex operators, input/output
5538The output of a command can also be piped into @code{getline}, using
5539@samp{@var{command} | getline}.  In
5540this case, the string @var{command} is run as a shell command and its output
5541is piped into @command{awk} to be used as input.  This form of @code{getline}
5542reads one record at a time from the pipe.
5543For example, the following program copies its input to its output, except for
5544lines that begin with @samp{@@execute}, which are replaced by the output
5545produced by running the rest of the line as a shell command:
5546
5547@example
5548@{
5549     if ($1 == "@@execute") @{
5550          tmp = substr($0, 10)
5551          while ((tmp | getline) > 0)
5552               print
5553          close(tmp)
5554     @} else
5555          print
5556@}
5557@end example
5558
5559@noindent
5560@cindex @code{close} function
5561The @code{close} function is called to ensure that if two identical
5562@samp{@@execute} lines appear in the input, the command is run for
5563each one.
5564@ifnottex
5565@xref{Close Files And Pipes}.
5566@end ifnottex
5567@c Exercise!!
5568@c This example is unrealistic, since you could just use system
5569Given the input:
5570
5571@example
5572foo
5573bar
5574baz
5575@@execute who
5576bletch
5577@end example
5578
5579@noindent
5580the program might produce:
5581
5582@cindex Robbins, Bill
5583@cindex Robbins, Miriam
5584@cindex Robbins, Arnold
5585@example
5586foo
5587bar
5588baz
5589arnold     ttyv0   Jul 13 14:22
5590miriam     ttyp0   Jul 13 14:23     (murphy:0)
5591bill       ttyp1   Jul 13 14:23     (murphy:0)
5592bletch
5593@end example
5594
5595@noindent
5596Notice that this program ran the command @command{who} and printed the previous result.
5597(If you try this program yourself, you will of course get different results,
5598depending upon who is logged in on your system.)
5599
5600This variation of @code{getline} splits the record into fields, sets the
5601value of @code{NF}, and recomputes the value of @code{$0}.  The values of
5602@code{NR} and @code{FNR} are not changed.
5603
5604@cindex POSIX @command{awk}, @code{|} I/O operator and
5605@c Thanks to Paul Eggert for initial wording here
5606According to POSIX, @samp{@var{expression} | getline} is ambiguous if
5607@var{expression} contains unparenthesized operators other than
5608@samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous
5609because the concatenation operator is not parenthesized.  You should
5610write it as @samp{(@w{"echo "} "date") | getline} if you want your program
5611to be portable to other @command{awk} implementations.
5612
5613@node Getline/Variable/Pipe
5614@subsection Using @code{getline} into a Variable from a Pipe
5615@c comma before using is NOT for tertiary
5616@cindex variables, @code{getline} command into, using
5617
5618When you use @samp{@var{command} | getline @var{var}}, the
5619output of @var{command} is sent through a pipe to
5620@code{getline} and into the variable @var{var}.  For example, the
5621following program reads the current date and time into the variable
5622@code{current_time}, using the @command{date} utility, and then
5623prints it:
5624
5625@example
5626BEGIN @{
5627     "date" | getline current_time
5628     close("date")
5629     print "Report printed on " current_time
5630@}
5631@end example
5632
5633In this version of @code{getline}, none of the built-in variables are
5634changed and the record is not split into fields.
5635
5636@ifinfo
5637@c Thanks to Paul Eggert for initial wording here
5638According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if
5639@var{expression} contains unparenthesized operators other than
5640@samp{$}; for example, @samp{@w{"echo "} "date" | getline @var{var}} is ambiguous
5641because the concatenation operator is not parenthesized. You should
5642write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your
5643program to be portable to other @command{awk} implementations.
5644@end ifinfo
5645
5646@node Getline/Coprocess
5647@subsection Using @code{getline} from a Coprocess
5648@cindex coprocesses, @code{getline} from
5649@c comma before using is NOT for tertiary
5650@cindex @code{getline} command, coprocesses, using from
5651@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
5652@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
5653@cindex operators, input/output
5654@cindex differences in @command{awk} and @command{gawk}, input/output operators
5655
5656Input into @code{getline} from a pipe is a one-way operation.
5657The command that is started with @samp{@var{command} | getline} only
5658sends data @emph{to} your @command{awk} program.
5659
5660On occasion, you might want to send data to another program
5661for processing and then read the results back.
5662@command{gawk} allows you start a @dfn{coprocess}, with which two-way
5663communications are possible.  This is done with the @samp{|&}
5664operator.
5665Typically, you write data to the coprocess first and then
5666read results back, as shown in the following:
5667
5668@example
5669print "@var{some query}" |& "db_server"
5670"db_server" |& getline
5671@end example
5672
5673@noindent
5674which sends a query to @command{db_server} and then reads the results.
5675
5676The values of @code{NR} and
5677@code{FNR} are not changed,
5678because the main input stream is not used.
5679However, the record is split into fields in
5680the normal manner, thus changing the values of @code{$0}, of the other fields,
5681and of @code{NF}.
5682
5683Coprocesses are an advanced feature. They are discussed here only because
5684this is the @value{SECTION} on @code{getline}.
5685@xref{Two-way I/O},
5686where coprocesses are discussed in more detail.
5687
5688@node Getline/Variable/Coprocess
5689@subsection Using @code{getline} into a Variable from a Coprocess
5690@c comma before using is NOT for tertiary
5691@cindex variables, @code{getline} command into, using
5692
5693When you use @samp{@var{command} |& getline @var{var}}, the output from
5694the coprocess @var{command} is sent through a two-way pipe to @code{getline}
5695and into the variable @var{var}.
5696
5697In this version of @code{getline}, none of the built-in variables are
5698changed and the record is not split into fields.  The only variable
5699changed is @var{var}.
5700
5701@ifinfo
5702Coprocesses are an advanced feature. They are discussed here only because
5703this is the @value{SECTION} on @code{getline}.
5704@xref{Two-way I/O},
5705where coprocesses are discussed in more detail.
5706@end ifinfo
5707
5708@node Getline Notes
5709@subsection Points to Remember About @code{getline}
5710Here are some miscellaneous points about @code{getline} that
5711you should bear in mind:
5712
5713@itemize @bullet
5714@item
5715When @code{getline} changes the value of @code{$0} and @code{NF},
5716@command{awk} does @emph{not} automatically jump to the start of the
5717program and start testing the new record against every pattern.
5718However, the new record is tested against any subsequent rules.
5719
5720@cindex differences in @command{awk} and @command{gawk}, implementation limitations
5721@cindex implementation issues, @command{gawk}, limits
5722@cindex @command{awk}, implementations, limits
5723@cindex @command{gawk}, implementation issues, limits
5724@item
5725Many @command{awk} implementations limit the number of pipelines that an @command{awk}
5726program may have open to just one.  In @command{gawk}, there is no such limit.
5727You can open as many pipelines (and coprocesses) as the underlying operating
5728system permits.
5729
5730@cindex side effects, @code{FILENAME} variable
5731@c The comma before "setting with" does NOT represent a tertiary
5732@cindex @code{FILENAME} variable, @code{getline}, setting with
5733@cindex dark corner, @code{FILENAME} variable
5734@cindex @code{getline} command, @code{FILENAME} variable and
5735@cindex @code{BEGIN} pattern, @code{getline} and
5736@item
5737An interesting side effect occurs if you use @code{getline} without a
5738redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline}
5739reads from the command-line @value{DF}s, the first @code{getline} command
5740causes @command{awk} to set the value of @code{FILENAME}. Normally,
5741@code{FILENAME} does not have a value inside @code{BEGIN} rules, because you
5742have not yet started to process the command-line @value{DF}s.
5743@value{DARKCORNER}
5744(@xref{BEGIN/END},
5745also @pxref{Auto-set}.)
5746
5747@item
5748Using @code{FILENAME} with @code{getline}
5749(@samp{getline < FILENAME})
5750is likely to be a source for
5751confusion.  @command{awk} opens a separate input stream from the
5752current input file.  However, by not using a variable, @code{$0}
5753and @code{NR} are still updated.  If you're doing this, it's
5754probably by accident, and you should reconsider what it is you're
5755trying to accomplish.
5756@end itemize
5757
5758@node Getline Summary
5759@subsection Summary of @code{getline} Variants
5760@cindex @code{getline} command, variants
5761
5762The following table summarizes the eight variants of @code{getline},
5763listing which built-in variables are set by each one.
5764
5765@multitable {@var{command} @code{|& getline} @var{var}} {1234567890123456789012345678901234567890}
5766@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR}
5767
5768@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, and @code{NR}
5769
5770@item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF}
5771
5772@item @code{getline @var{var} < @var{file}} @tab Sets @var{var}
5773
5774@item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF}
5775
5776@item @var{command} @code{| getline} @var{var} @tab Sets @var{var}
5777
5778@item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF}.
5779This is a @command{gawk} extension
5780
5781@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var}.
5782This is a @command{gawk} extension
5783@end multitable
5784@c ENDOFRANGE getl
5785@c ENDOFRANGE inex
5786@c ENDOFRANGE infir
5787
5788@node Printing
5789@chapter Printing Output
5790
5791@c STARTOFRANGE prnt
5792@cindex printing
5793@cindex output, printing, See printing
5794One of the most common programming actions is to @dfn{print}, or output,
5795some or all of the input.  Use the @code{print} statement
5796for simple output, and the @code{printf} statement
5797for fancier formatting.
5798The @code{print} statement is not limited when
5799computing @emph{which} values to print. However, with two exceptions,
5800you cannot specify @emph{how} to print them---how many
5801columns, whether to use exponential notation or not, and so on.
5802(For the exceptions, @pxref{Output Separators}, and
5803@ref{OFMT}.)
5804For printing with specifications, you need the @code{printf} statement
5805(@pxref{Printf}).
5806
5807@c STARTOFRANGE prnts
5808@cindex @code{print} statement
5809@cindex @code{printf} statement
5810Besides basic and formatted printing, this @value{CHAPTER}
5811also covers I/O redirections to files and pipes, introduces
5812the special @value{FN}s that @command{gawk} processes internally,
5813and discusses the @code{close} built-in function.
5814
5815@menu
5816* Print::                       The @code{print} statement.
5817* Print Examples::              Simple examples of @code{print} statements.
5818* Output Separators::           The output separators and how to change them.
5819* OFMT::                        Controlling Numeric Output With @code{print}.
5820* Printf::                      The @code{printf} statement.
5821* Redirection::                 How to redirect output to multiple files and
5822                                pipes.
5823* Special Files::               File name interpretation in @command{gawk}.
5824                                @command{gawk} allows access to inherited file
5825                                descriptors.
5826* Close Files And Pipes::       Closing Input and Output Files and Pipes.
5827@end menu
5828
5829@node Print
5830@section The @code{print} Statement
5831
5832The @code{print} statement is used to produce output with simple, standardized
5833formatting.  Specify only the strings or numbers to print, in a
5834list separated by commas.  They are output, separated by single spaces,
5835followed by a newline.  The statement looks like this:
5836
5837@example
5838print @var{item1}, @var{item2}, @dots{}
5839@end example
5840
5841@noindent
5842The entire list of items may be optionally enclosed in parentheses.  The
5843parentheses are necessary if any of the item expressions uses the @samp{>}
5844relational operator; otherwise it could be confused with a redirection
5845(@pxref{Redirection}).
5846
5847The items to print can be constant strings or numbers, fields of the
5848current record (such as @code{$1}), variables, or any @command{awk}
5849expression.  Numeric values are converted to strings and then printed.
5850
5851@cindex records, printing
5852@cindex lines, blank, printing
5853@cindex text, printing
5854The simple statement @samp{print} with no items is equivalent to
5855@samp{print $0}: it prints the entire current record.  To print a blank
5856line, use @samp{print ""}, where @code{""} is the empty string.
5857To print a fixed piece of text, use a string constant, such as
5858@w{@code{"Don't Panic"}}, as one item.  If you forget to use the
5859double-quote characters, your text is taken as an @command{awk}
5860expression, and you will probably get an error.  Keep in mind that a
5861space is printed between any two items.
5862
5863@node Print Examples
5864@section Examples of @code{print} Statements
5865
5866Each @code{print} statement makes at least one line of output.  However, it
5867isn't limited to only one line.  If an item value is a string that contains a
5868newline, the newline is output along with the rest of the string.  A
5869single @code{print} statement can make any number of lines this way.
5870
5871@cindex newlines, printing
5872The following is an example of printing a string that contains embedded newlines
5873(the @samp{\n} is an escape sequence, used to represent the newline
5874character; @pxref{Escape Sequences}):
5875
5876@example
5877$ awk 'BEGIN @{ print "line one\nline two\nline three" @}'
5878@print{} line one
5879@print{} line two
5880@print{} line three
5881@end example
5882
5883@cindex fields, printing
5884The next example, which is run on the @file{inventory-shipped} file,
5885prints the first two fields of each input record, with a space between
5886them:
5887
5888@example
5889$ awk '@{ print $1, $2 @}' inventory-shipped
5890@print{} Jan 13
5891@print{} Feb 15
5892@print{} Mar 15
5893@dots{}
5894@end example
5895
5896@cindex @code{print} statement, commas, omitting
5897@c comma does NOT start tertiary
5898@cindex troubleshooting, @code{print} statement, omitting commas
5899A common mistake in using the @code{print} statement is to omit the comma
5900between two items.  This often has the effect of making the items run
5901together in the output, with no space.  The reason for this is that
5902juxtaposing two string expressions in @command{awk} means to concatenate
5903them.  Here is the same program, without the comma:
5904
5905@example
5906$ awk '@{ print $1 $2 @}' inventory-shipped
5907@print{} Jan13
5908@print{} Feb15
5909@print{} Mar15
5910@dots{}
5911@end example
5912
5913@c comma does NOT start tertiary
5914@cindex @code{BEGIN} pattern, headings, adding
5915To someone unfamiliar with the @file{inventory-shipped} file, neither
5916example's output makes much sense.  A heading line at the beginning
5917would make it clearer.  Let's add some headings to our table of months
5918(@code{$1}) and green crates shipped (@code{$2}).  We do this using the
5919@code{BEGIN} pattern
5920(@pxref{BEGIN/END})
5921so that the headings are only printed once:
5922
5923@example
5924awk 'BEGIN @{  print "Month Crates"
5925              print "----- ------" @}
5926           @{  print $1, $2 @}' inventory-shipped
5927@end example
5928
5929@noindent
5930When run, the program prints the following:
5931
5932@example
5933Month Crates
5934----- ------
5935Jan 13
5936Feb 15
5937Mar 15
5938@dots{}
5939@end example
5940
5941@noindent
5942The only problem, however, is that the headings and the table data
5943don't line up!  We can fix this by printing some spaces between the
5944two fields:
5945
5946@example
5947@group
5948awk 'BEGIN @{ print "Month Crates"
5949             print "----- ------" @}
5950           @{ print $1, "     ", $2 @}' inventory-shipped
5951@end group
5952@end example
5953
5954@c comma does NOT start tertiary
5955@cindex @code{printf} statement, columns, aligning
5956@cindex columns, aligning
5957Lining up columns this way can get pretty
5958complicated when there are many columns to fix.  Counting spaces for two
5959or three columns is simple, but any more than this can take up
5960a lot of time. This is why the @code{printf} statement was
5961created (@pxref{Printf});
5962one of its specialties is lining up columns of data.
5963
5964@cindex line continuations, in @code{print} statement
5965@cindex @code{print} statement, line continuations and
5966@strong{Note:} You can continue either a @code{print} or
5967@code{printf} statement simply by putting a newline after any comma
5968(@pxref{Statements/Lines}).
5969@c ENDOFRANGE prnts
5970
5971@node Output Separators
5972@section Output Separators
5973
5974@cindex @code{OFS} variable
5975As mentioned previously, a @code{print} statement contains a list
5976of items separated by commas.  In the output, the items are normally
5977separated by single spaces.  However, this doesn't need to be the case;
5978a single space is only the default.  Any string of
5979characters may be used as the @dfn{output field separator} by setting the
5980built-in variable @code{OFS}.  The initial value of this variable
5981is the string @w{@code{" "}}---that is, a single space.
5982
5983The output from an entire @code{print} statement is called an
5984@dfn{output record}.  Each @code{print} statement outputs one output
5985record, and then outputs a string called the @dfn{output record separator}
5986(or @code{ORS}).  The initial
5987value of @code{ORS} is the string @code{"\n"}; i.e., a newline
5988character.  Thus, each @code{print} statement normally makes a separate line.
5989
5990@cindex output, records
5991@cindex output record separator, See @code{ORS} variable
5992@cindex @code{ORS} variable
5993@cindex @code{BEGIN} pattern, @code{OFS}/@code{ORS} variables, assigning values to
5994In order to change how output fields and records are separated, assign
5995new values to the variables @code{OFS} and @code{ORS}.  The usual
5996place to do this is in the @code{BEGIN} rule
5997(@pxref{BEGIN/END}), so
5998that it happens before any input is processed.  It can also be done
5999with assignments on the command line, before the names of the input
6000files, or using the @option{-v} command-line option
6001(@pxref{Options}).
6002The following example prints the first and second fields of each input
6003record, separated by a semicolon, with a blank line added after each
6004newline:
6005
6006@ignore
6007Exercise,
6008Rewrite the
6009@example
6010awk 'BEGIN @{ print "Month Crates"
6011             print "----- ------" @}
6012           @{ print $1, "     ", $2 @}' inventory-shipped
6013@end example
6014program by using a new value of @code{OFS}.
6015@end ignore
6016
6017@example
6018$ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}
6019>            @{ print $1, $2 @}' BBS-list
6020@print{} aardvark;555-5553
6021@print{}
6022@print{} alpo-net;555-3412
6023@print{}
6024@print{} barfly;555-7685
6025@dots{}
6026@end example
6027
6028If the value of @code{ORS} does not contain a newline, the program's output
6029is run together on a single line.
6030
6031@node OFMT
6032@section Controlling Numeric Output with @code{print}
6033@cindex numeric, output format
6034@c the comma does NOT start a secondary
6035@cindex formats, numeric output
6036When the @code{print} statement is used to print numeric values,
6037@command{awk} internally converts the number to a string of characters
6038and prints that string.  @command{awk} uses the @code{sprintf} function
6039to do this conversion
6040(@pxref{String Functions}).
6041For now, it suffices to say that the @code{sprintf}
6042function accepts a @dfn{format specification} that tells it how to format
6043numbers (or strings), and that there are a number of different ways in which
6044numbers can be formatted.  The different format specifications are discussed
6045more fully in
6046@ref{Control Letters}.
6047
6048@cindex @code{sprintf} function
6049@cindex @code{OFMT} variable
6050@c the comma before OFMT does NOT start a tertiary
6051@cindex output, format specifier, @code{OFMT}
6052The built-in variable @code{OFMT} contains the default format specification
6053that @code{print} uses with @code{sprintf} when it wants to convert a
6054number to a string for printing.
6055The default value of @code{OFMT} is @code{"%.6g"}.
6056The way @code{print} prints numbers can be changed
6057by supplying different format specifications
6058as the value of @code{OFMT}, as shown in the following example:
6059
6060@example
6061$ awk 'BEGIN @{
6062>   OFMT = "%.0f"  # print numbers as integers (rounds)
6063>   print 17.23, 17.54 @}'
6064@print{} 17 18
6065@end example
6066
6067@noindent
6068@cindex dark corner, @code{OFMT} variable
6069@cindex POSIX @command{awk}, @code{OFMT} variable and
6070@cindex @code{OFMT} variable, POSIX @command{awk} and
6071According to the POSIX standard, @command{awk}'s behavior is undefined
6072if @code{OFMT} contains anything but a floating-point conversion specification.
6073@value{DARKCORNER}
6074
6075@node Printf
6076@section Using @code{printf} Statements for Fancier Printing
6077
6078@c STARTOFRANGE printfs
6079@cindex @code{printf} statement
6080@cindex output, formatted
6081@cindex formatting output
6082For more precise control over the output format than what is
6083normally provided by @code{print}, use @code{printf}.
6084@code{printf} can be used to
6085specify the width to use for each item, as well as various
6086formatting choices for numbers (such as what output base to use, whether to
6087print an exponent, whether to print a sign, and how many digits to print
6088after the decimal point).  This is done by supplying a string, called
6089the @dfn{format string}, that controls how and where to print the other
6090arguments.
6091
6092@menu
6093* Basic Printf::                Syntax of the @code{printf} statement.
6094* Control Letters::             Format-control letters.
6095* Format Modifiers::            Format-specification modifiers.
6096* Printf Examples::             Several examples.
6097@end menu
6098
6099@node Basic Printf
6100@subsection Introduction to the @code{printf} Statement
6101
6102@cindex @code{printf} statement, syntax of
6103A simple @code{printf} statement looks like this:
6104
6105@example
6106printf @var{format}, @var{item1}, @var{item2}, @dots{}
6107@end example
6108
6109@noindent
6110The entire list of arguments may optionally be enclosed in parentheses.  The
6111parentheses are necessary if any of the item expressions use the @samp{>}
6112relational operator; otherwise, it can be confused with a redirection
6113(@pxref{Redirection}).
6114
6115@cindex format strings
6116The difference between @code{printf} and @code{print} is the @var{format}
6117argument.  This is an expression whose value is taken as a string; it
6118specifies how to output each of the other arguments.  It is called the
6119@dfn{format string}.
6120
6121The format string is very similar to that in the ISO C library function
6122@code{printf}.  Most of @var{format} is text to output verbatim.
6123Scattered among this text are @dfn{format specifiers}---one per item.
6124Each format specifier says to output the next item in the argument list
6125at that place in the format.
6126
6127The @code{printf} statement does not automatically append a newline
6128to its output.  It outputs only what the format string specifies.
6129So if a newline is needed, you must include one in the format string.
6130The output separator variables @code{OFS} and @code{ORS} have no effect
6131on @code{printf} statements. For example:
6132
6133@example
6134$ awk 'BEGIN @{
6135>    ORS = "\nOUCH!\n"; OFS = "+"
6136>    msg = "Dont Panic!"
6137>    printf "%s\n", msg
6138> @}'
6139@print{} Dont Panic!
6140@end example
6141
6142@noindent
6143Here, neither the @samp{+} nor the @samp{OUCH} appear when
6144the message is printed.
6145
6146@node Control Letters
6147@subsection Format-Control Letters
6148@cindex @code{printf} statement, format-control characters
6149@cindex format specifiers, @code{printf} statement
6150
6151A format specifier starts with the character @samp{%} and ends with
6152a @dfn{format-control letter}---it tells the @code{printf} statement
6153how to output one item.  The format-control letter specifies what @emph{kind}
6154of value to print.  The rest of the format specifier is made up of
6155optional @dfn{modifiers} that control @emph{how} to print the value, such as
6156the field width.  Here is a list of the format-control letters:
6157
6158@table @code
6159@item %c
6160This prints a number as an ASCII character; thus, @samp{printf "%c",
616165} outputs the letter @samp{A}. (The output for a string value is
6162the first character of the string.)
6163
6164@item %d@r{,} %i
6165These are equivalent; they both print a decimal integer.
6166(The @samp{%i} specification is for compatibility with ISO C.)
6167
6168@item %e@r{,} %E
6169These print a number in scientific (exponential) notation;
6170for example:
6171
6172@example
6173printf "%4.3e\n", 1950
6174@end example
6175
6176@noindent
6177prints @samp{1.950e+03}, with a total of four significant figures, three of
6178which follow the decimal point.
6179(The @samp{4.3} represents two modifiers,
6180discussed in the next @value{SUBSECTION}.)
6181@samp{%E} uses @samp{E} instead of @samp{e} in the output.
6182
6183@item %f
6184This prints a number in floating-point notation.
6185For example:
6186
6187@example
6188printf "%4.3f", 1950
6189@end example
6190
6191@noindent
6192prints @samp{1950.000}, with a total of four significant figures, three of
6193which follow the decimal point.
6194(The @samp{4.3} represents two modifiers,
6195discussed in the next @value{SUBSECTION}.)
6196
6197@item %g@r{,} %G
6198These print a number in either scientific notation or in floating-point
6199notation, whichever uses fewer characters; if the result is printed in
6200scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}.
6201
6202@item %o
6203This prints an unsigned octal integer.
6204
6205@item %s
6206This prints a string.
6207
6208@item %u
6209This prints an unsigned decimal integer.
6210(This format is of marginal use, because all numbers in @command{awk}
6211are floating-point; it is provided primarily for compatibility with C.)
6212
6213@item %x@r{,} %X
6214These print an unsigned hexadecimal integer;
6215@samp{%X} uses the letters @samp{A} through @samp{F}
6216instead of @samp{a} through @samp{f}.
6217
6218@item %%
6219This isn't a format-control letter, but it does have meaning---the
6220sequence @samp{%%} outputs one @samp{%}; it does not consume an
6221argument and it ignores any modifiers.
6222@end table
6223
6224@cindex dark corner, format-control characters
6225@cindex @command{gawk}, format-control characters
6226@strong{Note:}
6227When using the integer format-control letters for values that are
6228outside the range of the widest C integer type, @command{gawk} switches to the
6229the @samp{%g} format specifier. If @option{--lint} is provided on the
6230command line (@pxref{Options}), @command{gawk}
6231warns about this.  Other versions of @command{awk} may print invalid
6232values or do something else entirely.
6233@value{DARKCORNER}
6234
6235@node Format Modifiers
6236@subsection Modifiers for @code{printf} Formats
6237
6238@c STARTOFRANGE pfm
6239@cindex @code{printf} statement, modifiers
6240@c the comma here does NOT start a secondary
6241@cindex modifiers, in format specifiers
6242A format specification can also include @dfn{modifiers} that can control
6243how much of the item's value is printed, as well as how much space it gets.
6244The modifiers come between the @samp{%} and the format-control letter.
6245We will use the bullet symbol ``@bullet{}'' in the following examples to
6246represent
6247spaces in the output. Here are the possible modifiers, in the order in
6248which they may appear:
6249
6250@table @code
6251@cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{printf} statements
6252@cindex @code{printf} statement, positional specifiers
6253@c the command does NOT start a secondary
6254@cindex positional specifiers, @code{printf} statement
6255@item @var{N}$
6256An integer constant followed by a @samp{$} is a @dfn{positional specifier}.
6257Normally, format specifications are applied to arguments in the order
6258given in the format string.  With a positional specifier, the format
6259specification is applied to a specific argument, instead of what
6260would be the next argument in the list.  Positional specifiers begin
6261counting with one. Thus:
6262
6263@example
6264printf "%s %s\n", "don't", "panic"
6265printf "%2$s %1$s\n", "panic", "don't"
6266@end example
6267
6268@noindent
6269prints the famous friendly message twice.
6270
6271At first glance, this feature doesn't seem to be of much use.
6272It is in fact a @command{gawk} extension, intended for use in translating
6273messages at runtime.
6274@xref{Printf Ordering},
6275which describes how and why to use positional specifiers.
6276For now, we will not use them.
6277
6278@item -
6279The minus sign, used before the width modifier (see later on in
6280this table),
6281says to left-justify
6282the argument within its specified width.  Normally, the argument
6283is printed right-justified in the specified width.  Thus:
6284
6285@example
6286printf "%-4s", "foo"
6287@end example
6288
6289@noindent
6290prints @samp{foo@bullet{}}.
6291
6292@item @var{space}
6293For numeric conversions, prefix positive values with a space and
6294negative values with a minus sign.
6295
6296@item +
6297The plus sign, used before the width modifier (see later on in
6298this table),
6299says to always supply a sign for numeric conversions, even if the data
6300to format is positive. The @samp{+} overrides the space modifier.
6301
6302@item #
6303Use an ``alternate form'' for certain control letters.
6304For @samp{%o}, supply a leading zero.
6305For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
6306a nonzero result.
6307For @samp{%e}, @samp{%E}, and @samp{%f}, the result always contains a
6308decimal point.
6309For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result.
6310
6311@cindex dark corner
6312@item 0
6313A leading @samp{0} (zero) acts as a flag that indicates that output should be
6314padded with zeros instead of spaces.
6315This applies even to non-numeric output formats.
6316@value{DARKCORNER}
6317This flag only has an effect when the field width is wider than the
6318value to print.
6319
6320@item @var{width}
6321This is a number specifying the desired minimum width of a field.  Inserting any
6322number between the @samp{%} sign and the format-control character forces the
6323field to expand to this width.  The default way to do this is to
6324pad with spaces on the left.  For example:
6325
6326@example
6327printf "%4s", "foo"
6328@end example
6329
6330@noindent
6331prints @samp{@bullet{}foo}.
6332
6333The value of @var{width} is a minimum width, not a maximum.  If the item
6334value requires more than @var{width} characters, it can be as wide as
6335necessary.  Thus, the following:
6336
6337@example
6338printf "%4s", "foobar"
6339@end example
6340
6341@noindent
6342prints @samp{foobar}.
6343
6344Preceding the @var{width} with a minus sign causes the output to be
6345padded with spaces on the right, instead of on the left.
6346
6347@item .@var{prec}
6348A period followed by an integer constant
6349specifies the precision to use when printing.
6350The meaning of the precision varies by control letter:
6351
6352@table @asis
6353@item @code{%e}, @code{%E}, @code{%f}
6354Number of digits to the right of the decimal point.
6355
6356@item @code{%g}, @code{%G}
6357Maximum number of significant digits.
6358
6359@item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X}
6360Minimum number of digits to print.
6361
6362@item @code{%s}
6363Maximum number of characters from the string that should print.
6364@end table
6365
6366Thus, the following:
6367
6368@example
6369printf "%.4s", "foobar"
6370@end example
6371
6372@noindent
6373prints @samp{foob}.
6374@end table
6375
6376The C library @code{printf}'s dynamic @var{width} and @var{prec}
6377capability (for example, @code{"%*.*s"}) is supported.  Instead of
6378supplying explicit @var{width} and/or @var{prec} values in the format
6379string, they are passed in the argument list.  For example:
6380
6381@example
6382w = 5
6383p = 3
6384s = "abcdefg"
6385printf "%*.*s\n", w, p, s
6386@end example
6387
6388@noindent
6389is exactly equivalent to:
6390
6391@example
6392s = "abcdefg"
6393printf "%5.3s\n", s
6394@end example
6395
6396@noindent
6397Both programs output @samp{@w{@bullet{}@bullet{}abc}}.
6398Earlier versions of @command{awk} did not support this capability.
6399If you must use such a version, you may simulate this feature by using
6400concatenation to build up the format string, like so:
6401
6402@example
6403w = 5
6404p = 3
6405s = "abcdefg"
6406printf "%" w "." p "s\n", s
6407@end example
6408
6409@noindent
6410This is not particularly easy to read but it does work.
6411
6412@c @cindex lint checks
6413@cindex troubleshooting, fatal errors, @code{printf} format strings
6414@cindex POSIX @command{awk}, @code{printf} format strings and
6415C programmers may be used to supplying additional
6416@samp{l}, @samp{L}, and @samp{h}
6417modifiers in @code{printf} format strings. These are not valid in @command{awk}.
6418Most @command{awk} implementations silently ignore these modifiers.
6419If @option{--lint} is provided on the command line
6420(@pxref{Options}),
6421@command{gawk} warns about their use. If @option{--posix} is supplied,
6422their use is a fatal error.
6423@c ENDOFRANGE pfm
6424
6425@node Printf Examples
6426@subsection Examples Using @code{printf}
6427
6428The following is a simple example of
6429how to use @code{printf} to make an aligned table:
6430
6431@example
6432awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
6433@end example
6434
6435@noindent
6436This command
6437prints the names of the bulletin boards (@code{$1}) in the file
6438@file{BBS-list} as a string of 10 characters that are left-justified.  It also
6439prints the phone numbers (@code{$2}) next on the line.  This
6440produces an aligned two-column table of names and phone numbers,
6441as shown here:
6442
6443@example
6444$ awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
6445@print{} aardvark   555-5553
6446@print{} alpo-net   555-3412
6447@print{} barfly     555-7685
6448@print{} bites      555-1675
6449@print{} camelot    555-0542
6450@print{} core       555-2912
6451@print{} fooey      555-1234
6452@print{} foot       555-6699
6453@print{} macfoo     555-6480
6454@print{} sdace      555-3430
6455@print{} sabafoo    555-2127
6456@end example
6457
6458In this case, the phone numbers had to be printed as strings because
6459the numbers are separated by a dash.  Printing the phone numbers as
6460numbers would have produced just the first three digits: @samp{555}.
6461This would have been pretty confusing.
6462
6463It wasn't necessary to specify a width for the phone numbers because
6464they are last on their lines.  They don't need to have spaces
6465after them.
6466
6467The table could be made to look even nicer by adding headings to the
6468tops of the columns.  This is done using the @code{BEGIN} pattern
6469(@pxref{BEGIN/END})
6470so that the headers are only printed once, at the beginning of
6471the @command{awk} program:
6472
6473@example
6474awk 'BEGIN @{ print "Name      Number"
6475             print "----      ------" @}
6476     @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
6477@end example
6478
6479The above example mixed @code{print} and @code{printf} statements in
6480the same program.  Using just @code{printf} statements can produce the
6481same results:
6482
6483@example
6484awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
6485             printf "%-10s %s\n", "----", "------" @}
6486     @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
6487@end example
6488
6489@noindent
6490Printing each column heading with the same format specification
6491used for the column elements ensures that the headings
6492are aligned just like the columns.
6493
6494The fact that the same format specification is used three times can be
6495emphasized by storing it in a variable, like this:
6496
6497@example
6498awk 'BEGIN @{ format = "%-10s %s\n"
6499             printf format, "Name", "Number"
6500             printf format, "----", "------" @}
6501     @{ printf format, $1, $2 @}' BBS-list
6502@end example
6503
6504@c !!! exercise
6505At this point, it would be a worthwhile exercise to use the
6506@code{printf} statement to line up the headings and table data for the
6507@file{inventory-shipped} example that was covered earlier in the @value{SECTION}
6508on the @code{print} statement
6509(@pxref{Print}).
6510@c ENDOFRANGE printfs
6511
6512@node Redirection
6513@section Redirecting Output of @code{print} and @code{printf}
6514
6515@cindex output redirection
6516@cindex redirection of output
6517So far, the output from @code{print} and @code{printf} has gone
6518to the standard
6519output, usually the terminal.  Both @code{print} and @code{printf} can
6520also send their output to other places.
6521This is called @dfn{redirection}.
6522
6523A redirection appears after the @code{print} or @code{printf} statement.
6524Redirections in @command{awk} are written just like redirections in shell
6525commands, except that they are written inside the @command{awk} program.
6526
6527@c the commas here are part of the see also
6528@cindex @code{print} statement, See Also redirection, of output
6529@cindex @code{printf} statement, See Also redirection, of output
6530There are four forms of output redirection: output to a file, output
6531appended to a file, output through a pipe to another command, and output
6532to a coprocess.  They are all shown for the @code{print} statement,
6533but they work identically for @code{printf}:
6534
6535@table @code
6536@cindex @code{>} (right angle bracket), @code{>} operator (I/O)
6537@cindex right angle bracket (@code{>}), @code{>} operator (I/O)
6538@cindex operators, input/output
6539@item print @var{items} > @var{output-file}
6540This type of redirection prints the items into the output file named
6541@var{output-file}.  The @value{FN} @var{output-file} can be any
6542expression.  Its value is changed to a string and then used as a
6543@value{FN} (@pxref{Expressions}).
6544
6545When this type of redirection is used, the @var{output-file} is erased
6546before the first output is written to it.  Subsequent writes to the same
6547@var{output-file} do not erase @var{output-file}, but append to it.
6548(This is different from how you use redirections in shell scripts.)
6549If @var{output-file} does not exist, it is created.  For example, here
6550is how an @command{awk} program can write a list of BBS names to one
6551file named @file{name-list}, and a list of phone numbers to another file
6552named @file{phone-list}:
6553
6554@example
6555$ awk '@{ print $2 > "phone-list"
6556>        print $1 > "name-list" @}' BBS-list
6557$ cat phone-list
6558@print{} 555-5553
6559@print{} 555-3412
6560@dots{}
6561$ cat name-list
6562@print{} aardvark
6563@print{} alpo-net
6564@dots{}
6565@end example
6566
6567@noindent
6568Each output file contains one name or number per line.
6569
6570@cindex @code{>} (right angle bracket), @code{>>} operator (I/O)
6571@cindex right angle bracket (@code{>}), @code{>>} operator (I/O)
6572@item print @var{items} >> @var{output-file}
6573This type of redirection prints the items into the pre-existing output file
6574named @var{output-file}.  The difference between this and the
6575single-@samp{>} redirection is that the old contents (if any) of
6576@var{output-file} are not erased.  Instead, the @command{awk} output is
6577appended to the file.
6578If @var{output-file} does not exist, then it is created.
6579
6580@cindex @code{|} (vertical bar), @code{|} operator (I/O)
6581@cindex pipes, output
6582@cindex output, pipes
6583@item print @var{items} | @var{command}
6584It is also possible to send output to another program through a pipe
6585instead of into a file.   This type of redirection opens a pipe to
6586@var{command}, and writes the values of @var{items} through this pipe
6587to another process created to execute @var{command}.
6588
6589The redirection argument @var{command} is actually an @command{awk}
6590expression.  Its value is converted to a string whose contents give
6591the shell command to be run.  For example, the following produces two
6592files, one unsorted list of BBS names, and one list sorted in reverse
6593alphabetical order:
6594
6595@ignore
659610/2000:
6597This isn't the best style, since COMMAND is assigned for each
6598record.  It's done to avoid overfull hboxes in TeX.  Leave it
6599alone for now and let's hope no-one notices.
6600@end ignore
6601
6602@example
6603awk '@{ print $1 > "names.unsorted"
6604       command = "sort -r > names.sorted"
6605       print $1 | command @}' BBS-list
6606@end example
6607
6608The unsorted list is written with an ordinary redirection, while
6609the sorted list is written by piping through the @command{sort} utility.
6610
6611The next example uses redirection to mail a message to the mailing
6612list @samp{bug-system}.  This might be useful when trouble is encountered
6613in an @command{awk} script run periodically for system maintenance:
6614
6615@example
6616report = "mail bug-system"
6617print "Awk script failed:", $0 | report
6618m = ("at record number " FNR " of " FILENAME)
6619print m | report
6620close(report)
6621@end example
6622
6623The message is built using string concatenation and saved in the variable
6624@code{m}.  It's then sent down the pipeline to the @command{mail} program.
6625(The parentheses group the items to concatenate---see
6626@ref{Concatenation}.)
6627
6628The @code{close} function is called here because it's a good idea to close
6629the pipe as soon as all the intended output has been sent to it.
6630@xref{Close Files And Pipes},
6631for more information.
6632
6633This example also illustrates the use of a variable to represent
6634a @var{file} or @var{command}---it is not necessary to always
6635use a string constant.  Using a variable is generally a good idea,
6636because @command{awk} requires that the string value be spelled identically
6637every time.
6638
6639@cindex coprocesses
6640@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
6641@cindex operators, input/output
6642@cindex differences in @command{awk} and @command{gawk}, input/output operators
6643@item print @var{items} |& @var{command}
6644This type of redirection prints the items to the input of @var{command}.
6645The difference between this and the
6646single-@samp{|} redirection is that the output from @var{command}
6647can be read with @code{getline}.
6648Thus @var{command} is a @dfn{coprocess}, which works together with,
6649but subsidiary to, the @command{awk} program.
6650
6651This feature is a @command{gawk} extension, and is not available in
6652POSIX @command{awk}.
6653@xref{Two-way I/O},
6654for a more complete discussion.
6655@end table
6656
6657Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&}
6658asks the system to open a file, pipe, or coprocess only if the particular
6659@var{file} or @var{command} you specify has not already been written
6660to by your program or if it has been closed since it was last written to.
6661
6662@cindex troubleshooting, printing
6663It is a common error to use @samp{>} redirection for the first @code{print}
6664to a file, and then to use @samp{>>} for subsequent output:
6665
6666@example
6667# clear the file
6668print "Don't panic" > "guide.txt"
6669@dots{}
6670# append
6671print "Avoid improbability generators" >> "guide.txt"
6672@end example
6673
6674@noindent
6675This is indeed how redirections must be used from the shell.  But in
6676@command{awk}, it isn't necessary.  In this kind of case, a program should
6677use @samp{>} for all the @code{print} statements, since the output file
6678is only opened once.
6679
6680@cindex differences in @command{awk} and @command{gawk}, implementation limitations
6681@c the comma here does NOT start a secondary
6682@cindex implementation issues, @command{gawk}, limits
6683@cindex @command{awk}, implementation issues, pipes
6684@cindex @command{gawk}, implementation issues, pipes
6685@ifnotinfo
6686As mentioned earlier
6687(@pxref{Getline Notes}),
6688many
6689@end ifnotinfo
6690@ifnottex
6691Many
6692@end ifnottex
6693@command{awk} implementations limit the number of pipelines that an @command{awk}
6694program may have open to just one!  In @command{gawk}, there is no such limit.
6695@command{gawk} allows a program to
6696open as many pipelines as the underlying operating system permits.
6697
6698@c fakenode --- for prepinfo
6699@subheading Advanced Notes: Piping into @command{sh}
6700@cindex advanced features, piping into @command{sh}
6701@cindex shells, piping commands into
6702
6703A particularly powerful way to use redirection is to build command lines
6704and pipe them into the shell, @command{sh}.  For example, suppose you
6705have a list of files brought over from a system where all the @value{FN}s
6706are stored in uppercase, and you wish to rename them to have names in
6707all lowercase.  The following program is both simple and efficient:
6708
6709@c @cindex @command{mv} utility
6710@example
6711@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @}
6712
6713END @{ close("sh") @}
6714@end example
6715
6716The @code{tolower} function returns its argument string with all
6717uppercase characters converted to lowercase
6718(@pxref{String Functions}).
6719The program builds up a list of command lines,
6720using the @command{mv} utility to rename the files.
6721It then sends the list to the shell for execution.
6722@c ENDOFRANGE outre
6723@c ENDOFRANGE reout
6724
6725@node Special Files
6726@section Special @value{FFN}s in @command{gawk}
6727@c STARTOFRANGE gfn
6728@cindex @command{gawk}, @value{FN}s in
6729
6730@command{gawk} provides a number of special @value{FN}s that it interprets
6731internally.  These @value{FN}s provide access to standard file descriptors,
6732process-related information, and TCP/IP networking.
6733
6734@menu
6735* Special FD::                  Special files for I/O.
6736* Special Process::             Special files for process information.
6737* Special Network::             Special files for network communications.
6738* Special Caveats::             Things to watch out for.
6739@end menu
6740
6741@node Special FD
6742@subsection Special Files for Standard Descriptors
6743@cindex standard input
6744@cindex input, standard
6745@cindex standard output
6746@cindex output, standard
6747@cindex error output
6748@cindex file descriptors
6749@cindex files, descriptors, See file descriptors
6750
6751Running programs conventionally have three input and output streams
6752already available to them for reading and writing.  These are known as
6753the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
6754output}.  These streams are, by default, connected to your terminal, but
6755they are often redirected with the shell, via the @samp{<}, @samp{<<},
6756@samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators.  Standard error
6757is typically used for writing error messages; the reason there are two separate
6758streams, standard output and standard error, is so that they can be
6759redirected separately.
6760
6761@cindex differences in @command{awk} and @command{gawk}, error messages
6762@cindex error handling
6763In other implementations of @command{awk}, the only way to write an error
6764message to standard error in an @command{awk} program is as follows:
6765
6766@example
6767print "Serious error detected!" | "cat 1>&2"
6768@end example
6769
6770@noindent
6771This works by opening a pipeline to a shell command that can access the
6772standard error stream that it inherits from the @command{awk} process.
6773This is far from elegant, and it is also inefficient, because it requires a
6774separate process.  So people writing @command{awk} programs often
6775don't do this.  Instead, they send the error messages to the
6776terminal, like this:
6777
6778@example
6779print "Serious error detected!" > "/dev/tty"
6780@end example
6781
6782@noindent
6783This usually has the same effect but not always: although the
6784standard error stream is usually the terminal, it can be redirected; when
6785that happens, writing to the terminal is not correct.  In fact, if
6786@command{awk} is run from a background job, it may not have a terminal at all.
6787Then opening @file{/dev/tty} fails.
6788
6789@command{gawk} provides special @value{FN}s for accessing the three standard
6790streams, as well as any other inherited open files.  If the @value{FN} matches
6791one of these special names when @command{gawk} redirects input or output,
6792then it directly uses the stream that the @value{FN} stands for.
6793These special @value{FN}s work for all operating systems that @command{gawk}
6794has been ported to, not just those that are POSIX-compliant:
6795
6796@cindex @value{FN}s, standard streams in @command{gawk}
6797@cindex @code{/dev/@dots{}} special files (@command{gawk})
6798@cindex files, @code{/dev/@dots{}} special files
6799@c @cindex @code{/dev/stdin} special file
6800@c @cindex @code{/dev/stdout} special file
6801@c @cindex @code{/dev/stderr} special file
6802@c @cindex @code{/dev/fd} special files
6803@table @file
6804@item /dev/stdin
6805The standard input (file descriptor 0).
6806
6807@item /dev/stdout
6808The standard output (file descriptor 1).
6809
6810@item /dev/stderr
6811The standard error output (file descriptor 2).
6812
6813@item /dev/fd/@var{N}
6814The file associated with file descriptor @var{N}.  Such a file must
6815be opened by the program initiating the @command{awk} execution (typically
6816the shell).  Unless special pains are taken in the shell from which
6817@command{gawk} is invoked, only descriptors 0, 1, and 2 are available.
6818@end table
6819
6820The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
6821are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
6822respectively. However, they are more self-explanatory.
6823The proper way to write an error message in a @command{gawk} program
6824is to use @file{/dev/stderr}, like this:
6825
6826@example
6827print "Serious error detected!" > "/dev/stderr"
6828@end example
6829
6830@cindex troubleshooting, quotes with @value{FN}s
6831Note the use of quotes around the @value{FN}.
6832Like any other redirection, the value must be a string.
6833It is a common error to omit the quotes, which leads
6834to confusing results.
6835@c Exercise: What does it do?  :-)
6836
6837@node Special Process
6838@subsection Special Files for Process-Related Information
6839
6840@cindex files, for process information
6841@cindex process information, files for
6842@command{gawk} also provides special @value{FN}s that give access to information
6843about the running @command{gawk} process.  Each of these ``files'' provides
6844a single record of information.  To read them more than once, they must
6845first be closed with the @code{close} function
6846(@pxref{Close Files And Pipes}).
6847The @value{FN}s are:
6848
6849@c @cindex @code{/dev/pid} special file
6850@c @cindex @code{/dev/pgrpid} special file
6851@c @cindex @code{/dev/ppid} special file
6852@c @cindex @code{/dev/user} special file
6853@table @file
6854@item /dev/pid
6855Reading this file returns the process ID of the current process,
6856in decimal form, terminated with a newline.
6857
6858@item /dev/ppid
6859Reading this file returns the parent process ID of the current process,
6860in decimal form, terminated with a newline.
6861
6862@item /dev/pgrpid
6863Reading this file returns the process group ID of the current process,
6864in decimal form, terminated with a newline.
6865
6866@item /dev/user
6867Reading this file returns a single record terminated with a newline.
6868The fields are separated with spaces.  The fields represent the
6869following information:
6870
6871@table @code
6872@item $1
6873The return value of the @code{getuid} system call
6874(the real user ID number).
6875
6876@item $2
6877The return value of the @code{geteuid} system call
6878(the effective user ID number).
6879
6880@item $3
6881The return value of the @code{getgid} system call
6882(the real group ID number).
6883
6884@item $4
6885The return value of the @code{getegid} system call
6886(the effective group ID number).
6887@end table
6888
6889If there are any additional fields, they are the group IDs returned by
6890the @code{getgroups} system call.
6891(Multiple groups may not be supported on all systems.)
6892@end table
6893
6894These special @value{FN}s may be used on the command line as @value{DF}s,
6895as well as for I/O redirections within an @command{awk} program.
6896They may not be used as source files with the @option{-f} option.
6897
6898@c @cindex automatic warnings
6899@c @cindex warnings, automatic
6900@strong{Note:}
6901The special files that provide process-related information are now considered
6902obsolete and will disappear entirely
6903in the next release of @command{gawk}.
6904@command{gawk} prints a warning message every time you use one of
6905these files.
6906To obtain process-related information, use the @code{PROCINFO} array.
6907@xref{Auto-set}.
6908
6909@node Special Network
6910@subsection Special Files for Network Communications
6911@cindex networks, support for
6912@cindex TCP/IP, support for
6913
6914Starting with @value{PVERSION} 3.1 of @command{gawk}, @command{awk} programs
6915can open a two-way
6916TCP/IP connection, acting as either a client or a server.
6917This is done using a special @value{FN} of the form:
6918
6919@example
6920@file{/inet/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}
6921@end example
6922
6923The @var{protocol} is one of @samp{tcp}, @samp{udp}, or @samp{raw},
6924and the other fields represent the other essential pieces of information
6925for making a networking connection.
6926These @value{FN}s are used with the @samp{|&} operator for communicating
6927with a coprocess
6928(@pxref{Two-way I/O}).
6929This is an advanced feature, mentioned here only for completeness.
6930Full discussion is delayed until
6931@ref{TCP/IP Networking}.
6932
6933@node Special Caveats
6934@subsection Special @value{FFN} Caveats
6935
6936Here is a list of things to bear in mind when using the
6937special @value{FN}s that @command{gawk} provides:
6938
6939@itemize @bullet
6940@cindex compatibility mode (@command{gawk}), @value{FN}s
6941@cindex @value{FN}s, in compatibility mode
6942@item
6943Recognition of these special @value{FN}s is disabled if @command{gawk} is in
6944compatibility mode (@pxref{Options}).
6945
6946@c @cindex automatic warnings
6947@c @cindex warnings, automatic
6948@cindex @code{PROCINFO} array
6949@item
6950@ifnottex
6951The
6952@end ifnottex
6953@ifnotinfo
6954As mentioned earlier, the
6955@end ifnotinfo
6956special files that provide process-related information are now considered
6957obsolete and will disappear entirely
6958in the next release of @command{gawk}.
6959@command{gawk} prints a warning message every time you use one of
6960these files.
6961@ifnottex
6962To obtain process-related information, use the @code{PROCINFO} array.
6963@xref{Built-in Variables}.
6964@end ifnottex
6965
6966@item
6967Starting with @value{PVERSION} 3.1, @command{gawk} @emph{always}
6968interprets these special @value{FN}s.@footnote{Older versions of
6969@command{gawk} would interpret these names internally only if the system
6970did not actually have a @file{/dev/fd} directory or any of the other
6971special files listed earlier.  Usually this didn't make a difference,
6972but sometimes it did; thus, it was decided to make @command{gawk}'s
6973behavior consistent on all systems and to have it always interpret
6974the special @value{FN}s itself.}
6975For example, using @samp{/dev/fd/4}
6976for output actually writes on file descriptor 4, and not on a new
6977file descriptor that is @code{dup}'ed from file descriptor 4.  Most of
6978the time this does not matter; however, it is important to @emph{not}
6979close any of the files related to file descriptors 0, 1, and 2.
6980Doing so results in unpredictable behavior.
6981@end itemize
6982@c ENDOFRANGE gfn
6983
6984@node Close Files And Pipes
6985@section Closing Input and Output Redirections
6986@cindex files, output, See output files
6987@c STARTOFRANGE ifc
6988@cindex input files, closing
6989@c comma before closing is NOT start of tertiary
6990@c STARTOFRANGE ofc
6991@cindex output, files, closing
6992@c STARTOFRANGE pc
6993@cindex pipes, closing
6994@c STARTOFRANGE cc
6995@cindex coprocesses, closing
6996@c comma before using is NOT start of tertiary
6997@cindex @code{getline} command, coprocesses, using from
6998
6999If the same @value{FN} or the same shell command is used with @code{getline}
7000more than once during the execution of an @command{awk} program
7001(@pxref{Getline}),
7002the file is opened (or the command is executed) the first time only.
7003At that time, the first record of input is read from that file or command.
7004The next time the same file or command is used with @code{getline},
7005another record is read from it, and so on.
7006
7007Similarly, when a file or pipe is opened for output, the @value{FN} or
7008command associated with it is remembered by @command{awk}, and subsequent
7009writes to the same file or command are appended to the previous writes.
7010The file or pipe stays open until @command{awk} exits.
7011
7012@cindex @code{close} function
7013This implies that special steps are necessary in order to read the same
7014file again from the beginning, or to rerun a shell command (rather than
7015reading more output from the same command).  The @code{close} function
7016makes these things possible:
7017
7018@example
7019close(@var{filename})
7020@end example
7021
7022@noindent
7023or:
7024
7025@example
7026close(@var{command})
7027@end example
7028
7029The argument @var{filename} or @var{command} can be any expression.  Its
7030value must @emph{exactly} match the string that was used to open the file or
7031start the command (spaces and other ``irrelevant'' characters
7032included). For example, if you open a pipe with this:
7033
7034@example
7035"sort -r names" | getline foo
7036@end example
7037
7038@noindent
7039then you must close it with this:
7040
7041@example
7042close("sort -r names")
7043@end example
7044
7045Once this function call is executed, the next @code{getline} from that
7046file or command, or the next @code{print} or @code{printf} to that
7047file or command, reopens the file or reruns the command.
7048Because the expression that you use to close a file or pipeline must
7049exactly match the expression used to open the file or run the command,
7050it is good practice to use a variable to store the @value{FN} or command.
7051The previous example becomes the following:
7052
7053@example
7054sortcom = "sort -r names"
7055sortcom | getline foo
7056@dots{}
7057close(sortcom)
7058@end example
7059
7060@noindent
7061This helps avoid hard-to-find typographical errors in your @command{awk}
7062programs.  Here are some of the reasons for closing an output file:
7063
7064@itemize @bullet
7065@item
7066To write a file and read it back later on in the same @command{awk}
7067program.  Close the file after writing it, then
7068begin reading it with @code{getline}.
7069
7070@item
7071To write numerous files, successively, in the same @command{awk}
7072program.  If the files aren't closed, eventually @command{awk} may exceed a
7073system limit on the number of open files in one process.  It is best to
7074close each one when the program has finished writing it.
7075
7076@item
7077To make a command finish.  When output is redirected through a pipe,
7078the command reading the pipe normally continues to try to read input
7079as long as the pipe is open.  Often this means the command cannot
7080really do its work until the pipe is closed.  For example, if
7081output is redirected to the @command{mail} program, the message is not
7082actually sent until the pipe is closed.
7083
7084@item
7085To run the same program a second time, with the same arguments.
7086This is not the same thing as giving more input to the first run!
7087
7088For example, suppose a program pipes output to the @command{mail} program.
7089If it outputs several lines redirected to this pipe without closing
7090it, they make a single message of several lines.  By contrast, if the
7091program closes the pipe after each line of output, then each line makes
7092a separate message.
7093@end itemize
7094
7095@cindex differences in @command{awk} and @command{gawk}, @code{close} function
7096@cindex portability, @code{close} function and
7097If you use more files than the system allows you to have open,
7098@command{gawk} attempts to multiplex the available open files among
7099your @value{DF}s.  @command{gawk}'s ability to do this depends upon the
7100facilities of your operating system, so it may not always work.  It is
7101therefore both good practice and good portability advice to always
7102use @code{close} on your files when you are done with them.
7103In fact, if you are using a lot of pipes, it is essential that
7104you close commands when done. For example, consider something like this:
7105
7106@example
7107@{
7108    @dots{}
7109    command = ("grep " $1 " /some/file | my_prog -q " $3)
7110    while ((command | getline) > 0) @{
7111        @var{process output of} command
7112    @}
7113    # need close(command) here
7114@}
7115@end example
7116
7117This example creates a new pipeline based on data in @emph{each} record.
7118Without the call to @code{close} indicated in the comment, @command{awk}
7119creates child processes to run the commands, until it eventually
7120runs out of file descriptors for more pipelines.
7121
7122Even though each command has finished (as indicated by the end-of-file
7123return status from @code{getline}), the child process is not
7124terminated;@footnote{The technical terminology is rather morbid.
7125The finished child is called a ``zombie,'' and cleaning up after
7126it is referred to as ``reaping.''}
7127@c Good old UNIX: give the marketing guys fits, that's the ticket
7128more importantly, the file descriptor for the pipe
7129is not closed and released until @code{close} is called or
7130@command{awk} exits.
7131
7132@code{close} will silently do nothing if given an argument that
7133does not represent a file, pipe or coprocess that was opened with
7134a redirection.
7135
7136Note also that @samp{close(FILENAME)} has no
7137``magic'' effects on the implicit loop that reads through the
7138files named on the command line.  It is, more likely, a close
7139of a file that was never opened, so @command{awk} silently
7140does nothing.
7141
7142@c comma is part of tertiary
7143@cindex @code{|} (vertical bar), @code{|&} operator (I/O), pipes, closing
7144When using the @samp{|&} operator to communicate with a coprocess,
7145it is occasionally useful to be able to close one end of the two-way
7146pipe without closing the other.
7147This is done by supplying a second argument to @code{close}.
7148As in any other call to @code{close},
7149the first argument is the name of the command or special file used
7150to start the coprocess.
7151The second argument should be a string, with either of the values
7152@code{"to"} or @code{"from"}.  Case does not matter.
7153As this is an advanced feature, a more complete discussion is
7154delayed until
7155@ref{Two-way I/O},
7156which discusses it in more detail and gives an example.
7157
7158@c fakenode --- for prepinfo
7159@subheading Advanced Notes: Using @code{close}'s Return Value
7160@cindex advanced features, @code{close} function
7161@cindex dark corner, @code{close} function
7162@cindex @code{close} function, return values
7163@c comma does NOT start secondary
7164@cindex return values, @code{close} function
7165@cindex differences in @command{awk} and @command{gawk}, @code{close} function
7166@cindex Unix @command{awk}, @code{close} function and
7167
7168In many versions of Unix @command{awk}, the @code{close} function
7169is actually a statement.  It is a syntax error to try and use the return
7170value from @code{close}:
7171@value{DARKCORNER}
7172
7173@example
7174command = "@dots{}"
7175command | getline info
7176retval = close(command)  # syntax error in most Unix awks
7177@end example
7178
7179@command{gawk} treats @code{close} as a function.
7180The return value is @minus{}1 if the argument names something
7181that was never opened with a redirection, or if there is
7182a system problem closing the file or process.
7183In these cases, @command{gawk} sets the built-in variable
7184@code{ERRNO} to a string describing the problem.
7185
7186In @command{gawk},
7187when closing a pipe or coprocess,
7188the return value is the exit status of the command.@footnote{
7189This is a full 16-bit value as returned by the @code{wait}
7190system call. See the system manual pages for information on
7191how to decode this value.}
7192Otherwise, it is the return value from the system's @code{close} or
7193@code{fclose} C functions when closing input or output
7194files, respectively.
7195This value is zero if the close succeeds, or @minus{}1 if
7196it fails.
7197
7198The POSIX standard is very vague; it says that @code{close}
7199returns zero on success and non-zero otherwise.  In general,
7200different implementations vary in what they report when closing
7201pipes; thus the return value cannot be used portably.
7202@value{DARKCORNER}
7203
7204@ignore
7205@c 4/27/2003: Commenting this out for now, given the above
7206@c return of 16-bit value
7207The return value for closing a pipeline is particularly useful.
7208It allows you to get the output from a command as well as its
7209exit status.
7210@c 8/21/2002, FIXME: Maybe the code and this doc should be adjusted to
7211@c create values indicating death-by-signal?  Sigh.
7212
7213@cindex pipes, closing
7214@c comma does NOT start tertiary
7215@cindex POSIX @command{awk}, pipes, closing
7216For POSIX-compliant systems,
7217if the exit status is a number above 128, then the program
7218was terminated by a signal.  Subtract 128 to get the signal number:
7219
7220@example
7221exit_val = close(command)
7222if (exit_val > 128)
7223    print command, "died with signal", exit_val - 128
7224else
7225    print command, "exited with code", exit_val
7226@end example
7227
7228Currently, in @command{gawk}, this only works for commands
7229piping into @code{getline}.  For commands piped into
7230from @code{print} or @code{printf}, the
7231return value from @code{close} is that of the library's
7232@code{pclose} function.
7233@end ignore
7234@c ENDOFRANGE ifc
7235@c ENDOFRANGE ofc
7236@c ENDOFRANGE pc
7237@c ENDOFRANGE cc
7238@c ENDOFRANGE prnt
7239
7240@node Expressions
7241@chapter Expressions
7242@c STARTOFRANGE exps
7243@cindex expressions
7244
7245Expressions are the basic building blocks of @command{awk} patterns
7246and actions.  An expression evaluates to a value that you can print, test,
7247or pass to a function.  Additionally, an expression
7248can assign a new value to a variable or a field by using an assignment operator.
7249
7250An expression can serve as a pattern or action statement on its own.
7251Most other kinds of
7252statements contain one or more expressions that specify the data on which to
7253operate.  As in other languages, expressions in @command{awk} include
7254variables, array references, constants, and function calls, as well as
7255combinations of these with various operators.
7256
7257@menu
7258* Constants::                   String, numeric and regexp constants.
7259* Using Constant Regexps::      When and how to use a regexp constant.
7260* Variables::                   Variables give names to values for later use.
7261* Conversion::                  The conversion of strings to numbers and vice
7262                                versa.
7263* Arithmetic Ops::              Arithmetic operations (@samp{+}, @samp{-},
7264                                etc.)
7265* Concatenation::               Concatenating strings.
7266* Assignment Ops::              Changing the value of a variable or a field.
7267* Increment Ops::               Incrementing the numeric value of a variable.
7268* Truth Values::                What is ``true'' and what is ``false''.
7269* Typing and Comparison::       How variables acquire types and how this
7270                                affects comparison of numbers and strings with
7271                                @samp{<}, etc.
7272* Boolean Ops::                 Combining comparison expressions using boolean
7273                                operators @samp{||} (``or''), @samp{&&}
7274                                (``and'') and @samp{!} (``not'').
7275* Conditional Exp::             Conditional expressions select between two
7276                                subexpressions under control of a third
7277                                subexpression.
7278* Function Calls::              A function call is an expression.
7279* Precedence::                  How various operators nest.
7280@end menu
7281
7282@node Constants
7283@section Constant Expressions
7284@cindex constants, types of
7285
7286The simplest type of expression is the @dfn{constant}, which always has
7287the same value.  There are three types of constants: numeric,
7288string, and regular expression.
7289
7290Each is used in the appropriate context when you need a data
7291value that isn't going to change.  Numeric constants can
7292have different forms, but are stored identically internally.
7293
7294@menu
7295* Scalar Constants::            Numeric and string constants.
7296* Nondecimal-numbers::          What are octal and hex numbers.
7297* Regexp Constants::            Regular Expression constants.
7298@end menu
7299
7300@node Scalar Constants
7301@subsection Numeric and String Constants
7302
7303@cindex numeric, constants
7304A @dfn{numeric constant} stands for a number.  This number can be an
7305integer, a decimal fraction, or a number in scientific (exponential)
7306notation.@footnote{The internal representation of all numbers,
7307including integers, uses double-precision
7308floating-point numbers.
7309On most modern systems, these are in IEEE 754 standard format.}
7310Here are some examples of numeric constants that all
7311have the same value:
7312
7313@example
7314105
73151.05e+2
73161050e-1
7317@end example
7318
7319@cindex string constants
7320A string constant consists of a sequence of characters enclosed in
7321double-quotation marks.  For example:
7322
7323@example
7324"parrot"
7325@end example
7326
7327@noindent
7328@cindex differences in @command{awk} and @command{gawk}, strings
7329@cindex strings, length of
7330represents the string whose contents are @samp{parrot}.  Strings in
7331@command{gawk} can be of any length, and they can contain any of the possible
7332eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
7333Other @command{awk}
7334implementations may have difficulty with some character codes.
7335
7336@node Nondecimal-numbers
7337@subsection Octal and Hexadecimal Numbers
7338@cindex octal numbers
7339@cindex hexadecimal numbers
7340@cindex numbers, octal
7341@cindex numbers, hexadecimal
7342
7343In @command{awk}, all numbers are in decimal; i.e., base 10.  Many other
7344programming languages allow you to specify numbers in other bases, often
7345octal (base 8) and hexadecimal (base 16).
7346In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, etc.
7347Just as @samp{11}, in decimal, is 1 times 10 plus 1, so
7348@samp{11}, in octal, is 1 times 8, plus 1. This equals 9 in decimal.
7349In hexadecimal, there are 16 digits. Since the everyday decimal
7350number system only has ten digits (@samp{0}--@samp{9}), the letters
7351@samp{a} through @samp{f} are used to represent the rest.
7352(Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A}
7353have the same value.)
7354Thus, @samp{11}, in
7355hexadecimal, is 1 times 16 plus 1, which equals 17 in decimal.
7356
7357Just by looking at plain @samp{11}, you can't tell what base it's in.
7358So, in C, C++, and other languages derived from C,
7359@c such as PERL, but we won't mention that....
7360there is a special notation to help signify the base.
7361Octal numbers start with a leading @samp{0},
7362and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}:
7363
7364@table @code
7365@item 11
7366Decimal value 11.
7367
7368@item 011
7369Octal 11, decimal value 9.
7370
7371@item 0x11
7372Hexadecimal 11, decimal value 17.
7373@end table
7374
7375This example shows the difference:
7376
7377@example
7378$ gawk 'BEGIN @{ printf "%d, %d, %d\n", 011, 11, 0x11 @}'
7379@print{} 9, 11, 17
7380@end example
7381
7382Being able to use octal and hexadecimal constants in your programs is most
7383useful when working with data that cannot be represented conveniently as
7384characters or as regular numbers, such as binary data of various sorts.
7385
7386@cindex @command{gawk}, octal numbers and
7387@cindex @command{gawk}, hexadecimal numbers and
7388@command{gawk} allows the use of octal and hexadecimal
7389constants in your program text.  However, such numbers in the input data
7390are not treated differently; doing so by default would break old
7391programs.
7392(If you really need to do this, use the @option{--non-decimal-data}
7393command-line option;
7394@pxref{Nondecimal Data}.)
7395If you have octal or hexadecimal data,
7396you can use the @code{strtonum} function
7397(@pxref{String Functions})
7398to convert the data into a number.
7399Most of the time, you will want to use octal or hexadecimal constants
7400when working with the built-in bit manipulation functions;
7401see @ref{Bitwise Functions},
7402for more information.
7403
7404Unlike some early C implementations, @samp{8} and @samp{9} are not valid
7405in octal constants; e.g., @command{gawk} treats @samp{018} as decimal 18:
7406
7407@example
7408$ gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}'
7409@print{} 021 is 17
7410@print{} 18
7411@end example
7412
7413@cindex compatibility mode (@command{gawk}), octal numbers
7414@cindex compatibility mode (@command{gawk}), hexadecimal numbers
7415Octal and hexadecimal source code constants are a @command{gawk} extension.
7416If @command{gawk} is in compatibility mode
7417(@pxref{Options}),
7418they are not available.
7419
7420@c fakenode --- for prepinfo
7421@subheading Advanced Notes: A Constant's Base Does Not Affect Its Value
7422@c comma before values does NOT start tertiary
7423@cindex advanced features, constants, values of
7424
7425Once a numeric constant has
7426been converted internally into a number,
7427@command{gawk} no longer remembers
7428what the original form of the constant was; the internal value is
7429always used.  This has particular consequences for conversion of
7430numbers to strings:
7431
7432@example
7433$ gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'
7434@print{} 0x11 is <17>
7435@end example
7436
7437@node Regexp Constants
7438@subsection Regular Expression Constants
7439
7440@c STARTOFRANGE rec
7441@cindex regexp constants
7442@cindex @code{~} (tilde), @code{~} operator
7443@cindex tilde (@code{~}), @code{~} operator
7444@cindex @code{!} (exclamation point), @code{!~} operator
7445@cindex exclamation point (@code{!}), @code{!~} operator
7446A regexp constant is a regular expression description enclosed in
7447slashes, such as @code{@w{/^beginning and end$/}}.  Most regexps used in
7448@command{awk} programs are constant, but the @samp{~} and @samp{!~}
7449matching operators can also match computed or ``dynamic'' regexps
7450(which are just ordinary strings or variables that contain a regexp).
7451@c ENDOFRANGE cnst
7452
7453@node Using Constant Regexps
7454@section Using Regular Expression Constants
7455
7456@cindex dark corner, regexp constants
7457When used on the righthand side of the @samp{~} or @samp{!~}
7458operators, a regexp constant merely stands for the regexp that is to be
7459matched.
7460However, regexp constants (such as @code{/foo/}) may be used like simple expressions.
7461When a
7462regexp constant appears by itself, it has the same meaning as if it appeared
7463in a pattern, i.e., @samp{($0 ~ /foo/)}
7464@value{DARKCORNER}
7465@xref{Expression Patterns}.
7466This means that the following two code segments:
7467
7468@example
7469if ($0 ~ /barfly/ || $0 ~ /camelot/)
7470    print "found"
7471@end example
7472
7473@noindent
7474and:
7475
7476@example
7477if (/barfly/ || /camelot/)
7478    print "found"
7479@end example
7480
7481@noindent
7482are exactly equivalent.
7483One rather bizarre consequence of this rule is that the following
7484Boolean expression is valid, but does not do what the user probably
7485intended:
7486
7487@example
7488# note that /foo/ is on the left of the ~
7489if (/foo/ ~ $1) print "found foo"
7490@end example
7491
7492@c @cindex automatic warnings
7493@c @cindex warnings, automatic
7494@cindex @command{gawk}, regexp constants and
7495@cindex regexp constants, in @command{gawk}
7496@noindent
7497This code is ``obviously'' testing @code{$1} for a match against the regexp
7498@code{/foo/}.  But in fact, the expression @samp{/foo/ ~ $1} actually means
7499@samp{($0 ~ /foo/) ~ $1}.  In other words, first match the input record
7500against the regexp @code{/foo/}.  The result is either zero or one,
7501depending upon the success or failure of the match.  That result
7502is then matched against the first field in the record.
7503Because it is unlikely that you would ever really want to make this kind of
7504test, @command{gawk} issues a warning when it sees this construct in
7505a program.
7506Another consequence of this rule is that the assignment statement:
7507
7508@example
7509matches = /foo/
7510@end example
7511
7512@noindent
7513assigns either zero or one to the variable @code{matches}, depending
7514upon the contents of the current input record.
7515This feature of the language has never been well documented until the
7516POSIX specification.
7517
7518@cindex differences in @command{awk} and @command{gawk}, regexp constants
7519@cindex dark corner, regexp constants, as arguments to user-defined functions
7520@cindex @code{gensub} function (@command{gawk})
7521@cindex @code{sub} function
7522@cindex @code{gsub} function
7523Constant regular expressions are also used as the first argument for
7524the @code{gensub}, @code{sub}, and @code{gsub} functions, and as the
7525second argument of the @code{match} function
7526(@pxref{String Functions}).
7527Modern implementations of @command{awk}, including @command{gawk}, allow
7528the third argument of @code{split} to be a regexp constant, but some
7529older implementations do not.
7530@value{DARKCORNER}
7531This can lead to confusion when attempting to use regexp constants
7532as arguments to user-defined functions
7533(@pxref{User-defined}).
7534For example:
7535
7536@example
7537function mysub(pat, repl, str, global)
7538@{
7539    if (global)
7540        gsub(pat, repl, str)
7541    else
7542        sub(pat, repl, str)
7543    return str
7544@}
7545
7546@{
7547    @dots{}
7548    text = "hi! hi yourself!"
7549    mysub(/hi/, "howdy", text, 1)
7550    @dots{}
7551@}
7552@end example
7553
7554@c @cindex automatic warnings
7555@c @cindex warnings, automatic
7556In this example, the programmer wants to pass a regexp constant to the
7557user-defined function @code{mysub}, which in turn passes it on to
7558either @code{sub} or @code{gsub}.  However, what really happens is that
7559the @code{pat} parameter is either one or zero, depending upon whether
7560or not @code{$0} matches @code{/hi/}.
7561@command{gawk} issues a warning when it sees a regexp constant used as
7562a parameter to a user-defined function, since passing a truth value in
7563this way is probably not what was intended.
7564@c ENDOFRANGE rec
7565
7566@node Variables
7567@section Variables
7568
7569@cindex variables, user-defined
7570@cindex user-defined, variables
7571Variables are ways of storing values at one point in your program for
7572use later in another part of your program.  They can be manipulated
7573entirely within the program text, and they can also be assigned values
7574on the @command{awk} command line.
7575
7576@menu
7577* Using Variables::             Using variables in your programs.
7578* Assignment Options::          Setting variables on the command-line and a
7579                                summary of command-line syntax. This is an
7580                                advanced method of input.
7581@end menu
7582
7583@node Using Variables
7584@subsection Using Variables in a Program
7585
7586Variables let you give names to values and refer to them later.  Variables
7587have already been used in many of the examples.  The name of a variable
7588must be a sequence of letters, digits, or underscores, and it may not begin
7589with a digit.  Case is significant in variable names; @code{a} and @code{A}
7590are distinct variables.
7591
7592A variable name is a valid expression by itself; it represents the
7593variable's current value.  Variables are given new values with
7594@dfn{assignment operators}, @dfn{increment operators}, and
7595@dfn{decrement operators}.
7596@xref{Assignment Ops}.
7597@c NEXT ED: Can also be changed by sub, gsub, split
7598
7599@cindex variables, built-in
7600@cindex variables, initializing
7601A few variables have special built-in meanings, such as @code{FS} (the
7602field separator), and @code{NF} (the number of fields in the current input
7603record).  @xref{Built-in Variables}, for a list of the built-in variables.
7604These built-in variables can be used and assigned just like all other
7605variables, but their values are also used or changed automatically by
7606@command{awk}.  All built-in variables' names are entirely uppercase.
7607
7608Variables in @command{awk} can be assigned either numeric or string values.
7609The kind of value a variable holds can change over the life of a program.
7610By default, variables are initialized to the empty string, which
7611is zero if converted to a number.  There is no need to
7612``initialize'' each variable explicitly in @command{awk},
7613which is what you would do in C and in most other traditional languages.
7614
7615@node Assignment Options
7616@subsection Assigning Variables on the Command Line
7617@cindex variables, assigning on command line
7618@c comma before assigning does NOT start tertiary
7619@cindex command line, variables, assigning on
7620
7621Any @command{awk} variable can be set by including a @dfn{variable assignment}
7622among the arguments on the command line when @command{awk} is invoked
7623(@pxref{Other Arguments}).
7624Such an assignment has the following form:
7625
7626@example
7627@var{variable}=@var{text}
7628@end example
7629
7630@c comma before assigning does NOT start tertiary
7631@cindex @code{-v} option, variables, assigning
7632@noindent
7633With it, a variable is set either at the beginning of the
7634@command{awk} run or in between input files.
7635When the assignment is preceded with the @option{-v} option,
7636as in the following:
7637
7638@example
7639-v @var{variable}=@var{text}
7640@end example
7641
7642@noindent
7643the variable is set at the very beginning, even before the
7644@code{BEGIN} rules are run.  The @option{-v} option and its assignment
7645must precede all the @value{FN} arguments, as well as the program text.
7646(@xref{Options}, for more information about
7647the @option{-v} option.)
7648Otherwise, the variable assignment is performed at a time determined by
7649its position among the input file arguments---after the processing of the
7650preceding input file argument.  For example:
7651
7652@example
7653awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
7654@end example
7655
7656@noindent
7657prints the value of field number @code{n} for all input records.  Before
7658the first file is read, the command line sets the variable @code{n}
7659equal to four.  This causes the fourth field to be printed in lines from
7660the file @file{inventory-shipped}.  After the first file has finished,
7661but before the second file is started, @code{n} is set to two, so that the
7662second field is printed in lines from @file{BBS-list}:
7663
7664@example
7665$ awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
7666@print{} 15
7667@print{} 24
7668@dots{}
7669@print{} 555-5553
7670@print{} 555-3412
7671@dots{}
7672@end example
7673
7674@cindex dark corner, command-line arguments
7675Command-line arguments are made available for explicit examination by
7676the @command{awk} program in the @code{ARGV} array
7677(@pxref{ARGC and ARGV}).
7678@command{awk} processes the values of command-line assignments for escape
7679sequences
7680(@pxref{Escape Sequences}).
7681@value{DARKCORNER}
7682
7683@node Conversion
7684@section Conversion of Strings and Numbers
7685
7686@cindex converting, strings to numbers
7687@cindex strings, converting
7688@cindex numbers, converting
7689@cindex converting, numbers
7690Strings are converted to numbers and numbers are converted to strings, if the context
7691of the @command{awk} program demands it.  For example, if the value of
7692either @code{foo} or @code{bar} in the expression @samp{foo + bar}
7693happens to be a string, it is converted to a number before the addition
7694is performed.  If numeric values appear in string concatenation, they
7695are converted to strings.  Consider the following:
7696
7697@example
7698two = 2; three = 3
7699print (two three) + 4
7700@end example
7701
7702@noindent
7703This prints the (numeric) value 27.  The numeric values of
7704the variables @code{two} and @code{three} are converted to strings and
7705concatenated together.  The resulting string is converted back to the
7706number 23, to which 4 is then added.
7707
7708@cindex null strings, converting numbers to strings
7709@cindex type conversion
7710If, for some reason, you need to force a number to be converted to a
7711string, concatenate the empty string, @code{""}, with that number.
7712To force a string to be converted to a number, add zero to that string.
7713A string is converted to a number by interpreting any numeric prefix
7714of the string as numerals:
7715@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"}
7716has a numeric value of 25.
7717Strings that can't be interpreted as valid numbers convert to zero.
7718
7719@cindex @code{CONVFMT} variable
7720The exact manner in which numbers are converted into strings is controlled
7721by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
7722Numbers are converted using the @code{sprintf} function
7723with @code{CONVFMT} as the format
7724specifier
7725(@pxref{String Functions}).
7726
7727@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with
7728at least six significant digits.  For some applications, you might want to
7729change it to specify more precision.
7730On most modern machines,
773117 digits is enough to capture a floating-point number's
7732value exactly,
7733most of the time.@footnote{Pathological cases can require up to
7734752 digits (!), but we doubt that you need to worry about this.}
7735
7736@cindex dark corner, @code{CONVFMT} variable
7737Strange results can occur if you set @code{CONVFMT} to a string that doesn't
7738tell @code{sprintf} how to format floating-point numbers in a useful way.
7739For example, if you forget the @samp{%} in the format, @command{awk} converts
7740all numbers to the same constant string.
7741As a special case, if a number is an integer, then the result of converting
7742it to a string is @emph{always} an integer, no matter what the value of
7743@code{CONVFMT} may be.  Given the following code fragment:
7744
7745@example
7746CONVFMT = "%2.2f"
7747a = 12
7748b = a ""
7749@end example
7750
7751@noindent
7752@code{b} has the value @code{"12"}, not @code{"12.00"}.
7753@value{DARKCORNER}
7754
7755@cindex POSIX @command{awk}, @code{OFMT} variable and
7756@cindex @code{OFMT} variable
7757@cindex portability, new @command{awk} vs. old @command{awk}
7758@cindex @command{awk}, new vs. old, @code{OFMT} variable
7759Prior to the POSIX standard, @command{awk} used the value
7760of @code{OFMT} for converting numbers to strings.  @code{OFMT}
7761specifies the output format to use when printing numbers with @code{print}.
7762@code{CONVFMT} was introduced in order to separate the semantics of
7763conversion from the semantics of printing.  Both @code{CONVFMT} and
7764@code{OFMT} have the same default value: @code{"%.6g"}.  In the vast majority
7765of cases, old @command{awk} programs do not change their behavior.
7766However, these semantics for @code{OFMT} are something to keep in mind if you must
7767port your new style program to older implementations of @command{awk}.
7768We recommend
7769that instead of changing your programs, just port @command{gawk} itself.
7770@xref{Print},
7771for more information on the @code{print} statement.
7772
7773Finally, once again, where you are can matter when it comes to
7774converting between numbers and strings.  In
7775@ref{Locales}, we mentioned that the
7776local character set and language (the locale) can affect how @command{gawk} matches
7777characters.  The locale also affects numeric formats.  In particular, for @command{awk}
7778programs, it affects the decimal point character.  The @code{"C"} locale, and most
7779English-language locales, use the period character (@samp{.}) as the decimal point.
7780However, many (if not most) European and non-English locales use the comma (@samp{,})
7781as the decimal point character.
7782
7783The POSIX standard says that @command{awk} always uses the period as the decimal
7784point when reading the @command{awk} program source code, and for command-line
7785variable assignments (@pxref{Other Arguments}).
7786However, when interpreting input data, for @code{print} and @code{printf} output,
7787and for number to string conversion, the local decimal point character is used.
7788As of @value{PVERSION} 3.1.3, @command{gawk} fully complies with this aspect
7789of the standard.  Here are some examples indicating the difference in behavior,
7790on a GNU/Linux system:
7791
7792@example
7793$ gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}'
7794@print{} 3.14159
7795$  LC_ALL=en_DK gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}'
7796@print{} 3,14159
7797$ echo 4,321 | gawk '@{ print $1 + 1 @}'
7798@print{} 5
7799$ echo 4,321 | LC_ALL=en_DK gawk '@{ print $1 + 1 @}'
7800@print{} 5,321
7801@end example
7802
7803@noindent
7804The @samp{en_DK} locale is for English in Denmark, where the comma acts as
7805the decimal point separator.  In the normal @code{"C"} locale, @command{gawk}
7806treats @samp{4,321} as @samp{4}, while in the Danish locale, it's treated
7807as the full number, @samp{4.321}.
7808
7809@node Arithmetic Ops
7810@section Arithmetic Operators
7811@cindex arithmetic operators
7812@cindex operators, arithmetic
7813@c @cindex addition
7814@c @cindex subtraction
7815@c @cindex multiplication
7816@c @cindex division
7817@c @cindex remainder
7818@c @cindex quotient
7819@c @cindex exponentiation
7820
7821The @command{awk} language uses the common arithmetic operators when
7822evaluating expressions.  All of these arithmetic operators follow normal
7823precedence rules and work as you would expect them to.
7824
7825The following example uses a file named @file{grades}, which contains
7826a list of student names as well as three test scores per student (it's
7827a small class):
7828
7829@example
7830Pat   100 97 58
7831Sandy  84 72 93
7832Chris  72 92 89
7833@end example
7834
7835@noindent
7836This programs takes the file @file{grades} and prints the average
7837of the scores:
7838
7839@example
7840$ awk '@{ sum = $2 + $3 + $4 ; avg = sum / 3
7841>        print $1, avg @}' grades
7842@print{} Pat 85
7843@print{} Sandy 83
7844@print{} Chris 84.3333
7845@end example
7846
7847The following list provides the arithmetic operators in @command{awk}, in order from
7848the highest precedence to the lowest:
7849
7850@table @code
7851@item - @var{x}
7852Negation.
7853
7854@item + @var{x}
7855Unary plus; the expression is converted to a number.
7856
7857@cindex POSIX @command{awk}, arithmetic operators and
7858@item @var{x} ^ @var{y}
7859@itemx @var{x} ** @var{y}
7860Exponentiation; @var{x} raised to the @var{y} power.  @samp{2 ^ 3} has
7861the value eight; the character sequence @samp{**} is equivalent to
7862@samp{^}.
7863
7864@item @var{x} * @var{y}
7865Multiplication.
7866
7867@cindex troubleshooting, division
7868@cindex division
7869@item @var{x} / @var{y}
7870Division;  because all numbers in @command{awk} are floating-point
7871numbers, the result is @emph{not} rounded to an integer---@samp{3 / 4} has
7872the value 0.75.  (It is a common mistake, especially for C programmers,
7873to forget that @emph{all} numbers in @command{awk} are floating-point,
7874and that division of integer-looking constants produces a real number,
7875not an integer.)
7876
7877@item @var{x} % @var{y}
7878Remainder; further discussion is provided in the text, just
7879after this list.
7880
7881@item @var{x} + @var{y}
7882Addition.
7883
7884@item @var{x} - @var{y}
7885Subtraction.
7886@end table
7887
7888Unary plus and minus have the same precedence,
7889the multiplication operators all have the same precedence, and
7890addition and subtraction have the same precedence.
7891
7892@cindex differences in @command{awk} and @command{gawk}, trunc-mod operation
7893@cindex trunc-mod operation
7894When computing the remainder of @code{@var{x} % @var{y}},
7895the quotient is rounded toward zero to an integer and
7896multiplied by @var{y}. This result is subtracted from @var{x};
7897this operation is sometimes known as ``trunc-mod.''  The following
7898relation always holds:
7899
7900@example
7901b * int(a / b) + (a % b) == a
7902@end example
7903
7904One possibly undesirable effect of this definition of remainder is that
7905@code{@var{x} % @var{y}} is negative if @var{x} is negative.  Thus:
7906
7907@example
7908-17 % 8 = -1
7909@end example
7910
7911In other @command{awk} implementations, the signedness of the remainder
7912may be machine-dependent.
7913@c !!! what does posix say?
7914
7915@cindex portability, @code{**} operator and
7916@cindex @code{*} (asterisk), @code{**} operator
7917@cindex asterisk (@code{*}), @code{**} operator
7918@strong{Note:}
7919The POSIX standard only specifies the use of @samp{^}
7920for exponentiation.
7921For maximum portability, do not use the @samp{**} operator.
7922
7923@node Concatenation
7924@section String Concatenation
7925@cindex Kernighan, Brian
7926@quotation
7927@i{It seemed like a good idea at the time.}@*
7928Brian Kernighan
7929@end quotation
7930
7931@cindex string operators
7932@cindex operators, string
7933@cindex concatenating
7934There is only one string operation: concatenation.  It does not have a
7935specific operator to represent it.  Instead, concatenation is performed by
7936writing expressions next to one another, with no operator.  For example:
7937
7938@example
7939$ awk '@{ print "Field number one: " $1 @}' BBS-list
7940@print{} Field number one: aardvark
7941@print{} Field number one: alpo-net
7942@dots{}
7943@end example
7944
7945Without the space in the string constant after the @samp{:}, the line
7946runs together.  For example:
7947
7948@example
7949$ awk '@{ print "Field number one:" $1 @}' BBS-list
7950@print{} Field number one:aardvark
7951@print{} Field number one:alpo-net
7952@dots{}
7953@end example
7954
7955@cindex troubleshooting, string concatenation
7956Because string concatenation does not have an explicit operator, it is
7957often necessary to insure that it happens at the right time by using
7958parentheses to enclose the items to concatenate.  For example, the
7959following code fragment does not concatenate @code{file} and @code{name}
7960as you might expect:
7961
7962@example
7963file = "file"
7964name = "name"
7965print "something meaningful" > file name
7966@end example
7967
7968@noindent
7969It is necessary to use the following:
7970
7971@example
7972print "something meaningful" > (file name)
7973@end example
7974
7975@cindex order of evaluation, concatenation
7976@cindex evaluation order, concatenation
7977@cindex side effects
7978Parentheses should be used around concatenation in all but the
7979most common contexts, such as on the righthand side of @samp{=}.
7980Be careful about the kinds of expressions used in string concatenation.
7981In particular, the order of evaluation of expressions used for concatenation
7982is undefined in the @command{awk} language.  Consider this example:
7983
7984@example
7985BEGIN @{
7986    a = "don't"
7987    print (a " " (a = "panic"))
7988@}
7989@end example
7990
7991@noindent
7992It is not defined whether the assignment to @code{a} happens
7993before or after the value of @code{a} is retrieved for producing the
7994concatenated value.  The result could be either @samp{don't panic},
7995or @samp{panic panic}.
7996@c see test/nasty.awk for a worse example
7997The precedence of concatenation, when mixed with other operators, is often
7998counter-intuitive.  Consider this example:
7999
8000@ignore
8001> To: bug-gnu-utils@@gnu.org
8002> CC: arnold@gnu.org
8003> Subject: gawk 3.0.4 bug with {print -12 " " -24}
8004> From: Russell Schulz <Russell_Schulz@locutus.ofB.ORG>
8005> Date: Tue, 8 Feb 2000 19:56:08 -0700
8006>
8007> gawk 3.0.4 on NT gives me:
8008>
8009> prompt> cat bad.awk
8010> BEGIN { print -12 " " -24; }
8011>
8012> prompt> gawk -f bad.awk
8013> -12-24
8014>
8015> when I would expect
8016>
8017> -12 -24
8018>
8019> I have not investigated the source, or other implementations.  The
8020> bug is there on my NT and DOS versions 2.15.6 .
8021@end ignore
8022
8023@example
8024$ awk 'BEGIN @{ print -12 " " -24 @}'
8025@print{} -12-24
8026@end example
8027
8028This ``obviously'' is concatenating @minus{}12, a space, and @minus{}24.
8029But where did the space disappear to?
8030The answer lies in the combination of operator precedences and
8031@command{awk}'s automatic conversion rules.  To get the desired result,
8032write the program in the following manner:
8033
8034@example
8035$ awk 'BEGIN @{ print -12 " " (-24) @}'
8036@print{} -12 -24
8037@end example
8038
8039This forces @command{awk} to treat the @samp{-} on the @samp{-24} as unary.
8040Otherwise, it's parsed as follows:
8041
8042@display
8043    @minus{}12 (@code{"@ "} @minus{} 24)
8044@result{} @minus{}12 (0 @minus{} 24)
8045@result{} @minus{}12 (@minus{}24)
8046@result{} @minus{}12@minus{}24
8047@end display
8048
8049As mentioned earlier,
8050when doing concatenation, @emph{parenthesize}.  Otherwise,
8051you're never quite sure what you'll get.
8052
8053@node Assignment Ops
8054@section Assignment Expressions
8055@c STARTOFRANGE asop
8056@cindex assignment operators
8057@c STARTOFRANGE opas
8058@cindex operators, assignment
8059@c STARTOFRANGE exas
8060@cindex expressions, assignment
8061@cindex @code{=} (equals sign), @code{=} operator
8062@cindex equals sign (@code{=}), @code{=} operator
8063An @dfn{assignment} is an expression that stores a (usually different)
8064value into a variable.  For example, let's assign the value one to the variable
8065@code{z}:
8066
8067@example
8068z = 1
8069@end example
8070
8071After this expression is executed, the variable @code{z} has the value one.
8072Whatever old value @code{z} had before the assignment is forgotten.
8073
8074Assignments can also store string values.  For example, the
8075following stores
8076the value @code{"this food is good"} in the variable @code{message}:
8077
8078@example
8079thing = "food"
8080predicate = "good"
8081message = "this " thing " is " predicate
8082@end example
8083
8084@noindent
8085@cindex side effects, assignment expressions
8086This also illustrates string concatenation.
8087The @samp{=} sign is called an @dfn{assignment operator}.  It is the
8088simplest assignment operator because the value of the righthand
8089operand is stored unchanged.
8090Most operators (addition, concatenation, and so on) have no effect
8091except to compute a value.  If the value isn't used, there's no reason to
8092use the operator.  An assignment operator is different; it does
8093produce a value, but even if you ignore it, the assignment still
8094makes itself felt through the alteration of the variable.  We call this
8095a @dfn{side effect}.
8096
8097@cindex lvalues/rvalues
8098@cindex rvalues/lvalues
8099@cindex assignment operators, lvalues/rvalues
8100@cindex operators, assignment
8101The lefthand operand of an assignment need not be a variable
8102(@pxref{Variables}); it can also be a field
8103(@pxref{Changing Fields}) or
8104an array element (@pxref{Arrays}).
8105These are all called @dfn{lvalues},
8106which means they can appear on the lefthand side of an assignment operator.
8107The righthand operand may be any expression; it produces the new value
8108that the assignment stores in the specified variable, field, or array
8109element. (Such values are called @dfn{rvalues}.)
8110
8111@cindex variables, types of
8112It is important to note that variables do @emph{not} have permanent types.
8113A variable's type is simply the type of whatever value it happens
8114to hold at the moment.  In the following program fragment, the variable
8115@code{foo} has a numeric value at first, and a string value later on:
8116
8117@example
8118foo = 1
8119print foo
8120foo = "bar"
8121print foo
8122@end example
8123
8124@noindent
8125When the second assignment gives @code{foo} a string value, the fact that
8126it previously had a numeric value is forgotten.
8127
8128String values that do not begin with a digit have a numeric value of
8129zero. After executing the following code, the value of @code{foo} is five:
8130
8131@example
8132foo = "a string"
8133foo = foo + 5
8134@end example
8135
8136@noindent
8137@strong{Note:} Using a variable as a number and then later as a string
8138can be confusing and is poor programming style.  The previous two examples
8139illustrate how @command{awk} works, @emph{not} how you should write your
8140programs!
8141
8142An assignment is an expression, so it has a value---the same value that
8143is assigned.  Thus, @samp{z = 1} is an expression with the value one.
8144One consequence of this is that you can write multiple assignments together,
8145such as:
8146
8147@example
8148x = y = z = 5
8149@end example
8150
8151@noindent
8152This example stores the value five in all three variables
8153(@code{x}, @code{y}, and @code{z}).
8154It does so because the
8155value of @samp{z = 5}, which is five, is stored into @code{y} and then
8156the value of @samp{y = z = 5}, which is five, is stored into @code{x}.
8157
8158Assignments may be used anywhere an expression is called for.  For
8159example, it is valid to write @samp{x != (y = 1)} to set @code{y} to one,
8160and then test whether @code{x} equals one.  But this style tends to make
8161programs hard to read; such nesting of assignments should be avoided,
8162except perhaps in a one-shot program.
8163
8164@cindex @code{+} (plus sign), @code{+=} operator
8165@cindex plus sign (@code{+}), @code{+=} operator
8166Aside from @samp{=}, there are several other assignment operators that
8167do arithmetic with the old value of the variable.  For example, the
8168operator @samp{+=} computes a new value by adding the righthand value
8169to the old value of the variable.  Thus, the following assignment adds
8170five to the value of @code{foo}:
8171
8172@example
8173foo += 5
8174@end example
8175
8176@noindent
8177This is equivalent to the following:
8178
8179@example
8180foo = foo + 5
8181@end example
8182
8183@noindent
8184Use whichever makes the meaning of your program clearer.
8185
8186There are situations where using @samp{+=} (or any assignment operator)
8187is @emph{not} the same as simply repeating the lefthand operand in the
8188righthand expression.  For example:
8189
8190@cindex Rankin, Pat
8191@example
8192# Thanks to Pat Rankin for this example
8193BEGIN  @{
8194    foo[rand()] += 5
8195    for (x in foo)
8196       print x, foo[x]
8197
8198    bar[rand()] = bar[rand()] + 5
8199    for (x in bar)
8200       print x, bar[x]
8201@}
8202@end example
8203
8204@cindex operators, assignment, evaluation order
8205@cindex assignment operators, evaluation order
8206@noindent
8207The indices of @code{bar} are practically guaranteed to be different, because
8208@code{rand} returns different values each time it is called.
8209(Arrays and the @code{rand} function haven't been covered yet.
8210@xref{Arrays},
8211and see @ref{Numeric Functions}, for more information).
8212This example illustrates an important fact about assignment
8213operators: the lefthand expression is only evaluated @emph{once}.
8214It is up to the implementation as to which expression is evaluated
8215first, the lefthand or the righthand.
8216Consider this example:
8217
8218@example
8219i = 1
8220a[i += 2] = i + 1
8221@end example
8222
8223@noindent
8224The value of @code{a[3]} could be either two or four.
8225
8226Here is a table of the arithmetic assignment operators.  In each
8227case, the righthand operand is an expression whose value is converted
8228to a number.
8229
8230@ignore
8231@table @code
8232@item @var{lvalue} += @var{increment}
8233Adds @var{increment} to the value of @var{lvalue}.
8234
8235@item @var{lvalue} -= @var{decrement}
8236Subtracts @var{decrement} from the value of @var{lvalue}.
8237
8238@item @var{lvalue} *= @var{coefficient}
8239Multiplies the value of @var{lvalue} by @var{coefficient}.
8240
8241@item @var{lvalue} /= @var{divisor}
8242Divides the value of @var{lvalue} by @var{divisor}.
8243
8244@item @var{lvalue} %= @var{modulus}
8245Sets @var{lvalue} to its remainder by @var{modulus}.
8246
8247@cindex @command{awk} language, POSIX version
8248@cindex POSIX @command{awk}
8249@item @var{lvalue} ^= @var{power}
8250@itemx @var{lvalue} **= @var{power}
8251Raises @var{lvalue} to the power @var{power}.
8252(Only the @samp{^=} operator is specified by POSIX.)
8253@end table
8254@end ignore
8255
8256@cindex @code{-} (hyphen), @code{-=} operator
8257@cindex hyphen (@code{-}), @code{-=} operator
8258@cindex @code{*} (asterisk), @code{*=} operator
8259@cindex asterisk (@code{*}), @code{*=} operator
8260@cindex @code{/} (forward slash), @code{/=} operator
8261@cindex forward slash (@code{/}), @code{/=} operator
8262@cindex @code{%} (percent sign), @code{%=} operator
8263@cindex percent sign (@code{%}), @code{%=} operator
8264@cindex @code{^} (caret), @code{^=} operator
8265@cindex caret (@code{^}), @code{^=} operator
8266@cindex @code{*} (asterisk), @code{**=} operator
8267@cindex asterisk (@code{*}), @code{**=} operator
8268@multitable {@var{lvalue} *= @var{coefficient}} {Subtracts @var{decrement} from the value of @var{lvalue}.}
8269@item @var{lvalue} @code{+=} @var{increment} @tab Adds @var{increment} to the value of @var{lvalue}.
8270
8271@item @var{lvalue} @code{-=} @var{decrement} @tab Subtracts @var{decrement} from the value of @var{lvalue}.
8272
8273@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiplies the value of @var{lvalue} by @var{coefficient}.
8274
8275@item @var{lvalue} @code{/=} @var{divisor} @tab Divides the value of @var{lvalue} by @var{divisor}.
8276
8277@item @var{lvalue} @code{%=} @var{modulus} @tab Sets @var{lvalue} to its remainder by @var{modulus}.
8278
8279@cindex @command{awk} language, POSIX version
8280@cindex POSIX @command{awk}
8281@item @var{lvalue} @code{^=} @var{power} @tab
8282@item @var{lvalue} @code{**=} @var{power} @tab Raises @var{lvalue} to the power @var{power}.
8283@end multitable
8284
8285@cindex POSIX @command{awk}, @code{**=} operator and
8286@cindex portability, @code{**=} operator and
8287@strong{Note:}
8288Only the @samp{^=} operator is specified by POSIX.
8289For maximum portability, do not use the @samp{**=} operator.
8290
8291@c fakenode --- for prepinfo
8292@subheading Advanced Notes: Syntactic Ambiguities Between @samp{/=} and Regular Expressions
8293@cindex advanced features, regexp constants
8294@cindex dark corner, regexp constants, @code{/=} operator and
8295@cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
8296@cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
8297@cindex regexp constants, @code{/=@dots{}/}, @code{/=} operator and
8298
8299@c derived from email from  "Nelson H. F. Beebe" <beebe@math.utah.edu>
8300@c Date: Mon, 1 Sep 1997 13:38:35 -0600 (MDT)
8301
8302@cindex dark corner
8303@cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8304@cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8305@cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant
8306There is a syntactic ambiguity between the @samp{/=} assignment
8307operator and regexp constants whose first character is an @samp{=}.
8308@value{DARKCORNER}
8309This is most notable in commercial @command{awk} versions.
8310For example:
8311
8312@example
8313$ awk /==/ /dev/null
8314@error{} awk: syntax error at source line 1
8315@error{}  context is
8316@error{}         >>> /= <<<
8317@error{} awk: bailing out at source line 1
8318@end example
8319
8320@noindent
8321A workaround is:
8322
8323@example
8324awk '/[=]=/' /dev/null
8325@end example
8326
8327@command{gawk} does not have this problem,
8328nor do the other
8329freely available versions described in
8330@ref{Other Versions}.
8331@c ENDOFRANGE exas
8332@c ENDOFRANGE opas
8333@c ENDOFRANGE asop
8334
8335@node Increment Ops
8336@section Increment and Decrement Operators
8337
8338@c STARTOFRANGE inop
8339@cindex increment operators
8340@c STARTOFRANGE opde
8341@cindex operators, decrement/increment
8342@dfn{Increment} and @dfn{decrement operators} increase or decrease the value of
8343a variable by one.  An assignment operator can do the same thing, so
8344the increment operators add no power to the @command{awk} language; however, they
8345are convenient abbreviations for very common operations.
8346
8347@cindex side effects
8348@cindex @code{+} (plus sign), decrement/increment operators
8349@cindex plus sign (@code{+}), decrement/increment operators
8350@cindex side effects, decrement/increment operators
8351The operator used for adding one is written @samp{++}.  It can be used to increment
8352a variable either before or after taking its value.
8353To pre-increment a variable @code{v}, write @samp{++v}.  This adds
8354one to the value of @code{v}---that new value is also the value of the
8355expression. (The assignment expression @samp{v += 1} is completely
8356equivalent.)
8357Writing the @samp{++} after the variable specifies post-increment.  This
8358increments the variable value just the same; the difference is that the
8359value of the increment expression itself is the variable's @emph{old}
8360value.  Thus, if @code{foo} has the value four, then the expression @samp{foo++}
8361has the value four, but it changes the value of @code{foo} to five.
8362In other words, the operator returns the old value of the variable,
8363but with the side effect of incrementing it.
8364
8365The post-increment @samp{foo++} is nearly the same as writing @samp{(foo
8366+= 1) - 1}.  It is not perfectly equivalent because all numbers in
8367@command{awk} are floating-point---in floating-point, @samp{foo + 1 - 1} does
8368not necessarily equal @code{foo}.  But the difference is minute as
8369long as you stick to numbers that are fairly small (less than 10e12).
8370
8371@cindex @code{$} (dollar sign), incrementing fields and arrays
8372@cindex dollar sign (@code{$}), incrementing fields and arrays
8373Fields and array elements are incremented
8374just like variables.  (Use @samp{$(i++)} when you want to do a field reference
8375and a variable increment at the same time.  The parentheses are necessary
8376because of the precedence of the field reference operator @samp{$}.)
8377
8378@cindex decrement operators
8379The decrement operator @samp{--} works just like @samp{++}, except that
8380it subtracts one instead of adding it.  As with @samp{++}, it can be used before
8381the lvalue to pre-decrement or after it to post-decrement.
8382Following is a summary of increment and decrement expressions:
8383
8384@table @code
8385@cindex @code{+} (plus sign), @code{++} operator
8386@cindex plus sign (@code{+}), @code{++} operator
8387@item ++@var{lvalue}
8388This expression increments @var{lvalue}, and the new value becomes the
8389value of the expression.
8390
8391@item @var{lvalue}++
8392This expression increments @var{lvalue}, but
8393the value of the expression is the @emph{old} value of @var{lvalue}.
8394
8395@cindex @code{-} (hyphen), @code{--} operator
8396@cindex hyphen (@code{-}), @code{--} operator
8397@item --@var{lvalue}
8398This expression is
8399like @samp{++@var{lvalue}}, but instead of adding, it subtracts.  It
8400decrements @var{lvalue} and delivers the value that is the result.
8401
8402@item @var{lvalue}--
8403This expression is
8404like @samp{@var{lvalue}++}, but instead of adding, it subtracts.  It
8405decrements @var{lvalue}.  The value of the expression is the @emph{old}
8406value of @var{lvalue}.
8407@end table
8408
8409@c fakenode --- for prepinfo
8410@subheading Advanced Notes: Operator Evaluation Order
8411@c comma before precedence does NOT start tertiary
8412@cindex advanced features, operators, precedence
8413@cindex precedence
8414@cindex operators, precedence
8415@cindex portability, operators
8416@cindex evaluation order
8417@cindex Marx, Groucho
8418@quotation
8419@i{Doctor, doctor!  It hurts when I do this!@*
8420So don't do that!}@*
8421Groucho Marx
8422@end quotation
8423
8424@noindent
8425What happens for something like the following?
8426
8427@example
8428b = 6
8429print b += b++
8430@end example
8431
8432@noindent
8433Or something even stranger?
8434
8435@example
8436b = 6
8437b += ++b + b++
8438print b
8439@end example
8440
8441@cindex side effects
8442In other words, when do the various side effects prescribed by the
8443postfix operators (@samp{b++}) take effect?
8444When side effects happen is @dfn{implementation defined}.
8445In other words, it is up to the particular version of @command{awk}.
8446The result for the first example may be 12 or 13, and for the second, it
8447may be 22 or 23.
8448
8449In short, doing things like this is not recommended and definitely
8450not anything that you can rely upon for portability.
8451You should avoid such things in your own programs.
8452@c You'll sleep better at night and be able to look at yourself
8453@c in the mirror in the morning.
8454@c ENDOFRANGE inop
8455@c ENDOFRANGE opde
8456@c ENDOFRANGE deop
8457
8458@node Truth Values
8459@section True and False in @command{awk}
8460@cindex truth values
8461@cindex logical false/true
8462@cindex false, logical
8463@cindex true, logical
8464
8465@cindex null strings
8466Many programming languages have a special representation for the concepts
8467of ``true'' and ``false.''  Such languages usually use the special
8468constants @code{true} and @code{false}, or perhaps their uppercase
8469equivalents.
8470However, @command{awk} is different.
8471It borrows a very simple concept of true and
8472false from C.  In @command{awk}, any nonzero numeric value @emph{or} any
8473nonempty string value is true.  Any other value (zero or the null
8474string @code{""}) is false.  The following program prints @samp{A strange
8475truth value} three times:
8476
8477@example
8478BEGIN @{
8479   if (3.1415927)
8480       print "A strange truth value"
8481   if ("Four Score And Seven Years Ago")
8482       print "A strange truth value"
8483   if (j = 57)
8484       print "A strange truth value"
8485@}
8486@end example
8487
8488@cindex dark corner
8489There is a surprising consequence of the ``nonzero or non-null'' rule:
8490the string constant @code{"0"} is actually true, because it is non-null.
8491@value{DARKCORNER}
8492
8493@node Typing and Comparison
8494@section Variable Typing and Comparison Expressions
8495@quotation
8496@i{The Guide is definitive. Reality is frequently inaccurate.}@*
8497The Hitchhiker's Guide to the Galaxy
8498@end quotation
8499
8500@c STARTOFRANGE comex
8501@cindex comparison expressions
8502@c STARTOFRANGE excom
8503@cindex expressions, comparison
8504@cindex expressions, matching, See comparison expressions
8505@cindex matching, expressions, See comparison expressions
8506@cindex relational operators, See comparison operators
8507@c comma is part of See
8508@cindex operators, relational, See operators, comparison
8509@c STARTOFRANGE varting
8510@cindex variable typing
8511@c STARTOFRANGE vartypc
8512@cindex variables, types of, comparison expressions and
8513Unlike other programming languages, @command{awk} variables do not have a
8514fixed type. Instead, they can be either a number or a string, depending
8515upon the value that is assigned to them.
8516
8517@cindex numeric, strings
8518@cindex strings, numeric
8519@cindex POSIX @command{awk}, numeric strings and
8520The 1992 POSIX standard introduced
8521the concept of a @dfn{numeric string}, which is simply a string that looks
8522like a number---for example, @code{@w{" +2"}}.  This concept is used
8523for determining the type of a variable.
8524The type of the variable is important because the types of two variables
8525determine how they are compared.
8526In @command{gawk}, variable typing follows these rules:
8527
8528@itemize @bullet
8529@item
8530A numeric constant or the result of a numeric operation has the @var{numeric}
8531attribute.
8532
8533@item
8534A string constant or the result of a string operation has the @var{string}
8535attribute.
8536
8537@item
8538Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements,
8539@code{ENVIRON} elements, and the
8540elements of an array created by @code{split} that are numeric strings
8541have the @var{strnum} attribute.  Otherwise, they have the @var{string}
8542attribute.
8543Uninitialized variables also have the @var{strnum} attribute.
8544
8545@item
8546Attributes propagate across assignments but are not changed by
8547any use.
8548@c (Although a use may cause the entity to acquire an additional
8549@c value such that it has both a numeric and string value, this leaves the
8550@c attribute unchanged.)
8551@c This is important but not relevant
8552@end itemize
8553
8554The last rule is particularly important. In the following program,
8555@code{a} has numeric type, even though it is later used in a string
8556operation:
8557
8558@example
8559BEGIN @{
8560         a = 12.345
8561         b = a " is a cute number"
8562         print b
8563@}
8564@end example
8565
8566When two operands are compared, either string comparison or numeric comparison
8567may be used. This depends upon the attributes of the operands, according to the
8568following symmetric matrix:
8569
8570@c thanks to Karl Berry, kb@cs.umb.edu, for major help with TeX tables
8571@tex
8572\centerline{
8573\vbox{\bigskip % space above the table (about 1 linespace)
8574% Because we have vertical rules, we can't let TeX insert interline space
8575% in its usual way.
8576\offinterlineskip
8577%
8578% Define the table template. & separates columns, and \cr ends the
8579% template (and each row). # is replaced by the text of that entry on
8580% each row. The template for the first column breaks down like this:
8581%   \strut -- a way to make each line have the height and depth
8582%             of a normal line of type, since we turned off interline spacing.
8583%   \hfil -- infinite glue; has the effect of right-justifying in this case.
8584%   #     -- replaced by the text (for instance, `STRNUM', in the last row).
8585%   \quad -- about the width of an `M'. Just separates the columns.
8586%
8587% The second column (\vrule#) is what generates the vertical rule that
8588% spans table rows.
8589%
8590% The doubled && before the next entry means `repeat the following
8591% template as many times as necessary on each line' -- in our case, twice.
8592%
8593% The template itself, \quad#\hfil, left-justifies with a little space before.
8594%
8595\halign{\strut\hfil#\quad&\vrule#&&\quad#\hfil\cr
8596	&&STRING	&NUMERIC	&STRNUM\cr
8597% The \omit tells TeX to skip inserting the template for this column on
8598% this particular row. In this case, we only want a little extra space
8599% to separate the heading row from the rule below it.  the depth 2pt --
8600% `\vrule depth 2pt' is that little space.
8601\omit	&depth 2pt\cr
8602% This is the horizontal rule below the heading. Since it has nothing to
8603% do with the columns of the table, we use \noalign to get it in there.
8604\noalign{\hrule}
8605% Like above, this time a little more space.
8606\omit	&depth 4pt\cr
8607% The remaining rows have nothing special about them.
8608STRING	&&string	&string		&string\cr
8609NUMERIC	&&string	&numeric	&numeric\cr
8610STRNUM  &&string	&numeric	&numeric\cr
8611}}}
8612@end tex
8613@ifnottex
8614@display
8615        +----------------------------------------------
8616        |       STRING          NUMERIC         STRNUM
8617--------+----------------------------------------------
8618        |
8619STRING  |       string          string          string
8620        |
8621NUMERIC |       string          numeric         numeric
8622        |
8623STRNUM  |       string          numeric         numeric
8624--------+----------------------------------------------
8625@end display
8626@end ifnottex
8627
8628The basic idea is that user input that looks numeric---and @emph{only}
8629user input---should be treated as numeric, even though it is actually
8630made of characters and is therefore also a string.
8631Thus, for example, the string constant @w{@code{" +3.14"}}
8632is a string, even though it looks numeric,
8633and is @emph{never} treated as number for comparison
8634purposes.
8635
8636In short, when one operand is a ``pure'' string, such as a string
8637constant, then a string comparison is performed.  Otherwise, a
8638numeric comparison is performed.@footnote{The POSIX standard is under
8639revision.  The revised standard's rules for typing and comparison are
8640the same as just described for @command{gawk}.}
8641
8642@dfn{Comparison expressions} compare strings or numbers for
8643relationships such as equality.  They are written using @dfn{relational
8644operators}, which are a superset of those in C.  Here is a table of
8645them:
8646
8647@cindex @code{<} (left angle bracket), @code{<} operator
8648@cindex left angle bracket (@code{<}), @code{<} operator
8649@cindex @code{<} (left angle bracket), @code{<=} operator
8650@cindex left angle bracket (@code{<}), @code{<=} operator
8651@cindex @code{>} (right angle bracket), @code{>=} operator
8652@cindex right angle bracket (@code{>}), @code{>=} operator
8653@cindex @code{>} (right angle bracket), @code{>} operator
8654@cindex right angle bracket (@code{>}), @code{>} operator
8655@cindex @code{=} (equals sign), @code{==} operator
8656@cindex equals sign (@code{=}), @code{==} operator
8657@cindex @code{!} (exclamation point), @code{!=} operator
8658@cindex exclamation point (@code{!}), @code{!=} operator
8659@cindex @code{~} (tilde), @code{~} operator
8660@cindex tilde (@code{~}), @code{~} operator
8661@cindex @code{!} (exclamation point), @code{!~} operator
8662@cindex exclamation point (@code{!}), @code{!~} operator
8663@cindex @code{in} operator
8664@table @code
8665@item @var{x} < @var{y}
8666True if @var{x} is less than @var{y}.
8667
8668@item @var{x} <= @var{y}
8669True if @var{x} is less than or equal to @var{y}.
8670
8671@item @var{x} > @var{y}
8672True if @var{x} is greater than @var{y}.
8673
8674@item @var{x} >= @var{y}
8675True if @var{x} is greater than or equal to @var{y}.
8676
8677@item @var{x} == @var{y}
8678True if @var{x} is equal to @var{y}.
8679
8680@item @var{x} != @var{y}
8681True if @var{x} is not equal to @var{y}.
8682
8683@item @var{x} ~ @var{y}
8684True if the string @var{x} matches the regexp denoted by @var{y}.
8685
8686@item @var{x} !~ @var{y}
8687True if the string @var{x} does not match the regexp denoted by @var{y}.
8688
8689@item @var{subscript} in @var{array}
8690True if the array @var{array} has an element with the subscript @var{subscript}.
8691@end table
8692
8693Comparison expressions have the value one if true and zero if false.
8694When comparing operands of mixed types, numeric operands are converted
8695to strings using the value of @code{CONVFMT}
8696(@pxref{Conversion}).
8697
8698Strings are compared
8699by comparing the first character of each, then the second character of each,
8700and so on.  Thus, @code{"10"} is less than @code{"9"}.  If there are two
8701strings where one is a prefix of the other, the shorter string is less than
8702the longer one.  Thus, @code{"abc"} is less than @code{"abcd"}.
8703
8704@cindex troubleshooting, @code{==} operator
8705It is very easy to accidentally mistype the @samp{==} operator and
8706leave off one of the @samp{=} characters.  The result is still valid @command{awk}
8707code, but the program does not do what is intended:
8708
8709@example
8710if (a = b)   # oops! should be a == b
8711   @dots{}
8712else
8713   @dots{}
8714@end example
8715
8716@noindent
8717Unless @code{b} happens to be zero or the null string, the @code{if}
8718part of the test always succeeds.  Because the operators are
8719so similar, this kind of error is very difficult to spot when
8720scanning the source code.
8721
8722@cindex @command{gawk}, comparison operators and
8723The following table of expressions illustrates the kind of comparison
8724@command{gawk} performs, as well as what the result of the comparison is:
8725
8726@table @code
8727@item 1.5 <= 2.0
8728numeric comparison (true)
8729
8730@item "abc" >= "xyz"
8731string comparison (false)
8732
8733@item 1.5 != " +2"
8734string comparison (true)
8735
8736@item "1e2" < "3"
8737string comparison (true)
8738
8739@item a = 2; b = "2"
8740@itemx a == b
8741string comparison (true)
8742
8743@item a = 2; b = " +2"
8744@item a == b
8745string comparison (false)
8746@end table
8747
8748In the next example:
8749
8750@example
8751$ echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'
8752@print{} false
8753@end example
8754
8755@cindex comparison expressions, string vs. regexp
8756@c @cindex string comparison vs. regexp comparison
8757@c @cindex regexp comparison vs. string comparison
8758@noindent
8759the result is @samp{false} because both @code{$1} and @code{$2}
8760are user input.  They are numeric strings---therefore both have
8761the @var{strnum} attribute, dictating a numeric comparison.
8762The purpose of the comparison rules and the use of numeric strings is
8763to attempt to produce the behavior that is ``least surprising,'' while
8764still ``doing the right thing.''
8765String comparisons and regular expression comparisons are very different.
8766For example:
8767
8768@example
8769x == "foo"
8770@end example
8771
8772@noindent
8773has the value one, or is true if the variable @code{x}
8774is precisely @samp{foo}.  By contrast:
8775
8776@example
8777x ~ /foo/
8778@end example
8779
8780@noindent
8781has the value one if @code{x} contains @samp{foo}, such as
8782@code{"Oh, what a fool am I!"}.
8783
8784@cindex @code{~} (tilde), @code{~} operator
8785@cindex tilde (@code{~}), @code{~} operator
8786@cindex @code{!} (exclamation point), @code{!~} operator
8787@cindex exclamation point (@code{!}), @code{!~} operator
8788The righthand operand of the @samp{~} and @samp{!~} operators may be
8789either a regexp constant (@code{/@dots{}/}) or an ordinary
8790expression. In the latter case, the value of the expression as a string is used as a
8791dynamic regexp (@pxref{Regexp Usage}; also
8792@pxref{Computed Regexps}).
8793
8794@cindex @command{awk}, regexp constants and
8795@cindex regexp constants
8796In modern implementations of @command{awk}, a constant regular
8797expression in slashes by itself is also an expression.  The regexp
8798@code{/@var{regexp}/} is an abbreviation for the following comparison expression:
8799
8800@example
8801$0 ~ /@var{regexp}/
8802@end example
8803
8804One special place where @code{/foo/} is @emph{not} an abbreviation for
8805@samp{$0 ~ /foo/} is when it is the righthand operand of @samp{~} or
8806@samp{!~}.
8807@xref{Using Constant Regexps},
8808where this is discussed in more detail.
8809@c ENDOFRANGE comex
8810@c ENDOFRANGE excom
8811@c ENDOFRANGE vartypc
8812@c ENDOFRANGE varting
8813
8814@node Boolean Ops
8815@section Boolean Expressions
8816@cindex and Boolean-logic operator
8817@cindex or Boolean-logic operator
8818@cindex not Boolean-logic operator
8819@c STARTOFRANGE exbo
8820@cindex expressions, Boolean
8821@c STARTOFRANGE boex
8822@cindex Boolean expressions
8823@cindex operators, Boolean, See Boolean expressions
8824@cindex Boolean operators, See Boolean expressions
8825@cindex logical operators, See Boolean expressions
8826@cindex operators, logical, See Boolean expressions
8827
8828A @dfn{Boolean expression} is a combination of comparison expressions or
8829matching expressions, using the Boolean operators ``or''
8830(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with
8831parentheses to control nesting.  The truth value of the Boolean expression is
8832computed by combining the truth values of the component expressions.
8833Boolean expressions are also referred to as @dfn{logical expressions}.
8834The terms are equivalent.
8835
8836Boolean expressions can be used wherever comparison and matching
8837expressions can be used.  They can be used in @code{if}, @code{while},
8838@code{do}, and @code{for} statements
8839(@pxref{Statements}).
8840They have numeric values (one if true, zero if false) that come into play
8841if the result of the Boolean expression is stored in a variable or
8842used in arithmetic.
8843
8844In addition, every Boolean expression is also a valid pattern, so
8845you can use one as a pattern to control the execution of rules.
8846The Boolean operators are:
8847
8848@table @code
8849@item @var{boolean1} && @var{boolean2}
8850True if both @var{boolean1} and @var{boolean2} are true.  For example,
8851the following statement prints the current input record if it contains
8852both @samp{2400} and @samp{foo}:
8853
8854@example
8855if ($0 ~ /2400/ && $0 ~ /foo/) print
8856@end example
8857
8858@cindex side effects, Boolean operators
8859The subexpression @var{boolean2} is evaluated only if @var{boolean1}
8860is true.  This can make a difference when @var{boolean2} contains
8861expressions that have side effects. In the case of @samp{$0 ~ /foo/ &&
8862($2 == bar++)}, the variable @code{bar} is not incremented if there is
8863no substring @samp{foo} in the record.
8864
8865@item @var{boolean1} || @var{boolean2}
8866True if at least one of @var{boolean1} or @var{boolean2} is true.
8867For example, the following statement prints all records in the input
8868that contain @emph{either} @samp{2400} or
8869@samp{foo} or both:
8870
8871@example
8872if ($0 ~ /2400/ || $0 ~ /foo/) print
8873@end example
8874
8875The subexpression @var{boolean2} is evaluated only if @var{boolean1}
8876is false.  This can make a difference when @var{boolean2} contains
8877expressions that have side effects.
8878
8879@item ! @var{boolean}
8880True if @var{boolean} is false.  For example,
8881the following program prints @samp{no home!} in
8882the unusual event that the @env{HOME} environment
8883variable is not defined:
8884
8885@example
8886BEGIN @{ if (! ("HOME" in ENVIRON))
8887               print "no home!" @}
8888@end example
8889
8890(The @code{in} operator is described in
8891@ref{Reference to Elements}.)
8892@end table
8893
8894@cindex short-circuit operators
8895@cindex operators, short-circuit
8896@cindex @code{&} (ampersand), @code{&&} operator
8897@cindex ampersand (@code{&}), @code{&&} operator
8898@cindex @code{|} (vertical bar), @code{||} operator
8899@cindex vertical bar (@code{|}), @code{||} operator
8900The @samp{&&} and @samp{||} operators are called @dfn{short-circuit}
8901operators because of the way they work.  Evaluation of the full expression
8902is ``short-circuited'' if the result can be determined part way through
8903its evaluation.
8904
8905@cindex line continuations
8906Statements that use @samp{&&} or @samp{||} can be continued simply
8907by putting a newline after them.  But you cannot put a newline in front
8908of either of these operators without using backslash continuation
8909(@pxref{Statements/Lines}).
8910
8911@cindex @code{!} (exclamation point), @code{!}  operator
8912@cindex exclamation point (@code{!}), @code{!} operator
8913@cindex newlines
8914@cindex variables, flag
8915@cindex flag variables
8916The actual value of an expression using the @samp{!} operator is
8917either one or zero, depending upon the truth value of the expression it
8918is applied to.
8919The @samp{!} operator is often useful for changing the sense of a flag
8920variable from false to true and back again. For example, the following
8921program is one way to print lines in between special bracketing lines:
8922
8923@example
8924$1 == "START"   @{ interested = ! interested; next @}
8925interested == 1 @{ print @}
8926$1 == "END"     @{ interested = ! interested; next @}
8927@end example
8928
8929@noindent
8930The variable @code{interested}, as with all @command{awk} variables, starts
8931out initialized to zero, which is also false.  When a line is seen whose
8932first field is @samp{START}, the value of @code{interested} is toggled
8933to true, using @samp{!}. The next rule prints lines as long as
8934@code{interested} is true.  When a line is seen whose first field is
8935@samp{END}, @code{interested} is toggled back to false.
8936
8937@ignore
8938Scott Deifik points out that this program isn't robust against
8939bogus input data, but the point is to illustrate the use of `!',
8940so we'll leave well enough alone.
8941@end ignore
8942
8943@cindex @code{next} statement
8944@strong{Note:} The @code{next} statement is discussed in
8945@ref{Next Statement}.
8946@code{next} tells @command{awk} to skip the rest of the rules, get the
8947next record, and start processing the rules over again at the top.
8948The reason it's there is to avoid printing the bracketing
8949@samp{START} and @samp{END} lines.
8950@c ENDOFRANGE exbo
8951@c ENDOFRANGE boex
8952
8953@node Conditional Exp
8954@section Conditional Expressions
8955@cindex conditional expressions
8956@cindex expressions, conditional
8957@cindex expressions, selecting
8958
8959A @dfn{conditional expression} is a special kind of expression that has
8960three operands.  It allows you to use one expression's value to select
8961one of two other expressions.
8962The conditional expression is the same as in the C language,
8963as shown here:
8964
8965@example
8966@var{selector} ? @var{if-true-exp} : @var{if-false-exp}
8967@end example
8968
8969@noindent
8970There are three subexpressions.  The first, @var{selector}, is always
8971computed first.  If it is ``true'' (not zero or not null), then
8972@var{if-true-exp} is computed next and its value becomes the value of
8973the whole expression.  Otherwise, @var{if-false-exp} is computed next
8974and its value becomes the value of the whole expression.
8975For example, the following expression produces the absolute value of @code{x}:
8976
8977@example
8978x >= 0 ? x : -x
8979@end example
8980
8981@cindex side effects, conditional expressions
8982Each time the conditional expression is computed, only one of
8983@var{if-true-exp} and @var{if-false-exp} is used; the other is ignored.
8984This is important when the expressions have side effects.  For example,
8985this conditional expression examines element @code{i} of either array
8986@code{a} or array @code{b}, and increments @code{i}:
8987
8988@example
8989x == y ? a[i++] : b[i++]
8990@end example
8991
8992@noindent
8993This is guaranteed to increment @code{i} exactly once, because each time
8994only one of the two increment expressions is executed
8995and the other is not.
8996@xref{Arrays},
8997for more information about arrays.
8998
8999@cindex differences in @command{awk} and @command{gawk}, line continuations
9000@cindex line continuations, @command{gawk}
9001@cindex @command{gawk}, line continuation in
9002As a minor @command{gawk} extension,
9003a statement that uses @samp{?:} can be continued simply
9004by putting a newline after either character.
9005However, putting a newline in front
9006of either character does not work without using backslash continuation
9007(@pxref{Statements/Lines}).
9008If @option{--posix} is specified
9009(@pxref{Options}), then this extension is disabled.
9010
9011@node Function Calls
9012@section Function Calls
9013@cindex function calls
9014
9015A @dfn{function} is a name for a particular calculation.
9016This enables you to
9017ask for it by name at any point in the program.  For
9018example, the function @code{sqrt} computes the square root of a number.
9019
9020@cindex functions, built-in
9021A fixed set of functions are @dfn{built-in}, which means they are
9022available in every @command{awk} program.  The @code{sqrt} function is one
9023of these.  @xref{Built-in}, for a list of built-in
9024functions and their descriptions.  In addition, you can define
9025functions for use in your program.
9026@xref{User-defined},
9027for instructions on how to do this.
9028
9029@cindex arguments, in function calls
9030The way to use a function is with a @dfn{function call} expression,
9031which consists of the function name followed immediately by a list of
9032@dfn{arguments} in parentheses.  The arguments are expressions that
9033provide the raw materials for the function's calculations.
9034When there is more than one argument, they are separated by commas.  If
9035there are no arguments, just write @samp{()} after the function name.
9036The following examples show function calls with and without arguments:
9037
9038@example
9039sqrt(x^2 + y^2)        @i{one argument}
9040atan2(y, x)            @i{two arguments}
9041rand()                 @i{no arguments}
9042@end example
9043
9044@cindex troubleshooting, function call syntax
9045@strong{Caution:}
9046Do not put any space between the function name and the open-parenthesis!
9047A user-defined function name looks just like the name of a
9048variable---a space would make the expression look like concatenation of
9049a variable with an expression inside parentheses.
9050
9051With built-in functions, space before the parenthesis is harmless, but
9052it is best not to get into the habit of using space to avoid mistakes
9053with user-defined functions.  Each function expects a particular number
9054of arguments.  For example, the @code{sqrt} function must be called with
9055a single argument, the number of which to take the square root:
9056
9057@example
9058sqrt(@var{argument})
9059@end example
9060
9061Some of the built-in functions have one or
9062more optional arguments.
9063If those arguments are not supplied, the functions
9064use a reasonable default value.
9065@xref{Built-in}, for full details.  If arguments
9066are omitted in calls to user-defined functions, then those arguments are
9067treated as local variables and initialized to the empty string
9068(@pxref{User-defined}).
9069
9070@cindex side effects, function calls
9071Like every other expression, the function call has a value, which is
9072computed by the function based on the arguments you give it.  In this
9073example, the value of @samp{sqrt(@var{argument})} is the square root of
9074@var{argument}.  A function can also have side effects, such as assigning
9075values to certain variables or doing I/O.
9076The following program reads numbers, one number per line, and prints the
9077square root of each one:
9078
9079@example
9080$ awk '@{ print "The square root of", $1, "is", sqrt($1) @}'
90811
9082@print{} The square root of 1 is 1
90833
9084@print{} The square root of 3 is 1.73205
90855
9086@print{} The square root of 5 is 2.23607
9087@kbd{@value{CTL}-d}
9088@end example
9089
9090@node Precedence
9091@section Operator Precedence (How Operators Nest)
9092@c STARTOFRANGE prec
9093@cindex precedence
9094@c STARTOFRANGE oppr
9095@cindex operators, precedence
9096
9097@dfn{Operator precedence} determines how operators are grouped when
9098different operators appear close by in one expression.  For example,
9099@samp{*} has higher precedence than @samp{+}; thus, @samp{a + b * c}
9100means to multiply @code{b} and @code{c}, and then add @code{a} to the
9101product (i.e., @samp{a + (b * c)}).
9102
9103The normal precedence of the operators can be overruled by using parentheses.
9104Think of the precedence rules as saying where the
9105parentheses are assumed to be.  In
9106fact, it is wise to always use parentheses whenever there is an unusual
9107combination of operators, because other people who read the program may
9108not remember what the precedence is in this case.
9109Even experienced programmers occasionally forget the exact rules,
9110which leads to mistakes.
9111Explicit parentheses help prevent
9112any such mistakes.
9113
9114When operators of equal precedence are used together, the leftmost
9115operator groups first, except for the assignment, conditional, and
9116exponentiation operators, which group in the opposite order.
9117Thus, @samp{a - b + c} groups as @samp{(a - b) + c} and
9118@samp{a = b = c} groups as @samp{a = (b = c)}.
9119
9120The precedence of prefix unary operators does not matter as long as only
9121unary operators are involved, because there is only one way to interpret
9122them: innermost first.  Thus, @samp{$++i} means @samp{$(++i)} and
9123@samp{++$x} means @samp{++($x)}.  However, when another operator follows
9124the operand, then the precedence of the unary operators can matter.
9125@samp{$x^2} means @samp{($x)^2}, but @samp{-x^2} means
9126@samp{-(x^2)}, because @samp{-} has lower precedence than @samp{^},
9127whereas @samp{$} has higher precedence.
9128This table presents @command{awk}'s operators, in order of highest
9129to lowest precedence:
9130
9131@c use @code in the items, looks better in TeX w/o all the quotes
9132@table @code
9133@item (@dots{})
9134Grouping.
9135
9136@cindex @code{$} (dollar sign), @code{$} field operator
9137@cindex dollar sign (@code{$}), @code{$} field operator
9138@item $
9139Field.
9140
9141@cindex @code{+} (plus sign), @code{++} operator
9142@cindex plus sign (@code{+}), @code{++} operator
9143@cindex @code{-} (hyphen), @code{--} (decrement/increment) operator
9144@cindex hyphen (@code{-}), @code{--} (decrement/increment) operators
9145@item ++ --
9146Increment, decrement.
9147
9148@cindex @code{^} (caret), @code{^} operator
9149@cindex caret (@code{^}), @code{^} operator
9150@cindex @code{*} (asterisk), @code{**} operator
9151@cindex asterisk (@code{*}), @code{**} operator
9152@item ^ **
9153Exponentiation.  These operators group right-to-left.
9154
9155@cindex @code{+} (plus sign), @code{+} operator
9156@cindex plus sign (@code{+}), @code{+} operator
9157@cindex @code{-} (hyphen), @code{-} operator
9158@cindex hyphen (@code{-}), @code{-} operator
9159@cindex @code{!} (exclamation point), @code{!} operator
9160@cindex exclamation point (@code{!}), @code{!} operator
9161@item + - !
9162Unary plus, minus, logical ``not.''
9163
9164@cindex @code{*} (asterisk), @code{*} operator, as multiplication operator
9165@cindex asterisk (@code{*}), @code{*} operator, as multiplication operator
9166@cindex @code{/} (forward slash), @code{/} operator
9167@cindex forward slash (@code{/}), @code{/} operator
9168@cindex @code{%} (percent sign), @code{%} operator
9169@cindex percent sign (@code{%}), @code{%} operator
9170@item * / %
9171Multiplication, division, modulus.
9172
9173@cindex @code{+} (plus sign), @code{+} operator
9174@cindex plus sign (@code{+}), @code{+} operator
9175@cindex @code{-} (hyphen), @code{-} operator
9176@cindex hyphen (@code{-}), @code{-} operator
9177@item + -
9178Addition, subtraction.
9179
9180@item @r{String Concatenation}
9181No special symbol is used to indicate concatenation.
9182The operands are simply written side by side
9183(@pxref{Concatenation}).
9184
9185@cindex @code{<} (left angle bracket), @code{<} operator
9186@cindex left angle bracket (@code{<}), @code{<} operator
9187@cindex @code{<} (left angle bracket), @code{<=} operator
9188@cindex left angle bracket (@code{<}), @code{<=} operator
9189@cindex @code{>} (right angle bracket), @code{>=} operator
9190@cindex right angle bracket (@code{>}), @code{>=} operator
9191@cindex @code{>} (right angle bracket), @code{>} operator
9192@cindex right angle bracket (@code{>}), @code{>} operator
9193@cindex @code{=} (equals sign), @code{==} operator
9194@cindex equals sign (@code{=}), @code{==} operator
9195@cindex @code{!} (exclamation point), @code{!=} operator
9196@cindex exclamation point (@code{!}), @code{!=} operator
9197@cindex @code{>} (right angle bracket), @code{>>} operator (I/O)
9198@cindex right angle bracket (@code{>}), @code{>>} operator (I/O)
9199@cindex operators, input/output
9200@cindex @code{|} (vertical bar), @code{|} operator (I/O)
9201@cindex vertical bar (@code{|}), @code{|} operator (I/O)
9202@cindex operators, input/output
9203@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
9204@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
9205@cindex operators, input/output
9206@item < <= == !=
9207@itemx > >= >> | |&
9208Relational and redirection.
9209The relational operators and the redirections have the same precedence
9210level.  Characters such as @samp{>} serve both as relationals and as
9211redirections; the context distinguishes between the two meanings.
9212
9213@cindex @code{print} statement, I/O operators in
9214@cindex @code{printf} statement, I/O operators in
9215Note that the I/O redirection operators in @code{print} and @code{printf}
9216statements belong to the statement level, not to expressions.  The
9217redirection does not produce an expression that could be the operand of
9218another operator.  As a result, it does not make sense to use a
9219redirection operator near another operator of lower precedence without
9220parentheses.  Such combinations (for example, @samp{print foo > a ? b : c}),
9221result in syntax errors.
9222The correct way to write this statement is @samp{print foo > (a ? b : c)}.
9223
9224@cindex @code{~} (tilde), @code{~} operator
9225@cindex tilde (@code{~}), @code{~} operator
9226@cindex @code{!} (exclamation point), @code{!~} operator
9227@cindex exclamation point (@code{!}), @code{!~} operator
9228@item ~ !~
9229Matching, nonmatching.
9230
9231@cindex @code{in} operator
9232@item in
9233Array membership.
9234
9235@cindex @code{&} (ampersand), @code{&&} operator
9236@cindex ampersand (@code{&}), @code{&&}operator
9237@item &&
9238Logical ``and''.
9239
9240@cindex @code{|} (vertical bar), @code{||} operator
9241@cindex vertical bar (@code{|}), @code{||} operator
9242@item ||
9243Logical ``or''.
9244
9245@cindex @code{?} (question mark), @code{?:} operator
9246@cindex question mark (@code{?}), @code{?:} operator
9247@item ?:
9248Conditional.  This operator groups right-to-left.
9249
9250@cindex @code{+} (plus sign), @code{+=} operator
9251@cindex plus sign (@code{+}), @code{+=} operator
9252@cindex @code{-} (hyphen), @code{-=} operator
9253@cindex hyphen (@code{-}), @code{-=} operator
9254@cindex @code{*} (asterisk), @code{*=} operator
9255@cindex asterisk (@code{*}), @code{*=} operator
9256@cindex @code{*} (asterisk), @code{**=} operator
9257@cindex asterisk (@code{*}), @code{**=} operator
9258@cindex @code{/} (forward slash), @code{/=} operator
9259@cindex forward slash (@code{/}), @code{/=} operator
9260@cindex @code{%} (percent sign), @code{%=} operator
9261@cindex percent sign (@code{%}), @code{%=} operator
9262@cindex @code{^} (caret), @code{^=} operator
9263@cindex caret (@code{^}), @code{^=} operator
9264@item = += -= *=
9265@itemx /= %= ^= **=
9266Assignment.  These operators group right to left.
9267@end table
9268
9269@cindex portability, operators, not in POSIX @command{awk}
9270@strong{Note:}
9271The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX.
9272For maximum portability, do not use them.
9273@c ENDOFRANGE prec
9274@c ENDOFRANGE oppr
9275@c ENDOFRANGE exps
9276
9277@node Patterns and Actions
9278@chapter Patterns, Actions, and Variables
9279@c STARTOFRANGE pat
9280@cindex patterns
9281
9282As you have already seen, each @command{awk} statement consists of
9283a pattern with an associated action.  This @value{CHAPTER} describes how
9284you build patterns and actions, what kinds of things you can do within
9285actions, and @command{awk}'s built-in variables.
9286
9287The pattern-action rules and the statements available for use
9288within actions form the core of @command{awk} programming.
9289In a sense, everything covered
9290up to here has been the foundation
9291that programs are built on top of.  Now it's time to start
9292building something useful.
9293
9294@menu
9295* Pattern Overview::            What goes into a pattern.
9296* Using Shell Variables::       How to use shell variables with @command{awk}.
9297* Action Overview::             What goes into an action.
9298* Statements::                  Describes the various control statements in
9299                                detail.
9300* Built-in Variables::          Summarizes the built-in variables.
9301@end menu
9302
9303@node Pattern Overview
9304@section Pattern Elements
9305
9306@menu
9307* Regexp Patterns::             Using regexps as patterns.
9308* Expression Patterns::         Any expression can be used as a pattern.
9309* Ranges::                      Pairs of patterns specify record ranges.
9310* BEGIN/END::                   Specifying initialization and cleanup rules.
9311* Empty::                       The empty pattern, which matches every record.
9312@end menu
9313
9314@cindex patterns, types of
9315Patterns in @command{awk} control the execution of rules---a rule is
9316executed when its pattern matches the current input record.
9317The following is a summary of the types of @command{awk} patterns:
9318
9319@table @code
9320@item /@var{regular expression}/
9321A regular expression. It matches when the text of the
9322input record fits the regular expression.
9323(@xref{Regexp}.)
9324
9325@item @var{expression}
9326A single expression.  It matches when its value
9327is nonzero (if a number) or non-null (if a string).
9328(@xref{Expression Patterns}.)
9329
9330@item @var{pat1}, @var{pat2}
9331A pair of patterns separated by a comma, specifying a range of records.
9332The range includes both the initial record that matches @var{pat1} and
9333the final record that matches @var{pat2}.
9334(@xref{Ranges}.)
9335
9336@item BEGIN
9337@itemx END
9338Special patterns for you to supply startup or cleanup actions for your
9339@command{awk} program.
9340(@xref{BEGIN/END}.)
9341
9342@item @var{empty}
9343The empty pattern matches every input record.
9344(@xref{Empty}.)
9345@end table
9346
9347@node Regexp Patterns
9348@subsection Regular Expressions as Patterns
9349@cindex patterns, expressions as
9350@cindex regular expressions, as patterns
9351
9352Regular expressions are one of the first kinds of patterns presented
9353in this book.
9354This kind of pattern is simply a regexp constant in the pattern part of
9355a rule.  Its  meaning is @samp{$0 ~ /@var{pattern}/}.
9356The pattern matches when the input record matches the regexp.
9357For example:
9358
9359@example
9360/foo|bar|baz/  @{ buzzwords++ @}
9361END            @{ print buzzwords, "buzzwords seen" @}
9362@end example
9363
9364@node Expression Patterns
9365@subsection Expressions as Patterns
9366@cindex expressions, as patterns
9367
9368Any @command{awk} expression is valid as an @command{awk} pattern.
9369The pattern matches if the expression's value is nonzero (if a
9370number) or non-null (if a string).
9371The expression is reevaluated each time the rule is tested against a new
9372input record.  If the expression uses fields such as @code{$1}, the
9373value depends directly on the new input record's text; otherwise, it
9374depends on only what has happened so far in the execution of the
9375@command{awk} program.
9376
9377@cindex comparison expressions, as patterns
9378@cindex patterns, comparison expressions as
9379Comparison expressions, using the comparison operators described in
9380@ref{Typing and Comparison},
9381are a very common kind of pattern.
9382Regexp matching and nonmatching are also very common expressions.
9383The left operand of the @samp{~} and @samp{!~} operators is a string.
9384The right operand is either a constant regular expression enclosed in
9385slashes (@code{/@var{regexp}/}), or any expression whose string value
9386is used as a dynamic regular expression
9387(@pxref{Computed Regexps}).
9388The following example prints the second field of each input record
9389whose first field is precisely @samp{foo}:
9390
9391@cindex @code{/} (forward slash), patterns and
9392@cindex forward slash (@code{/}), patterns and
9393@cindex @code{~} (tilde), @code{~} operator
9394@cindex tilde (@code{~}), @code{~} operator
9395@cindex @code{!} (exclamation point), @code{!~} operator
9396@cindex exclamation point (@code{!}), @code{!~} operator
9397@example
9398$ awk '$1 == "foo" @{ print $2 @}' BBS-list
9399@end example
9400
9401@noindent
9402(There is no output, because there is no BBS site with the exact name @samp{foo}.)
9403Contrast this with the following regular expression match, which
9404accepts any record with a first field that contains @samp{foo}:
9405
9406@example
9407$ awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
9408@print{} 555-1234
9409@print{} 555-6699
9410@print{} 555-6480
9411@print{} 555-2127
9412@end example
9413
9414@cindex regexp constants, as patterns
9415@cindex patterns, regexp constants as
9416A regexp constant as a pattern is also a special case of an expression
9417pattern.  The expression @code{/foo/} has the value one if @samp{foo}
9418appears in the current input record. Thus, as a pattern, @code{/foo/}
9419matches any record containing @samp{foo}.
9420
9421@cindex Boolean expressions, as patterns
9422Boolean expressions are also commonly used as patterns.
9423Whether the pattern
9424matches an input record depends on whether its subexpressions match.
9425For example, the following command prints all the records in
9426@file{BBS-list} that contain both @samp{2400} and @samp{foo}:
9427
9428@example
9429$ awk '/2400/ && /foo/' BBS-list
9430@print{} fooey        555-1234     2400/1200/300     B
9431@end example
9432
9433The following command prints all records in
9434@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}
9435(or both, of course):
9436
9437@example
9438$ awk '/2400/ || /foo/' BBS-list
9439@print{} alpo-net     555-3412     2400/1200/300     A
9440@print{} bites        555-1675     2400/1200/300     A
9441@print{} fooey        555-1234     2400/1200/300     B
9442@print{} foot         555-6699     1200/300          B
9443@print{} macfoo       555-6480     1200/300          A
9444@print{} sdace        555-3430     2400/1200/300     A
9445@print{} sabafoo      555-2127     1200/300          C
9446@end example
9447
9448The following command prints all records in
9449@file{BBS-list} that do @emph{not} contain the string @samp{foo}:
9450
9451@example
9452$ awk '! /foo/' BBS-list
9453@print{} aardvark     555-5553     1200/300          B
9454@print{} alpo-net     555-3412     2400/1200/300     A
9455@print{} barfly       555-7685     1200/300          A
9456@print{} bites        555-1675     2400/1200/300     A
9457@print{} camelot      555-0542     300               C
9458@print{} core         555-2912     1200/300          C
9459@print{} sdace        555-3430     2400/1200/300     A
9460@end example
9461
9462@cindex @code{BEGIN} pattern, Boolean patterns and
9463@cindex @code{END} pattern, Boolean patterns and
9464The subexpressions of a Boolean operator in a pattern can be constant regular
9465expressions, comparisons, or any other @command{awk} expressions.  Range
9466patterns are not expressions, so they cannot appear inside Boolean
9467patterns.  Likewise, the special patterns @code{BEGIN} and @code{END},
9468which never match any input record, are not expressions and cannot
9469appear inside Boolean patterns.
9470
9471@node Ranges
9472@subsection Specifying Record Ranges with Patterns
9473
9474@cindex range patterns
9475@cindex patterns, ranges in
9476@cindex lines, matching ranges of
9477@cindex @code{,} (comma), in range patterns
9478@cindex comma (@code{,}), in range patterns
9479A @dfn{range pattern} is made of two patterns separated by a comma, in
9480the form @samp{@var{begpat}, @var{endpat}}.  It is used to match ranges of
9481consecutive input records.  The first pattern, @var{begpat}, controls
9482where the range begins, while @var{endpat} controls where
9483the pattern ends.  For example, the following:
9484
9485@example
9486awk '$1 == "on", $1 == "off"' myfile
9487@end example
9488
9489@noindent
9490prints every record in @file{myfile} between @samp{on}/@samp{off} pairs, inclusive.
9491
9492A range pattern starts out by matching @var{begpat} against every
9493input record.  When a record matches @var{begpat}, the range pattern is
9494@dfn{turned on} and the range pattern matches this record as well.  As long as
9495the range pattern stays turned on, it automatically matches every input
9496record read.  The range pattern also matches @var{endpat} against every
9497input record; when this succeeds, the range pattern is turned off again
9498for the following record.  Then the range pattern goes back to checking
9499@var{begpat} against each record.
9500
9501@c last comma does NOT start a tertiary
9502@cindex @code{if} statement, actions, changing
9503The record that turns on the range pattern and the one that turns it
9504off both match the range pattern.  If you don't want to operate on
9505these records, you can write @code{if} statements in the rule's action
9506to distinguish them from the records you are interested in.
9507
9508It is possible for a pattern to be turned on and off by the same
9509record. If the record satisfies both conditions, then the action is
9510executed for just that record.
9511For example, suppose there is text between two identical markers (e.g.,
9512the @samp{%} symbol), each on its own line, that should be ignored.
9513A first attempt would be to
9514combine a range pattern that describes the delimited text with the
9515@code{next} statement
9516(not discussed yet, @pxref{Next Statement}).
9517This causes @command{awk} to skip any further processing of the current
9518record and start over again with the next input record. Such a program
9519looks like this:
9520
9521@example
9522/^%$/,/^%$/    @{ next @}
9523               @{ print @}
9524@end example
9525
9526@noindent
9527@cindex lines, skipping between markers
9528@c @cindex flag variables
9529This program fails because the range pattern is both turned on and turned off
9530by the first line, which just has a @samp{%} on it.  To accomplish this task,
9531write the program in the following manner, using a flag:
9532
9533@cindex @code{!} operator
9534@example
9535/^%$/     @{ skip = ! skip; next @}
9536skip == 1 @{ next @} # skip lines with `skip' set
9537@end example
9538
9539In a range pattern, the comma (@samp{,}) has the lowest precedence of
9540all the operators (i.e., it is evaluated last).  Thus, the following
9541program attempts to combine a range pattern with another, simpler test:
9542
9543@example
9544echo Yes | awk '/1/,/2/ || /Yes/'
9545@end example
9546
9547The intent of this program is @samp{(/1/,/2/) || /Yes/}.
9548However, @command{awk} interprets this as @samp{/1/, (/2/ || /Yes/)}.
9549This cannot be changed or worked around; range patterns do not combine
9550with other patterns:
9551
9552@example
9553$ echo Yes | gawk '(/1/,/2/) || /Yes/'
9554@error{} gawk: cmd. line:1: (/1/,/2/) || /Yes/
9555@error{} gawk: cmd. line:1:           ^ parse error
9556@error{} gawk: cmd. line:2: (/1/,/2/) || /Yes/
9557@error{} gawk: cmd. line:2:                   ^ unexpected newline
9558@end example
9559
9560@node BEGIN/END
9561@subsection The @code{BEGIN} and @code{END} Special Patterns
9562
9563@c STARTOFRANGE beg
9564@cindex @code{BEGIN} pattern
9565@c STARTOFRANGE end
9566@cindex @code{END} pattern
9567All the patterns described so far are for matching input records.
9568The @code{BEGIN} and @code{END} special patterns are different.
9569They supply startup and cleanup actions for @command{awk} programs.
9570@code{BEGIN} and @code{END} rules must have actions; there is no default
9571action for these rules because there is no current record when they run.
9572@code{BEGIN} and @code{END} rules are often referred to as
9573``@code{BEGIN} and @code{END} blocks'' by long-time @command{awk}
9574programmers.
9575
9576@menu
9577* Using BEGIN/END::             How and why to use BEGIN/END rules.
9578* I/O And BEGIN/END::           I/O issues in BEGIN/END rules.
9579@end menu
9580
9581@node Using BEGIN/END
9582@subsubsection Startup and Cleanup Actions
9583
9584A @code{BEGIN} rule is executed once only, before the first input record
9585is read. Likewise, an @code{END} rule is executed once only, after all the
9586input is read.  For example:
9587
9588@example
9589$ awk '
9590> BEGIN @{ print "Analysis of \"foo\"" @}
9591> /foo/ @{ ++n @}
9592> END   @{ print "\"foo\" appears", n, "times." @}' BBS-list
9593@print{} Analysis of "foo"
9594@print{} "foo" appears 4 times.
9595@end example
9596
9597@cindex @code{BEGIN} pattern, operators and
9598@cindex @code{END} pattern, operators and
9599This program finds the number of records in the input file @file{BBS-list}
9600that contain the string @samp{foo}.  The @code{BEGIN} rule prints a title
9601for the report.  There is no need to use the @code{BEGIN} rule to
9602initialize the counter @code{n} to zero, since @command{awk} does this
9603automatically (@pxref{Variables}).
9604The second rule increments the variable @code{n} every time a
9605record containing the pattern @samp{foo} is read.  The @code{END} rule
9606prints the value of @code{n} at the end of the run.
9607
9608The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
9609or with Boolean operators (indeed, they cannot be used with any operators).
9610An @command{awk} program may have multiple @code{BEGIN} and/or @code{END}
9611rules.  They are executed in the order in which they appear: all the @code{BEGIN}
9612rules at startup and all the @code{END} rules at termination.
9613@code{BEGIN} and @code{END} rules may be intermixed with other rules.
9614This feature was added in the 1987 version of @command{awk} and is included
9615in the POSIX standard.
9616The original (1978) version of @command{awk}
9617required the @code{BEGIN} rule to be placed at the beginning of the
9618program, the @code{END} rule to be placed at the end, and only allowed one of
9619each.
9620This is no longer required, but it is a good idea to follow this template
9621in terms of program organization and readability.
9622
9623Multiple @code{BEGIN} and @code{END} rules are useful for writing
9624library functions, because each library file can have its own @code{BEGIN} and/or
9625@code{END} rule to do its own initialization and/or cleanup.
9626The order in which library functions are named on the command line
9627controls the order in which their @code{BEGIN} and @code{END} rules are
9628executed.  Therefore, you have to be careful when writing such rules in
9629library files so that the order in which they are executed doesn't matter.
9630@xref{Options}, for more information on
9631using library functions.
9632@xref{Library Functions},
9633for a number of useful library functions.
9634
9635If an @command{awk} program has only a @code{BEGIN} rule and no
9636other rules, then the program exits after the @code{BEGIN} rule is
9637run.@footnote{The original version of @command{awk} used to keep
9638reading and ignoring input until the end of the file was seen.}  However, if an
9639@code{END} rule exists, then the input is read, even if there are
9640no other rules in the program.  This is necessary in case the @code{END}
9641rule checks the @code{FNR} and @code{NR} variables.
9642
9643@node I/O And BEGIN/END
9644@subsubsection Input/Output from @code{BEGIN} and @code{END} Rules
9645
9646@cindex input/output, from @code{BEGIN} and @code{END}
9647There are several (sometimes subtle) points to remember when doing I/O
9648from a @code{BEGIN} or @code{END} rule.
9649The first has to do with the value of @code{$0} in a @code{BEGIN}
9650rule.  Because @code{BEGIN} rules are executed before any input is read,
9651there simply is no input record, and therefore no fields, when
9652executing @code{BEGIN} rules.  References to @code{$0} and the fields
9653yield a null string or zero, depending upon the context.  One way
9654to give @code{$0} a real value is to execute a @code{getline} command
9655without a variable (@pxref{Getline}).
9656Another way is simply to assign a value to @code{$0}.
9657
9658@cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns
9659@cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns
9660@cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and
9661@cindex @code{BEGIN} pattern, @code{print} statement and
9662@cindex @code{END} pattern, @code{print} statement and
9663The second point is similar to the first but from the other direction.
9664Traditionally, due largely to implementation issues, @code{$0} and
9665@code{NF} were @emph{undefined} inside an @code{END} rule.
9666The POSIX standard specifies that @code{NF} is available in an @code{END}
9667rule. It contains the number of fields from the last input record.
9668Most probably due to an oversight, the standard does not say that @code{$0}
9669is also preserved, although logically one would think that it should be.
9670In fact, @command{gawk} does preserve the value of @code{$0} for use in
9671@code{END} rules.  Be aware, however, that Unix @command{awk}, and possibly
9672other implementations, do not.
9673
9674The third point follows from the first two.  The meaning of @samp{print}
9675inside a @code{BEGIN} or @code{END} rule is the same as always:
9676@samp{print $0}.  If @code{$0} is the null string, then this prints an
9677empty line.  Many long time @command{awk} programmers use an unadorned
9678@samp{print} in @code{BEGIN} and @code{END} rules, to mean @samp{@w{print ""}},
9679relying on @code{$0} being null.  Although one might generally get away with
9680this in @code{BEGIN} rules, it is a very bad idea in @code{END} rules,
9681at least in @command{gawk}.  It is also poor style, since if an empty
9682line is needed in the output, the program should print one explicitly.
9683
9684@cindex @code{next} statement, @code{BEGIN}/@code{END} patterns and
9685@cindex @code{nextfile} statement, @code{BEGIN}/@code{END} patterns and
9686@cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and
9687@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and
9688Finally, the @code{next} and @code{nextfile} statements are not allowed
9689in a @code{BEGIN} rule, because the implicit
9690read-a-record-and-match-against-the-rules loop has not started yet.  Similarly, those statements
9691are not valid in an @code{END} rule, since all the input has been read.
9692(@xref{Next Statement}, and see
9693@ref{Nextfile Statement}.)
9694@c ENDOFRANGE beg
9695@c ENDOFRANGE end
9696
9697@node Empty
9698@subsection The Empty Pattern
9699
9700@cindex empty pattern
9701@cindex patterns, empty
9702An empty (i.e., nonexistent) pattern is considered to match @emph{every}
9703input record.  For example, the program:
9704
9705@example
9706awk '@{ print $1 @}' BBS-list
9707@end example
9708
9709@noindent
9710prints the first field of every record.
9711@c ENDOFRANGE pat
9712
9713@node Using Shell Variables
9714@section Using Shell Variables in Programs
9715@cindex shells, variables
9716@cindex @command{awk} programs, shell variables in
9717@c @cindex shell and @command{awk} interaction
9718
9719@command{awk} programs are often used as components in larger
9720programs written in shell.
9721For example, it is very common to use a shell variable to
9722hold a pattern that the @command{awk} program searches for.
9723There are two ways to get the value of the shell variable
9724into the body of the @command{awk} program.
9725
9726@cindex shells, quoting
9727The most common method is to use shell quoting to substitute
9728the variable's value into the program inside the script.
9729For example, in the following program:
9730
9731@example
9732echo -n "Enter search pattern: "
9733read pattern
9734awk "/$pattern/ "'@{ nmatches++ @}
9735     END @{ print nmatches, "found" @}' /path/to/data
9736@end example
9737
9738@noindent
9739the @command{awk} program consists of two pieces of quoted text
9740that are concatenated together to form the program.
9741The first part is double-quoted, which allows substitution of
9742the @code{pattern} variable inside the quotes.
9743The second part is single-quoted.
9744
9745Variable substitution via quoting works, but can be potentially
9746messy.  It requires a good understanding of the shell's quoting rules
9747(@pxref{Quoting}),
9748and it's often difficult to correctly
9749match up the quotes when reading the program.
9750
9751A better method is to use @command{awk}'s variable assignment feature
9752(@pxref{Assignment Options})
9753to assign the shell variable's value to an @command{awk} variable's
9754value.  Then use dynamic regexps to match the pattern
9755(@pxref{Computed Regexps}).
9756The following shows how to redo the
9757previous example using this technique:
9758
9759@example
9760echo -n "Enter search pattern: "
9761read pattern
9762awk -v pat="$pattern" '$0 ~ pat @{ nmatches++ @}
9763       END @{ print nmatches, "found" @}' /path/to/data
9764@end example
9765
9766@noindent
9767Now, the @command{awk} program is just one single-quoted string.
9768The assignment @samp{-v pat="$pattern"} still requires double quotes,
9769in case there is whitespace in the value of @code{$pattern}.
9770The @command{awk} variable @code{pat} could be named @code{pattern}
9771too, but that would be more confusing.  Using a variable also
9772provides more flexibility, since the variable can be used anywhere inside
9773the program---for printing, as an array subscript, or for any other
9774use---without requiring the quoting tricks at every point in the program.
9775
9776@node Action Overview
9777@section Actions
9778@c @cindex action, definition of
9779@c @cindex curly braces
9780@c @cindex action, curly braces
9781@c @cindex action, separating statements
9782@cindex actions
9783
9784An @command{awk} program or script consists of a series of
9785rules and function definitions interspersed.  (Functions are
9786described later.  @xref{User-defined}.)
9787A rule contains a pattern and an action, either of which (but not
9788both) may be omitted.  The purpose of the @dfn{action} is to tell
9789@command{awk} what to do once a match for the pattern is found.  Thus,
9790in outline, an @command{awk} program generally looks like this:
9791
9792@example
9793@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
9794@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
9795@dots{}
9796function @var{name}(@var{args}) @{ @dots{} @}
9797@dots{}
9798@end example
9799
9800@cindex @code{@{@}} (braces), actions and
9801@cindex braces (@code{@{@}}), actions and
9802@cindex separators, for statements in actions
9803@cindex newlines, separating statements in actions
9804@cindex @code{;} (semicolon), separating statements in actions
9805@cindex semicolon (@code{;}), separating statements in actions
9806An action consists of one or more @command{awk} @dfn{statements}, enclosed
9807in curly braces (@samp{@{@dots{}@}}).  Each statement specifies one
9808thing to do.  The statements are separated by newlines or semicolons.
9809The curly braces around an action must be used even if the action
9810contains only one statement, or if it contains no statements at
9811all.  However, if you omit the action entirely, omit the curly braces as
9812well.  An omitted action is equivalent to @samp{@{ print $0 @}}:
9813
9814@example
9815/foo/  @{ @}     @i{match @code{foo}, do nothing --- empty action}
9816/foo/          @i{match @code{foo}, print the record --- omitted action}
9817@end example
9818
9819The following types of statements are supported in @command{awk}:
9820
9821@table @asis
9822@cindex side effects, statements
9823@item Expressions
9824Call functions or assign values to variables
9825(@pxref{Expressions}).  Executing
9826this kind of statement simply computes the value of the expression.
9827This is useful when the expression has side effects
9828(@pxref{Assignment Ops}).
9829
9830@item Control statements
9831Specify the control flow of @command{awk}
9832programs.  The @command{awk} language gives you C-like constructs
9833(@code{if}, @code{for}, @code{while}, and @code{do}) as well as a few
9834special ones (@pxref{Statements}).
9835
9836@item Compound statements
9837Consist of one or more statements enclosed in
9838curly braces.  A compound statement is used in order to put several
9839statements together in the body of an @code{if}, @code{while}, @code{do},
9840or @code{for} statement.
9841
9842@item Input statements
9843Use the @code{getline} command
9844(@pxref{Getline}).
9845Also supplied in @command{awk} are the @code{next}
9846statement (@pxref{Next Statement}),
9847and the @code{nextfile} statement
9848(@pxref{Nextfile Statement}).
9849
9850@item Output statements
9851Such as @code{print} and @code{printf}.
9852@xref{Printing}.
9853
9854@item Deletion statements
9855For deleting array elements.
9856@xref{Delete}.
9857@end table
9858
9859@node Statements
9860@section Control Statements in Actions
9861@c STARTOFRANGE csta
9862@cindex control statements
9863@c STARTOFRANGE acs
9864@cindex statements, control, in actions
9865@c STARTOFRANGE accs
9866@cindex actions, control statements in
9867
9868@dfn{Control statements}, such as @code{if}, @code{while}, and so on,
9869control the flow of execution in @command{awk} programs.  Most of the
9870control statements in @command{awk} are patterned on similar statements in C.
9871
9872@c the comma here does NOT start a secondary
9873@cindex compound statements, control statements and
9874@c the second comma here does NOT start a tertiary
9875@cindex statements, compound, control statements and
9876@cindex body, in actions
9877@cindex @code{@{@}} (braces), statements, grouping
9878@cindex braces (@code{@{@}}), statements, grouping
9879@cindex newlines, separating statements in actions
9880@cindex @code{;} (semicolon), separating statements in actions
9881@cindex semicolon (@code{;}), separating statements in actions
9882All the control statements start with special keywords, such as @code{if}
9883and @code{while}, to distinguish them from simple expressions.
9884Many control statements contain other statements.  For example, the
9885@code{if} statement contains another statement that may or may not be
9886executed.  The contained statement is called the @dfn{body}.
9887To include more than one statement in the body, group them into a
9888single @dfn{compound statement} with curly braces, separating them with
9889newlines or semicolons.
9890
9891@menu
9892* If Statement::                Conditionally execute some @command{awk}
9893                                statements.
9894* While Statement::             Loop until some condition is satisfied.
9895* Do Statement::                Do specified action while looping until some
9896                                condition is satisfied.
9897* For Statement::               Another looping statement, that provides
9898                                initialization and increment clauses.
9899* Switch Statement::            Switch/case evaluation for conditional
9900                                execution of statements based on a value.
9901* Break Statement::             Immediately exit the innermost enclosing loop.
9902* Continue Statement::          Skip to the end of the innermost enclosing
9903                                loop.
9904* Next Statement::              Stop processing the current input record.
9905* Nextfile Statement::          Stop processing the current file.
9906* Exit Statement::              Stop execution of @command{awk}.
9907@end menu
9908
9909@node If Statement
9910@subsection The @code{if}-@code{else} Statement
9911
9912@cindex @code{if} statement
9913The @code{if}-@code{else} statement is @command{awk}'s decision-making
9914statement.  It looks like this:
9915
9916@example
9917if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]}
9918@end example
9919
9920@noindent
9921The @var{condition} is an expression that controls what the rest of the
9922statement does.  If the @var{condition} is true, @var{then-body} is
9923executed; otherwise, @var{else-body} is executed.
9924The @code{else} part of the statement is
9925optional.  The condition is considered false if its value is zero or
9926the null string; otherwise, the condition is true.
9927Refer to the following:
9928
9929@example
9930if (x % 2 == 0)
9931    print "x is even"
9932else
9933    print "x is odd"
9934@end example
9935
9936In this example, if the expression @samp{x % 2 == 0} is true (that is,
9937if the value of @code{x} is evenly divisible by two), then the first
9938@code{print} statement is executed; otherwise, the second @code{print}
9939statement is executed.
9940If the @code{else} keyword appears on the same line as @var{then-body} and
9941@var{then-body} is not a compound statement (i.e., not surrounded by
9942curly braces), then a semicolon must separate @var{then-body} from
9943the @code{else}.
9944To illustrate this, the previous example can be rewritten as:
9945
9946@example
9947if (x % 2 == 0) print "x is even"; else
9948        print "x is odd"
9949@end example
9950
9951@noindent
9952If the @samp{;} is left out, @command{awk} can't interpret the statement and
9953it produces a syntax error.  Don't actually write programs this way,
9954because a human reader might fail to see the @code{else} if it is not
9955the first thing on its line.
9956
9957@node While Statement
9958@subsection The @code{while} Statement
9959@cindex @code{while} statement
9960@cindex loops
9961@cindex loops, See Also @code{while} statement
9962
9963In programming, a @dfn{loop} is a part of a program that can
9964be executed two or more times in succession.
9965The @code{while} statement is the simplest looping statement in
9966@command{awk}.  It repeatedly executes a statement as long as a condition is
9967true.  For example:
9968
9969@example
9970while (@var{condition})
9971  @var{body}
9972@end example
9973
9974@cindex body, in loops
9975@noindent
9976@var{body} is a statement called the @dfn{body} of the loop,
9977and @var{condition} is an expression that controls how long the loop
9978keeps running.
9979The first thing the @code{while} statement does is test the @var{condition}.
9980If the @var{condition} is true, it executes the statement @var{body}.
9981@ifinfo
9982(The @var{condition} is true when the value
9983is not zero and not a null string.)
9984@end ifinfo
9985After @var{body} has been executed,
9986@var{condition} is tested again, and if it is still true, @var{body} is
9987executed again.  This process repeats until the @var{condition} is no longer
9988true.  If the @var{condition} is initially false, the body of the loop is
9989never executed and @command{awk} continues with the statement following
9990the loop.
9991This example prints the first three fields of each record, one per line:
9992
9993@example
9994awk '@{ i = 1
9995       while (i <= 3) @{
9996           print $i
9997           i++
9998       @}
9999@}' inventory-shipped
10000@end example
10001
10002@noindent
10003The body of this loop is a compound statement enclosed in braces,
10004containing two statements.
10005The loop works in the following manner: first, the value of @code{i} is set to one.
10006Then, the @code{while} statement tests whether @code{i} is less than or equal to
10007three.  This is true when @code{i} equals one, so the @code{i}-th
10008field is printed.  Then the @samp{i++} increments the value of @code{i}
10009and the loop repeats.  The loop terminates when @code{i} reaches four.
10010
10011A newline is not required between the condition and the
10012body; however using one makes the program clearer unless the body is a
10013compound statement or else is very simple.  The newline after the open-brace
10014that begins the compound statement is not required either, but the
10015program is harder to read without it.
10016
10017@node Do Statement
10018@subsection The @code{do}-@code{while} Statement
10019@cindex @code{do}-@code{while} statement
10020
10021The @code{do} loop is a variation of the @code{while} looping statement.
10022The @code{do} loop executes the @var{body} once and then repeats the
10023@var{body} as long as the @var{condition} is true.  It looks like this:
10024
10025@example
10026do
10027  @var{body}
10028while (@var{condition})
10029@end example
10030
10031Even if the @var{condition} is false at the start, the @var{body} is
10032executed at least once (and only once, unless executing @var{body}
10033makes @var{condition} true).  Contrast this with the corresponding
10034@code{while} statement:
10035
10036@example
10037while (@var{condition})
10038  @var{body}
10039@end example
10040
10041@noindent
10042This statement does not execute @var{body} even once if the @var{condition}
10043is false to begin with.
10044The following is an example of a @code{do} statement:
10045
10046@example
10047@{      i = 1
10048       do @{
10049          print $0
10050          i++
10051       @} while (i <= 10)
10052@}
10053@end example
10054
10055@noindent
10056This program prints each input record 10 times.  However, it isn't a very
10057realistic example, since in this case an ordinary @code{while} would do
10058just as well.  This situation reflects actual experience; only
10059occasionally is there a real use for a @code{do} statement.
10060
10061@node For Statement
10062@subsection The @code{for} Statement
10063@cindex @code{for} statement
10064
10065The @code{for} statement makes it more convenient to count iterations of a
10066loop.  The general form of the @code{for} statement looks like this:
10067
10068@example
10069for (@var{initialization}; @var{condition}; @var{increment})
10070  @var{body}
10071@end example
10072
10073@noindent
10074The @var{initialization}, @var{condition}, and @var{increment} parts are
10075arbitrary @command{awk} expressions, and @var{body} stands for any
10076@command{awk} statement.
10077
10078The @code{for} statement starts by executing @var{initialization}.
10079Then, as long
10080as the @var{condition} is true, it repeatedly executes @var{body} and then
10081@var{increment}.  Typically, @var{initialization} sets a variable to
10082either zero or one, @var{increment} adds one to it, and @var{condition}
10083compares it against the desired number of iterations.
10084For example:
10085
10086@example
10087awk '@{ for (i = 1; i <= 3; i++)
10088          print $i
10089@}' inventory-shipped
10090@end example
10091
10092@noindent
10093This prints the first three fields of each input record, with one field per
10094line.
10095
10096It isn't possible to
10097set more than one variable in the
10098@var{initialization} part without using a multiple assignment statement
10099such as @samp{x = y = 0}. This makes sense only if all the initial values
10100are equal.  (But it is possible to initialize additional variables by writing
10101their assignments as separate statements preceding the @code{for} loop.)
10102
10103@c @cindex comma operator, not supported
10104The same is true of the @var{increment} part. Incrementing additional
10105variables requires separate statements at the end of the loop.
10106The C compound expression, using C's comma operator, is useful in
10107this context but it is not supported in @command{awk}.
10108
10109Most often, @var{increment} is an increment expression, as in the previous
10110example.  But this is not required; it can be any expression
10111whatsoever.  For example, the following statement prints all the powers of two
10112between 1 and 100:
10113
10114@example
10115for (i = 1; i <= 100; i *= 2)
10116  print i
10117@end example
10118
10119If there is nothing to be done, any of the three expressions in the
10120parentheses following the @code{for} keyword may be omitted.  Thus,
10121@w{@samp{for (; x > 0;)}} is equivalent to @w{@samp{while (x > 0)}}.  If the
10122@var{condition} is omitted, it is treated as true, effectively
10123yielding an @dfn{infinite loop} (i.e., a loop that never terminates).
10124
10125In most cases, a @code{for} loop is an abbreviation for a @code{while}
10126loop, as shown here:
10127
10128@example
10129@var{initialization}
10130while (@var{condition}) @{
10131  @var{body}
10132  @var{increment}
10133@}
10134@end example
10135
10136@cindex loops, @code{continue} statements and
10137@noindent
10138The only exception is when the @code{continue} statement
10139(@pxref{Continue Statement}) is used
10140inside the loop. Changing a @code{for} statement to a @code{while}
10141statement in this way can change the effect of the @code{continue}
10142statement inside the loop.
10143
10144The @command{awk} language has a @code{for} statement in addition to a
10145@code{while} statement because a @code{for} loop is often both less work to
10146type and more natural to think of.  Counting the number of iterations is
10147very common in loops.  It can be easier to think of this counting as part
10148of looping rather than as something to do inside the loop.
10149
10150@ifinfo
10151@cindex @code{in} operator
10152There is an alternate version of the @code{for} loop, for iterating over
10153all the indices of an array:
10154
10155@example
10156for (i in array)
10157    @var{do something with} array[i]
10158@end example
10159
10160@noindent
10161@xref{Scanning an Array},
10162for more information on this version of the @code{for} loop.
10163@end ifinfo
10164
10165@node Switch Statement
10166@subsection The @code{switch} Statement
10167@cindex @code{switch} statement
10168@cindex @code{case} keyword
10169@cindex @code{default} keyword
10170
10171@strong{NOTE:} This @value{SUBSECTION} describes an experimental feature
10172added in @command{gawk} 3.1.3.  It is @emph{not} enabled by default. To
10173enable it, use the @option{--enable-switch} option to @command{configure}
10174when @command{gawk} is being configured and built.
10175@xref{Additional Configuration Options},
10176for more information.
10177
10178The @code{switch} statement allows the evaluation of an expression and
10179the execution of statements based on a @code{case} match. Case statements
10180are checked for a match in the order they are defined.  If no suitable
10181@code{case} is found, the @code{default} section is executed, if supplied. The
10182general form of the @code{switch} statement looks like this:
10183
10184@example
10185switch (@var{expression}) @{
10186case @var{value or regular expression}:
10187    @var{case-body}
10188default:
10189    @var{default-body}
10190@}
10191@end example
10192
10193The @code{switch} statement works as it does in C. Once a match to a given
10194case is made, case statement bodies are executed until a @code{break},
10195@code{continue}, @code{next}, @code{nextfile}  or @code{exit} is encountered,
10196or the end of the @code{switch} statement itself. For example:
10197
10198@example
10199switch (NR * 2 + 1) @{
10200case 3:
10201case "11":
10202    print NR - 1
10203    break
10204
10205case /2[[:digit:]]+/:
10206    print NR
10207
10208default:
10209    print NR + 1
10210
10211case -1:
10212    print NR * -1
10213@}
10214@end example
10215
10216Note that if none of the statements specified above halt execution
10217of a matched @code{case} statement, execution falls through to the
10218next @code{case} until execution halts. In the above example, for
10219any case value starting with @samp{2} followed by one or more digits,
10220the @code{print} statement is executed and then falls through into the
10221@code{default} section, executing its @code{print} statement. In turn,
10222the @minus{}1 case will also be executed since the @code{default} does
10223not halt execution.
10224
10225@node Break Statement
10226@subsection The @code{break} Statement
10227@cindex @code{break} statement
10228@cindex loops, exiting
10229
10230The @code{break} statement jumps out of the innermost @code{for},
10231@code{while}, or @code{do} loop that encloses it.  The following example
10232finds the smallest divisor of any integer, and also identifies prime
10233numbers:
10234
10235@example
10236# find smallest divisor of num
10237@{
10238   num = $1
10239   for (div = 2; div*div <= num; div++)
10240     if (num % div == 0)
10241       break
10242   if (num % div == 0)
10243     printf "Smallest divisor of %d is %d\n", num, div
10244   else
10245     printf "%d is prime\n", num
10246@}
10247@end example
10248
10249When the remainder is zero in the first @code{if} statement, @command{awk}
10250immediately @dfn{breaks out} of the containing @code{for} loop.  This means
10251that @command{awk} proceeds immediately to the statement following the loop
10252and continues processing.  (This is very different from the @code{exit}
10253statement, which stops the entire @command{awk} program.
10254@xref{Exit Statement}.)
10255
10256Th following program illustrates how the @var{condition} of a @code{for}
10257or @code{while} statement could be replaced with a @code{break} inside
10258an @code{if}:
10259
10260@example
10261# find smallest divisor of num
10262@{
10263  num = $1
10264  for (div = 2; ; div++) @{
10265    if (num % div == 0) @{
10266      printf "Smallest divisor of %d is %d\n", num, div
10267      break
10268    @}
10269    if (div*div > num) @{
10270      printf "%d is prime\n", num
10271      break
10272    @}
10273  @}
10274@}
10275@end example
10276
10277@c @cindex @code{break}, outside of loops
10278@c @cindex historical features
10279@c @cindex @command{awk} language, POSIX version
10280@cindex POSIX @command{awk}, @code{break} statement and
10281@cindex dark corner, @code{break} statement
10282@cindex @command{gawk}, @code{break} statement in
10283The @code{break} statement has no meaning when
10284used outside the body of a loop.  However, although it was never documented,
10285historical implementations of @command{awk} treated the @code{break}
10286statement outside of a loop as if it were a @code{next} statement
10287(@pxref{Next Statement}).
10288Recent versions of Unix @command{awk} no longer allow this usage.
10289@command{gawk} supports this use of @code{break} only
10290if @option{--traditional}
10291has been specified on the command line
10292(@pxref{Options}).
10293Otherwise, it is treated as an error, since the POSIX standard
10294specifies that @code{break} should only be used inside the body of a
10295loop.
10296@value{DARKCORNER}
10297
10298@node Continue Statement
10299@subsection The @code{continue} Statement
10300
10301@cindex @code{continue} statement
10302As with @code{break}, the @code{continue} statement is used only inside
10303@code{for}, @code{while}, and @code{do} loops.  It skips
10304over the rest of the loop body, causing the next cycle around the loop
10305to begin immediately.  Contrast this with @code{break}, which jumps out
10306of the loop altogether.
10307
10308The @code{continue} statement in a @code{for} loop directs @command{awk} to
10309skip the rest of the body of the loop and resume execution with the
10310increment-expression of the @code{for} statement.  The following program
10311illustrates this fact:
10312
10313@example
10314BEGIN @{
10315     for (x = 0; x <= 20; x++) @{
10316         if (x == 5)
10317             continue
10318         printf "%d ", x
10319     @}
10320     print ""
10321@}
10322@end example
10323
10324@noindent
10325This program prints all the numbers from 0 to 20---except for 5, for
10326which the @code{printf} is skipped.  Because the increment @samp{x++}
10327is not skipped, @code{x} does not remain stuck at 5.  Contrast the
10328@code{for} loop from the previous example with the following @code{while} loop:
10329
10330@example
10331BEGIN @{
10332     x = 0
10333     while (x <= 20) @{
10334         if (x == 5)
10335             continue
10336         printf "%d ", x
10337         x++
10338     @}
10339     print ""
10340@}
10341@end example
10342
10343@noindent
10344This program loops forever once @code{x} reaches 5.
10345
10346@c @cindex @code{continue}, outside of loops
10347@c @cindex historical features
10348@c @cindex @command{awk} language, POSIX version
10349@cindex POSIX @command{awk}, @code{continue} statement and
10350@cindex dark corner, @code{continue} statement
10351@cindex @command{gawk}, @code{continue} statement in
10352The @code{continue} statement has no meaning when used outside the body of
10353a loop.  Historical versions of @command{awk} treated a @code{continue}
10354statement outside a loop the same way they treated a @code{break}
10355statement outside a loop: as if it were a @code{next}
10356statement
10357(@pxref{Next Statement}).
10358Recent versions of Unix @command{awk} no longer work this way, and
10359@command{gawk} allows it only if @option{--traditional} is specified on
10360the command line (@pxref{Options}).  Just like the
10361@code{break} statement, the POSIX standard specifies that @code{continue}
10362should only be used inside the body of a loop.
10363@value{DARKCORNER}
10364
10365@node Next Statement
10366@subsection The @code{next} Statement
10367@cindex @code{next} statement
10368
10369The @code{next} statement forces @command{awk} to immediately stop processing
10370the current record and go on to the next record.  This means that no
10371further rules are executed for the current record, and the rest of the
10372current rule's action isn't executed.
10373
10374Contrast this with the effect of the @code{getline} function
10375(@pxref{Getline}).  That also causes
10376@command{awk} to read the next record immediately, but it does not alter the
10377flow of control in any way (i.e., the rest of the current action executes
10378with a new input record).
10379
10380@cindex @command{awk} programs, execution of
10381At the highest level, @command{awk} program execution is a loop that reads
10382an input record and then tests each rule's pattern against it.  If you
10383think of this loop as a @code{for} statement whose body contains the
10384rules, then the @code{next} statement is analogous to a @code{continue}
10385statement. It skips to the end of the body of this implicit loop and
10386executes the increment (which reads another record).
10387
10388For example, suppose an @command{awk} program works only on records
10389with four fields, and it shouldn't fail when given bad input.  To avoid
10390complicating the rest of the program, write a ``weed out'' rule near
10391the beginning, in the following manner:
10392
10393@example
10394NF != 4 @{
10395  err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
10396  print err > "/dev/stderr"
10397  next
10398@}
10399@end example
10400
10401@noindent
10402Because of the @code{next} statement,
10403the program's subsequent rules won't see the bad record.  The error
10404message is redirected to the standard error output stream, as error
10405messages should be.
10406For more detail see
10407@ref{Special Files}.
10408
10409@c @cindex @command{awk} language, POSIX version
10410@c @cindex @code{next}, inside a user-defined function
10411@cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and
10412@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and
10413@cindex POSIX @command{awk}, @code{next}/@code{nextfile} statements and
10414@cindex @code{next} statement, user-defined functions and
10415@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
10416According to the POSIX standard, the behavior is undefined if
10417the @code{next} statement is used in a @code{BEGIN} or @code{END} rule.
10418@command{gawk} treats it as a syntax error.
10419Although POSIX permits it,
10420some other @command{awk} implementations don't allow the @code{next}
10421statement inside function bodies
10422(@pxref{User-defined}).
10423Just as with any other @code{next} statement, a @code{next} statement inside a
10424function body reads the next record and starts processing it with the
10425first rule in the program.
10426If the @code{next} statement causes the end of the input to be reached,
10427then the code in any @code{END} rules is executed.
10428@xref{BEGIN/END}.
10429
10430@node Nextfile Statement
10431@subsection Using @command{gawk}'s @code{nextfile} Statement
10432@cindex @code{nextfile} statement
10433@cindex differences in @command{awk} and @command{gawk}, @code{next}/@code{nextfile} statements
10434
10435@command{gawk} provides the @code{nextfile} statement,
10436which is similar to the @code{next} statement.
10437However, instead of abandoning processing of the current record, the
10438@code{nextfile} statement instructs @command{gawk} to stop processing the
10439current @value{DF}.
10440
10441The @code{nextfile} statement is a @command{gawk} extension.
10442In most other @command{awk} implementations,
10443or if @command{gawk} is in compatibility mode
10444(@pxref{Options}),
10445@code{nextfile} is not special.
10446
10447Upon execution of the @code{nextfile} statement, @code{FILENAME} is
10448updated to the name of the next @value{DF} listed on the command line,
10449@code{FNR} is reset to one, @code{ARGIND} is incremented, and processing
10450starts over with the first rule in the program.
10451(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.)
10452If the @code{nextfile} statement causes the end of the input to be reached,
10453then the code in any @code{END} rules is executed.
10454@xref{BEGIN/END}.
10455
10456The @code{nextfile} statement is useful when there are many @value{DF}s
10457to process but it isn't necessary to process every record in every file.
10458Normally, in order to move on to the next @value{DF}, a program
10459has to continue scanning the unwanted records.  The @code{nextfile}
10460statement accomplishes this much more efficiently.
10461
10462While one might think that @samp{close(FILENAME)} would accomplish
10463the same as @code{nextfile}, this isn't true.  @code{close} is
10464reserved for closing files, pipes, and coprocesses that are
10465opened with redirections.  It is not related to the main processing that
10466@command{awk} does with the files listed in @code{ARGV}.
10467
10468If it's necessary to use an @command{awk} version that doesn't support
10469@code{nextfile}, see
10470@ref{Nextfile Function},
10471for a user-defined function that simulates the @code{nextfile}
10472statement.
10473
10474@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
10475@cindex @code{nextfile} statement, user-defined functions and
10476The current version of the Bell Laboratories @command{awk}
10477(@pxref{Other Versions})
10478also supports @code{nextfile}.  However, it doesn't allow the @code{nextfile}
10479statement inside function bodies
10480(@pxref{User-defined}).
10481@command{gawk} does; a @code{nextfile} inside a
10482function body reads the next record and starts processing it with the
10483first rule in the program, just as any other @code{nextfile} statement.
10484
10485@cindex @code{next file} statement, in @command{gawk}
10486@cindex @command{gawk}, @code{next file} statement in
10487@cindex @code{nextfile} statement, in @command{gawk}
10488@cindex @command{gawk}, @code{nextfile} statement in
10489@strong{Caution:}  Versions of @command{gawk} prior to 3.0 used two
10490words (@samp{next file}) for the @code{nextfile} statement.
10491In @value{PVERSION} 3.0, this was changed
10492to one word, because the treatment of @samp{file} was
10493inconsistent. When it appeared after @code{next}, @samp{file} was a keyword;
10494otherwise, it was a regular identifier.  The old usage is no longer
10495accepted; @samp{next file} generates a syntax error.
10496
10497@node Exit Statement
10498@subsection The @code{exit} Statement
10499
10500@cindex @code{exit} statement
10501The @code{exit} statement causes @command{awk} to immediately stop
10502executing the current rule and to stop processing input; any remaining input
10503is ignored.  The @code{exit} statement is written as follows:
10504
10505@example
10506exit @r{[}@var{return code}@r{]}
10507@end example
10508
10509@cindex @code{BEGIN} pattern, @code{exit} statement and
10510@cindex @code{END} pattern, @code{exit} statement and
10511When an @code{exit} statement is executed from a @code{BEGIN} rule, the
10512program stops processing everything immediately.  No input records are
10513read.  However, if an @code{END} rule is present,
10514as part of executing the @code{exit} statement,
10515the @code{END} rule is executed
10516(@pxref{BEGIN/END}).
10517If @code{exit} is used as part of an @code{END} rule, it causes
10518the program to stop immediately.
10519
10520An @code{exit} statement that is not part of a @code{BEGIN} or @code{END}
10521rule stops the execution of any further automatic rules for the current
10522record, skips reading any remaining input records, and executes the
10523@code{END} rule if there is one.
10524
10525In such a case,
10526if you don't want the @code{END} rule to do its job, set a variable
10527to nonzero before the @code{exit} statement and check that variable in
10528the @code{END} rule.
10529@xref{Assert Function},
10530for an example that does this.
10531
10532@cindex dark corner, @code{exit} statement
10533If an argument is supplied to @code{exit}, its value is used as the exit
10534status code for the @command{awk} process.  If no argument is supplied,
10535@code{exit} returns status zero (success).  In the case where an argument
10536is supplied to a first @code{exit} statement, and then @code{exit} is
10537called a second time from an @code{END} rule with no argument,
10538@command{awk} uses the previously supplied exit value.
10539@value{DARKCORNER}
10540
10541@cindex programming conventions, @code{exit} statement
10542For example, suppose an error condition occurs that is difficult or
10543impossible to handle.  Conventionally, programs report this by
10544exiting with a nonzero status.  An @command{awk} program can do this
10545using an @code{exit} statement with a nonzero argument, as shown
10546in the following example:
10547
10548@example
10549BEGIN @{
10550       if (("date" | getline date_now) <= 0) @{
10551         print "Can't get system date" > "/dev/stderr"
10552         exit 1
10553       @}
10554       print "current date is", date_now
10555       close("date")
10556@}
10557@end example
10558@c ENDOFRANGE csta
10559@c ENDOFRANGE acs
10560@c ENDOFRANGE accs
10561
10562@node Built-in Variables
10563@section Built-in Variables
10564@c STARTOFRANGE bvar
10565@cindex built-in variables
10566@c STARTOFRANGE varb
10567@cindex variables, built-in
10568
10569Most @command{awk} variables are available to use for your own
10570purposes; they never change unless your program assigns values to
10571them, and they never affect anything unless your program examines them.
10572However, a few variables in @command{awk} have special built-in meanings.
10573@command{awk} examines some of these automatically, so that they enable you
10574to tell @command{awk} how to do certain things.  Others are set
10575automatically by @command{awk}, so that they carry information from the
10576internal workings of @command{awk} to your program.
10577
10578@cindex @command{gawk}, built-in variables and
10579This @value{SECTION} documents all the built-in variables of
10580@command{gawk}, most of which are also documented in the chapters
10581describing their areas of activity.
10582
10583@menu
10584* User-modified::               Built-in variables that you change to control
10585                                @command{awk}.
10586* Auto-set::                    Built-in variables where @command{awk} gives
10587                                you information.
10588* ARGC and ARGV::               Ways to use @code{ARGC} and @code{ARGV}.
10589@end menu
10590
10591@node User-modified
10592@subsection Built-in Variables That Control @command{awk}
10593@c STARTOFRANGE bvaru
10594@cindex built-in variables, user-modifiable
10595@c STARTOFRANGE nmbv
10596@cindex user-modifiable variables
10597
10598The following is an alphabetical list of variables that you can change to
10599control how @command{awk} does certain things. The variables that are
10600specific to @command{gawk} are marked with a pound sign@w{ (@samp{#}).}
10601
10602@table @code
10603@cindex @code{BINMODE} variable
10604@cindex binary input/output
10605@cindex input/output, binary
10606@item BINMODE #
10607On non-POSIX systems, this variable specifies use of binary mode for all I/O.
10608Numeric values of one, two, or three specify that input files, output files, or
10609all files, respectively, should use binary I/O.
10610Alternatively,
10611string values of @code{"r"} or @code{"w"} specify that input files and
10612output files, respectively, should use binary I/O.
10613A string value of @code{"rw"} or @code{"wr"} indicates that all
10614files should use binary I/O.
10615Any other string value is equivalent to @code{"rw"}, but @command{gawk}
10616generates a warning message.
10617@code{BINMODE} is described in more detail in
10618@ref{PC Using}.
10619
10620@cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable
10621This variable is a @command{gawk} extension.
10622In other @command{awk} implementations
10623(except @command{mawk},
10624@pxref{Other Versions}),
10625or if @command{gawk} is in compatibility mode
10626(@pxref{Options}),
10627it is not special.
10628
10629@cindex @code{CONVFMT} variable
10630@cindex POSIX @command{awk}, @code{CONVFMT} variable and
10631@cindex numbers, converting, to strings
10632@cindex strings, converting, numbers to
10633@item CONVFMT
10634This string controls conversion of numbers to
10635strings (@pxref{Conversion}).
10636It works by being passed, in effect, as the first argument to the
10637@code{sprintf} function
10638(@pxref{String Functions}).
10639Its default value is @code{"%.6g"}.
10640@code{CONVFMT} was introduced by the POSIX standard.
10641
10642@cindex @code{FIELDWIDTHS} variable
10643@cindex differences in @command{awk} and @command{gawk}, @code{FIELDWIDTHS} variable
10644@cindex field separators, @code{FIELDWIDTHS} variable and
10645@cindex separators, field, @code{FIELDWIDTHS} variable and
10646@item FIELDWIDTHS #
10647This is a space-separated list of columns that tells @command{gawk}
10648how to split input with fixed columnar boundaries.
10649Assigning a value to @code{FIELDWIDTHS}
10650overrides the use of @code{FS} for field splitting.
10651@xref{Constant Size}, for more information.
10652
10653@cindex @command{gawk}, @code{FIELDWIDTHS} variable in
10654If @command{gawk} is in compatibility mode
10655(@pxref{Options}), then @code{FIELDWIDTHS}
10656has no special meaning, and field-splitting operations occur based
10657exclusively on the value of @code{FS}.
10658
10659@cindex @code{FS} variable
10660@cindex separators, field
10661@cindex field separators
10662@item FS
10663This is the input field separator
10664(@pxref{Field Separators}).
10665The value is a single-character string or a multi-character regular
10666expression that matches the separations between fields in an input
10667record.  If the value is the null string (@code{""}), then each
10668character in the record becomes a separate field.
10669(This behavior is a @command{gawk} extension. POSIX @command{awk} does not
10670specify the behavior when @code{FS} is the null string.)
10671@c NEXT ED: Mark as common extension
10672
10673@cindex POSIX @command{awk}, @code{FS} variable and
10674The default value is @w{@code{" "}}, a string consisting of a single
10675space.  As a special exception, this value means that any
10676sequence of spaces, tabs, and/or newlines is a single separator.@footnote{In
10677POSIX @command{awk}, newline does not count as whitespace.}  It also causes
10678spaces, tabs, and newlines at the beginning and end of a record to be ignored.
10679
10680You can set the value of @code{FS} on the command line using the
10681@option{-F} option:
10682
10683@example
10684awk -F, '@var{program}' @var{input-files}
10685@end example
10686
10687@cindex @command{gawk}, field separators and
10688If @command{gawk} is using @code{FIELDWIDTHS} for field splitting,
10689assigning a value to @code{FS} causes @command{gawk} to return to
10690the normal, @code{FS}-based field splitting. An easy way to do this
10691is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
10692
10693@cindex @code{IGNORECASE} variable
10694@cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable
10695@cindex case sensitivity, string comparisons and
10696@cindex case sensitivity, regexps and
10697@cindex regular expressions, case sensitivity
10698@item IGNORECASE #
10699If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
10700and all regular expression matching are case independent.  Thus, regexp
10701matching with @samp{~} and @samp{!~}, as well as the @code{gensub},
10702@code{gsub}, @code{index}, @code{match}, @code{split}, and @code{sub}
10703functions, record termination with @code{RS}, and field splitting with
10704@code{FS}, all ignore case when doing their particular regexp operations.
10705However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting
10706and it does not affect field splitting when using a single-character
10707field separator.
10708@xref{Case-sensitivity}.
10709
10710@cindex @command{gawk}, @code{IGNORECASE} variable in
10711If @command{gawk} is in compatibility mode
10712(@pxref{Options}),
10713then @code{IGNORECASE} has no special meaning.  Thus, string
10714and regexp operations are always case-sensitive.
10715
10716@cindex @code{LINT} variable
10717@cindex differences in @command{awk} and @command{gawk}, @code{LINT} variable
10718@cindex lint checking
10719@item LINT #
10720When this variable is true (nonzero or non-null), @command{gawk}
10721behaves as if the @option{--lint} command-line option is in effect.
10722(@pxref{Options}).
10723With a value of @code{"fatal"}, lint warnings become fatal errors.
10724With a value of @code{"invalid"}, only warnings about things that are
10725actually invalid are issued. (This is not fully implemented yet.)
10726Any other true value prints nonfatal warnings.
10727Assigning a false value to @code{LINT} turns off the lint warnings.
10728
10729@cindex @command{gawk}, @code{LINT} variable in
10730This variable is a @command{gawk} extension.  It is not special
10731in other @command{awk} implementations.  Unlike the other special variables,
10732changing @code{LINT} does affect the production of lint warnings,
10733even if @command{gawk} is in compatibility mode.  Much as
10734the @option{--lint} and @option{--traditional} options independently
10735control different aspects of @command{gawk}'s behavior, the control
10736of lint warnings during program execution is independent of the flavor
10737of @command{awk} being executed.
10738
10739@cindex @code{OFMT} variable
10740@cindex numbers, converting, to strings
10741@cindex strings, converting, numbers to
10742@item OFMT
10743This string controls conversion of numbers to
10744strings (@pxref{Conversion}) for
10745printing with the @code{print} statement.  It works by being passed
10746as the first argument to the @code{sprintf} function
10747(@pxref{String Functions}).
10748Its default value is @code{"%.6g"}.  Earlier versions of @command{awk}
10749also used @code{OFMT} to specify the format for converting numbers to
10750strings in general expressions; this is now done by @code{CONVFMT}.
10751
10752@cindex @code{sprintf} function, @code{OFMT} variable and
10753@cindex @code{print} statement, @code{OFMT} variable and
10754@cindex @code{OFS} variable
10755@cindex separators, field
10756@cindex field separators
10757@item OFS
10758This is the output field separator (@pxref{Output Separators}).  It is
10759output between the fields printed by a @code{print} statement.  Its
10760default value is @w{@code{" "}}, a string consisting of a single space.
10761
10762@cindex @code{ORS} variable
10763@item ORS
10764This is the output record separator.  It is output at the end of every
10765@code{print} statement.  Its default value is @code{"\n"}, the newline
10766character.  (@xref{Output Separators}.)
10767
10768@cindex @code{RS} variable
10769@cindex separators, record
10770@cindex record separators
10771@item RS
10772This is @command{awk}'s input record separator.  Its default value is a string
10773containing a single newline character, which means that an input record
10774consists of a single line of text.
10775It can also be the null string, in which case records are separated by
10776runs of blank lines.
10777If it is a regexp, records are separated by
10778matches of the regexp in the input text.
10779(@xref{Records}.)
10780
10781The ability for @code{RS} to be a regular expression
10782is a @command{gawk} extension.
10783In most other @command{awk} implementations,
10784or if @command{gawk} is in compatibility mode
10785(@pxref{Options}),
10786just the first character of @code{RS}'s value is used.
10787
10788@cindex @code{SUBSEP} variable
10789@cindex separators, subscript
10790@cindex subscript separators
10791@item SUBSEP
10792This is the subscript separator.  It has the default value of
10793@code{"\034"} and is used to separate the parts of the indices of a
10794multidimensional array.  Thus, the expression @code{@w{foo["A", "B"]}}
10795really accesses @code{foo["A\034B"]}
10796(@pxref{Multi-dimensional}).
10797
10798@cindex @code{TEXTDOMAIN} variable
10799@cindex differences in @command{awk} and @command{gawk}, @code{TEXTDOMAIN} variable
10800@cindex internationalization, localization
10801@item TEXTDOMAIN #
10802This variable is used for internationalization of programs at the
10803@command{awk} level.  It sets the default text domain for specially
10804marked string constants in the source text, as well as for the
10805@code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} functions
10806(@pxref{Internationalization}).
10807The default value of @code{TEXTDOMAIN} is @code{"messages"}.
10808
10809This variable is a @command{gawk} extension.
10810In other @command{awk} implementations,
10811or if @command{gawk} is in compatibility mode
10812(@pxref{Options}),
10813it is not special.
10814@end table
10815@c ENDOFRANGE bvar
10816@c ENDOFRANGE varb
10817@c ENDOFRANGE bvaru
10818@c ENDOFRANGE nmbv
10819
10820@node Auto-set
10821@subsection Built-in Variables That Convey Information
10822
10823@c STARTOFRANGE bvconi
10824@cindex built-in variables, conveying information
10825@c STARTOFRANGE vbconi
10826@cindex variables, built-in, conveying information
10827The following is an alphabetical list of variables that @command{awk}
10828sets automatically on certain occasions in order to provide
10829information to your program.  The variables that are specific to
10830@command{gawk} are marked with a pound sign@w{ (@samp{#}).}
10831
10832@table @code
10833@cindex @code{ARGC}/@code{ARGV} variables
10834@cindex arguments, command-line
10835@cindex command line, arguments
10836@item ARGC@r{,} ARGV
10837The command-line arguments available to @command{awk} programs are stored in
10838an array called @code{ARGV}.  @code{ARGC} is the number of command-line
10839arguments present.  @xref{Other Arguments}.
10840Unlike most @command{awk} arrays,
10841@code{ARGV} is indexed from 0 to @code{ARGC} @minus{} 1.
10842In the following example:
10843
10844@example
10845$ awk 'BEGIN @{
10846>         for (i = 0; i < ARGC; i++)
10847>             print ARGV[i]
10848>      @}' inventory-shipped BBS-list
10849@print{} awk
10850@print{} inventory-shipped
10851@print{} BBS-list
10852@end example
10853
10854@noindent
10855@code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
10856contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
10857@code{"BBS-list"}.  The value of @code{ARGC} is three, one more than the
10858index of the last element in @code{ARGV}, because the elements are numbered
10859from zero.
10860
10861@cindex programming conventions, @code{ARGC}/@code{ARGV} variables
10862The names @code{ARGC} and @code{ARGV}, as well as the convention of indexing
10863the array from 0 to @code{ARGC} @minus{} 1, are derived from the C language's
10864method of accessing command-line arguments.
10865
10866The value of @code{ARGV[0]} can vary from system to system.
10867Also, you should note that the program text is @emph{not} included in
10868@code{ARGV}, nor are any of @command{awk}'s command-line options.
10869@xref{ARGC and ARGV}, for information
10870about how @command{awk} uses these variables.
10871
10872@cindex @code{ARGIND} variable
10873@cindex differences in @command{awk} and @command{gawk}, @code{ARGIND} variable
10874@item ARGIND #
10875The index in @code{ARGV} of the current file being processed.
10876Every time @command{gawk} opens a new @value{DF} for processing, it sets
10877@code{ARGIND} to the index in @code{ARGV} of the @value{FN}.
10878When @command{gawk} is processing the input files,
10879@samp{FILENAME == ARGV[ARGIND]} is always true.
10880
10881@c comma before ARGIND does NOT mark a tertiary
10882@cindex files, processing, @code{ARGIND} variable and
10883This variable is useful in file processing; it allows you to tell how far
10884along you are in the list of @value{DF}s as well as to distinguish between
10885successive instances of the same @value{FN} on the command line.
10886
10887@cindex @value{FN}s, distinguishing
10888While you can change the value of @code{ARGIND} within your @command{awk}
10889program, @command{gawk} automatically sets it to a new value when the
10890next file is opened.
10891
10892This variable is a @command{gawk} extension.
10893In other @command{awk} implementations,
10894or if @command{gawk} is in compatibility mode
10895(@pxref{Options}),
10896it is not special.
10897
10898@cindex @code{ENVIRON} variable
10899@cindex environment variables
10900@item ENVIRON
10901An associative array that contains the values of the environment.  The array
10902indices are the environment variable names; the elements are the values of
10903the particular environment variables.  For example,
10904@code{ENVIRON["HOME"]} might be @file{/home/arnold}.  Changing this array
10905does not affect the environment passed on to any programs that
10906@command{awk} may spawn via redirection or the @code{system} function.
10907@c (In a future version of @command{gawk}, it may do so.)
10908
10909Some operating systems may not have environment variables.
10910On such systems, the @code{ENVIRON} array is empty (except for
10911@w{@code{ENVIRON["AWKPATH"]}},
10912@pxref{AWKPATH Variable}).
10913
10914@cindex @code{ERRNO} variable
10915@cindex differences in @command{awk} and @command{gawk}, @code{ERRNO} variable
10916@cindex error handling, @code{ERRNO} variable and
10917@item ERRNO #
10918If a system error occurs during a redirection for @code{getline},
10919during a read for @code{getline}, or during a @code{close} operation,
10920then @code{ERRNO} contains a string describing the error.
10921
10922This variable is a @command{gawk} extension.
10923In other @command{awk} implementations,
10924or if @command{gawk} is in compatibility mode
10925(@pxref{Options}),
10926it is not special.
10927
10928@cindex @code{FILENAME} variable
10929@cindex dark corner, @code{FILENAME} variable
10930@item FILENAME
10931The name of the file that @command{awk} is currently reading.
10932When no @value{DF}s are listed on the command line, @command{awk} reads
10933from the standard input and @code{FILENAME} is set to @code{"-"}.
10934@code{FILENAME} is changed each time a new file is read
10935(@pxref{Reading Files}).
10936Inside a @code{BEGIN} rule, the value of @code{FILENAME} is
10937@code{""}, since there are no input files being processed
10938yet.@footnote{Some early implementations of Unix @command{awk} initialized
10939@code{FILENAME} to @code{"-"}, even if there were @value{DF}s to be
10940processed. This behavior was incorrect and should not be relied
10941upon in your programs.}
10942@value{DARKCORNER}
10943Note, though, that using @code{getline}
10944(@pxref{Getline})
10945inside a @code{BEGIN} rule can give
10946@code{FILENAME} a value.
10947
10948@cindex @code{FNR} variable
10949@item FNR
10950The current record number in the current file.  @code{FNR} is
10951incremented each time a new record is read
10952(@pxref{Getline}).  It is reinitialized
10953to zero each time a new input file is started.
10954
10955@cindex @code{NF} variable
10956@item NF
10957The number of fields in the current input record.
10958@code{NF} is set each time a new record is read, when a new field is
10959created or when @code{$0} changes (@pxref{Fields}).
10960
10961Unlike most of the variables described in this
10962@ifnotinfo
10963section,
10964@end ifnotinfo
10965@ifinfo
10966node,
10967@end ifinfo
10968assigning a value to @code{NF} has the potential to affect
10969@command{awk}'s internal workings.  In particular, assignments
10970to @code{NF} can be used to create or remove fields from the
10971current record: @xref{Changing Fields}.
10972
10973@cindex @code{NR} variable
10974@item NR
10975The number of input records @command{awk} has processed since
10976the beginning of the program's execution
10977(@pxref{Records}).
10978@code{NR} is incremented each time a new record is read.
10979
10980@cindex @code{PROCINFO} array
10981@cindex differences in @command{awk} and @command{gawk}, @code{PROCINFO} array
10982@item PROCINFO #
10983The elements of this array provide access to information about the
10984running @command{awk} program.
10985The following elements (listed alphabetically)
10986are guaranteed to be available:
10987
10988@table @code
10989@item PROCINFO["egid"]
10990The value of the @code{getegid} system call.
10991
10992@item PROCINFO["euid"]
10993The value of the @code{geteuid} system call.
10994
10995@item PROCINFO["FS"]
10996This is
10997@code{"FS"} if field splitting with @code{FS} is in effect, or it is
10998@code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect.
10999
11000@item PROCINFO["gid"]
11001The value of the @code{getgid} system call.
11002
11003@item PROCINFO["pgrpid"]
11004The process group ID of the current process.
11005
11006@item PROCINFO["pid"]
11007The process ID of the current process.
11008
11009@item PROCINFO["ppid"]
11010The parent process ID of the current process.
11011
11012@item PROCINFO["uid"]
11013The value of the @code{getuid} system call.
11014@end table
11015
11016On some systems, there may be elements in the array, @code{"group1"}
11017through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of
11018supplementary groups that the process has.  Use the @code{in} operator
11019to test for these elements
11020(@pxref{Reference to Elements}).
11021
11022This array is a @command{gawk} extension.
11023In other @command{awk} implementations,
11024or if @command{gawk} is in compatibility mode
11025(@pxref{Options}),
11026it is not special.
11027
11028@cindex @code{RLENGTH} variable
11029@item RLENGTH
11030The length of the substring matched by the
11031@code{match} function
11032(@pxref{String Functions}).
11033@code{RLENGTH} is set by invoking the @code{match} function.  Its value
11034is the length of the matched string, or @minus{}1 if no match is found.
11035
11036@cindex @code{RSTART} variable
11037@item RSTART
11038The start-index in characters of the substring that is matched by the
11039@code{match} function
11040(@pxref{String Functions}).
11041@code{RSTART} is set by invoking the @code{match} function.  Its value
11042is the position of the string where the matched substring starts, or zero
11043if no match was found.
11044
11045@cindex @code{RT} variable
11046@cindex differences in @command{awk} and @command{gawk}, @code{RT} variable
11047@item RT #
11048This is set each time a record is read. It contains the input text
11049that matched the text denoted by @code{RS}, the record separator.
11050
11051This variable is a @command{gawk} extension.
11052In other @command{awk} implementations,
11053or if @command{gawk} is in compatibility mode
11054(@pxref{Options}),
11055it is not special.
11056@end table
11057@c ENDOFRANGE bvconi
11058@c ENDOFRANGE vbconi
11059
11060@c fakenode --- for prepinfo
11061@subheading Advanced Notes: Changing @code{NR} and @code{FNR}
11062@cindex @code{NR} variable, changing
11063@cindex @code{FNR} variable, changing
11064@cindex advanced features, @code{FNR}/@code{NR} variables
11065@cindex dark corner, @code{FNR}/@code{NR} variables
11066@command{awk} increments @code{NR} and @code{FNR}
11067each time it reads a record, instead of setting them to the absolute
11068value of the number of records read.  This means that a program can
11069change these variables and their new values are incremented for
11070each record.
11071@value{DARKCORNER}
11072This is demonstrated in the following example:
11073
11074@example
11075$ echo '1
11076> 2
11077> 3
11078> 4' | awk 'NR == 2 @{ NR = 17 @}
11079> @{ print NR @}'
11080@print{} 1
11081@print{} 17
11082@print{} 18
11083@print{} 19
11084@end example
11085
11086@noindent
11087Before @code{FNR} was added to the @command{awk} language
11088(@pxref{V7/SVR3.1}),
11089many @command{awk} programs used this feature to track the number of
11090records in a file by resetting @code{NR} to zero when @code{FILENAME}
11091changed.
11092
11093@node ARGC and ARGV
11094@subsection Using @code{ARGC} and @code{ARGV}
11095@cindex @code{ARGC}/@code{ARGV} variables
11096@cindex arguments, command-line
11097@cindex command line, arguments
11098
11099@ref{Auto-set},
11100presented the following program describing the information contained in @code{ARGC}
11101and @code{ARGV}:
11102
11103@example
11104$ awk 'BEGIN @{
11105>        for (i = 0; i < ARGC; i++)
11106>            print ARGV[i]
11107>      @}' inventory-shipped BBS-list
11108@print{} awk
11109@print{} inventory-shipped
11110@print{} BBS-list
11111@end example
11112
11113@noindent
11114In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]}
11115contains @samp{inventory-shipped}, and @code{ARGV[2]} contains
11116@samp{BBS-list}.
11117Notice that the @command{awk} program is not entered in @code{ARGV}.  The
11118other special command-line options, with their arguments, are also not
11119entered.  This includes variable assignments done with the @option{-v}
11120option (@pxref{Options}).
11121Normal variable assignments on the command line @emph{are}
11122treated as arguments and do show up in the @code{ARGV} array:
11123
11124@example
11125$ cat showargs.awk
11126@print{} BEGIN @{
11127@print{}     printf "A=%d, B=%d\n", A, B
11128@print{}     for (i = 0; i < ARGC; i++)
11129@print{}         printf "\tARGV[%d] = %s\n", i, ARGV[i]
11130@print{} @}
11131@print{} END   @{ printf "A=%d, B=%d\n", A, B @}
11132$ awk -v A=1 -f showargs.awk B=2 /dev/null
11133@print{} A=1, B=0
11134@print{}        ARGV[0] = awk
11135@print{}        ARGV[1] = B=2
11136@print{}        ARGV[2] = /dev/null
11137@print{} A=1, B=2
11138@end example
11139
11140A program can alter @code{ARGC} and the elements of @code{ARGV}.
11141Each time @command{awk} reaches the end of an input file, it uses the next
11142element of @code{ARGV} as the name of the next input file.  By storing a
11143different string there, a program can change which files are read.
11144Use @code{"-"} to represent the standard input.  Storing
11145additional elements and incrementing @code{ARGC} causes
11146additional files to be read.
11147
11148If the value of @code{ARGC} is decreased, that eliminates input files
11149from the end of the list.  By recording the old value of @code{ARGC}
11150elsewhere, a program can treat the eliminated arguments as
11151something other than @value{FN}s.
11152
11153To eliminate a file from the middle of the list, store the null string
11154(@code{""}) into @code{ARGV} in place of the file's name.  As a
11155special feature, @command{awk} ignores @value{FN}s that have been
11156replaced with the null string.
11157Another option is to
11158use the @code{delete} statement to remove elements from
11159@code{ARGV} (@pxref{Delete}).
11160
11161All of these actions are typically done in the @code{BEGIN} rule,
11162before actual processing of the input begins.
11163@xref{Split Program}, and see
11164@ref{Tee Program}, for examples
11165of each way of removing elements from @code{ARGV}.
11166The following fragment processes @code{ARGV} in order to examine, and
11167then remove, command-line options:
11168@c NEXT ED: Add xref to rewind() function
11169
11170@example
11171BEGIN @{
11172    for (i = 1; i < ARGC; i++) @{
11173        if (ARGV[i] == "-v")
11174            verbose = 1
11175        else if (ARGV[i] == "-d")
11176            debug = 1
11177        else if (ARGV[i] ~ /^-?/) @{
11178            e = sprintf("%s: unrecognized option -- %c",
11179                    ARGV[0], substr(ARGV[i], 1, ,1))
11180            print e > "/dev/stderr"
11181        @} else
11182            break
11183        delete ARGV[i]
11184    @}
11185@}
11186@end example
11187
11188To actually get the options into the @command{awk} program,
11189end the @command{awk} options with @option{--} and then supply
11190the @command{awk} program's options, in the following manner:
11191
11192@example
11193awk -f myprog -- -v -d file1 file2 @dots{}
11194@end example
11195
11196@cindex differences in @command{awk} and @command{gawk}, @code{ARGC}/@code{ARGV} variables
11197This is not necessary in @command{gawk}. Unless @option{--posix} has
11198been specified, @command{gawk} silently puts any unrecognized options
11199into @code{ARGV} for the @command{awk} program to deal with.  As soon
11200as it sees an unknown option, @command{gawk} stops looking for other
11201options that it might otherwise recognize.  The previous example with
11202@command{gawk} would be:
11203
11204@example
11205gawk -f myprog -d -v file1 file2 @dots{}
11206@end example
11207
11208@noindent
11209Because @option{-d} is not a valid @command{gawk} option,
11210it and the following @option{-v}
11211are passed on to the @command{awk} program.
11212
11213@node Arrays
11214@chapter Arrays in @command{awk}
11215@c STARTOFRANGE arrs
11216@cindex arrays
11217
11218An @dfn{array} is a table of values called @dfn{elements}.  The
11219elements of an array are distinguished by their indices.  @dfn{Indices}
11220may be either numbers or strings.
11221
11222This @value{CHAPTER} describes how arrays work in @command{awk},
11223how to use array elements, how to scan through every element in an array,
11224and how to remove array elements.
11225It also describes how @command{awk} simulates multidimensional
11226arrays, as well as some of the less obvious points about array usage.
11227The @value{CHAPTER} finishes with a discussion of @command{gawk}'s facility
11228for sorting an array based on its indices.
11229
11230@cindex variables, names of
11231@cindex functions, names of
11232@cindex arrays, names of
11233@cindex names, arrays/variables
11234@cindex namespace issues
11235@command{awk} maintains a single set
11236of names that may be used for naming variables, arrays, and functions
11237(@pxref{User-defined}).
11238Thus, you cannot have a variable and an array with the same name in the
11239same @command{awk} program.
11240
11241@menu
11242* Array Intro::                 Introduction to Arrays
11243* Reference to Elements::       How to examine one element of an array.
11244* Assigning Elements::          How to change an element of an array.
11245* Array Example::               Basic Example of an Array
11246* Scanning an Array::           A variation of the @code{for} statement. It
11247                                loops through the indices of an array's
11248                                existing elements.
11249* Delete::                      The @code{delete} statement removes an element
11250                                from an array.
11251* Numeric Array Subscripts::    How to use numbers as subscripts in
11252                                @command{awk}.
11253* Uninitialized Subscripts::    Using Uninitialized variables as subscripts.
11254* Multi-dimensional::           Emulating multidimensional arrays in
11255                                @command{awk}.
11256* Multi-scanning::              Scanning multidimensional arrays.
11257* Array Sorting::               Sorting array values and indices.
11258@end menu
11259
11260@node Array Intro
11261@section Introduction to Arrays
11262
11263The @command{awk} language provides one-dimensional arrays
11264for storing groups of related strings or numbers.
11265Every @command{awk} array must have a name.  Array names have the same
11266syntax as variable names; any valid variable name would also be a valid
11267array name.  But one name cannot be used in both ways (as an array and
11268as a variable) in the same @command{awk} program.
11269
11270Arrays in @command{awk} superficially resemble arrays in other programming
11271languages, but there are fundamental differences.  In @command{awk}, it
11272isn't necessary to specify the size of an array before starting to use it.
11273Additionally, any number or string in @command{awk}, not just consecutive integers,
11274may be used as an array index.
11275
11276In most other languages, arrays must be @dfn{declared} before use,
11277including a specification of
11278how many elements or components they contain.  In such languages, the
11279declaration causes a contiguous block of memory to be allocated for that
11280many elements.  Usually, an index in the array must be a positive integer.
11281For example, the index zero specifies the first element in the array, which is
11282actually stored at the beginning of the block of memory.  Index one
11283specifies the second element, which is stored in memory right after the
11284first element, and so on.  It is impossible to add more elements to the
11285array, because it has room only for as many elements as given in
11286the declaration.
11287(Some languages allow arbitrary starting and ending
11288indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when
11289the array is declared.)
11290
11291A contiguous array of four elements might look like the following example,
11292conceptually, if the element values are 8, @code{"foo"},
11293@code{""}, and 30:
11294
11295@c NEXT ED: Use real images here
11296@iftex
11297@c from Karl Berry, much thanks for the help.
11298@tex
11299\bigskip % space above the table (about 1 linespace)
11300\offinterlineskip
11301\newdimen\width \width = 1.5cm
11302\newdimen\hwidth \hwidth = 4\width \advance\hwidth by 2pt % 5 * 0.4pt
11303\centerline{\vbox{
11304\halign{\strut\hfil\ignorespaces#&&\vrule#&\hbox to\width{\hfil#\unskip\hfil}\cr
11305\noalign{\hrule width\hwidth}
11306	&&{\tt 8} &&{\tt "foo"} &&{\tt ""} &&{\tt 30} &&\quad Value\cr
11307\noalign{\hrule width\hwidth}
11308\noalign{\smallskip}
11309	&\omit&0&\omit &1   &\omit&2 &\omit&3 &\omit&\quad Index\cr
11310}
11311}}
11312@end tex
11313@end iftex
11314@ifinfo
11315@example
11316+---------+---------+--------+---------+
11317|    8    |  "foo"  |   ""   |    30   |    @r{Value}
11318+---------+---------+--------+---------+
11319     0         1         2         3        @r{Index}
11320@end example
11321@end ifinfo
11322@ifxml
11323@example
11324+---------+---------+--------+---------+
11325|    8    |  "foo"  |   ""   |    30   |    @r{Value}
11326+---------+---------+--------+---------+
11327     0         1         2         3        @r{Index}
11328@end example
11329@end ifxml
11330
11331@noindent
11332Only the values are stored; the indices are implicit from the order of
11333the values. Here, 8 is the value at index zero, because 8 appears in the
11334position with zero elements before it.
11335
11336@c STARTOFRANGE arrin
11337@cindex arrays, indexing
11338@c STARTOFRANGE inarr
11339@cindex indexing arrays
11340@cindex associative arrays
11341@cindex arrays, associative
11342Arrays in @command{awk} are different---they are @dfn{associative}.  This means
11343that each array is a collection of pairs: an index and its corresponding
11344array element value:
11345
11346@example
11347@r{Element} 3     @r{Value} 30
11348@r{Element} 1     @r{Value} "foo"
11349@r{Element} 0     @r{Value} 8
11350@r{Element} 2     @r{Value} ""
11351@end example
11352
11353@noindent
11354The pairs are shown in jumbled order because their order is irrelevant.
11355
11356One advantage of associative arrays is that new pairs can be added
11357at any time.  For example, suppose a tenth element is added to the array
11358whose value is @w{@code{"number ten"}}.  The result is:
11359
11360@example
11361@r{Element} 10    @r{Value} "number ten"
11362@r{Element} 3     @r{Value} 30
11363@r{Element} 1     @r{Value} "foo"
11364@r{Element} 0     @r{Value} 8
11365@r{Element} 2     @r{Value} ""
11366@end example
11367
11368@noindent
11369@cindex sparse arrays
11370@cindex arrays, sparse
11371Now the array is @dfn{sparse}, which just means some indices are missing.
11372It has elements 0--3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or 9.
11373
11374Another consequence of associative arrays is that the indices don't
11375have to be positive integers.  Any number, or even a string, can be
11376an index.  For example, the following is an array that translates words from
11377English to French:
11378
11379@example
11380@r{Element} "dog" @r{Value} "chien"
11381@r{Element} "cat" @r{Value} "chat"
11382@r{Element} "one" @r{Value} "un"
11383@r{Element} 1     @r{Value} "un"
11384@end example
11385
11386@noindent
11387Here we decided to translate the number one in both spelled-out and
11388numeric form---thus illustrating that a single array can have both
11389numbers and strings as indices.
11390In fact, array subscripts are always strings; this is discussed
11391in more detail in
11392@ref{Numeric Array Subscripts}.
11393Here, the number @code{1} isn't double-quoted, since @command{awk}
11394automatically converts it to a string.
11395
11396@cindex case sensitivity, array indices and
11397@cindex arrays, @code{IGNORECASE} variable and
11398@cindex @code{IGNORECASE} variable, array subscripts and
11399The value of @code{IGNORECASE} has no effect upon array subscripting.
11400The identical string value used to store an array element must be used
11401to retrieve it.
11402When @command{awk} creates an array (e.g., with the @code{split}
11403built-in function),
11404that array's indices are consecutive integers starting at one.
11405(@xref{String Functions}.)
11406
11407@command{awk}'s arrays are efficient---the time to access an element
11408is independent of the number of elements in the array.
11409@c ENDOFRANGE arrin
11410@c ENDOFRANGE inarr
11411
11412@node Reference to Elements
11413@section Referring to an Array Element
11414@cindex arrays, elements, referencing
11415@cindex elements in arrays
11416
11417The principal way to use an array is to refer to one of its elements.
11418An array reference is an expression as follows:
11419
11420@example
11421@var{array}[@var{index}]
11422@end example
11423
11424@noindent
11425Here, @var{array} is the name of an array.  The expression @var{index} is
11426the index of the desired element of the array.
11427
11428The value of the array reference is the current value of that array
11429element.  For example, @code{foo[4.3]} is an expression for the element
11430of array @code{foo} at index @samp{4.3}.
11431
11432A reference to an array element that has no recorded value yields a value of
11433@code{""}, the null string.  This includes elements
11434that have not been assigned any value as well as elements that have been
11435deleted (@pxref{Delete}).  Such a reference
11436automatically creates that array element, with the null string as its value.
11437(In some cases, this is unfortunate, because it might waste memory inside
11438@command{awk}.)
11439
11440@c @cindex arrays, @code{in} operator and
11441@cindex @code{in} operator, arrays and
11442To determine whether an element exists in an array at a certain index, use
11443the following expression:
11444
11445@example
11446@var{index} in @var{array}
11447@end example
11448
11449@cindex side effects, array indexing
11450@noindent
11451This expression tests whether the particular index exists,
11452without the side effect of creating that element if it is not present.
11453The expression has the value one (true) if @code{@var{array}[@var{index}]}
11454exists and zero (false) if it does not exist.
11455For example, this statement tests whether the array @code{frequencies}
11456contains the index @samp{2}:
11457
11458@example
11459if (2 in frequencies)
11460    print "Subscript 2 is present."
11461@end example
11462
11463Note that this is @emph{not} a test of whether the array
11464@code{frequencies} contains an element whose @emph{value} is two.
11465There is no way to do that except to scan all the elements.  Also, this
11466@emph{does not} create @code{frequencies[2]}, while the following
11467(incorrect) alternative does:
11468
11469@example
11470if (frequencies[2] != "")
11471    print "Subscript 2 is present."
11472@end example
11473
11474@node Assigning Elements
11475@section Assigning Array Elements
11476@cindex arrays, elements, assigning
11477@cindex elements in arrays, assigning
11478
11479Array elements can be assigned values just like
11480@command{awk} variables:
11481
11482@example
11483@var{array}[@var{subscript}] = @var{value}
11484@end example
11485
11486@noindent
11487@var{array} is the name of an array.  The expression
11488@var{subscript} is the index of the element of the array that is
11489assigned a value.  The expression @var{value} is the value to
11490assign to that element of the array.
11491
11492@node Array Example
11493@section Basic Array Example
11494
11495The following program takes a list of lines, each beginning with a line
11496number, and prints them out in order of line number.  The line numbers
11497are not in order when they are first read---instead they
11498are scrambled.  This program sorts the lines by making an array using
11499the line numbers as subscripts.  The program then prints out the lines
11500in sorted order of their numbers.  It is a very simple program and gets
11501confused upon encountering repeated numbers, gaps, or lines that don't
11502begin with a number:
11503
11504@example
11505@c file eg/misc/arraymax.awk
11506@{
11507  if ($1 > max)
11508    max = $1
11509  arr[$1] = $0
11510@}
11511
11512END @{
11513  for (x = 1; x <= max; x++)
11514    print arr[x]
11515@}
11516@c endfile
11517@end example
11518
11519The first rule keeps track of the largest line number seen so far;
11520it also stores each line into the array @code{arr}, at an index that
11521is the line's number.
11522The second rule runs after all the input has been read, to print out
11523all the lines.
11524When this program is run with the following input:
11525
11526@example
11527@c file eg/misc/arraymax.data
115285  I am the Five man
115292  Who are you?  The new number two!
115304  . . . And four on the floor
115311  Who is number one?
115323  I three you.
11533@c endfile
11534@end example
11535
11536@noindent
11537Its output is:
11538
11539@example
115401  Who is number one?
115412  Who are you?  The new number two!
115423  I three you.
115434  . . . And four on the floor
115445  I am the Five man
11545@end example
11546
11547If a line number is repeated, the last line with a given number overrides
11548the others.
11549Gaps in the line numbers can be handled with an easy improvement to the
11550program's @code{END} rule, as follows:
11551
11552@example
11553END @{
11554  for (x = 1; x <= max; x++)
11555    if (x in arr)
11556      print arr[x]
11557@}
11558@end example
11559
11560@node Scanning an Array
11561@section Scanning All Elements of an Array
11562@cindex elements in arrays, scanning
11563@cindex arrays, scanning
11564
11565In programs that use arrays, it is often necessary to use a loop that
11566executes once for each element of an array.  In other languages, where
11567arrays are contiguous and indices are limited to positive integers,
11568this is easy: all the valid indices can be found by counting from
11569the lowest index up to the highest.  This technique won't do the job
11570in @command{awk}, because any number or string can be an array index.
11571So @command{awk} has a special kind of @code{for} statement for scanning
11572an array:
11573
11574@example
11575for (@var{var} in @var{array})
11576  @var{body}
11577@end example
11578
11579@noindent
11580@cindex @code{in} operator, arrays and
11581This loop executes @var{body} once for each index in @var{array} that the
11582program has previously used, with the variable @var{var} set to that index.
11583
11584@cindex arrays, @code{for} statement and
11585@cindex @code{for} statement, in arrays
11586The following program uses this form of the @code{for} statement.  The
11587first rule scans the input records and notes which words appear (at
11588least once) in the input, by storing a one into the array @code{used} with
11589the word as index.  The second rule scans the elements of @code{used} to
11590find all the distinct words that appear in the input.  It prints each
11591word that is more than 10 characters long and also prints the number of
11592such words.
11593@xref{String Functions},
11594for more information on the built-in function @code{length}.
11595
11596@example
11597# Record a 1 for each word that is used at least once
11598@{
11599    for (i = 1; i <= NF; i++)
11600        used[$i] = 1
11601@}
11602
11603# Find number of distinct words more than 10 characters long
11604END @{
11605    for (x in used)
11606        if (length(x) > 10) @{
11607            ++num_long_words
11608            print x
11609        @}
11610    print num_long_words, "words longer than 10 characters"
11611@}
11612@end example
11613
11614@noindent
11615@xref{Word Sorting},
11616for a more detailed example of this type.
11617
11618@cindex arrays, elements, order of
11619@cindex elements in arrays, order of
11620The order in which elements of the array are accessed by this statement
11621is determined by the internal arrangement of the array elements within
11622@command{awk} and cannot be controlled or changed.  This can lead to
11623problems if new elements are added to @var{array} by statements in
11624the loop body; it is not predictable whether the @code{for} loop will
11625reach them.  Similarly, changing @var{var} inside the loop may produce
11626strange results.  It is best to avoid such things.
11627
11628@node Delete
11629@section The @code{delete} Statement
11630@cindex @code{delete} statement
11631@cindex deleting elements in arrays
11632@cindex arrays, elements, deleting
11633@cindex elements in arrays, deleting
11634
11635To remove an individual element of an array, use the @code{delete}
11636statement:
11637
11638@example
11639delete @var{array}[@var{index}]
11640@end example
11641
11642Once an array element has been deleted, any value the element once
11643had is no longer available. It is as if the element had never
11644been referred to or had been given a value.
11645The following is an example of deleting elements in an array:
11646
11647@example
11648for (i in frequencies)
11649  delete frequencies[i]
11650@end example
11651
11652@noindent
11653This example removes all the elements from the array @code{frequencies}.
11654Once an element is deleted, a subsequent @code{for} statement to scan the array
11655does not report that element and the @code{in} operator to check for
11656the presence of that element returns zero (i.e., false):
11657
11658@example
11659delete foo[4]
11660if (4 in foo)
11661    print "This will never be printed"
11662@end example
11663
11664@cindex null strings, array elements and
11665It is important to note that deleting an element is @emph{not} the
11666same as assigning it a null value (the empty string, @code{""}).
11667For example:
11668
11669@example
11670foo[4] = ""
11671if (4 in foo)
11672  print "This is printed, even though foo[4] is empty"
11673@end example
11674
11675@cindex lint checking, array elements
11676It is not an error to delete an element that does not exist.
11677If @option{--lint} is provided on the command line
11678(@pxref{Options}),
11679@command{gawk} issues a warning message when an element that
11680is not in the array is deleted.
11681
11682@cindex arrays, deleting entire contents
11683@cindex deleting entire arrays
11684@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
11685All the elements of an array may be deleted with a single statement
11686by leaving off the subscript in the @code{delete} statement,
11687as follows:
11688
11689@example
11690delete @var{array}
11691@end example
11692
11693This ability is a @command{gawk} extension; it is not available in
11694compatibility mode (@pxref{Options}).
11695
11696Using this version of the @code{delete} statement is about three times
11697more efficient than the equivalent loop that deletes each element one
11698at a time.
11699
11700@cindex portability, deleting array elements
11701@cindex Brennan, Michael
11702The following statement provides a portable but nonobvious way to clear
11703out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
11704
11705@example
11706split("", array)
11707@end example
11708
11709@c comma before deleting does NOT start a tertiary
11710@cindex @code{split} function, array elements, deleting
11711The @code{split} function
11712(@pxref{String Functions})
11713clears out the target array first. This call asks it to split
11714apart the null string. Because there is no data to split out, the
11715function simply clears the array and then returns.
11716
11717@strong{Caution:} Deleting an array does not change its type; you cannot
11718delete an array and then use the array's name as a scalar
11719(i.e., a regular variable). For example, the following does not work:
11720
11721@example
11722a[1] = 3; delete a; a = 3
11723@end example
11724
11725@node Numeric Array Subscripts
11726@section Using Numbers to Subscript Arrays
11727
11728@cindex numbers, as array subscripts
11729@cindex arrays, subscripts
11730@cindex subscripts in arrays, numbers as
11731@cindex @code{CONVFMT} variable, array subscripts and
11732An important aspect about arrays to remember is that @emph{array subscripts
11733are always strings}.  When a numeric value is used as a subscript,
11734it is converted to a string value before being used for subscripting
11735(@pxref{Conversion}).
11736This means that the value of the built-in variable @code{CONVFMT} can
11737affect how your program accesses elements of an array.  For example:
11738
11739@example
11740xyz = 12.153
11741data[xyz] = 1
11742CONVFMT = "%2.2f"
11743if (xyz in data)
11744    printf "%s is in data\n", xyz
11745else
11746    printf "%s is not in data\n", xyz
11747@end example
11748
11749@noindent
11750This prints @samp{12.15 is not in data}.  The first statement gives
11751@code{xyz} a numeric value.  Assigning to
11752@code{data[xyz]} subscripts @code{data} with the string value @code{"12.153"}
11753(using the default conversion value of @code{CONVFMT}, @code{"%.6g"}).
11754Thus, the array element @code{data["12.153"]} is assigned the value one.
11755The program then changes
11756the value of @code{CONVFMT}.  The test @samp{(xyz in data)} generates a new
11757string value from @code{xyz}---this time @code{"12.15"}---because the value of
11758@code{CONVFMT} only allows two significant digits.  This test fails,
11759since @code{"12.15"} is a different string from @code{"12.153"}.
11760
11761@cindex converting, during subscripting
11762According to the rules for conversions
11763(@pxref{Conversion}), integer
11764values are always converted to strings as integers, no matter what the
11765value of @code{CONVFMT} may happen to be.  So the usual case of
11766the following works:
11767
11768@example
11769for (i = 1; i <= maxsub; i++)
11770    @i{do something with} array[i]
11771@end example
11772
11773The ``integer values always convert to strings as integers'' rule
11774has an additional consequence for array indexing.
11775Octal and hexadecimal constants
11776(@pxref{Nondecimal-numbers})
11777are converted internally into numbers, and their original form
11778is forgotten.
11779This means, for example, that
11780@code{array[17]},
11781@code{array[021]},
11782and
11783@code{array[0x11]}
11784all refer to the same element!
11785
11786As with many things in @command{awk}, the majority of the time
11787things work as one would expect them to.  But it is useful to have a precise
11788knowledge of the actual rules which sometimes can have a subtle
11789effect on your programs.
11790
11791@node Uninitialized Subscripts
11792@section Using Uninitialized Variables as Subscripts
11793
11794@c last comma does NOT start a tertiary
11795@cindex variables, uninitialized, as array subscripts
11796@cindex uninitialized variables, as array subscripts
11797@cindex subscripts in arrays, uninitialized variables as
11798@cindex arrays, subscripts, uninitialized variables as
11799Suppose it's necessary to write a program
11800to print the input data in reverse order.
11801A reasonable attempt to do so (with some test
11802data) might look like this:
11803
11804@example
11805$ echo 'line 1
11806> line 2
11807> line 3' | awk '@{ l[lines] = $0; ++lines @}
11808> END @{
11809>     for (i = lines-1; i >= 0; --i)
11810>        print l[i]
11811> @}'
11812@print{} line 3
11813@print{} line 2
11814@end example
11815
11816Unfortunately, the very first line of input data did not come out in the
11817output!
11818
11819At first glance, this program should have worked.  The variable @code{lines}
11820is uninitialized, and uninitialized variables have the numeric value zero.
11821So, @command{awk} should have printed the value of @code{l[0]}.
11822
11823The issue here is that subscripts for @command{awk} arrays are @emph{always}
11824strings. Uninitialized variables, when used as strings, have the
11825value @code{""}, not zero.  Thus, @samp{line 1} ends up stored in
11826@code{l[""]}.
11827The following version of the program works correctly:
11828
11829@example
11830@{ l[lines++] = $0 @}
11831END @{
11832    for (i = lines - 1; i >= 0; --i)
11833       print l[i]
11834@}
11835@end example
11836
11837Here, the @samp{++} forces @code{lines} to be numeric, thus making
11838the ``old value'' numeric zero. This is then converted to @code{"0"}
11839as the array subscript.
11840
11841@cindex null strings, as array subscripts
11842@cindex dark corner, array subscripts
11843@cindex lint checking, array subscripts
11844Even though it is somewhat unusual, the null string
11845(@code{""}) is a valid array subscript.
11846@value{DARKCORNER}
11847@command{gawk} warns about the use of the null string as a subscript
11848if @option{--lint} is provided
11849on the command line (@pxref{Options}).
11850
11851@node Multi-dimensional
11852@section Multidimensional Arrays
11853
11854@cindex subscripts in arrays, multidimensional
11855@cindex arrays, multidimensional
11856A multidimensional array is an array in which an element is identified
11857by a sequence of indices instead of a single index.  For example, a
11858two-dimensional array requires two indices.  The usual way (in most
11859languages, including @command{awk}) to refer to an element of a
11860two-dimensional array named @code{grid} is with
11861@code{grid[@var{x},@var{y}]}.
11862
11863@cindex @code{SUBSEP} variable, multidimensional arrays
11864Multidimensional arrays are supported in @command{awk} through
11865concatenation of indices into one string.
11866@command{awk} converts the indices into strings
11867(@pxref{Conversion}) and
11868concatenates them together, with a separator between them.  This creates
11869a single string that describes the values of the separate indices.  The
11870combined string is used as a single index into an ordinary,
11871one-dimensional array.  The separator used is the value of the built-in
11872variable @code{SUBSEP}.
11873
11874For example, suppose we evaluate the expression @samp{foo[5,12] = "value"}
11875when the value of @code{SUBSEP} is @code{"@@"}.  The numbers 5 and 12 are
11876converted to strings and
11877concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus,
11878the array element @code{foo["5@@12"]} is set to @code{"value"}.
11879
11880Once the element's value is stored, @command{awk} has no record of whether
11881it was stored with a single index or a sequence of indices.  The two
11882expressions @samp{foo[5,12]} and @w{@samp{foo[5 SUBSEP 12]}} are always
11883equivalent.
11884
11885The default value of @code{SUBSEP} is the string @code{"\034"},
11886which contains a nonprinting character that is unlikely to appear in an
11887@command{awk} program or in most input data.
11888The usefulness of choosing an unlikely character comes from the fact
11889that index values that contain a string matching @code{SUBSEP} can lead to
11890combined strings that are ambiguous.  Suppose that @code{SUBSEP} is
11891@code{"@@"}; then @w{@samp{foo["a@@b", "c"]}} and @w{@samp{foo["a",
11892"b@@c"]}} are indistinguishable because both are actually
11893stored as @samp{foo["a@@b@@c"]}.
11894
11895To test whether a particular index sequence exists in a
11896multidimensional array, use the same operator (@samp{in}) that is
11897used for single dimensional arrays.  Write the whole sequence of indices
11898in parentheses, separated by commas, as the left operand:
11899
11900@example
11901(@var{subscript1}, @var{subscript2}, @dots{}) in @var{array}
11902@end example
11903
11904The following example treats its input as a two-dimensional array of
11905fields; it rotates this array 90 degrees clockwise and prints the
11906result.  It assumes that all lines have the same number of
11907elements:
11908
11909@example
11910@{
11911     if (max_nf < NF)
11912          max_nf = NF
11913     max_nr = NR
11914     for (x = 1; x <= NF; x++)
11915          vector[x, NR] = $x
11916@}
11917
11918END @{
11919     for (x = 1; x <= max_nf; x++) @{
11920          for (y = max_nr; y >= 1; --y)
11921               printf("%s ", vector[x, y])
11922          printf("\n")
11923     @}
11924@}
11925@end example
11926
11927@noindent
11928When given the input:
11929
11930@example
119311 2 3 4 5 6
119322 3 4 5 6 1
119333 4 5 6 1 2
119344 5 6 1 2 3
11935@end example
11936
11937@noindent
11938the program produces the following output:
11939
11940@example
119414 3 2 1
119425 4 3 2
119436 5 4 3
119441 6 5 4
119452 1 6 5
119463 2 1 6
11947@end example
11948
11949@node Multi-scanning
11950@section Scanning Multidimensional Arrays
11951
11952There is no special @code{for} statement for scanning a
11953``multidimensional'' array. There cannot be one, because, in truth, there
11954are no multidimensional arrays or elements---there is only a
11955multidimensional @emph{way of accessing} an array.
11956
11957@cindex subscripts in arrays, multidimensional, scanning
11958@cindex arrays, multidimensional, scanning
11959However, if your program has an array that is always accessed as
11960multidimensional, you can get the effect of scanning it by combining
11961the scanning @code{for} statement
11962(@pxref{Scanning an Array}) with the
11963built-in @code{split} function
11964(@pxref{String Functions}).
11965It works in the following manner:
11966
11967@example
11968for (combined in array) @{
11969    split(combined, separate, SUBSEP)
11970    @dots{}
11971@}
11972@end example
11973
11974@noindent
11975This sets the variable @code{combined} to
11976each concatenated combined index in the array, and splits it
11977into the individual indices by breaking it apart where the value of
11978@code{SUBSEP} appears.  The individual indices then become the elements of
11979the array @code{separate}.
11980
11981Thus, if a value is previously stored in @code{array[1, "foo"]}; then
11982an element with index @code{"1\034foo"} exists in @code{array}.  (Recall
11983that the default value of @code{SUBSEP} is the character with code 034.)
11984Sooner or later, the @code{for} statement finds that index and does an
11985iteration with the variable @code{combined} set to @code{"1\034foo"}.
11986Then the @code{split} function is called as follows:
11987
11988@example
11989split("1\034foo", separate, "\034")
11990@end example
11991
11992@noindent
11993The result is to set @code{separate[1]} to @code{"1"} and
11994@code{separate[2]} to @code{"foo"}.  Presto! The original sequence of
11995separate indices is recovered.
11996
11997@node Array Sorting
11998@section Sorting Array Values and Indices with @command{gawk}
11999
12000@cindex arrays, sorting
12001@cindex @code{asort} function (@command{gawk})
12002@c last comma does NOT start a tertiary
12003@cindex @code{asort} function (@command{gawk}), arrays, sorting
12004@cindex sort function, arrays, sorting
12005The order in which an array is scanned with a @samp{for (i in array)}
12006loop is essentially arbitrary.
12007In most @command{awk} implementations, sorting an array requires
12008writing a @code{sort} function.
12009While this can be educational for exploring different sorting algorithms,
12010usually that's not the point of the program.
12011@command{gawk} provides the built-in @code{asort}
12012and @code{asorti} functions
12013(@pxref{String Functions})
12014for sorting arrays.  For example:
12015
12016@example
12017@var{populate the array} data
12018n = asort(data)
12019for (i = 1; i <= n; i++)
12020    @var{do something with} data[i]
12021@end example
12022
12023After the call to @code{asort}, the array @code{data} is indexed from 1
12024to some number @var{n}, the total number of elements in @code{data}.
12025(This count is @code{asort}'s return value.)
12026@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
12027The comparison of array elements is done
12028using @command{gawk}'s usual comparison rules
12029(@pxref{Typing and Comparison}).
12030
12031@cindex side effects, @code{asort} function
12032An important side effect of calling @code{asort} is that
12033@emph{the array's original indices are irrevocably lost}.
12034As this isn't always desirable, @code{asort} accepts a
12035second argument:
12036
12037@example
12038@var{populate the array} source
12039n = asort(source, dest)
12040for (i = 1; i <= n; i++)
12041    @var{do something with} dest[i]
12042@end example
12043
12044In this case, @command{gawk} copies the @code{source} array into the
12045@code{dest} array and then sorts @code{dest}, destroying its indices.
12046However, the @code{source} array is not affected.
12047
12048Often, what's needed is to sort on the values of the @emph{indices}
12049instead of the values of the elements.
12050To do that, starting with @command{gawk} 3.1.2, use the
12051@code{asorti} function.  The interface is identical to that of
12052@code{asort}, except that the index values are used for sorting, and
12053become the values of the result array:
12054
12055@example
12056@{ source[$0] = some_func($0) @}
12057
12058END @{
12059    n = asorti(source, dest)
12060    for (i = 1; i <= n; i++)
12061        @var{do something with} dest[i]
12062@}
12063@end example
12064
12065If your version of @command{gawk} is 3.1.0 or 3.1.1, you don't
12066have @code{asorti}. Instead, use a helper array
12067to hold the sorted index values, and then access the original array's
12068elements.  It works in the following way:
12069
12070@example
12071@var{populate the array} data
12072# copy indices
12073j = 1
12074for (i in data) @{
12075    ind[j] = i    # index value becomes element value
12076    j++
12077@}
12078n = asort(ind)    # index values are now sorted
12079for (i = 1; i <= n; i++)
12080    @var{do something with} data[ind[i]]
12081@end example
12082
12083Sorting the array by replacing the indices provides maximal flexibility.
12084To traverse the elements in decreasing order, use a loop that goes from
12085@var{n} down to 1, either over the elements or over the indices.
12086
12087@cindex reference counting, sorting arrays
12088Copying array indices and elements isn't expensive in terms of memory.
12089Internally, @command{gawk} maintains @dfn{reference counts} to data.
12090For example, when @code{asort} copies the first array to the second one,
12091there is only one copy of the original array elements' data, even though
12092both arrays use the values.  Similarly, when copying the indices from
12093@code{data} to @code{ind}, there is only one copy of the actual index
12094strings.
12095
12096@c Document It And Call It A Feature. Sigh.
12097@cindex arrays, sorting, @code{IGNORECASE} variable and
12098@cindex @code{IGNORECASE} variable, array sorting and
12099We said previously that comparisons are done using @command{gawk}'s
12100``usual comparison rules.''  Because @code{IGNORECASE} affects
12101string comparisons, the value of @code{IGNORECASE} also
12102affects sorting for both @code{asort} and @code{asorti}.
12103Caveat Emptor.
12104@c ENDOFRANGE arrs
12105
12106@node Functions
12107@chapter Functions
12108
12109@c STARTOFRANGE funcbi
12110@cindex functions, built-in
12111@c STARTOFRANGE bifunc
12112@cindex built-in functions
12113This @value{CHAPTER} describes @command{awk}'s built-in functions,
12114which fall into three categories: numeric, string, and I/O.
12115@command{gawk} provides additional groups of functions
12116to work with values that represent time, do
12117bit manipulation, and internationalize and localize programs.
12118
12119Besides the built-in functions, @command{awk} has provisions for
12120writing new functions that the rest of a program can use.
12121The second half of this @value{CHAPTER} describes these
12122@dfn{user-defined} functions.
12123
12124@menu
12125* Built-in::                    Summarizes the built-in functions.
12126* User-defined::                Describes User-defined functions in detail.
12127@end menu
12128
12129@node Built-in
12130@section Built-in Functions
12131
12132@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
12133@dfn{Built-in} functions are always available for
12134your @command{awk} program to call.  This @value{SECTION} defines all
12135the built-in
12136functions in @command{awk}; some of these are mentioned in other sections
12137but are summarized here for your convenience.
12138
12139@menu
12140* Calling Built-in::            How to call built-in functions.
12141* Numeric Functions::           Functions that work with numbers, including
12142                                @code{int}, @code{sin} and @code{rand}.
12143* String Functions::            Functions for string manipulation, such as
12144                                @code{split}, @code{match} and @code{sprintf}.
12145* I/O Functions::               Functions for files and shell commands.
12146* Time Functions::              Functions for dealing with timestamps.
12147* Bitwise Functions::           Functions for bitwise operations.
12148* I18N Functions::              Functions for string translation.
12149@end menu
12150
12151@node Calling Built-in
12152@subsection Calling Built-in Functions
12153
12154To call one of @command{awk}'s built-in functions, write the name of
12155the function followed
12156by arguments in parentheses.  For example, @samp{atan2(y + z, 1)}
12157is a call to the function @code{atan2} and has two arguments.
12158
12159@cindex programming conventions, functions, calling
12160@c last comma does NOT start a tertiary
12161@cindex whitespace, functions, calling
12162Whitespace is ignored between the built-in function name and the
12163open parenthesis, and it is good practice to avoid using whitespace
12164there.  User-defined functions do not permit whitespace in this way, and
12165it is easier to avoid mistakes by following a simple
12166convention that always works---no whitespace after a function name.
12167
12168@c last comma is part of tertiary
12169@cindex troubleshooting, @command{gawk}, fatal errors, function arguments
12170@cindex @command{gawk}, function arguments and
12171@cindex differences in @command{awk} and @command{gawk}, function arguments (@command{gawk})
12172Each built-in function accepts a certain number of arguments.
12173In some cases, arguments can be omitted. The defaults for omitted
12174arguments vary from function to function and are described under the
12175individual functions.  In some @command{awk} implementations, extra
12176arguments given to built-in functions are ignored.  However, in @command{gawk},
12177it is a fatal error to give extra arguments to a built-in function.
12178
12179When a function is called, expressions that create the function's actual
12180parameters are evaluated completely before the call is performed.
12181For example, in the following code fragment:
12182
12183@example
12184i = 4
12185j = sqrt(i++)
12186@end example
12187
12188@cindex evaluation order, functions
12189@cindex functions, built-in, evaluation order
12190@cindex built-in functions, evaluation order
12191@noindent
12192the variable @code{i} is incremented to the value five before @code{sqrt}
12193is called with a value of four for its actual parameter.
12194The order of evaluation of the expressions used for the function's
12195parameters is undefined.  Thus, avoid writing programs that
12196assume that parameters are evaluated from left to right or from
12197right to left.  For example:
12198
12199@example
12200i = 5
12201j = atan2(i++, i *= 2)
12202@end example
12203
12204If the order of evaluation is left to right, then @code{i} first becomes
122056, and then 12, and @code{atan2} is called with the two arguments 6
12206and 12.  But if the order of evaluation is right to left, @code{i}
12207first becomes 10, then 11, and @code{atan2} is called with the
12208two arguments 11 and 10.
12209
12210@node Numeric Functions
12211@subsection Numeric Functions
12212
12213The following list describes all of
12214the built-in functions that work with numbers.
12215Optional parameters are enclosed in square brackets@w{ ([ ]):}
12216
12217@table @code
12218@item int(@var{x})
12219@cindex @code{int} function
12220This returns the nearest integer to @var{x}, located between @var{x} and zero and
12221truncated toward zero.
12222
12223For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
12224is @minus{}3, and @code{int(-3)} is @minus{}3 as well.
12225
12226@item sqrt(@var{x})
12227@cindex @code{sqrt} function
12228This returns the positive square root of @var{x}.
12229@command{gawk} reports an error
12230if @var{x} is negative.  Thus, @code{sqrt(4)} is 2.
12231
12232@item exp(@var{x})
12233@cindex @code{exp} function
12234This returns the exponential of @var{x} (@code{e ^ @var{x}}) or reports
12235an error if @var{x} is out of range.  The range of values @var{x} can have
12236depends on your machine's floating-point representation.
12237
12238@item log(@var{x})
12239@cindex @code{log} function
12240This returns the natural logarithm of @var{x}, if @var{x} is positive;
12241otherwise, it reports an error.
12242
12243@item sin(@var{x})
12244@cindex @code{sin} function
12245This returns the sine of @var{x}, with @var{x} in radians.
12246
12247@item cos(@var{x})
12248@cindex @code{cos} function
12249This returns the cosine of @var{x}, with @var{x} in radians.
12250
12251@item atan2(@var{y}, @var{x})
12252@cindex @code{atan2} function
12253This returns the arctangent of @code{@var{y} / @var{x}} in radians.
12254
12255@item rand()
12256@cindex @code{rand} function
12257@cindex random numbers, @code{rand}/@code{srand} functions
12258This returns a random number.  The values of @code{rand} are
12259uniformly distributed between zero and one.
12260The value could be zero but is never one.@footnote{The C version of @code{rand}
12261is known to produce fairly poor sequences of random numbers.
12262However, nothing requires that an @command{awk} implementation use the C
12263@code{rand} to implement the @command{awk} version of @code{rand}.
12264In fact, @command{gawk} uses the BSD @code{random} function, which is
12265considerably better than @code{rand}, to produce random numbers.}
12266
12267Often random integers are needed instead.  Following is a user-defined function
12268that can be used to obtain a random non-negative integer less than @var{n}:
12269
12270@example
12271function randint(n) @{
12272     return int(n * rand())
12273@}
12274@end example
12275
12276@noindent
12277The multiplication produces a random number greater than zero and less
12278than @code{n}.  Using @code{int}, this result is made into
12279an integer between zero and @code{n} @minus{} 1, inclusive.
12280
12281The following example uses a similar function to produce random integers
12282between one and @var{n}.  This program prints a new random number for
12283each input record:
12284
12285@example
12286# Function to roll a simulated die.
12287function roll(n) @{ return 1 + int(rand() * n) @}
12288
12289# Roll 3 six-sided dice and
12290# print total number of points.
12291@{
12292      printf("%d points\n",
12293             roll(6)+roll(6)+roll(6))
12294@}
12295@end example
12296
12297@cindex numbers, random
12298@cindex random numbers, seed of
12299@c MAWK uses a different seed each time.
12300@strong{Caution:} In most @command{awk} implementations, including @command{gawk},
12301@code{rand} starts generating numbers from the same
12302starting number, or @dfn{seed}, each time you run @command{awk}.  Thus,
12303a program generates the same results each time you run it.
12304The numbers are random within one @command{awk} run but predictable
12305from run to run.  This is convenient for debugging, but if you want
12306a program to do different things each time it is used, you must change
12307the seed to a value that is different in each run.  To do this,
12308use @code{srand}.
12309
12310@item srand(@r{[}@var{x}@r{]})
12311@cindex @code{srand} function
12312The function @code{srand} sets the starting point, or seed,
12313for generating random numbers to the value @var{x}.
12314
12315Each seed value leads to a particular sequence of random
12316numbers.@footnote{Computer-generated random numbers really are not truly
12317random.  They are technically known as ``pseudorandom.''  This means
12318that while the numbers in a sequence appear to be random, you can in
12319fact generate the same sequence of random numbers over and over again.}
12320Thus, if the seed is set to the same value a second time,
12321the same sequence of random numbers is produced again.
12322
12323Different @command{awk} implementations use different random-number
12324generators internally.  Don't expect the same @command{awk} program
12325to produce the same series of random numbers when executed by
12326different versions of @command{awk}.
12327
12328If the argument @var{x} is omitted, as in @samp{srand()}, then the current
12329date and time of day are used for a seed.  This is the way to get random
12330numbers that are truly unpredictable.
12331
12332The return value of @code{srand} is the previous seed.  This makes it
12333easy to keep track of the seeds in case you need to consistently reproduce
12334sequences of random numbers.
12335@end table
12336
12337@node String Functions
12338@subsection String-Manipulation Functions
12339
12340The functions in this @value{SECTION} look at or change the text of one or more
12341strings.
12342Optional parameters are enclosed in square brackets@w{ ([ ]).}
12343Those functions that are
12344specific to @command{gawk} are marked with a pound sign@w{ (@samp{#}):}
12345
12346@menu
12347* Gory Details::                More than you want to know about @samp{\} and
12348                                @samp{&} with @code{sub}, @code{gsub}, and
12349                                @code{gensub}.
12350@end menu
12351
12352@table @code
12353@item asort(@var{source} @r{[}, @var{dest}@r{]}) #
12354@cindex arrays, elements, retrieving number of
12355@cindex @code{asort} function (@command{gawk})
12356@code{asort} is a @command{gawk}-specific extension, returning the number of
12357elements in the array @var{source}.  The contents of @var{source} are
12358sorted using @command{gawk}'s normal rules for comparing values
12359(in particular, @code{IGNORECASE} affects the sorting)
12360and the indices
12361of the sorted values of @var{source} are replaced with sequential
12362integers starting with one. If the optional array @var{dest} is specified,
12363then @var{source} is duplicated into @var{dest}.  @var{dest} is then
12364sorted, leaving the indices of @var{source} unchanged.
12365For example, if the contents of @code{a} are as follows:
12366
12367@example
12368a["last"] = "de"
12369a["first"] = "sac"
12370a["middle"] = "cul"
12371@end example
12372
12373@noindent
12374A call to @code{asort}:
12375
12376@example
12377asort(a)
12378@end example
12379
12380@noindent
12381results in the following contents of @code{a}:
12382
12383@example
12384a[1] = "cul"
12385a[2] = "de"
12386a[3] = "sac"
12387@end example
12388
12389The @code{asort} function is described in more detail in
12390@ref{Array Sorting}.
12391@code{asort} is a @command{gawk} extension; it is not available
12392in compatibility mode (@pxref{Options}).
12393
12394@item asorti(@var{source} @r{[}, @var{dest}@r{]}) #
12395@cindex @code{asorti} function (@command{gawk})
12396@code{asorti} is a @command{gawk}-specific extension, returning the number of
12397elements in the array @var{source}.
12398It works similarly to @code{asort}, however, the @emph{indices}
12399are sorted, instead of the values.  As array indices are always strings,
12400the comparison performed is always a string comparison.  (Here too,
12401@code{IGNORECASE} affects the sorting.)
12402
12403The @code{asorti} function is described in more detail in
12404@ref{Array Sorting}.
12405It was added in @command{gawk} 3.1.2.
12406@code{asorti} is a @command{gawk} extension; it is not available
12407in compatibility mode (@pxref{Options}).
12408
12409@item index(@var{in}, @var{find})
12410@cindex @code{index} function
12411@cindex searching
12412This searches the string @var{in} for the first occurrence of the string
12413@var{find}, and returns the position in characters where that occurrence
12414begins in the string @var{in}.  Consider the following example:
12415
12416@example
12417$ awk 'BEGIN @{ print index("peanut", "an") @}'
12418@print{} 3
12419@end example
12420
12421@noindent
12422If @var{find} is not found, @code{index} returns zero.
12423(Remember that string indices in @command{awk} start at one.)
12424
12425@item length(@r{[}@var{string}@r{]})
12426@cindex @code{length} function
12427This returns the number of characters in @var{string}.  If
12428@var{string} is a number, the length of the digit string representing
12429that number is returned.  For example, @code{length("abcde")} is 5.  By
12430contrast, @code{length(15 * 35)} works out to 3. In this example, 15 * 35 =
12431525, and 525 is then converted to the string @code{"525"}, which has
12432three characters.
12433
12434If no argument is supplied, @code{length} returns the length of @code{$0}.
12435
12436@c @cindex historical features
12437@cindex portability, @code{length} function
12438@cindex POSIX @command{awk}, functions and, @code{length}
12439@strong{Note:}
12440In older versions of @command{awk}, the @code{length} function could
12441be called
12442without any parentheses.  Doing so is marked as ``deprecated'' in the
12443POSIX standard.  This means that while a program can do this,
12444it is a feature that can eventually be removed from a future
12445version of the standard.  Therefore, for programs to be maximally portable,
12446always supply the parentheses.
12447
12448@item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]})
12449@cindex @code{match} function
12450The @code{match} function searches @var{string} for the
12451longest, leftmost substring matched by the regular expression,
12452@var{regexp}.  It returns the character position, or @dfn{index},
12453at which that substring begins (one, if it starts at the beginning of
12454@var{string}).  If no match is found, it returns zero.
12455
12456The @var{regexp} argument may be either a regexp constant
12457(@samp{/@dots{}/}) or a string constant (@var{"@dots{}"}).
12458In the latter case, the string is treated as a regexp to be matched.
12459@ref{Computed Regexps}, for a
12460discussion of the difference between the two forms, and the
12461implications for writing your program correctly.
12462
12463The order of the first two arguments is backwards from most other string
12464functions that work with regular expressions, such as
12465@code{sub} and @code{gsub}.  It might help to remember that
12466for @code{match}, the order is the same as for the @samp{~} operator:
12467@samp{@var{string} ~ @var{regexp}}.
12468
12469@cindex @code{RSTART} variable, @code{match} function and
12470@cindex @code{RLENGTH} variable, @code{match} function and
12471@cindex @code{match} function, @code{RSTART}/@code{RLENGTH} variables
12472The @code{match} function sets the built-in variable @code{RSTART} to
12473the index.  It also sets the built-in variable @code{RLENGTH} to the
12474length in characters of the matched substring.  If no match is found,
12475@code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1.
12476
12477For example:
12478
12479@example
12480@c file eg/misc/findpat.awk
12481@{
12482       if ($1 == "FIND")
12483         regex = $2
12484       else @{
12485         where = match($0, regex)
12486         if (where != 0)
12487           print "Match of", regex, "found at",
12488                     where, "in", $0
12489       @}
12490@}
12491@c endfile
12492@end example
12493
12494@noindent
12495This program looks for lines that match the regular expression stored in
12496the variable @code{regex}.  This regular expression can be changed.  If the
12497first word on a line is @samp{FIND}, @code{regex} is changed to be the
12498second word on that line.  Therefore, if given:
12499
12500@example
12501@c file eg/misc/findpat.data
12502FIND ru+n
12503My program runs
12504but not very quickly
12505FIND Melvin
12506JF+KM
12507This line is property of Reality Engineering Co.
12508Melvin was here.
12509@c endfile
12510@end example
12511
12512@noindent
12513@command{awk} prints:
12514
12515@example
12516Match of ru+n found at 12 in My program runs
12517Match of Melvin found at 1 in Melvin was here.
12518@end example
12519
12520@cindex differences in @command{awk} and @command{gawk}, @code{match} function
12521If @var{array} is present, it is cleared, and then the 0th element
12522of @var{array} is set to the entire portion of @var{string}
12523matched by @var{regexp}.  If @var{regexp} contains parentheses,
12524the integer-indexed elements of @var{array} are set to contain the
12525portion of @var{string} matching the corresponding parenthesized
12526subexpression.
12527For example:
12528
12529@example
12530$ echo foooobazbarrrrr |
12531> gawk '@{ match($0, /(fo+).+(bar*)/, arr)
12532>           print arr[1], arr[2] @}'
12533@print{} foooo barrrrr
12534@end example
12535
12536In addition,
12537beginning with @command{gawk} 3.1.2,
12538multidimensional subscripts are available providing
12539the start index and length of each matched subexpression:
12540
12541@example
12542$ echo foooobazbarrrrr |
12543> gawk '@{ match($0, /(fo+).+(bar*)/, arr)
12544>           print arr[1], arr[2]
12545>           print arr[1, "start"], arr[1, "length"]
12546>           print arr[2, "start"], arr[2, "length"]
12547> @}'
12548@print{} foooo barrrrr
12549@print{} 1 5
12550@print{} 9 7
12551@end example
12552
12553There may not be subscripts for the start and index for every parenthesized
12554subexpressions, since they may not all have matched text; thus they
12555should be tested for with the @code{in} operator
12556(@pxref{Reference to Elements}).
12557
12558@cindex troubleshooting, @code{match} function
12559The @var{array} argument to @code{match} is a
12560@command{gawk} extension.  In compatibility mode
12561(@pxref{Options}),
12562using a third argument is a fatal error.
12563
12564@item split(@var{string}, @var{array} @r{[}, @var{fieldsep}@r{]})
12565@cindex @code{split} function
12566This function divides @var{string} into pieces separated by @var{fieldsep}
12567and stores the pieces in @var{array}.  The first piece is stored in
12568@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
12569forth.  The string value of the third argument, @var{fieldsep}, is
12570a regexp describing where to split @var{string} (much as @code{FS} can
12571be a regexp describing where to split input records).  If
12572@var{fieldsep} is omitted, the value of @code{FS} is used.
12573@code{split} returns the number of elements created.
12574
12575The @code{split} function splits strings into pieces in a
12576manner similar to the way input lines are split into fields.  For example:
12577
12578@example
12579split("cul-de-sac", a, "-")
12580@end example
12581
12582@noindent
12583@cindex strings, splitting
12584splits the string @samp{cul-de-sac} into three fields using @samp{-} as the
12585separator.  It sets the contents of the array @code{a} as follows:
12586
12587@example
12588a[1] = "cul"
12589a[2] = "de"
12590a[3] = "sac"
12591@end example
12592
12593@noindent
12594The value returned by this call to @code{split} is three.
12595
12596@cindex differences in @command{awk} and @command{gawk}, @code{split} function
12597As with input field-splitting, when the value of @var{fieldsep} is
12598@w{@code{" "}}, leading and trailing whitespace is ignored, and the elements
12599are separated by runs of whitespace.
12600Also as with input field-splitting, if @var{fieldsep} is the null string, each
12601individual character in the string is split into its own array element.
12602(This is a @command{gawk}-specific extension.)
12603
12604Note, however, that @code{RS} has no effect on the way @code{split}
12605works. Even though @samp{RS = ""} causes newline to also be an input
12606field separator, this does not affect how @code{split} splits strings.
12607
12608@cindex dark corner, @code{split} function
12609Modern implementations of @command{awk}, including @command{gawk}, allow
12610the third argument to be a regexp constant (@code{/abc/}) as well as a
12611string.
12612@value{DARKCORNER}
12613The POSIX standard allows this as well.
12614@ref{Computed Regexps}, for a
12615discussion of the difference between using a string constant or a regexp constant,
12616and the implications for writing your program correctly.
12617
12618Before splitting the string, @code{split} deletes any previously existing
12619elements in the array @var{array}.
12620
12621If @var{string} is null, the array has no elements. (So this is a portable
12622way to delete an entire array with one statement.
12623@xref{Delete}.)
12624
12625If @var{string} does not match @var{fieldsep} at all (but is not null),
12626@var{array} has one element only. The value of that element is the original
12627@var{string}.
12628
12629@item sprintf(@var{format}, @var{expression1}, @dots{})
12630@cindex @code{sprintf} function
12631This returns (without printing) the string that @code{printf} would
12632have printed out with the same arguments
12633(@pxref{Printf}).
12634For example:
12635
12636@example
12637pival = sprintf("pi = %.2f (approx.)", 22/7)
12638@end example
12639
12640@noindent
12641assigns the string @w{@code{"pi = 3.14 (approx.)"}} to the variable @code{pival}.
12642
12643@cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk})
12644@cindex @code{strtonum} function (@command{gawk})
12645@item strtonum(@var{str}) #
12646Examines @var{str} and returns its numeric value.  If @var{str}
12647begins with a leading @samp{0}, @code{strtonum} assumes that @var{str}
12648is an octal number.  If @var{str} begins with a leading @samp{0x} or
12649@samp{0X}, @code{strtonum} assumes that @var{str} is a hexadecimal number.
12650For example:
12651
12652@example
12653$ echo 0x11 |
12654> gawk '@{ printf "%d\n", strtonum($1) @}'
12655@print{} 17
12656@end example
12657
12658Using the @code{strtonum} function is @emph{not} the same as adding zero
12659to a string value; the automatic coercion of strings to numbers
12660works only for decimal data, not for octal or hexadecimal.@footnote{Unless
12661you use the @option{--non-decimal-data} option, which isn't recommended.
12662@xref{Nondecimal Data}, for more information.}
12663
12664@cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk})
12665@code{strtonum} is a @command{gawk} extension; it is not available
12666in compatibility mode (@pxref{Options}).
12667
12668@item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
12669@cindex @code{sub} function
12670The @code{sub} function alters the value of @var{target}.
12671It searches this value, which is treated as a string, for the
12672leftmost, longest substring matched by the regular expression @var{regexp}.
12673Then the entire string is
12674changed by replacing the matched text with @var{replacement}.
12675The modified string becomes the new value of @var{target}.
12676
12677The @var{regexp} argument may be either a regexp constant
12678(@samp{/@dots{}/}) or a string constant (@var{"@dots{}"}).
12679In the latter case, the string is treated as a regexp to be matched.
12680@ref{Computed Regexps}, for a
12681discussion of the difference between the two forms, and the
12682implications for writing your program correctly.
12683
12684This function is peculiar because @var{target} is not simply
12685used to compute a value, and not just any expression will do---it
12686must be a variable, field, or array element so that @code{sub} can
12687store a modified value there.  If this argument is omitted, then the
12688default is to use and alter @code{$0}.@footnote{Note that this means
12689that the record will first be regenerated using the value of @code{OFS} if
12690any fields have been changed, and that the fields will be updated
12691after the substituion, even if the operation is a ``no-op'' such
12692as @samp{sub(/^/, "")}.}
12693For example:
12694
12695@example
12696str = "water, water, everywhere"
12697sub(/at/, "ith", str)
12698@end example
12699
12700@noindent
12701sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the
12702leftmost longest occurrence of @samp{at} with @samp{ith}.
12703
12704The @code{sub} function returns the number of substitutions made (either
12705one or zero).
12706
12707If the special character @samp{&} appears in @var{replacement}, it
12708stands for the precise substring that was matched by @var{regexp}.  (If
12709the regexp can match more than one string, then this precise substring
12710may vary.)  For example:
12711
12712@example
12713@{ sub(/candidate/, "& and his wife"); print @}
12714@end example
12715
12716@noindent
12717changes the first occurrence of @samp{candidate} to @samp{candidate
12718and his wife} on each input line.
12719Here is another example:
12720
12721@example
12722$ awk 'BEGIN @{
12723>         str = "daabaaa"
12724>         sub(/a+/, "C&C", str)
12725>         print str
12726> @}'
12727@print{} dCaaCbaaa
12728@end example
12729
12730@noindent
12731This shows how @samp{&} can represent a nonconstant string and also
12732illustrates the ``leftmost, longest'' rule in regexp matching
12733(@pxref{Leftmost Longest}).
12734
12735The effect of this special character (@samp{&}) can be turned off by putting a
12736backslash before it in the string.  As usual, to insert one backslash in
12737the string, you must write two backslashes.  Therefore, write @samp{\\&}
12738in a string constant to include a literal @samp{&} in the replacement.
12739For example, the following shows how to replace the first @samp{|} on each line with
12740an @samp{&}:
12741
12742@example
12743@{ sub(/\|/, "\\&"); print @}
12744@end example
12745
12746@cindex @code{sub} function, arguments of
12747@cindex @code{gsub} function, arguments of
12748As mentioned, the third argument to @code{sub} must
12749be a variable, field or array reference.
12750Some versions of @command{awk} allow the third argument to
12751be an expression that is not an lvalue.  In such a case, @code{sub}
12752still searches for the pattern and returns zero or one, but the result of
12753the substitution (if any) is thrown away because there is no place
12754to put it.  Such versions of @command{awk} accept expressions
12755such as the following:
12756
12757@example
12758sub(/USA/, "United States", "the USA and Canada")
12759@end example
12760
12761@noindent
12762@cindex troubleshooting, @code{gsub}/@code{sub} functions
12763For historical compatibility, @command{gawk} accepts erroneous code,
12764such as in the previous example. However, using any other nonchangeable
12765object as the third parameter causes a fatal error and your program
12766will not run.
12767
12768Finally, if the @var{regexp} is not a regexp constant, it is converted into a
12769string, and then the value of that string is treated as the regexp to match.
12770
12771@item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
12772@cindex @code{gsub} function
12773This is similar to the @code{sub} function, except @code{gsub} replaces
12774@emph{all} of the longest, leftmost, @emph{nonoverlapping} matching
12775substrings it can find.  The @samp{g} in @code{gsub} stands for
12776``global,'' which means replace everywhere.  For example:
12777
12778@example
12779@{ gsub(/Britain/, "United Kingdom"); print @}
12780@end example
12781
12782@noindent
12783replaces all occurrences of the string @samp{Britain} with @samp{United
12784Kingdom} for all input records.
12785
12786The @code{gsub} function returns the number of substitutions made.  If
12787the variable to search and alter (@var{target}) is
12788omitted, then the entire input record (@code{$0}) is used.
12789As in @code{sub}, the characters @samp{&} and @samp{\} are special,
12790and the third argument must be assignable.
12791
12792@item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) #
12793@cindex @code{gensub} function (@command{gawk})
12794@code{gensub} is a general substitution function.  Like @code{sub} and
12795@code{gsub}, it searches the target string @var{target} for matches of
12796the regular expression @var{regexp}.  Unlike @code{sub} and @code{gsub},
12797the modified string is returned as the result of the function and the
12798original target string is @emph{not} changed.  If @var{how} is a string
12799beginning with @samp{g} or @samp{G}, then it replaces all matches of
12800@var{regexp} with @var{replacement}.  Otherwise, @var{how} is treated
12801as a number that indicates which match of @var{regexp} to replace. If
12802no @var{target} is supplied, @code{$0} is used.
12803
12804@code{gensub} provides an additional feature that is not available
12805in @code{sub} or @code{gsub}: the ability to specify components of a
12806regexp in the replacement text.  This is done by using parentheses in
12807the regexp to mark the components and then specifying @samp{\@var{N}}
12808in the replacement text, where @var{N} is a digit from 1 to 9.
12809For example:
12810
12811@example
12812$ gawk '
12813> BEGIN @{
12814>      a = "abc def"
12815>      b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
12816>      print b
12817> @}'
12818@print{} def abc
12819@end example
12820
12821@noindent
12822As with @code{sub}, you must type two backslashes in order
12823to get one into the string.
12824In the replacement text, the sequence @samp{\0} represents the entire
12825matched text, as does the character @samp{&}.
12826
12827The following example shows how you can use the third argument to control
12828which match of the regexp should be changed:
12829
12830@example
12831$ echo a b c a b c |
12832> gawk '@{ print gensub(/a/, "AA", 2) @}'
12833@print{} a b c AA b c
12834@end example
12835
12836In this case, @code{$0} is used as the default target string.
12837@code{gensub} returns the new string as its result, which is
12838passed directly to @code{print} for printing.
12839
12840@c @cindex automatic warnings
12841@c @cindex warnings, automatic
12842If the @var{how} argument is a string that does not begin with @samp{g} or
12843@samp{G}, or if it is a number that is less than or equal to zero, only one
12844substitution is performed.  If @var{how} is zero, @command{gawk} issues
12845a warning message.
12846
12847If @var{regexp} does not match @var{target}, @code{gensub}'s return value
12848is the original unchanged value of @var{target}.
12849
12850@code{gensub} is a @command{gawk} extension; it is not available
12851in compatibility mode (@pxref{Options}).
12852
12853@item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]})
12854@cindex @code{substr} function
12855This returns a @var{length}-character-long substring of @var{string},
12856starting at character number @var{start}.  The first character of a
12857string is character number one.@footnote{This is different from
12858C and C++, in which the first character is number zero.}
12859For example, @code{substr("washington", 5, 3)} returns @code{"ing"}.
12860
12861If @var{length} is not present, this function returns the whole suffix of
12862@var{string} that begins at character number @var{start}.  For example,
12863@code{substr("washington", 5)} returns @code{"ington"}.  The whole
12864suffix is also returned
12865if @var{length} is greater than the number of characters remaining
12866in the string, counting from character @var{start}.
12867
12868If @var{start} is less than one, @code{substr} treats it as
12869if it was one. (POSIX doesn't specify what to do in this case:
12870Unix @command{awk} acts this way, and therefore @command{gawk}
12871does too.)
12872If @var{start} is greater than the number of characters
12873in the string, @code{substr} returns the null string.
12874Similarly, if @var{length} is present but less than or equal to zero,
12875the null string is returned.
12876
12877@cindex troubleshooting, @code{substr} function
12878The string returned by @code{substr} @emph{cannot} be
12879assigned.  Thus, it is a mistake to attempt to change a portion of
12880a string, as shown in the following example:
12881
12882@example
12883string = "abcdef"
12884# try to get "abCDEf", won't work
12885substr(string, 3, 3) = "CDE"
12886@end example
12887
12888@noindent
12889It is also a mistake to use @code{substr} as the third argument
12890of @code{sub} or @code{gsub}:
12891
12892@example
12893gsub(/xyz/, "pdq", substr($0, 5, 20))  # WRONG
12894@end example
12895
12896@cindex portability, @code{substr} function
12897(Some commercial versions of @command{awk} do in fact let you use
12898@code{substr} this way, but doing so is not portable.)
12899
12900If you need to replace bits and pieces of a string, combine @code{substr}
12901with string concatenation, in the following manner:
12902
12903@example
12904string = "abcdef"
12905@dots{}
12906string = substr(string, 1, 2) "CDE" substr(string, 6)
12907@end example
12908
12909@cindex case sensitivity, converting case
12910@cindex converting, case
12911@item tolower(@var{string})
12912@cindex @code{tolower} function
12913This returns a copy of @var{string}, with each uppercase character
12914in the string replaced with its corresponding lowercase character.
12915Nonalphabetic characters are left unchanged.  For example,
12916@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
12917
12918@item toupper(@var{string})
12919@cindex @code{toupper} function
12920This returns a copy of @var{string}, with each lowercase character
12921in the string replaced with its corresponding uppercase character.
12922Nonalphabetic characters are left unchanged.  For example,
12923@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
12924@end table
12925
12926@node Gory Details
12927@subsubsection More About @samp{\} and @samp{&} with @code{sub}, @code{gsub}, and @code{gensub}
12928
12929@cindex escape processing, @code{gsub}/@code{gensub}/@code{sub} functions
12930@cindex @code{sub} function, escape processing
12931@cindex @code{gsub} function, escape processing
12932@cindex @code{gensub} function (@command{gawk}), escape processing
12933@cindex @code{\} (backslash), @code{gsub}/@code{gensub}/@code{sub} functions and
12934@cindex backslash (@code{\}), @code{gsub}/@code{gensub}/@code{sub} functions and
12935@cindex @code{&} (ampersand), @code{gsub}/@code{gensub}/@code{sub} functions and
12936@cindex ampersand (@code{&}), @code{gsub}/@code{gensub}/@code{sub} functions and
12937When using @code{sub}, @code{gsub}, or @code{gensub}, and trying to get literal
12938backslashes and ampersands into the replacement text, you need to remember
12939that there are several levels of @dfn{escape processing} going on.
12940
12941First, there is the @dfn{lexical} level, which is when @command{awk} reads
12942your program
12943and builds an internal copy of it that can be executed.
12944Then there is the runtime level, which is when @command{awk} actually scans the
12945replacement string to determine what to generate.
12946
12947At both levels, @command{awk} looks for a defined set of characters that
12948can come after a backslash.  At the lexical level, it looks for the
12949escape sequences listed in @ref{Escape Sequences}.
12950Thus, for every @samp{\} that @command{awk} processes at the runtime
12951level, type two backslashes at the lexical level.
12952When a character that is not valid for an escape sequence follows the
12953@samp{\}, Unix @command{awk} and @command{gawk} both simply remove the initial
12954@samp{\} and put the next character into the string. Thus, for
12955example, @code{"a\qb"} is treated as @code{"aqb"}.
12956
12957At the runtime level, the various functions handle sequences of
12958@samp{\} and @samp{&} differently.  The situation is (sadly) somewhat complex.
12959Historically, the @code{sub} and @code{gsub} functions treated the two
12960character sequence @samp{\&} specially; this sequence was replaced in
12961the generated text with a single @samp{&}.  Any other @samp{\} within
12962the @var{replacement} string that did not precede an @samp{&} was passed
12963through unchanged.  To illustrate with a table:
12964
12965@c Thank to Karl Berry for help with the TeX stuff.
12966@tex
12967\vbox{\bigskip
12968% This table has lots of &'s and \'s, so unspecialize them.
12969\catcode`\& = \other \catcode`\\ = \other
12970% But then we need character for escape and tab.
12971@catcode`! = 4
12972@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
12973    You type!@code{sub} sees!@code{sub} generates@cr
12974@hrulefill!@hrulefill!@hrulefill@cr
12975   @code{\&}!       @code{&}!the matched text@cr
12976  @code{\\&}!      @code{\&}!a literal @samp{&}@cr
12977 @code{\\\&}!      @code{\&}!a literal @samp{&}@cr
12978@code{\\\\&}!     @code{\\&}!a literal @samp{\&}@cr
12979@code{\\\\\&}!     @code{\\&}!a literal @samp{\&}@cr
12980@code{\\\\\\&}!     @code{\\\&}!a literal @samp{\\&}@cr
12981  @code{\\q}!      @code{\q}!a literal @samp{\q}@cr
12982}
12983@bigskip}
12984@end tex
12985@ifnottex
12986@display
12987 You type         @code{sub} sees          @code{sub} generates
12988 --------         ----------          ---------------
12989     @code{\&}              @code{&}            the matched text
12990    @code{\\&}             @code{\&}            a literal @samp{&}
12991   @code{\\\&}             @code{\&}            a literal @samp{&}
12992  @code{\\\\&}            @code{\\&}            a literal @samp{\&}
12993 @code{\\\\\&}            @code{\\&}            a literal @samp{\&}
12994@code{\\\\\\&}           @code{\\\&}            a literal @samp{\\&}
12995    @code{\\q}             @code{\q}            a literal @samp{\q}
12996@end display
12997@end ifnottex
12998
12999@noindent
13000This table shows both the lexical-level processing, where
13001an odd number of backslashes becomes an even number at the runtime level,
13002as well as the runtime processing done by @code{sub}.
13003(For the sake of simplicity, the rest of the following tables only show the
13004case of even numbers of backslashes entered at the lexical level.)
13005
13006The problem with the historical approach is that there is no way to get
13007a literal @samp{\} followed by the matched text.
13008
13009@c @cindex @command{awk} language, POSIX version
13010@cindex POSIX @command{awk}, functions and, @code{gsub}/@code{sub}
13011The 1992 POSIX standard attempted to fix this problem. The standard
13012says that @code{sub} and @code{gsub} look for either a @samp{\} or an @samp{&}
13013after the @samp{\}. If either one follows a @samp{\}, that character is
13014output literally.  The interpretation of @samp{\} and @samp{&} then becomes:
13015
13016@c thanks to Karl Berry for formatting this table
13017@tex
13018\vbox{\bigskip
13019% This table has lots of &'s and \'s, so unspecialize them.
13020\catcode`\& = \other \catcode`\\ = \other
13021% But then we need character for escape and tab.
13022@catcode`! = 4
13023@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
13024    You type!@code{sub} sees!@code{sub} generates@cr
13025@hrulefill!@hrulefill!@hrulefill@cr
13026    @code{&}!       @code{&}!the matched text@cr
13027  @code{\\&}!      @code{\&}!a literal @samp{&}@cr
13028@code{\\\\&}!     @code{\\&}!a literal @samp{\}, then the matched text@cr
13029@code{\\\\\\&}!  @code{\\\&}!a literal @samp{\&}@cr
13030}
13031@bigskip}
13032@end tex
13033@ifnottex
13034@display
13035 You type         @code{sub} sees          @code{sub} generates
13036 --------         ----------          ---------------
13037      @code{&}              @code{&}            the matched text
13038    @code{\\&}             @code{\&}            a literal @samp{&}
13039  @code{\\\\&}            @code{\\&}            a literal @samp{\}, then the matched text
13040@code{\\\\\\&}           @code{\\\&}            a literal @samp{\&}
13041@end display
13042@end ifnottex
13043
13044@noindent
13045This appears to solve the problem.
13046Unfortunately, the phrasing of the standard is unusual. It
13047says, in effect, that @samp{\} turns off the special meaning of any
13048following character, but for anything other than @samp{\} and @samp{&},
13049such special meaning is undefined.  This wording leads to two problems:
13050
13051@itemize @bullet
13052@item
13053Backslashes must now be doubled in the @var{replacement} string, breaking
13054historical @command{awk} programs.
13055
13056@item
13057To make sure that an @command{awk} program is portable, @emph{every} character
13058in the @var{replacement} string must be preceded with a
13059backslash.@footnote{This consequence was certainly unintended.}
13060@c I can say that, 'cause I was involved in making this change
13061@end itemize
13062
13063The POSIX standard is under revision.
13064Because of the problems just listed, proposed text for the revised standard
13065reverts to rules that correspond more closely to the original existing
13066practice. The proposed rules have special cases that make it possible
13067to produce a @samp{\} preceding the matched text:
13068
13069@tex
13070\vbox{\bigskip
13071% This table has lots of &'s and \'s, so unspecialize them.
13072\catcode`\& = \other \catcode`\\ = \other
13073% But then we need character for escape and tab.
13074@catcode`! = 4
13075@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
13076    You type!@code{sub} sees!@code{sub} generates@cr
13077@hrulefill!@hrulefill!@hrulefill@cr
13078@code{\\\\\\&}!     @code{\\\&}!a literal @samp{\&}@cr
13079@code{\\\\&}!     @code{\\&}!a literal @samp{\}, followed by the matched text@cr
13080  @code{\\&}!      @code{\&}!a literal @samp{&}@cr
13081  @code{\\q}!      @code{\q}!a literal @samp{\q}@cr
13082}
13083@bigskip}
13084@end tex
13085@ifinfo
13086@display
13087 You type         @code{sub} sees         @code{sub} generates
13088 --------         ----------         ---------------
13089@code{\\\\\\&}           @code{\\\&}            a literal @samp{\&}
13090  @code{\\\\&}            @code{\\&}            a literal @samp{\}, followed by the matched text
13091    @code{\\&}             @code{\&}            a literal @samp{&}
13092    @code{\\q}             @code{\q}            a literal @samp{\q}
13093@end display
13094@end ifinfo
13095
13096In a nutshell, at the runtime level, there are now three special sequences
13097of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically
13098there was only one.  However, as in the historical case, any @samp{\} that
13099is not part of one of these three sequences is not special and appears
13100in the output literally.
13101
13102@command{gawk} 3.0 and 3.1 follow these proposed POSIX rules for @code{sub} and
13103@code{gsub}.
13104@c As much as we think it's a lousy idea. You win some, you lose some. Sigh.
13105Whether these proposed rules will actually become codified into the
13106standard is unknown at this point. Subsequent @command{gawk} releases will
13107track the standard and implement whatever the final version specifies;
13108this @value{DOCUMENT} will be updated as
13109well.@footnote{As this @value{DOCUMENT} was being finalized,
13110we learned that the POSIX standard will not use these rules.
13111However, it was too late to change @command{gawk} for the 3.1 release.
13112@command{gawk} behaves as described here.}
13113
13114The rules for @code{gensub} are considerably simpler. At the runtime
13115level, whenever @command{gawk} sees a @samp{\}, if the following character
13116is a digit, then the text that matched the corresponding parenthesized
13117subexpression is placed in the generated output.  Otherwise,
13118no matter what character follows the @samp{\}, it
13119appears in the generated text and the @samp{\} does not:
13120
13121@tex
13122\vbox{\bigskip
13123% This table has lots of &'s and \'s, so unspecialize them.
13124\catcode`\& = \other \catcode`\\ = \other
13125% But then we need character for escape and tab.
13126@catcode`! = 4
13127@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
13128    You type!@code{gensub} sees!@code{gensub} generates@cr
13129@hrulefill!@hrulefill!@hrulefill@cr
13130      @code{&}!           @code{&}!the matched text@cr
13131    @code{\\&}!          @code{\&}!a literal @samp{&}@cr
13132   @code{\\\\}!          @code{\\}!a literal @samp{\}@cr
13133  @code{\\\\&}!         @code{\\&}!a literal @samp{\}, then the matched text@cr
13134@code{\\\\\\&}!        @code{\\\&}!a literal @samp{\&}@cr
13135    @code{\\q}!          @code{\q}!a literal @samp{q}@cr
13136}
13137@bigskip}
13138@end tex
13139@ifnottex
13140@display
13141  You type          @code{gensub} sees         @code{gensub} generates
13142  --------          -------------         ------------------
13143      @code{&}                    @code{&}            the matched text
13144    @code{\\&}                   @code{\&}            a literal @samp{&}
13145   @code{\\\\}                   @code{\\}            a literal @samp{\}
13146  @code{\\\\&}                  @code{\\&}            a literal @samp{\}, then the matched text
13147@code{\\\\\\&}                 @code{\\\&}            a literal @samp{\&}
13148    @code{\\q}                   @code{\q}            a literal @samp{q}
13149@end display
13150@end ifnottex
13151
13152Because of the complexity of the lexical and runtime level processing
13153and the special cases for @code{sub} and @code{gsub},
13154we recommend the use of @command{gawk} and @code{gensub} when you have
13155to do substitutions.
13156
13157@c fakenode --- for prepinfo
13158@subheading Advanced Notes: Matching the Null String
13159@c last comma does NOT start tertiary
13160@cindex advanced features, null strings, matching
13161@cindex matching, null strings
13162@cindex null strings, matching
13163@c last comma in next two is part of tertiary
13164@cindex @code{*} (asterisk), @code{*} operator, null strings, matching
13165@cindex asterisk (@code{*}), @code{*} operator, null strings, matching
13166
13167In @command{awk}, the @samp{*} operator can match the null string.
13168This is particularly important for the @code{sub}, @code{gsub},
13169and @code{gensub} functions.  For example:
13170
13171@example
13172$ echo abc | awk '@{ gsub(/m*/, "X"); print @}'
13173@print{} XaXbXcX
13174@end example
13175
13176@noindent
13177Although this makes a certain amount of sense, it can be surprising.
13178
13179@node I/O Functions
13180@subsection Input/Output Functions
13181
13182The following functions relate to input/output (I/O).
13183Optional parameters are enclosed in square brackets ([ ]):
13184
13185@table @code
13186@item close(@var{filename} @r{[}, @var{how}@r{]})
13187@cindex @code{close} function
13188@cindex files, closing
13189Close the file @var{filename} for input or output. Alternatively, the
13190argument may be a shell command that was used for creating a coprocess, or
13191for redirecting to or from a pipe; then the coprocess or pipe is closed.
13192@xref{Close Files And Pipes},
13193for more information.
13194
13195When closing a coprocess, it is occasionally useful to first close
13196one end of the two-way pipe and then to close the other.  This is done
13197by providing a second argument to @code{close}.  This second argument
13198should be one of the two string values @code{"to"} or @code{"from"},
13199indicating which end of the pipe to close.  Case in the string does
13200not matter.
13201@xref{Two-way I/O},
13202which discusses this feature in more detail and gives an example.
13203
13204@item fflush(@r{[}@var{filename}@r{]})
13205@cindex @code{fflush} function
13206Flush any buffered output associated with @var{filename}, which is either a
13207file opened for writing or a shell command for redirecting output to
13208a pipe or coprocess.
13209
13210@cindex portability, @code{fflush} function and
13211@cindex buffers, flushing
13212@cindex output, buffering
13213Many utility programs @dfn{buffer} their output; i.e., they save information
13214to write to a disk file or terminal in memory until there is enough
13215for it to be worthwhile to send the data to the output device.
13216This is often more efficient than writing
13217every little bit of information as soon as it is ready.  However, sometimes
13218it is necessary to force a program to @dfn{flush} its buffers; that is,
13219write the information to its destination, even if a buffer is not full.
13220This is the purpose of the @code{fflush} function---@command{gawk} also
13221buffers its output and the @code{fflush} function forces
13222@command{gawk} to flush its buffers.
13223
13224@code{fflush} was added to the Bell Laboratories research
13225version of @command{awk} in 1994; it is not part of the POSIX standard and is
13226not available if @option{--posix} has been specified on the
13227command line (@pxref{Options}).
13228
13229@cindex @command{gawk}, @code{fflush} function in
13230@command{gawk} extends the @code{fflush} function in two ways.  The first
13231is to allow no argument at all. In this case, the buffer for the
13232standard output is flushed.  The second is to allow the null string
13233(@w{@code{""}}) as the argument. In this case, the buffers for
13234@emph{all} open output files and pipes are flushed.
13235
13236@c @cindex automatic warnings
13237@c @cindex warnings, automatic
13238@cindex troubleshooting, @code{fflush} function
13239@code{fflush} returns zero if the buffer is successfully flushed;
13240otherwise, it returns @minus{}1.
13241In the case where all buffers are flushed, the return value is zero
13242only if all buffers were flushed successfully.  Otherwise, it is
13243@minus{}1, and @command{gawk} warns about the problem @var{filename}.
13244
13245@command{gawk} also issues a warning message if you attempt to flush
13246a file or pipe that was opened for reading (such as with @code{getline}),
13247or if @var{filename} is not an open file, pipe, or coprocess.
13248In such a case, @code{fflush} returns @minus{}1, as well.
13249
13250@item system(@var{command})
13251@cindex @code{system} function
13252@cindex interacting with other programs
13253Executes operating-system
13254commands and then returns to the @command{awk} program.  The @code{system}
13255function executes the command given by the string @var{command}.
13256It returns the status returned by the command that was executed as
13257its value.
13258
13259For example, if the following fragment of code is put in your @command{awk}
13260program:
13261
13262@example
13263END @{
13264     system("date | mail -s 'awk run done' root")
13265@}
13266@end example
13267
13268@noindent
13269the system administrator is sent mail when the @command{awk} program
13270finishes processing input and begins its end-of-input processing.
13271
13272Note that redirecting @code{print} or @code{printf} into a pipe is often
13273enough to accomplish your task.  If you need to run many commands, it
13274is more efficient to simply print them down a pipeline to the shell:
13275
13276@example
13277while (@var{more stuff to do})
13278    print @var{command} | "/bin/sh"
13279close("/bin/sh")
13280@end example
13281
13282@noindent
13283@cindex troubleshooting, @code{system} function
13284However, if your @command{awk}
13285program is interactive, @code{system} is useful for cranking up large
13286self-contained programs, such as a shell or an editor.
13287Some operating systems cannot implement the @code{system} function.
13288@code{system} causes a fatal error if it is not supported.
13289@end table
13290
13291@c fakenode --- for prepinfo
13292@subheading Advanced Notes: Interactive Versus Noninteractive Buffering
13293@cindex advanced features, buffering
13294@cindex buffering, interactive vs. noninteractive
13295
13296As a side point, buffering issues can be even more confusing, depending
13297upon whether your program is @dfn{interactive}, i.e., communicating
13298with a user sitting at a keyboard.@footnote{A program is interactive
13299if the standard output is connected
13300to a terminal device.}
13301
13302@c Thanks to Walter.Mecky@dresdnerbank.de for this example, and for
13303@c motivating me to write this section.
13304Interactive programs generally @dfn{line buffer} their output; i.e., they
13305write out every line.  Noninteractive programs wait until they have
13306a full buffer, which may be many lines of output.
13307Here is an example of the difference:
13308
13309@example
13310$ awk '@{ print $1 + $2 @}'
133111 1
13312@print{} 2
133132 3
13314@print{} 5
13315@kbd{@value{CTL}-d}
13316@end example
13317
13318@noindent
13319Each line of output is printed immediately. Compare that behavior
13320with this example:
13321
13322@example
13323$ awk '@{ print $1 + $2 @}' | cat
133241 1
133252 3
13326@kbd{@value{CTL}-d}
13327@print{} 2
13328@print{} 5
13329@end example
13330
13331@noindent
13332Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because
13333it is all buffered and sent down the pipe to @command{cat} in one shot.
13334
13335@c fakenode --- for prepinfo
13336@subheading Advanced Notes: Controlling Output Buffering with @code{system}
13337@cindex advanced features, buffering
13338@cindex buffers, flushing
13339@cindex buffering, input/output
13340@cindex output, buffering
13341
13342The @code{fflush} function provides explicit control over output buffering for
13343individual files and pipes.  However, its use is not portable to many other
13344@command{awk} implementations.  An alternative method to flush output
13345buffers is to call @code{system} with a null string as its argument:
13346
13347@example
13348system("")   # flush output
13349@end example
13350
13351@noindent
13352@command{gawk} treats this use of the @code{system} function as a special
13353case and is smart enough not to run a shell (or other command
13354interpreter) with the empty command.  Therefore, with @command{gawk}, this
13355idiom is not only useful, it is also efficient.  While this method should work
13356with other @command{awk} implementations, it does not necessarily avoid
13357starting an unnecessary shell.  (Other implementations may only
13358flush the buffer associated with the standard output and not necessarily
13359all buffered output.)
13360
13361If you think about what a programmer expects, it makes sense that
13362@code{system} should flush any pending output.  The following program:
13363
13364@example
13365BEGIN @{
13366     print "first print"
13367     system("echo system echo")
13368     print "second print"
13369@}
13370@end example
13371
13372@noindent
13373must print:
13374
13375@example
13376first print
13377system echo
13378second print
13379@end example
13380
13381@noindent
13382and not:
13383
13384@example
13385system echo
13386first print
13387second print
13388@end example
13389
13390If @command{awk} did not flush its buffers before calling @code{system},
13391you would see the latter (undesirable) output.
13392
13393@node Time Functions
13394@subsection Using @command{gawk}'s Timestamp Functions
13395
13396@c STARTOFRANGE tst
13397@cindex timestamps
13398@c STARTOFRANGE logftst
13399@cindex log files, timestamps in
13400@c last comma does NOT start tertiary
13401@c STARTOFRANGE filogtst
13402@cindex files, log, timestamps in
13403@c STARTOFRANGE gawtst
13404@cindex @command{gawk}, timestamps
13405@cindex POSIX @command{awk}, timestamps and
13406@code{awk} programs are commonly used to process log files
13407containing timestamp information, indicating when a
13408particular log record was written.  Many programs log their timestamp
13409in the form returned by the @code{time} system call, which is the
13410number of seconds since a particular epoch.  On POSIX-compliant systems,
13411it is the number of seconds since
134121970-01-01 00:00:00 UTC, not counting leap seconds.@footnote{@xref{Glossary},
13413especially the entries ``Epoch'' and ``UTC.''}
13414All known POSIX-compliant systems support timestamps from 0 through
13415@math{2^31 - 1}, which is sufficient to represent times through
134162038-01-19 03:14:07 UTC.  Many systems support a wider range of timestamps,
13417including negative timestamps that represent times before the
13418epoch.
13419
13420@cindex @command{date} utility, GNU
13421@cindex time, retrieving
13422In order to make it easier to process such log files and to produce
13423useful reports, @command{gawk} provides the following functions for
13424working with timestamps.  They are @command{gawk} extensions; they are
13425not specified in the POSIX standard, nor are they in any other known
13426version of @command{awk}.@footnote{The GNU @command{date} utility can
13427also do many of the things described here.  Its use may be preferable
13428for simple time-related operations in shell scripts.}
13429Optional parameters are enclosed in square brackets ([ ]):
13430
13431@table @code
13432@item systime()
13433@cindex @code{systime} function (@command{gawk})
13434@cindex timestamps
13435This function returns the current time as the number of seconds since
13436the system epoch.  On POSIX systems, this is the number of seconds
13437since 1970-01-01 00:00:00 UTC, not counting leap seconds.
13438It may be a different number on
13439other systems.
13440
13441@item mktime(@var{datespec})
13442@cindex @code{mktime} function (@command{gawk})
13443This function turns @var{datespec} into a timestamp in the same form
13444as is returned by @code{systime}.  It is similar to the function of the
13445same name in ISO C.  The argument, @var{datespec}, is a string of the form
13446@w{@code{"@var{YYYY} @var{MM} @var{DD} @var{HH} @var{MM} @var{SS} [@var{DST}]"}}.
13447The string consists of six or seven numbers representing, respectively,
13448the full year including century, the month from 1 to 12, the day of the month
13449from 1 to 31, the hour of the day from 0 to 23, the minute from 0 to
1345059, the second from 0 to 60,@footnote{Occasionally there are
13451minutes in a year with a leap second, which is why the
13452seconds can go up to 60.}
13453and an optional daylight-savings flag.
13454
13455The values of these numbers need not be within the ranges specified;
13456for example, an hour of @minus{}1 means 1 hour before midnight.
13457The origin-zero Gregorian calendar is assumed, with year 0 preceding
13458year 1 and year @minus{}1 preceding year 0.
13459The time is assumed to be in the local timezone.
13460If the daylight-savings flag is positive, the time is assumed to be
13461daylight savings time; if zero, the time is assumed to be standard
13462time; and if negative (the default), @code{mktime} attempts to determine
13463whether daylight savings time is in effect for the specified time.
13464
13465If @var{datespec} does not contain enough elements or if the resulting time
13466is out of range, @code{mktime} returns @minus{}1.
13467
13468@item strftime(@r{[}@var{format} @r{[}, @var{timestamp}@r{]]})
13469@c STARTOFRANGE strf
13470@cindex @code{strftime} function (@command{gawk})
13471This function returns a string.  It is similar to the function of the
13472same name in ISO C.  The time specified by @var{timestamp} is used to
13473produce a string, based on the contents of the @var{format} string.
13474The @var{timestamp} is in the same format as the value returned by the
13475@code{systime} function.  If no @var{timestamp} argument is supplied,
13476@command{gawk} uses the current time of day as the timestamp.
13477If no @var{format} argument is supplied, @code{strftime} uses
13478@code{@w{"%a %b %d %H:%M:%S %Z %Y"}}.  This format string produces
13479output that is (almost) equivalent to that of the @command{date} utility.
13480(Versions of @command{gawk} prior to 3.0 require the @var{format} argument.)
13481@end table
13482
13483The @code{systime} function allows you to compare a timestamp from a
13484log file with the current time of day.  In particular, it is easy to
13485determine how long ago a particular record was logged.  It also allows
13486you to produce log records using the ``seconds since the epoch'' format.
13487
13488@cindex converting, dates to timestamps
13489@cindex dates, converting to timestamps
13490@cindex timestamps, converting dates to
13491The @code{mktime} function allows you to convert a textual representation
13492of a date and time into a timestamp.   This makes it easy to do before/after
13493comparisons of dates and times, particularly when dealing with date and
13494time data coming from an external source, such as a log file.
13495
13496The @code{strftime} function allows you to easily turn a timestamp
13497into human-readable information.  It is similar in nature to the @code{sprintf}
13498function
13499(@pxref{String Functions}),
13500in that it copies nonformat specification characters verbatim to the
13501returned string, while substituting date and time values for format
13502specifications in the @var{format} string.
13503
13504@cindex format specifiers, @code{strftime} function (@command{gawk})
13505@code{strftime} is guaranteed by the 1999 ISO C standard@footnote{As this
13506is a recent standard, not every system's @code{strftime} necessarily
13507supports all of the conversions listed here.}
13508to support the following date format specifications:
13509
13510@table @code
13511@item %a
13512The locale's abbreviated weekday name.
13513
13514@item %A
13515The locale's full weekday name.
13516
13517@item %b
13518The locale's abbreviated month name.
13519
13520@item %B
13521The locale's full month name.
13522
13523@item %c
13524The locale's ``appropriate'' date and time representation.
13525(This is @samp{%A %B %d %T %Y} in the @code{"C"} locale.)
13526
13527@item %C
13528The century.  This is the year divided by 100 and truncated to the next
13529lower integer.
13530
13531@item %d
13532The day of the month as a decimal number (01--31).
13533
13534@item %D
13535Equivalent to specifying @samp{%m/%d/%y}.
13536
13537@item %e
13538The day of the month, padded with a space if it is only one digit.
13539
13540@item %F
13541Equivalent to specifying @samp{%Y-%m-%d}.
13542This is the ISO 8601 date format.
13543
13544@item %g
13545The year modulo 100 of the ISO week number, as a decimal number (00--99).
13546For example, January 1, 1993 is in week 53 of 1992. Thus, the year
13547of its ISO week number is 1992, even though its year is 1993.
13548Similarly, December 31, 1973 is in week 1 of 1974. Thus, the year
13549of its ISO week number is 1974, even though its year is 1973.
13550
13551@item %G
13552The full year of the ISO week number, as a decimal number.
13553
13554@item %h
13555Equivalent to @samp{%b}.
13556
13557@item %H
13558The hour (24-hour clock) as a decimal number (00--23).
13559
13560@item %I
13561The hour (12-hour clock) as a decimal number (01--12).
13562
13563@item %j
13564The day of the year as a decimal number (001--366).
13565
13566@item %m
13567The month as a decimal number (01--12).
13568
13569@item %M
13570The minute as a decimal number (00--59).
13571
13572@item %n
13573A newline character (ASCII LF).
13574
13575@item %p
13576The locale's equivalent of the AM/PM designations associated
13577with a 12-hour clock.
13578
13579@item %r
13580The locale's 12-hour clock time.
13581(This is @samp{%I:%M:%S %p} in the @code{"C"} locale.)
13582
13583@item %R
13584Equivalent to specifying @samp{%H:%M}.
13585
13586@item %S
13587The second as a decimal number (00--60).
13588
13589@item %t
13590A TAB character.
13591
13592@item %T
13593Equivalent to specifying @samp{%H:%M:%S}.
13594
13595@item %u
13596The weekday as a decimal number (1--7).  Monday is day one.
13597
13598@item %U
13599The week number of the year (the first Sunday as the first day of week one)
13600as a decimal number (00--53).
13601
13602@c @cindex ISO 8601
13603@item %V
13604The week number of the year (the first Monday as the first
13605day of week one) as a decimal number (01--53).
13606The method for determining the week number is as specified by ISO 8601.
13607(To wit: if the week containing January 1 has four or more days in the
13608new year, then it is week one; otherwise it is week 53 of the previous year
13609and the next week is week one.)
13610
13611@item %w
13612The weekday as a decimal number (0--6).  Sunday is day zero.
13613
13614@item %W
13615The week number of the year (the first Monday as the first day of week one)
13616as a decimal number (00--53).
13617
13618@item %x
13619The locale's ``appropriate'' date representation.
13620(This is @samp{%A %B %d %Y} in the @code{"C"} locale.)
13621
13622@item %X
13623The locale's ``appropriate'' time representation.
13624(This is @samp{%T} in the @code{"C"} locale.)
13625
13626@item %y
13627The year modulo 100 as a decimal number (00--99).
13628
13629@item %Y
13630The full year as a decimal number (e.g., 1995).
13631
13632@c @cindex RFC 822
13633@c @cindex RFC 1036
13634@item %z
13635The timezone offset in a +HHMM format (e.g., the format necessary to
13636produce RFC 822/RFC 1036 date headers).
13637
13638@item %Z
13639The time zone name or abbreviation; no characters if
13640no time zone is determinable.
13641
13642@item %Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH
13643@itemx %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy
13644``Alternate representations'' for the specifications
13645that use only the second letter (@samp{%c}, @samp{%C},
13646and so on).@footnote{If you don't understand any of this, don't worry about
13647it; these facilities are meant to make it easier to ``internationalize''
13648programs.
13649Other internationalization features are described in
13650@ref{Internationalization}.}
13651(These facilitate compliance with the POSIX @command{date} utility.)
13652
13653@item %%
13654A literal @samp{%}.
13655@end table
13656
13657If a conversion specifier is not one of the above, the behavior is
13658undefined.@footnote{This is because ISO C leaves the
13659behavior of the C version of @code{strftime} undefined and @command{gawk}
13660uses the system's version of @code{strftime} if it's there.
13661Typically, the conversion specifier either does not appear in the
13662returned string or appears literally.}
13663
13664@c @cindex locale, definition of
13665Informally, a @dfn{locale} is the geographic place in which a program
13666is meant to run.  For example, a common way to abbreviate the date
13667September 4, 1991 in the United States is ``9/4/91.''
13668In many countries in Europe, however, it is abbreviated ``4.9.91.''
13669Thus, the @samp{%x} specification in a @code{"US"} locale might produce
13670@samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce
13671@samp{4.9.91}.  The ISO C standard defines a default @code{"C"}
13672locale, which is an environment that is typical of what most C programmers
13673are used to.
13674
13675A public-domain C version of @code{strftime} is supplied with @command{gawk}
13676for systems that are not yet fully standards-compliant.
13677It supports all of the just listed format specifications.
13678If that version is
13679used to compile @command{gawk} (@pxref{Installation}),
13680then the following additional format specifications are available:
13681
13682@table @code
13683@item %k
13684The hour (24-hour clock) as a decimal number (0--23).
13685Single-digit numbers are padded with a space.
13686
13687@item %l
13688The hour (12-hour clock) as a decimal number (1--12).
13689Single-digit numbers are padded with a space.
13690
13691@item %N
13692The ``Emperor/Era'' name.
13693Equivalent to @code{%C}.
13694
13695@item %o
13696The ``Emperor/Era'' year.
13697Equivalent to @code{%y}.
13698
13699@item %s
13700The time as a decimal timestamp in seconds since the epoch.
13701
13702@item %v
13703The date in VMS format (e.g., @samp{20-JUN-1991}).
13704@end table
13705@c ENDOFRANGE strf
13706
13707Additionally, the alternate representations are recognized but their
13708normal representations are used.
13709
13710@cindex @code{date} utility, POSIX
13711@cindex POSIX @command{awk}, @code{date} utility and
13712This example is an @command{awk} implementation of the POSIX
13713@command{date} utility.  Normally, the @command{date} utility prints the
13714current date and time of day in a well-known format.  However, if you
13715provide an argument to it that begins with a @samp{+}, @command{date}
13716copies nonformat specifier characters to the standard output and
13717interprets the current time according to the format specifiers in
13718the string.  For example:
13719
13720@example
13721$ date '+Today is %A, %B %d, %Y.'
13722@print{} Today is Thursday, September 14, 2000.
13723@end example
13724
13725Here is the @command{gawk} version of the @command{date} utility.
13726It has a shell ``wrapper'' to handle the @option{-u} option,
13727which requires that @command{date} run as if the time zone
13728is set to UTC:
13729
13730@example
13731#! /bin/sh
13732#
13733# date --- approximate the P1003.2 'date' command
13734
13735case $1 in
13736-u)  TZ=UTC0     # use UTC
13737     export TZ
13738     shift ;;
13739esac
13740
13741@c FIXME: One day, change %d to %e, when C 99 is common.
13742gawk 'BEGIN  @{
13743    format = "%a %b %d %H:%M:%S %Z %Y"
13744    exitval = 0
13745
13746    if (ARGC > 2)
13747        exitval = 1
13748    else if (ARGC == 2) @{
13749        format = ARGV[1]
13750        if (format ~ /^\+/)
13751            format = substr(format, 2)   # remove leading +
13752    @}
13753    print strftime(format)
13754    exit exitval
13755@}' "$@@"
13756@end example
13757@c ENDOFRANGE tst
13758@c ENDOFRANGE logftst
13759@c ENDOFRANGE filogtst
13760@c ENDOFRANGE gawtst
13761
13762@node Bitwise Functions
13763@subsection Bit-Manipulation Functions of @command{gawk}
13764@c STARTOFRANGE bit
13765@cindex bitwise, operations
13766@c STARTOFRANGE and
13767@cindex AND bitwise operation
13768@c STARTOFRANGE oro
13769@cindex OR bitwise operation
13770@c STARTOFRANGE xor
13771@cindex XOR bitwise operation
13772@c STARTOFRANGE opbit
13773@cindex operations, bitwise
13774@quotation
13775@i{I can explain it for you, but I can't understand it for you.}@*
13776Anonymous
13777@end quotation
13778
13779Many languages provide the ability to perform @dfn{bitwise} operations
13780on two integer numbers.  In other words, the operation is performed on
13781each successive pair of bits in the operands.
13782Three common operations are bitwise AND, OR, and XOR.
13783The operations are described by the following table:
13784
13785@ifnottex
13786@display
13787                Bit Operator
13788          |  AND  |   OR  |  XOR
13789          |---+---+---+---+---+---
13790Operands  | 0 | 1 | 0 | 1 | 0 | 1
13791----------+---+---+---+---+---+---
13792    0     | 0   0 | 0   1 | 0   1
13793    1     | 0   1 | 1   1 | 1   0
13794@end display
13795@end ifnottex
13796@tex
13797\centerline{
13798\vbox{\bigskip % space above the table (about 1 linespace)
13799% Because we have vertical rules, we can't let TeX insert interline space
13800% in its usual way.
13801\offinterlineskip
13802\halign{\strut\hfil#\quad\hfil  % operands
13803        &\vrule#&\quad#\quad    % rule, 0 (of and)
13804        &\vrule#&\quad#\quad    % rule, 1 (of and)
13805        &\vrule#                % rule between and and or
13806        &\quad#\quad            % 0 (of or)
13807        &\vrule#&\quad#\quad    % rule, 1 (of of)
13808        &\vrule#                % rule between or and xor
13809        &\quad#\quad            % 0 of xor
13810        &\vrule#&\quad#\quad    % rule, 1 of xor
13811        \cr
13812&\omit&\multispan{11}\hfil\bf Bit operator\hfil\cr
13813\noalign{\smallskip}
13814&     &\multispan3\hfil AND\hfil&&\multispan3\hfil  OR\hfil
13815                           &&\multispan3\hfil XOR\hfil\cr
13816\bf Operands&&0&&1&&0&&1&&0&&1\cr
13817\noalign{\hrule}
13818\omit&height 2pt&&\omit&&&&\omit&&&&\omit\cr
13819\noalign{\hrule height0pt}% without this the rule does not extend; why?
138200&&0&\omit&0&&0&\omit&1&&0&\omit&1\cr
138211&&0&\omit&1&&1&\omit&1&&1&\omit&0\cr
13822}}}
13823@end tex
13824
13825@cindex bitwise, complement
13826@cindex complement, bitwise
13827As you can see, the result of an AND operation is 1 only when @emph{both}
13828bits are 1.
13829The result of an OR operation is 1 if @emph{either} bit is 1.
13830The result of an XOR operation is 1 if either bit is 1,
13831but not both.
13832The next operation is the @dfn{complement}; the complement of 1 is 0 and
13833the complement of 0 is 1. Thus, this operation ``flips'' all the bits
13834of a given value.
13835
13836@cindex bitwise, shift
13837@cindex left shift, bitwise
13838@cindex right shift, bitwise
13839@cindex shift, bitwise
13840Finally, two other common operations are to shift the bits left or right.
13841For example, if you have a bit string @samp{10111001} and you shift it
13842right by three bits, you end up with @samp{00010111}.@footnote{This example
13843shows that 0's come in on the left side. For @command{gawk}, this is
13844always true, but in some languages, it's possible to have the left side
13845fill with 1's. Caveat emptor.}
13846@c Purposely decided to use   0's   and   1's   here.  2/2001.
13847If you start over
13848again with @samp{10111001} and shift it left by three bits, you end up
13849with @samp{11001000}.
13850@command{gawk} provides built-in functions that implement the
13851bitwise operations just described. They are:
13852
13853@ignore
13854@table @code
13855@cindex @code{and} function (@command{gawk})
13856@item and(@var{v1}, @var{v2})
13857Return the bitwise AND of the values provided by @var{v1} and @var{v2}.
13858
13859@cindex @code{or} function (@command{gawk})
13860@item or(@var{v1}, @var{v2})
13861Return the bitwise OR of the values provided by @var{v1} and @var{v2}.
13862
13863@cindex @code{xor} function (@command{gawk})
13864@item xor(@var{v1}, @var{v2})
13865Return the bitwise XOR of the values provided by @var{v1} and @var{v2}.
13866
13867@cindex @code{compl} function (@command{gawk})
13868@item compl(@var{val})
13869Return the bitwise complement of @var{val}.
13870
13871@cindex @code{lshift} function (@command{gawk})
13872@item lshift(@var{val}, @var{count})
13873Return the value of @var{val}, shifted left by @var{count} bits.
13874
13875@cindex @code{rshift} function (@command{gawk})
13876@item rshift(@var{val}, @var{count})
13877Return the value of @var{val}, shifted right by @var{count} bits.
13878@end table
13879@end ignore
13880
13881@cindex @command{gawk}, bitwise operations in
13882@multitable {@code{rshift(@var{val}, @var{count})}} {Return the value of @var{val}, shifted right by @var{count} bits.}
13883@cindex @code{and} function (@command{gawk})
13884@item @code{and(@var{v1}, @var{v2})}
13885@tab Returns the bitwise AND of the values provided by @var{v1} and @var{v2}.
13886
13887@cindex @code{or} function (@command{gawk})
13888@item @code{or(@var{v1}, @var{v2})}
13889@tab Returns the bitwise OR of the values provided by @var{v1} and @var{v2}.
13890
13891@cindex @code{xor} function (@command{gawk})
13892@item @code{xor(@var{v1}, @var{v2})}
13893@tab Returns the bitwise XOR of the values provided by @var{v1} and @var{v2}.
13894
13895@cindex @code{compl} function (@command{gawk})
13896@item @code{compl(@var{val})}
13897@tab Returns the bitwise complement of @var{val}.
13898
13899@cindex @code{lshift} function (@command{gawk})
13900@item @code{lshift(@var{val}, @var{count})}
13901@tab Returns the value of @var{val}, shifted left by @var{count} bits.
13902
13903@cindex @code{rshift} function (@command{gawk})
13904@item @code{rshift(@var{val}, @var{count})}
13905@tab Returns the value of @var{val}, shifted right by @var{count} bits.
13906@end multitable
13907
13908For all of these functions, first the double-precision floating-point value is
13909converted to the widest C unsigned integer type, then the bitwise operation is
13910performed and then the result is converted back into a C @code{double}. (If
13911you don't understand this paragraph, don't worry about it.)
13912
13913Here is a user-defined function
13914(@pxref{User-defined})
13915that illustrates the use of these functions:
13916
13917@cindex @code{bits2str} user-defined function
13918@cindex @code{testbits.awk} program
13919@smallexample
13920@group
13921@c file eg/lib/bits2str.awk
13922# bits2str --- turn a byte into readable 1's and 0's
13923
13924function bits2str(bits,        data, mask)
13925@{
13926    if (bits == 0)
13927        return "0"
13928
13929    mask = 1
13930    for (; bits != 0; bits = rshift(bits, 1))
13931        data = (and(bits, mask) ? "1" : "0") data
13932
13933    while ((length(data) % 8) != 0)
13934        data = "0" data
13935
13936    return data
13937@}
13938@c endfile
13939@end group
13940
13941@c this is a hack to make testbits.awk self-contained
13942@ignore
13943@c file eg/prog/testbits.awk
13944# bits2str --- turn a byte into readable 1's and 0's
13945
13946function bits2str(bits,        data, mask)
13947@{
13948    if (bits == 0)
13949        return "0"
13950
13951    mask = 1
13952    for (; bits != 0; bits = rshift(bits, 1))
13953        data = (and(bits, mask) ? "1" : "0") data
13954
13955    while ((length(data) % 8) != 0)
13956        data = "0" data
13957
13958    return data
13959@}
13960@c endfile
13961@end ignore
13962@c file eg/prog/testbits.awk
13963BEGIN @{
13964    printf "123 = %s\n", bits2str(123)
13965    printf "0123 = %s\n", bits2str(0123)
13966    printf "0x99 = %s\n", bits2str(0x99)
13967    comp = compl(0x99)
13968    printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp)
13969    shift = lshift(0x99, 2)
13970    printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
13971    shift = rshift(0x99, 2)
13972    printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
13973@}
13974@c endfile
13975@end smallexample
13976
13977@noindent
13978This program produces the following output when run:
13979
13980@smallexample
13981$ gawk -f testbits.awk
13982@print{} 123 = 01111011
13983@print{} 0123 = 01010011
13984@print{} 0x99 = 10011001
13985@print{} compl(0x99) = 0xffffff66 = 11111111111111111111111101100110
13986@print{} lshift(0x99, 2) = 0x264 = 0000001001100100
13987@print{} rshift(0x99, 2) = 0x26 = 00100110
13988@end smallexample
13989
13990@cindex numbers, converting, to strings
13991@cindex strings, converting, numbers to
13992@cindex converting, numbers, to strings
13993The @code{bits2str} function turns a binary number into a string.
13994The number @code{1} represents a binary value where the rightmost bit
13995is set to 1.  Using this mask,
13996the function repeatedly checks the rightmost bit.
13997ANDing the mask with the value indicates whether the
13998rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front
13999of the string.
14000Otherwise, a @code{"0"} is added.
14001The value is then shifted right by one bit and the loop continues
14002until there are no more 1 bits.
14003
14004If the initial value is zero it returns a simple @code{"0"}.
14005Otherwise, at the end, it pads the value with zeros to represent multiples
14006of 8-bit quantities. This is typical in modern computers.
14007
14008The main code in the @code{BEGIN} rule shows the difference between the
14009decimal and octal values for the same numbers
14010(@pxref{Nondecimal-numbers}),
14011and then demonstrates the
14012results of the @code{compl}, @code{lshift}, and @code{rshift} functions.
14013@c ENDOFRANGE bit
14014@c ENDOFRANGE and
14015@c ENDOFRANGE oro
14016@c ENDOFRANGE xor
14017@c ENDOFRANGE opbit
14018
14019@node I18N Functions
14020@subsection Using @command{gawk}'s String-Translation Functions
14021@cindex @command{gawk}, string-translation functions
14022@cindex functions, string-translation
14023@cindex internationalization
14024@cindex @command{awk} programs, internationalizing
14025
14026@command{gawk} provides facilities for internationalizing @command{awk} programs.
14027These include the functions described in the following list.
14028The descriptions here are purposely brief.
14029@xref{Internationalization},
14030for the full story.
14031Optional parameters are enclosed in square brackets ([ ]):
14032
14033@table @code
14034@cindex @code{dcgettext} function (@command{gawk})
14035@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
14036This function returns the translation of @var{string} in
14037text domain @var{domain} for locale category @var{category}.
14038The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14039The default value for @var{category} is @code{"LC_MESSAGES"}.
14040
14041@cindex @code{dcngettext} function (@command{gawk})
14042@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
14043This function returns the plural form used for @var{number} of the
14044translation of @var{string1} and @var{string2} in text domain
14045@var{domain} for locale category @var{category}. @var{string1} is the
14046English singular variant of a message, and @var{string2} the English plural
14047variant of the same message.
14048The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14049The default value for @var{category} is @code{"LC_MESSAGES"}.
14050
14051@cindex @code{bindtextdomain} function (@command{gawk})
14052@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
14053This function allows you to specify the directory in which
14054@command{gawk} will look for message translation files, in case they
14055will not or cannot be placed in the ``standard'' locations
14056(e.g., during testing).
14057It returns the directory in which @var{domain} is ``bound.''
14058
14059The default @var{domain} is the value of @code{TEXTDOMAIN}.
14060If @var{directory} is the null string (@code{""}), then
14061@code{bindtextdomain} returns the current binding for the
14062given @var{domain}.
14063@end table
14064@c ENDOFRANGE funcbi
14065@c ENDOFRANGE bifunc
14066
14067@node User-defined
14068@section User-Defined Functions
14069
14070@c STARTOFRANGE udfunc
14071@cindex user-defined, functions
14072@c STARTOFRANGE funcud
14073@cindex functions, user-defined
14074Complicated @command{awk} programs can often be simplified by defining
14075your own functions.  User-defined functions can be called just like
14076built-in ones (@pxref{Function Calls}), but it is up to you to define
14077them, i.e., to tell @command{awk} what they should do.
14078
14079@menu
14080* Definition Syntax::           How to write definitions and what they mean.
14081* Function Example::            An example function definition and what it
14082                                does.
14083* Function Caveats::            Things to watch out for.
14084* Return Statement::            Specifying the value a function returns.
14085* Dynamic Typing::              How variable types can change at runtime.
14086@end menu
14087
14088@node Definition Syntax
14089@subsection Function Definition Syntax
14090
14091@c STARTOFRANGE fdef
14092@cindex functions, defining
14093Definitions of functions can appear anywhere between the rules of an
14094@command{awk} program.  Thus, the general form of an @command{awk} program is
14095extended to include sequences of rules @emph{and} user-defined function
14096definitions.
14097There is no need to put the definition of a function
14098before all uses of the function.  This is because @command{awk} reads the
14099entire program before starting to execute any of it.
14100
14101The definition of a function named @var{name} looks like this:
14102@c NEXT ED: put [ ] around parameter list
14103
14104@example
14105function @var{name}(@var{parameter-list})
14106@{
14107     @var{body-of-function}
14108@}
14109@end example
14110
14111@cindex names, functions
14112@cindex functions, names of
14113@cindex namespace issues, functions
14114@noindent
14115@var{name} is the name of the function to define.  A valid function
14116name is like a valid variable name: a sequence of letters, digits, and
14117underscores that doesn't start with a digit.
14118Within a single @command{awk} program, any particular name can only be
14119used as a variable, array, or function.
14120
14121@c NEXT ED: parameter-list is an OPTIONAL list of ...
14122@var{parameter-list} is a list of the function's arguments and local
14123variable names, separated by commas.  When the function is called,
14124the argument names are used to hold the argument values given in
14125the call.  The local variables are initialized to the empty string.
14126A function cannot have two parameters with the same name, nor may it
14127have a parameter with the same name as the function itself.
14128
14129The @var{body-of-function} consists of @command{awk} statements.  It is the
14130most important part of the definition, because it says what the function
14131should actually @emph{do}.  The argument names exist to give the body a
14132way to talk about the arguments; local variables exist to give the body
14133places to keep temporary values.
14134
14135Argument names are not distinguished syntactically from local variable
14136names. Instead, the number of arguments supplied when the function is
14137called determines how many argument variables there are.  Thus, if three
14138argument values are given, the first three names in @var{parameter-list}
14139are arguments and the rest are local variables.
14140
14141It follows that if the number of arguments is not the same in all calls
14142to the function, some of the names in @var{parameter-list} may be
14143arguments on some occasions and local variables on others.  Another
14144way to think of this is that omitted arguments default to the
14145null string.
14146
14147@cindex programming conventions, functions, writing
14148Usually when you write a function, you know how many names you intend to
14149use for arguments and how many you intend to use as local variables.  It is
14150conventional to place some extra space between the arguments and
14151the local variables, in order to document how your function is supposed to be used.
14152
14153@cindex variables, shadowing
14154During execution of the function body, the arguments and local variable
14155values hide, or @dfn{shadow}, any variables of the same names used in the
14156rest of the program.  The shadowed variables are not accessible in the
14157function definition, because there is no way to name them while their
14158names have been taken away for the local variables.  All other variables
14159used in the @command{awk} program can be referenced or set normally in the
14160function's body.
14161
14162The arguments and local variables last only as long as the function body
14163is executing.  Once the body finishes, you can once again access the
14164variables that were shadowed while the function was running.
14165
14166@cindex recursive functions
14167@cindex functions, recursive
14168The function body can contain expressions that call functions.  They
14169can even call this function, either directly or by way of another
14170function.  When this happens, we say the function is @dfn{recursive}.
14171The act of a function calling itself is called @dfn{recursion}.
14172
14173@c @cindex @command{awk} language, POSIX version
14174@c @cindex POSIX @command{awk}
14175@cindex POSIX @command{awk}, @code{function} keyword in
14176In many @command{awk} implementations, including @command{gawk},
14177the keyword @code{function} may be
14178abbreviated @code{func}.  However, POSIX only specifies the use of
14179the keyword @code{function}.  This actually has some practical implications.
14180If @command{gawk} is in POSIX-compatibility mode
14181(@pxref{Options}), then the following
14182statement does @emph{not} define a function:
14183
14184@example
14185func foo() @{ a = sqrt($1) ; print a @}
14186@end example
14187
14188@noindent
14189Instead it defines a rule that, for each record, concatenates the value
14190of the variable @samp{func} with the return value of the function @samp{foo}.
14191If the resulting string is non-null, the action is executed.
14192This is probably not what is desired.  (@command{awk} accepts this input as
14193syntactically valid, because functions may be used before they are defined
14194in @command{awk} programs.)
14195@c NEXT ED: This won't actually run, since foo() is undefined ...
14196
14197@c last comma does NOT start tertiary
14198@cindex portability, functions, defining
14199To ensure that your @command{awk} programs are portable, always use the
14200keyword @code{function} when defining a function.
14201
14202@node Function Example
14203@subsection Function Definition Examples
14204
14205Here is an example of a user-defined function, called @code{myprint}, that
14206takes a number and prints it in a specific format:
14207
14208@example
14209function myprint(num)
14210@{
14211     printf "%6.3g\n", num
14212@}
14213@end example
14214
14215@noindent
14216To illustrate, here is an @command{awk} rule that uses our @code{myprint}
14217function:
14218
14219@example
14220$3 > 0     @{ myprint($3) @}
14221@end example
14222
14223@noindent
14224This program prints, in our special format, all the third fields that
14225contain a positive number in our input.  Therefore, when given the following:
14226
14227@example
14228 1.2   3.4    5.6   7.8
14229 9.10 11.12 -13.14 15.16
1423017.18 19.20  21.22 23.24
14231@end example
14232
14233@noindent
14234this program, using our function to format the results, prints:
14235
14236@example
14237   5.6
14238  21.2
14239@end example
14240
14241This function deletes all the elements in an array:
14242
14243@example
14244function delarray(a,    i)
14245@{
14246    for (i in a)
14247       delete a[i]
14248@}
14249@end example
14250
14251When working with arrays, it is often necessary to delete all the elements
14252in an array and start over with a new list of elements
14253(@pxref{Delete}).
14254Instead of having
14255to repeat this loop everywhere that you need to clear out
14256an array, your program can just call @code{delarray}.
14257(This guarantees portability.  The use of @samp{delete @var{array}} to delete
14258the contents of an entire array is a nonstandard extension.)
14259
14260The following is an example of a recursive function.  It takes a string
14261as an input parameter and returns the string in backwards order.
14262Recursive functions must always have a test that stops the recursion.
14263In this case, the recursion terminates when the starting position
14264is zero, i.e., when there are no more characters left in the string.
14265
14266@cindex @code{rev} user-defined function
14267@example
14268function rev(str, start)
14269@{
14270    if (start == 0)
14271        return ""
14272
14273    return (substr(str, start, 1) rev(str, start - 1))
14274@}
14275@end example
14276
14277If this function is in a file named @file{rev.awk}, it can be tested
14278this way:
14279
14280@example
14281$ echo "Don't Panic!" |
14282> gawk --source '@{ print rev($0, length($0)) @}' -f rev.awk
14283@print{} !cinaP t'noD
14284@end example
14285
14286The C @code{ctime} function takes a timestamp and returns it in a string,
14287formatted in a well-known fashion.
14288The following example uses the built-in @code{strftime} function
14289(@pxref{Time Functions})
14290to create an @command{awk} version of @code{ctime}:
14291
14292@cindex @code{ctime} user-defined function
14293@c FIXME: One day, change %d to %e, when C 99 is common.
14294@example
14295@c file eg/lib/ctime.awk
14296# ctime.awk
14297#
14298# awk version of C ctime(3) function
14299
14300function ctime(ts,    format)
14301@{
14302    format = "%a %b %d %H:%M:%S %Z %Y"
14303    if (ts == 0)
14304        ts = systime()       # use current time as default
14305    return strftime(format, ts)
14306@}
14307@c endfile
14308@end example
14309@c ENDOFRANGE fdef
14310
14311@node Function Caveats
14312@subsection Calling User-Defined Functions
14313
14314@c STARTOFRANGE fudc
14315@cindex functions, user-defined, calling
14316@dfn{Calling a function} means causing the function to run and do its job.
14317A function call is an expression and its value is the value returned by
14318the function.
14319
14320A function call consists of the function name followed by the arguments
14321in parentheses.  @command{awk} expressions are what you write in the
14322call for the arguments.  Each time the call is executed, these
14323expressions are evaluated, and the values are the actual arguments.  For
14324example, here is a call to @code{foo} with three arguments (the first
14325being a string concatenation):
14326
14327@example
14328foo(x y, "lose", 4 * z)
14329@end example
14330
14331@strong{Caution:} Whitespace characters (spaces and tabs) are not allowed
14332between the function name and the open-parenthesis of the argument list.
14333If you write whitespace by mistake, @command{awk} might think that you mean
14334to concatenate a variable with an expression in parentheses.  However, it
14335notices that you used a function name and not a variable name, and reports
14336an error.
14337
14338@cindex call by value
14339When a function is called, it is given a @emph{copy} of the values of
14340its arguments.  This is known as @dfn{call by value}.  The caller may use
14341a variable as the expression for the argument, but the called function
14342does not know this---it only knows what value the argument had.  For
14343example, if you write the following code:
14344
14345@example
14346foo = "bar"
14347z = myfunc(foo)
14348@end example
14349
14350@noindent
14351then you should not think of the argument to @code{myfunc} as being
14352``the variable @code{foo}.''  Instead, think of the argument as the
14353string value @code{"bar"}.
14354If the function @code{myfunc} alters the values of its local variables,
14355this has no effect on any other variables.  Thus, if @code{myfunc}
14356does this:
14357
14358@example
14359function myfunc(str)
14360@{
14361  print str
14362  str = "zzz"
14363  print str
14364@}
14365@end example
14366
14367@noindent
14368to change its first argument variable @code{str}, it does @emph{not}
14369change the value of @code{foo} in the caller.  The role of @code{foo} in
14370calling @code{myfunc} ended when its value (@code{"bar"}) was computed.
14371If @code{str} also exists outside of @code{myfunc}, the function body
14372cannot alter this outer value, because it is shadowed during the
14373execution of @code{myfunc} and cannot be seen or changed from there.
14374
14375@cindex call by reference
14376@cindex arrays, as parameters to functions
14377@cindex functions, arrays as parameters to
14378However, when arrays are the parameters to functions, they are @emph{not}
14379copied.  Instead, the array itself is made available for direct manipulation
14380by the function.  This is usually called @dfn{call by reference}.
14381Changes made to an array parameter inside the body of a function @emph{are}
14382visible outside that function.
14383
14384@strong{Note:} Changing an array parameter inside a function
14385can be very dangerous if you do not watch what you are doing.
14386For example:
14387
14388@example
14389function changeit(array, ind, nvalue)
14390@{
14391     array[ind] = nvalue
14392@}
14393
14394BEGIN @{
14395    a[1] = 1; a[2] = 2; a[3] = 3
14396    changeit(a, 2, "two")
14397    printf "a[1] = %s, a[2] = %s, a[3] = %s\n",
14398            a[1], a[2], a[3]
14399@}
14400@end example
14401
14402@noindent
14403prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because
14404@code{changeit} stores @code{"two"} in the second element of @code{a}.
14405
14406@cindex undefined functions
14407@cindex functions, undefined
14408Some @command{awk} implementations allow you to call a function that
14409has not been defined. They only report a problem at runtime when the
14410program actually tries to call the function. For example:
14411
14412@example
14413BEGIN @{
14414    if (0)
14415        foo()
14416    else
14417        bar()
14418@}
14419function bar() @{ @dots{} @}
14420# note that `foo' is not defined
14421@end example
14422
14423@noindent
14424Because the @samp{if} statement will never be true, it is not really a
14425problem that @code{foo} has not been defined.  Usually, though, it is a
14426problem if a program calls an undefined function.
14427
14428@cindex lint checking, undefined functions
14429If @option{--lint} is specified
14430(@pxref{Options}),
14431@command{gawk} reports calls to undefined functions.
14432
14433@cindex portability, @code{next} statement in user-defined functions
14434Some @command{awk} implementations generate a runtime
14435error if you use the @code{next} statement
14436(@pxref{Next Statement})
14437inside a user-defined function.
14438@command{gawk} does not have this limitation.
14439@c ENDOFRANGE fudc
14440
14441@node Return Statement
14442@subsection The @code{return} Statement
14443@c comma does NOT start a secondary
14444@cindex @code{return} statement, user-defined functions
14445
14446The body of a user-defined function can contain a @code{return} statement.
14447This statement returns control to the calling part of the @command{awk} program.  It
14448can also be used to return a value for use in the rest of the @command{awk}
14449program.  It looks like this:
14450
14451@example
14452return @r{[}@var{expression}@r{]}
14453@end example
14454
14455The @var{expression} part is optional.  If it is omitted, then the returned
14456value is undefined, and therefore, unpredictable.
14457
14458A @code{return} statement with no value expression is assumed at the end of
14459every function definition.  So if control reaches the end of the function
14460body, then the function returns an unpredictable value.  @command{awk}
14461does @emph{not} warn you if you use the return value of such a function.
14462
14463Sometimes, you want to write a function for what it does, not for
14464what it returns.  Such a function corresponds to a @code{void} function
14465in C or to a @code{procedure} in Pascal.  Thus, it may be appropriate to not
14466return any value; simply bear in mind that if you use the return
14467value of such a function, you do so at your own risk.
14468
14469The following is an example of a user-defined function that returns a value
14470for the largest number among the elements of an array:
14471
14472@example
14473function maxelt(vec,   i, ret)
14474@{
14475     for (i in vec) @{
14476          if (ret == "" || vec[i] > ret)
14477               ret = vec[i]
14478     @}
14479     return ret
14480@}
14481@end example
14482
14483@cindex programming conventions, function parameters
14484@noindent
14485You call @code{maxelt} with one argument, which is an array name.  The local
14486variables @code{i} and @code{ret} are not intended to be arguments;
14487while there is nothing to stop you from passing more than one argument
14488to @code{maxelt}, the results would be strange.  The extra space before
14489@code{i} in the function parameter list indicates that @code{i} and
14490@code{ret} are not supposed to be arguments.
14491You should follow this convention when defining functions.
14492
14493The following program uses the @code{maxelt} function.  It loads an
14494array, calls @code{maxelt}, and then reports the maximum number in that
14495array:
14496
14497@example
14498function maxelt(vec,   i, ret)
14499@{
14500     for (i in vec) @{
14501          if (ret == "" || vec[i] > ret)
14502               ret = vec[i]
14503     @}
14504     return ret
14505@}
14506
14507# Load all fields of each record into nums.
14508@{
14509     for(i = 1; i <= NF; i++)
14510          nums[NR, i] = $i
14511@}
14512
14513END @{
14514     print maxelt(nums)
14515@}
14516@end example
14517
14518Given the following input:
14519
14520@example
14521 1 5 23 8 16
1452244 3 5 2 8 26
14523256 291 1396 2962 100
14524-6 467 998 1101
1452599385 11 0 225
14526@end example
14527
14528@noindent
14529the program reports (predictably) that @code{99385} is the largest number
14530in the array.
14531
14532@node Dynamic Typing
14533@subsection Functions and Their Effects on Variable Typing
14534
14535@command{awk} is a very fluid language.
14536It is possible that @command{awk} can't tell if an identifier
14537represents a regular variable or an array until runtime.
14538Here is an annotated sample program:
14539
14540@example
14541function foo(a)
14542@{
14543    a[1] = 1   # parameter is an array
14544@}
14545
14546BEGIN @{
14547    b = 1
14548    foo(b)  # invalid: fatal type mismatch
14549
14550    foo(x)  # x uninitialized, becomes an array dynamically
14551    x = 1   # now not allowed, runtime error
14552@}
14553@end example
14554
14555Usually, such things aren't a big issue, but it's worth
14556being aware of them.
14557@c ENDOFRANGE udfunc
14558@c ENDOFRANGE funcud
14559
14560@node Internationalization
14561@chapter Internationalization with @command{gawk}
14562
14563Once upon a time, computer makers
14564wrote software that worked only in English.
14565Eventually, hardware and software vendors noticed that if their
14566systems worked in the native languages of non-English-speaking
14567countries, they were able to sell more systems.
14568As a result, internationalization and localization
14569of programs and software systems became a common practice.
14570
14571@c STARTOFRANGE inloc
14572@cindex internationalization, localization
14573@cindex @command{gawk}, internationalization and, See internationalization
14574@cindex internationalization, localization, @command{gawk} and
14575Until recently, the ability to provide internationalization
14576was largely restricted to programs written in C and C++.
14577This @value{CHAPTER} describes the underlying library @command{gawk}
14578uses for internationalization, as well as how
14579@command{gawk} makes internationalization
14580features available at the @command{awk} program level.
14581Having internationalization available at the @command{awk} level
14582gives software developers additional flexibility---they are no
14583longer required to write in C when internationalization is
14584a requirement.
14585
14586@menu
14587* I18N and L10N::               Internationalization and Localization.
14588* Explaining gettext::          How GNU @code{gettext} works.
14589* Programmer i18n::             Features for the programmer.
14590* Translator i18n::             Features for the translator.
14591* I18N Example::                A simple i18n example.
14592* Gawk I18N::                   @command{gawk} is also internationalized.
14593@end menu
14594
14595@node I18N and L10N
14596@section Internationalization and Localization
14597
14598@cindex internationalization
14599@c comma is part of see
14600@cindex localization, See internationalization, localization
14601@cindex localization
14602@dfn{Internationalization} means writing (or modifying) a program once,
14603in such a way that it can use multiple languages without requiring
14604further source-code changes.
14605@dfn{Localization} means providing the data necessary for an
14606internationalized program to work in a particular language.
14607Most typically, these terms refer to features such as the language
14608used for printing error messages, the language used to read
14609responses, and information related to how numerical and
14610monetary values are printed and read.
14611
14612@node Explaining gettext
14613@section GNU @code{gettext}
14614
14615@cindex internationalizing a program
14616@c STARTOFRANGE gettex
14617@cindex @code{gettext} library
14618The facilities in GNU @code{gettext} focus on messages; strings printed
14619by a program, either directly or via formatting with @code{printf} or
14620@code{sprintf}.@footnote{For some operating systems, the @command{gawk}
14621port doesn't support GNU @code{gettext}.  This applies most notably to
14622the PC operating systems.  As such, these features are not available
14623if you are using one of those operating systems.  Sorry.}
14624
14625@cindex portability, @code{gettext} library and
14626When using GNU @code{gettext}, each application has its own
14627@dfn{text domain}.  This is a unique name, such as @samp{kpilot} or @samp{gawk},
14628that identifies the application.
14629A complete application may have multiple components---programs written
14630in C or C++, as well as scripts written in @command{sh} or @command{awk}.
14631All of the components use the same text domain.
14632
14633To make the discussion concrete, assume we're writing an application
14634named @command{guide}.  Internationalization consists of the
14635following steps, in this order:
14636
14637@enumerate
14638@item
14639The programmer goes
14640through the source for all of @command{guide}'s components
14641and marks each string that is a candidate for translation.
14642For example, @code{"`-F': option required"} is a good candidate for translation.
14643A table with strings of option names is not (e.g., @command{gawk}'s
14644@option{--profile} option should remain the same, no matter what the local
14645language).
14646
14647@cindex @code{textdomain} function (C library)
14648@item
14649The programmer indicates the application's text domain
14650(@code{"guide"}) to the @code{gettext} library,
14651by calling the @code{textdomain} function.
14652
14653@item
14654Messages from the application are extracted from the source code and
14655collected into a portable object file (@file{guide.po}),
14656which lists the strings and their translations.
14657The translations are initially empty.
14658The original (usually English) messages serve as the key for
14659lookup of the translations.
14660
14661@cindex @code{.po} files
14662@cindex files, @code{.po}
14663@cindex portable object files
14664@cindex files, portable object
14665@item
14666For each language with a translator, @file{guide.po}
14667is copied and translations are created and shipped with the application.
14668
14669@cindex @code{.mo} files
14670@cindex files, @code{.mo}
14671@cindex message object files
14672@cindex files, message object
14673@item
14674Each language's @file{.po} file is converted into a binary
14675message object (@file{.mo}) file.
14676A message object file contains the original messages and their
14677translations in a binary format that allows fast lookup of translations
14678at runtime.
14679
14680@item
14681When @command{guide} is built and installed, the binary translation files
14682are installed in a standard place.
14683
14684@cindex @code{bindtextdomain} function (C library)
14685@item
14686For testing and development, it is possible to tell @code{gettext}
14687to use @file{.mo} files in a different directory than the standard
14688one by using the @code{bindtextdomain} function.
14689
14690@cindex @code{.mo} files, specifying directory of
14691@cindex files, @code{.mo}, specifying directory of
14692@cindex message object files, specifying directory of
14693@cindex files, message object, specifying directory of
14694@item
14695At runtime, @command{guide} looks up each string via a call
14696to @code{gettext}.  The returned string is the translated string
14697if available, or the original string if not.
14698
14699@item
14700If necessary, it is possible to access messages from a different
14701text domain than the one belonging to the application, without
14702having to switch the application's default text domain back
14703and forth.
14704@end enumerate
14705
14706@cindex @code{gettext} function (C library)
14707In C (or C++), the string marking and dynamic translation lookup
14708are accomplished by wrapping each string in a call to @code{gettext}:
14709
14710@example
14711printf(gettext("Don't Panic!\n"));
14712@end example
14713
14714The tools that extract messages from source code pull out all
14715strings enclosed in calls to @code{gettext}.
14716
14717@cindex @code{_} (underscore), @code{_} C macro
14718@cindex underscore (@code{_}), @code{_} C macro
14719The GNU @code{gettext} developers, recognizing that typing
14720@samp{gettext} over and over again is both painful and ugly to look
14721at, use the macro @samp{_} (an underscore) to make things easier:
14722
14723@example
14724/* In the standard header file: */
14725#define _(str) gettext(str)
14726
14727/* In the program text: */
14728printf(_("Don't Panic!\n"));
14729@end example
14730
14731@cindex internationalization, localization, locale categories
14732@cindex @code{gettext} library, locale categories
14733@cindex locale categories
14734@noindent
14735This reduces the typing overhead to just three extra characters per string
14736and is considerably easier to read as well.
14737There are locale @dfn{categories}
14738for different types of locale-related information.
14739The defined locale categories that @code{gettext} knows about are:
14740
14741@table @code
14742@cindex @code{LC_MESSAGES} locale category
14743@item LC_MESSAGES
14744Text messages.  This is the default category for @code{gettext}
14745operations, but it is possible to supply a different one explicitly,
14746if necessary.  (It is almost never necessary to supply a different category.)
14747
14748@cindex sorting characters in different languages
14749@cindex @code{LC_COLLATE} locale category
14750@item LC_COLLATE
14751Text-collation information; i.e., how different characters
14752and/or groups of characters sort in a given language.
14753
14754@cindex @code{LC_CTYPE} locale category
14755@item LC_CTYPE
14756Character-type information (alphabetic, digit, upper- or lowercase, and
14757so on).
14758This information is accessed via the
14759POSIX character classes in regular expressions,
14760such as @code{/[[:alnum:]]/}
14761(@pxref{Regexp Operators}).
14762
14763@cindex monetary information, localization
14764@cindex currency symbols, localization
14765@cindex @code{LC_MONETARY} locale category
14766@item LC_MONETARY
14767Monetary information, such as the currency symbol, and whether the
14768symbol goes before or after a number.
14769
14770@cindex @code{LC_NUMERIC} locale category
14771@item LC_NUMERIC
14772Numeric information, such as which characters to use for the decimal
14773point and the thousands separator.@footnote{Americans
14774use a comma every three decimal places and a period for the decimal
14775point, while many Europeans do exactly the opposite:
14776@code{1,234.56} versus @code{1.234,56}.}
14777
14778@cindex @code{LC_RESPONSE} locale category
14779@item LC_RESPONSE
14780Response information, such as how ``yes'' and ``no'' appear in the
14781local language, and possibly other information as well.
14782
14783@cindex time, localization and
14784@c last comma does NOT start a tertiary
14785@cindex dates, information related to, localization
14786@cindex @code{LC_TIME} locale category
14787@item LC_TIME
14788Time- and date-related information, such as 12- or 24-hour clock, month printed
14789before or after day in a date, local month abbreviations, and so on.
14790
14791@cindex @code{LC_ALL} locale category
14792@item LC_ALL
14793All of the above.  (Not too useful in the context of @code{gettext}.)
14794@end table
14795@c ENDOFRANGE gettex
14796
14797@node Programmer i18n
14798@section Internationalizing @command{awk} Programs
14799@c STARTOFRANGE inap
14800@cindex @command{awk} programs, internationalizing
14801
14802@command{gawk} provides the following variables and functions for
14803internationalization:
14804
14805@table @code
14806@cindex @code{TEXTDOMAIN} variable
14807@item TEXTDOMAIN
14808This variable indicates the application's text domain.
14809For compatibility with GNU @code{gettext}, the default
14810value is @code{"messages"}.
14811
14812@cindex internationalization, localization, marked strings
14813@cindex strings, for localization
14814@item _"your message here"
14815String constants marked with a leading underscore
14816are candidates for translation at runtime.
14817String constants without a leading underscore are not translated.
14818
14819@cindex @code{dcgettext} function (@command{gawk})
14820@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
14821This built-in function returns the translation of @var{string} in
14822text domain @var{domain} for locale category @var{category}.
14823The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14824The default value for @var{category} is @code{"LC_MESSAGES"}.
14825
14826If you supply a value for @var{category}, it must be a string equal to
14827one of the known locale categories described in
14828@ifnotinfo
14829the previous @value{SECTION}.
14830@end ifnotinfo
14831@ifinfo
14832@ref{Explaining gettext}.
14833@end ifinfo
14834You must also supply a text domain.  Use @code{TEXTDOMAIN} if
14835you want to use the current domain.
14836
14837@strong{Caution:} The order of arguments to the @command{awk} version
14838of the @code{dcgettext} function is purposely different from the order for
14839the C version.  The @command{awk} version's order was
14840chosen to be simple and to allow for reasonable @command{awk}-style
14841default arguments.
14842
14843@cindex @code{dcngettext} function (@command{gawk})
14844@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
14845This built-in function returns the plural form used for @var{number} of the
14846translation of @var{string1} and @var{string2} in text domain
14847@var{domain} for locale category @var{category}. @var{string1} is the
14848English singular variant of a message, and @var{string2} the English plural
14849variant of the same message.
14850The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
14851The default value for @var{category} is @code{"LC_MESSAGES"}.
14852
14853The same remarks as for the @code{dcgettext} function apply.
14854
14855@cindex @code{.mo} files, specifying directory of
14856@cindex files, @code{.mo}, specifying directory of
14857@cindex message object files, specifying directory of
14858@cindex files, message object, specifying directory of
14859@cindex @code{bindtextdomain} function (@command{gawk})
14860@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
14861This built-in function allows you to specify the directory in which
14862@code{gettext} looks for @file{.mo} files, in case they
14863will not or cannot be placed in the standard locations
14864(e.g., during testing).
14865It returns the directory in which @var{domain} is ``bound.''
14866
14867The default @var{domain} is the value of @code{TEXTDOMAIN}.
14868If @var{directory} is the null string (@code{""}), then
14869@code{bindtextdomain} returns the current binding for the
14870given @var{domain}.
14871@end table
14872
14873To use these facilities in your @command{awk} program, follow the steps
14874outlined in
14875@ifnotinfo
14876the previous @value{SECTION},
14877@end ifnotinfo
14878@ifinfo
14879@ref{Explaining gettext},
14880@end ifinfo
14881like so:
14882
14883@enumerate
14884@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
14885@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
14886@item
14887Set the variable @code{TEXTDOMAIN} to the text domain of
14888your program.  This is best done in a @code{BEGIN} rule
14889(@pxref{BEGIN/END}),
14890or it can also be done via the @option{-v} command-line
14891option (@pxref{Options}):
14892
14893@example
14894BEGIN @{
14895    TEXTDOMAIN = "guide"
14896    @dots{}
14897@}
14898@end example
14899
14900@cindex @code{_} (underscore), translatable string
14901@cindex underscore (@code{_}), translatable string
14902@item
14903Mark all translatable strings with a leading underscore (@samp{_})
14904character.  It @emph{must} be adjacent to the opening
14905quote of the string.  For example:
14906
14907@example
14908print _"hello, world"
14909x = _"you goofed"
14910printf(_"Number of users is %d\n", nusers)
14911@end example
14912
14913@item
14914If you are creating strings dynamically, you can
14915still translate them, using the @code{dcgettext}
14916built-in function:
14917
14918@example
14919message = nusers " users logged in"
14920message = dcgettext(message, "adminprog")
14921print message
14922@end example
14923
14924Here, the call to @code{dcgettext} supplies a different
14925text domain (@code{"adminprog"}) in which to find the
14926message, but it uses the default @code{"LC_MESSAGES"} category.
14927
14928@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain} function (@command{gawk})
14929@item
14930During development, you might want to put the @file{.mo}
14931file in a private directory for testing.  This is done
14932with the @code{bindtextdomain} built-in function:
14933
14934@example
14935BEGIN @{
14936   TEXTDOMAIN = "guide"   # our text domain
14937   if (Testing) @{
14938       # where to find our files
14939       bindtextdomain("testdir")
14940       # joe is in charge of adminprog
14941       bindtextdomain("../joe/testdir", "adminprog")
14942   @}
14943   @dots{}
14944@}
14945@end example
14946
14947@end enumerate
14948
14949@xref{I18N Example},
14950for an example program showing the steps to create
14951and use translations from @command{awk}.
14952
14953@node Translator i18n
14954@section Translating @command{awk} Programs
14955
14956@cindex @code{.po} files
14957@cindex files, @code{.po}
14958@cindex portable object files
14959@cindex files, portable object
14960Once a program's translatable strings have been marked, they must
14961be extracted to create the initial @file{.po} file.
14962As part of translation, it is often helpful to rearrange the order
14963in which arguments to @code{printf} are output.
14964
14965@command{gawk}'s @option{--gen-po} command-line option extracts
14966the messages and is discussed next.
14967After that, @code{printf}'s ability to
14968rearrange the order for @code{printf} arguments at runtime
14969is covered.
14970
14971@menu
14972* String Extraction::           Extracting marked strings.
14973* Printf Ordering::             Rearranging @code{printf} arguments.
14974* I18N Portability::            @command{awk}-level portability issues.
14975@end menu
14976
14977@node String Extraction
14978@subsection Extracting Marked Strings
14979@cindex strings, extracting
14980@c comma does NOT start secondary
14981@cindex marked strings, extracting
14982@cindex @code{--gen-po} option
14983@cindex command-line options, string extraction
14984@cindex string extraction (internationalization)
14985@cindex marked string extraction (internationalization)
14986@cindex extraction, of marked strings (internationalization)
14987
14988@cindex @code{--gen-po} option
14989Once your @command{awk} program is working, and all the strings have
14990been marked and you've set (and perhaps bound) the text domain,
14991it is time to produce translations.
14992First, use the @option{--gen-po} command-line option to create
14993the initial @file{.po} file:
14994
14995@example
14996$ gawk --gen-po -f guide.awk > guide.po
14997@end example
14998
14999@cindex @code{xgettext} utility
15000When run with @option{--gen-po}, @command{gawk} does not execute your
15001program.  Instead, it parses it as usual and prints all marked strings
15002to standard output in the format of a GNU @code{gettext} Portable Object
15003file.  Also included in the output are any constant strings that
15004appear as the first argument to @code{dcgettext} or as the first and
15005second argument to @code{dcngettext}.@footnote{Starting with @code{gettext}
15006version 0.11.5, the @command{xgettext} utility that comes with GNU
15007@code{gettext} can handle @file{.awk} files.}
15008@xref{I18N Example},
15009for the full list of steps to go through to create and test
15010translations for @command{guide}.
15011
15012@node Printf Ordering
15013@subsection Rearranging @code{printf} Arguments
15014
15015@cindex @code{printf} statement, positional specifiers
15016@c comma does NOT start secondary
15017@cindex positional specifiers, @code{printf} statement
15018Format strings for @code{printf} and @code{sprintf}
15019(@pxref{Printf})
15020present a special problem for translation.
15021Consider the following:@footnote{This example is borrowed
15022from the GNU @code{gettext} manual.}
15023
15024@c line broken here only for smallbook format
15025@example
15026printf(_"String `%s' has %d characters\n",
15027          string, length(string)))
15028@end example
15029
15030A possible German translation for this might be:
15031
15032@example
15033"%d Zeichen lang ist die Zeichenkette `%s'\n"
15034@end example
15035
15036The problem should be obvious: the order of the format
15037specifications is different from the original!
15038Even though @code{gettext} can return the translated string
15039at runtime,
15040it cannot change the argument order in the call to @code{printf}.
15041
15042To solve this problem, @code{printf} format specificiers may have
15043an additional optional element, which we call a @dfn{positional specifier}.
15044For example:
15045
15046@example
15047"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
15048@end example
15049
15050Here, the positional specifier consists of an integer count, which indicates which
15051argument to use, and a @samp{$}. Counts are one-based, and the
15052format string itself is @emph{not} included.  Thus, in the following
15053example, @samp{string} is the first argument and @samp{length(string)} is the second:
15054
15055@example
15056$ gawk 'BEGIN @{
15057>     string = "Dont Panic"
15058>     printf _"%2$d characters live in \"%1$s\"\n",
15059>                         string, length(string)
15060> @}'
15061@print{} 10 characters live in "Dont Panic"
15062@end example
15063
15064If present, positional specifiers come first in the format specification,
15065before the flags, the field width, and/or the precision.
15066
15067Positional specifiers can be used with the dynamic field width and
15068precision capability:
15069
15070@example
15071$ gawk 'BEGIN @{
15072>    printf("%*.*s\n", 10, 20, "hello")
15073>    printf("%3$*2$.*1$s\n", 20, 10, "hello")
15074> @}'
15075@print{}      hello
15076@print{}      hello
15077@end example
15078
15079@noindent
15080@strong{Note:} When using @samp{*} with a positional specifier, the @samp{*}
15081comes first, then the integer position, and then the @samp{$}.
15082This is somewhat counterintutive.
15083
15084@cindex @code{printf} statement, positional specifiers, mixing with regular formats
15085@c first comma does is part of primary
15086@cindex positional specifiers, @code{printf} statement, mixing with regular formats
15087@cindex format specifiers, mixing regular with positional specifiers
15088@command{gawk} does not allow you to mix regular format specifiers
15089and those with positional specifiers in the same string:
15090
15091@smallexample
15092$ gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'
15093@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
15094@end smallexample
15095
15096@strong{Note:} There are some pathological cases that @command{gawk} may fail to
15097diagnose.  In such cases, the output may not be what you expect.
15098It's still a bad idea to try mixing them, even if @command{gawk}
15099doesn't detect it.
15100
15101Although positional specifiers can be used directly in @command{awk} programs,
15102their primary purpose is to help in producing correct translations of
15103format strings into languages different from the one in which the program
15104is first written.
15105
15106@node I18N Portability
15107@subsection @command{awk} Portability Issues
15108
15109@cindex portability, internationalization and
15110@cindex internationalization, localization, portability and
15111@command{gawk}'s internationalization features were purposely chosen to
15112have as little impact as possible on the portability of @command{awk}
15113programs that use them to other versions of @command{awk}.
15114Consider this program:
15115
15116@example
15117BEGIN @{
15118    TEXTDOMAIN = "guide"
15119    if (Test_Guide)   # set with -v
15120        bindtextdomain("/test/guide/messages")
15121    print _"don't panic!"
15122@}
15123@end example
15124
15125@noindent
15126As written, it won't work on other versions of @command{awk}.
15127However, it is actually almost portable, requiring very little
15128change:
15129
15130@itemize @bullet
15131@cindex @code{TEXTDOMAIN} variable, portability and
15132@item
15133Assignments to @code{TEXTDOMAIN} won't have any effect,
15134since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
15135
15136@item
15137Non-GNU versions of @command{awk} treat marked strings
15138as the concatenation of a variable named @code{_} with the string
15139following it.@footnote{This is good fodder for an ``Obfuscated
15140@command{awk}'' contest.} Typically, the variable @code{_} has
15141the null string (@code{""}) as its value, leaving the original string constant as
15142the result.
15143
15144@item
15145By defining ``dummy'' functions to replace @code{dcgettext}, @code{dcngettext}
15146and @code{bindtextdomain}, the @command{awk} program can be made to run, but
15147all the messages are output in the original language.
15148For example:
15149
15150@cindex @code{bindtextdomain} function (@command{gawk}), portability and
15151@cindex @code{dcgettext} function (@command{gawk}), portability and
15152@cindex @code{dcngettext} function (@command{gawk}), portability and
15153@example
15154@c file eg/lib/libintl.awk
15155function bindtextdomain(dir, domain)
15156@{
15157    return dir
15158@}
15159
15160function dcgettext(string, domain, category)
15161@{
15162    return string
15163@}
15164
15165function dcngettext(string1, string2, number, domain, category)
15166@{
15167    return (number == 1 ? string1 : string2)
15168@}
15169@c endfile
15170@end example
15171
15172@item
15173The use of positional specifications in @code{printf} or
15174@code{sprintf} is @emph{not} portable.
15175To support @code{gettext} at the C level, many systems' C versions of
15176@code{sprintf} do support positional specifiers.  But it works only if
15177enough arguments are supplied in the function call.  Many versions of
15178@command{awk} pass @code{printf} formats and arguments unchanged to the
15179underlying C library version of @code{sprintf}, but only one format and
15180argument at a time.  What happens if a positional specification is
15181used is anybody's guess.
15182However, since the positional specifications are primarily for use in
15183@emph{translated} format strings, and since non-GNU @command{awk}s never
15184retrieve the translated string, this should not be a problem in practice.
15185@end itemize
15186@c ENDOFRANGE inap
15187
15188@node I18N Example
15189@section A Simple Internationalization Example
15190
15191Now let's look at a step-by-step example of how to internationalize and
15192localize a simple @command{awk} program, using @file{guide.awk} as our
15193original source:
15194
15195@example
15196@c file eg/prog/guide.awk
15197BEGIN @{
15198    TEXTDOMAIN = "guide"
15199    bindtextdomain(".")  # for testing
15200    print _"Don't Panic"
15201    print _"The Answer Is", 42
15202    print "Pardon me, Zaphod who?"
15203@}
15204@c endfile
15205@end example
15206
15207@noindent
15208Run @samp{gawk --gen-po} to create the @file{.po} file:
15209
15210@example
15211$ gawk --gen-po -f guide.awk > guide.po
15212@end example
15213
15214@noindent
15215This produces:
15216
15217@example
15218@c file eg/data/guide.po
15219#: guide.awk:4
15220msgid "Don't Panic"
15221msgstr ""
15222
15223#: guide.awk:5
15224msgid "The Answer Is"
15225msgstr ""
15226
15227@c endfile
15228@end example
15229
15230This original portable object file is saved and reused for each language
15231into which the application is translated.  The @code{msgid}
15232is the original string and the @code{msgstr} is the translation.
15233
15234@strong{Note:} Strings not marked with a leading underscore do not
15235appear in the @file{guide.po} file.
15236
15237Next, the messages must be translated.
15238Here is a translation to a hypothetical dialect of English,
15239called ``Mellow'':@footnote{Perhaps it would be better if it were
15240called ``Hippy.'' Ah, well.}
15241
15242@example
15243@group
15244$ cp guide.po guide-mellow.po
15245@var{Add translations to} guide-mellow.po @dots{}
15246@end group
15247@end example
15248
15249@noindent
15250Following are the translations:
15251
15252@example
15253@c file eg/data/guide-mellow.po
15254#: guide.awk:4
15255msgid "Don't Panic"
15256msgstr "Hey man, relax!"
15257
15258#: guide.awk:5
15259msgid "The Answer Is"
15260msgstr "Like, the scoop is"
15261
15262@c endfile
15263@end example
15264
15265@cindex Linux
15266@cindex GNU/Linux
15267The next step is to make the directory to hold the binary message object
15268file and then to create the @file{guide.mo} file.
15269The directory layout shown here is standard for GNU @code{gettext} on
15270GNU/Linux systems.  Other versions of @code{gettext} may use a different
15271layout:
15272
15273@example
15274$ mkdir en_US en_US/LC_MESSAGES
15275@end example
15276
15277@cindex @code{.po} files, converting to @code{.mo}
15278@cindex files, @code{.po}, converting to @code{.mo}
15279@cindex @code{.mo} files, converting from @code{.po}
15280@cindex files, @code{.mo}, converting from @code{.po}
15281@cindex portable object files, converting to message object files
15282@cindex files, portable object, converting to message object files
15283@cindex message object files, converting from portable object files
15284@cindex files, message object, converting from portable object files
15285@cindex @command{msgfmt} utility
15286The @command{msgfmt} utility does the conversion from human-readable
15287@file{.po} file to machine-readable @file{.mo} file.
15288By default, @command{msgfmt} creates a file named @file{messages}.
15289This file must be renamed and placed in the proper directory so that
15290@command{gawk} can find it:
15291
15292@example
15293$ msgfmt guide-mellow.po
15294$ mv messages en_US/LC_MESSAGES/guide.mo
15295@end example
15296
15297Finally, we run the program to test it:
15298
15299@example
15300$ gawk -f guide.awk
15301@print{} Hey man, relax!
15302@print{} Like, the scoop is 42
15303@print{} Pardon me, Zaphod who?
15304@end example
15305
15306If the three replacement functions for @code{dcgettext}, @code{dcngettext}
15307and @code{bindtextdomain}
15308(@pxref{I18N Portability})
15309are in a file named @file{libintl.awk},
15310then we can run @file{guide.awk} unchanged as follows:
15311
15312@example
15313$ gawk --posix -f guide.awk -f libintl.awk
15314@print{} Don't Panic
15315@print{} The Answer Is 42
15316@print{} Pardon me, Zaphod who?
15317@end example
15318
15319@node Gawk I18N
15320@section @command{gawk} Can Speak Your Language
15321
15322As of @value{PVERSION} 3.1, @command{gawk} itself has been internationalized
15323using the GNU @code{gettext} package.
15324@ifinfo
15325(GNU @code{gettext} is described in
15326complete detail in
15327@ref{Top}.)
15328@end ifinfo
15329@ifnotinfo
15330(GNU @code{gettext} is described in
15331complete detail in
15332@cite{GNU gettext tools}.)
15333@end ifnotinfo
15334As of this writing, the latest version of GNU @code{gettext} is
15335@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.11.5.tar.gz, @value{PVERSION} 0.11.5}.
15336
15337If a translation of @command{gawk}'s messages exists,
15338then @command{gawk} produces usage messages, warnings,
15339and fatal errors in the local language.
15340
15341@cindex @code{--with-included-gettext} configuration option
15342@cindex configuration option, @code{--with-included-gettext}
15343On systems that do not use @value{PVERSION} 2 (or later) of the GNU C library, you should
15344configure @command{gawk} with the @option{--with-included-gettext} option
15345before compiling and installing it.
15346@xref{Additional Configuration Options},
15347for more information.
15348@c ENDOFRANGE inloc
15349
15350@node Advanced Features
15351@chapter Advanced Features of @command{gawk}
15352@cindex advanced features, network connections, See Also networks, connections
15353@c STARTOFRANGE gawadv
15354@cindex @command{gawk}, features, advanced
15355@c STARTOFRANGE advgaw
15356@cindex advanced features, @command{gawk}
15357@ignore
15358Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
15359
15360    Found in Steve English's "signature" line:
15361
15362"Write documentation as if whoever reads it is a violent psychopath
15363who knows where you live."
15364@end ignore
15365@quotation
15366@i{Write documentation as if whoever reads it is
15367a violent psychopath who knows where you live.}@*
15368Steve English, as quoted by Peter Langston
15369@end quotation
15370
15371This @value{CHAPTER} discusses advanced features in @command{gawk}.
15372It's a bit of a ``grab bag'' of items that are otherwise unrelated
15373to each other.
15374First, a command-line option allows @command{gawk} to recognize
15375nondecimal numbers in input data, not just in @command{awk}
15376programs.  Next, two-way I/O, discussed briefly in earlier parts of this
15377@value{DOCUMENT}, is described in full detail, along with the basics
15378of TCP/IP networking and BSD portal files.  Finally, @command{gawk}
15379can @dfn{profile} an @command{awk} program, making it possible to tune
15380it for performance.
15381
15382@ref{Dynamic Extensions},
15383discusses the ability to dynamically add new built-in functions to
15384@command{gawk}.  As this feature is still immature and likely to change,
15385its description is relegated to an appendix.
15386
15387@menu
15388* Nondecimal Data::             Allowing nondecimal input data.
15389* Two-way I/O::                 Two-way communications with another process.
15390* TCP/IP Networking::           Using @command{gawk} for network programming.
15391* Portal Files::                Using @command{gawk} with BSD portals.
15392* Profiling::                   Profiling your @command{awk} programs.
15393@end menu
15394
15395@node Nondecimal Data
15396@section Allowing Nondecimal Input Data
15397@cindex @code{--non-decimal-data} option
15398@cindex advanced features, @command{gawk}, nondecimal input data
15399@c last comma does NOT start tertiary
15400@cindex input, data, nondecimal
15401@cindex constants, nondecimal
15402
15403If you run @command{gawk} with the @option{--non-decimal-data} option,
15404you can have nondecimal constants in your input data:
15405
15406@c line break here for small book format
15407@example
15408$ echo 0123 123 0x123 |
15409> gawk --non-decimal-data '@{ printf "%d, %d, %d\n",
15410>                                         $1, $2, $3 @}'
15411@print{} 83, 123, 291
15412@end example
15413
15414For this feature to work, write your program so that
15415@command{gawk} treats your data as numeric:
15416
15417@example
15418$ echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'
15419@print{} 0123 123 0x123
15420@end example
15421
15422@noindent
15423The @code{print} statement treats its expressions as strings.
15424Although the fields can act as numbers when necessary,
15425they are still strings, so @code{print} does not try to treat them
15426numerically.  You may need to add zero to a field to force it to
15427be treated as a number.  For example:
15428
15429@example
15430$ echo 0123 123 0x123 | gawk --non-decimal-data '
15431> @{ print $1, $2, $3
15432>   print $1 + 0, $2 + 0, $3 + 0 @}'
15433@print{} 0123 123 0x123
15434@print{} 83 123 291
15435@end example
15436
15437Because it is common to have decimal data with leading zeros, and because
15438using it could lead to surprising results, the default is to leave this
15439facility disabled.  If you want it, you must explicitly request it.
15440
15441@cindex programming conventions, @code{--non-decimal-data} option
15442@cindex @code{--non-decimal-data} option, @code{strtonum} function and
15443@cindex @code{strtonum} function (@command{gawk}), @code{--non-decimal-data} option and
15444@strong{Caution:}
15445@emph{Use of this option is not recommended.}
15446It can break old programs very badly.
15447Instead, use the @code{strtonum} function to convert your data
15448(@pxref{Nondecimal-numbers}).
15449This makes your programs easier to write and easier to read, and
15450leads to less surprising results.
15451
15452@node Two-way I/O
15453@section Two-Way Communications with Another Process
15454@cindex Brennan, Michael
15455@cindex programmers, attractiveness of
15456@smallexample
15457@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
15458From: brennan@@whidbey.com (Mike Brennan)
15459Newsgroups: comp.lang.awk
15460Subject: Re: Learn the SECRET to Attract Women Easily
15461Date: 4 Aug 1997 17:34:46 GMT
15462@c Organization: WhidbeyNet
15463@c Lines: 12
15464Message-ID: <5s53rm$eca@@news.whidbey.com>
15465@c References: <5s20dn$2e1@chronicle.concentric.net>
15466@c Reply-To: brennan@whidbey.com
15467@c NNTP-Posting-Host: asn202.whidbey.com
15468@c X-Newsreader: slrn (0.9.4.1 UNIX)
15469@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
15470
15471On 3 Aug 1997 13:17:43 GMT, Want More Dates???
15472<tracy78@@kilgrona.com> wrote:
15473>Learn the SECRET to Attract Women Easily
15474>
15475>The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
15476
15477The scent of awk programmers is a lot more attractive to women than
15478the scent of perl programmers.
15479--
15480Mike Brennan
15481@c brennan@@whidbey.com
15482@end smallexample
15483
15484@c final comma is part of tertiary
15485@cindex advanced features, @command{gawk}, processes, communicating with
15486@cindex processes, two-way communications with
15487It is often useful to be able to
15488send data to a separate program for
15489processing and then read the result.  This can always be
15490done with temporary files:
15491
15492@example
15493# write the data for processing
15494tempfile = ("mydata." PROCINFO["pid"])
15495while (@var{not done with data})
15496    print @var{data} | ("subprogram > " tempfile)
15497close("subprogram > " tempfile)
15498
15499# read the results, remove tempfile when done
15500while ((getline newdata < tempfile) > 0)
15501    @var{process} newdata @var{appropriately}
15502close(tempfile)
15503system("rm " tempfile)
15504@end example
15505
15506@noindent
15507This works, but not elegantly.  Among other things, it requires that
15508the program be run in a directory that cannot be shared among users;
15509for example, @file{/tmp} will not do, as another user might happen
15510to be using a temporary file with the same name.
15511
15512@cindex coprocesses
15513@cindex input/output, two-way
15514@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
15515@cindex vertical bar (@code{|}), @code{|&} I/O operator (I/O)
15516@cindex @command{csh} utility, @code{|&} operator, comparison with
15517Starting with @value{PVERSION} 3.1 of @command{gawk}, it is possible to
15518open a @emph{two-way} pipe to another process.  The second process is
15519termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
15520The two-way connection is created using the new @samp{|&} operator
15521(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
15522different from the same operator in the C shell, @command{csh}.}
15523
15524@example
15525do @{
15526    print @var{data} |& "subprogram"
15527    "subprogram" |& getline results
15528@} while (@var{data left to process})
15529close("subprogram")
15530@end example
15531
15532The first time an I/O operation is executed using the @samp{|&}
15533operator, @command{gawk} creates a two-way pipeline to a child process
15534that runs the other program.  Output created with @code{print}
15535or @code{printf} is written to the program's standard input, and
15536output from the program's standard output can be read by the @command{gawk}
15537program using @code{getline}.
15538As is the case with processes started by @samp{|}, the subprogram
15539can be any program, or pipeline of programs, that can be started by
15540the shell.
15541
15542There are some cautionary items to be aware of:
15543
15544@itemize @bullet
15545@item
15546As the code inside @command{gawk} currently stands, the coprocess's
15547standard error goes to the same place that the parent @command{gawk}'s
15548standard error goes. It is not possible to read the child's
15549standard error separately.
15550
15551@cindex deadlocks
15552@cindex buffering, input/output
15553@cindex @code{getline} command, deadlock and
15554@item
15555I/O buffering may be a problem.  @command{gawk} automatically
15556flushes all output down the pipe to the child process.
15557However, if the coprocess does not flush its output,
15558@command{gawk} may hang when doing a @code{getline} in order to read
15559the coprocess's results.  This could lead to a situation
15560known as @dfn{deadlock}, where each process is waiting for the
15561other one to do something.
15562@end itemize
15563
15564@cindex @code{close} function, two-way pipes and
15565It is possible to close just one end of the two-way pipe to
15566a coprocess, by supplying a second argument to the @code{close}
15567function of either @code{"to"} or @code{"from"}
15568(@pxref{Close Files And Pipes}).
15569These strings tell @command{gawk} to close the end of the pipe
15570that sends data to the process or the end that reads from it,
15571respectively.
15572
15573@cindex @command{sort} utility, coprocesses and
15574This is particularly necessary in order to use
15575the system @command{sort} utility as part of a coprocess;
15576@command{sort} must read @emph{all} of its input
15577data before it can produce any output.
15578The @command{sort} program does not receive an end-of-file indication
15579until @command{gawk} closes the write end of the pipe.
15580
15581When you have finished writing data to the @command{sort}
15582utility, you can close the @code{"to"} end of the pipe, and
15583then start reading sorted data via @code{getline}.
15584For example:
15585
15586@example
15587BEGIN @{
15588    command = "LC_ALL=C sort"
15589    n = split("abcdefghijklmnopqrstuvwxyz", a, "")
15590
15591    for (i = n; i > 0; i--)
15592        print a[i] |& command
15593    close(command, "to")
15594
15595    while ((command |& getline line) > 0)
15596        print "got", line
15597    close(command)
15598@}
15599@end example
15600
15601This program writes the letters of the alphabet in reverse order, one
15602per line, down the two-way pipe to @command{sort}.  It then closes the
15603write end of the pipe, so that @command{sort} receives an end-of-file
15604indication.  This causes @command{sort} to sort the data and write the
15605sorted data back to the @command{gawk} program.  Once all of the data
15606has been read, @command{gawk} terminates the coprocess and exits.
15607
15608As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
15609command ensures traditional Unix (ASCII) sorting from @command{sort}.
15610
15611Beginning with @command{gawk} 3.1.2, you may use Pseudo-ttys (ptys) for
15612two-way communication instead of pipes, if your system supports them.
15613This is done on a per-command basis, by setting a special element
15614in the @code{PROCINFO} array
15615(@pxref{Auto-set}),
15616like so:
15617
15618@example
15619command = "sort -nr"           # command, saved in variable for convenience
15620PROCINFO[command, "pty"] = 1   # update PROCINFO
15621print @dots{} |& command       # start two-way pipe
15622@dots{}
15623@end example
15624
15625@noindent
15626Using ptys avoids the buffer deadlock issues described earlier, at some
15627loss in performance.  If your system does not have ptys, or if all the
15628system's ptys are in use, @command{gawk} automatically falls back to
15629using regular pipes.
15630
15631@node TCP/IP Networking
15632@section Using @command{gawk} for Network Programming
15633@cindex advanced features, @command{gawk}, network programming
15634@cindex networks, programming
15635@c STARTOFRANGE tcpip
15636@cindex TCP/IP
15637@cindex @code{/inet/} files (@command{gawk})
15638@cindex files, @code{/inet/} (@command{gawk})
15639@cindex @code{EMISTERED}
15640@quotation
15641@code{EMISTERED}: @i{A host is a host from coast to coast,@*
15642and no-one can talk to host that's close,@*
15643unless the host that isn't close@*
15644is busy hung or dead.}
15645@end quotation
15646
15647In addition to being able to open a two-way pipeline to a coprocess
15648on the same system
15649(@pxref{Two-way I/O}),
15650it is possible to make a two-way connection to
15651another process on another system across an IP networking connection.
15652
15653You can think of this as just a @emph{very long} two-way pipeline to
15654a coprocess.
15655The way @command{gawk} decides that you want to use TCP/IP networking is
15656by recognizing special @value{FN}s that begin with @samp{/inet/}.
15657
15658The full syntax of the special @value{FN} is
15659@file{/inet/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
15660The components are:
15661
15662@table @var
15663@item protocol
15664The protocol to use over IP.  This must be either @samp{tcp},
15665@samp{udp}, or @samp{raw}, for a TCP, UDP, or raw IP connection,
15666respectively.  The use of TCP is recommended for most applications.
15667
15668@cindex raw sockets
15669@cindex sockets
15670@strong{Caution:} The use of raw sockets is not currently supported
15671in @value{PVERSION} 3.1 of @command{gawk}.
15672
15673@item local-port
15674@cindex @code{getservbyname} function (C library)
15675The local TCP or UDP port number to use.  Use a port number of @samp{0}
15676when you want the system to pick a port. This is what you should do
15677when writing a TCP or UDP client.
15678You may also use a well-known service name, such as @samp{smtp}
15679or @samp{http}, in which case @command{gawk} attempts to determine
15680the predefined port number using the C @code{getservbyname} function.
15681
15682@item remote-host
15683The IP address or fully-qualified domain name of the Internet
15684host to which you want to connect.
15685
15686@item remote-port
15687The TCP or UDP port number to use on the given @var{remote-host}.
15688Again, use @samp{0} if you don't care, or else a well-known
15689service name.
15690@end table
15691
15692Consider the following very simple example:
15693
15694@example
15695BEGIN @{
15696  Service = "/inet/tcp/0/localhost/daytime"
15697  Service |& getline
15698  print $0
15699  close(Service)
15700@}
15701@end example
15702
15703This program reads the current date and time from the local system's
15704TCP @samp{daytime} server.
15705It then prints the results and closes the connection.
15706
15707Because this topic is extensive, the use of @command{gawk} for
15708TCP/IP programming is documented separately.
15709@ifinfo
15710@xref{Top},
15711@end ifinfo
15712@ifnotinfo
15713See @cite{TCP/IP Internetworking with @command{gawk}},
15714which comes as part of the @command{gawk} distribution,
15715@end ifnotinfo
15716for a much more complete introduction and discussion, as well as
15717extensive examples.
15718
15719@node Portal Files
15720@section Using @command{gawk} with BSD Portals
15721@cindex advanced features, @command{gawk}, BSD portals
15722@cindex portal files
15723@cindex files, portal
15724@cindex BSD portals
15725@cindex @code{/p} files (@command{gawk})
15726@cindex files, @code{/p} (@command{gawk})
15727@cindex @code{--enable-portals} configuration option
15728@cindex operating systems, BSD-based
15729
15730Similar to the @file{/inet} special files, if @command{gawk}
15731is configured with the @option{--enable-portals} option
15732(@pxref{Quick Installation}),
15733then @command{gawk} treats
15734files whose pathnames begin with @code{/p} as 4.4 BSD-style portals.
15735
15736@cindex @code{|} (vertical bar), @code{|&} operator (I/O), two-way communications
15737@cindex vertical bar (@code{|}), @code{|&} operator (I/O), two-way communications
15738When used with the @samp{|&} operator, @command{gawk} opens the file
15739for two-way communications.  The operating system's portal mechanism
15740then manages creating the process associated with the portal and
15741the corresponding communications with the portal's process.
15742@c ENDOFRANGE tcpip
15743
15744@node Profiling
15745@section Profiling Your @command{awk} Programs
15746@c STARTOFRANGE awkp
15747@cindex @command{awk} programs, profiling
15748@c STARTOFRANGE proawk
15749@cindex profiling @command{awk} programs
15750@c STARTOFRANGE pgawk
15751@cindex @command{pgawk} program
15752@cindex profiling @command{gawk}, See @command{pgawk} program
15753
15754Beginning with @value{PVERSION} 3.1 of @command{gawk}, you may produce execution
15755traces of your @command{awk} programs.
15756This is done with a specially compiled version of @command{gawk},
15757called @command{pgawk} (``profiling @command{gawk}'').
15758
15759@cindex @code{awkprof.out} file
15760@cindex files, @code{awkprof.out}
15761@cindex @command{pgawk} program, @code{awkprof.out} file
15762@command{pgawk} is identical in every way to @command{gawk}, except that when
15763it has finished running, it creates a profile of your program in a file
15764named @file{awkprof.out}.
15765Because it is profiling, it also executes up to 45% slower than
15766@command{gawk} normally does.
15767
15768@cindex @code{--profile} option
15769As shown in the following example,
15770the @option{--profile} option can be used to change the name of the file
15771where @command{pgawk} will write the profile:
15772
15773@example
15774$ pgawk --profile=myprog.prof -f myprog.awk data1 data2
15775@end example
15776
15777@noindent
15778In the above example, @command{pgawk} places the profile in
15779@file{myprog.prof} instead of in @file{awkprof.out}.
15780
15781Regular @command{gawk} also accepts this option.  When called with just
15782@option{--profile}, @command{gawk} ``pretty prints'' the program into
15783@file{awkprof.out}, without any execution counts.  You may supply an
15784option to @option{--profile} to change the @value{FN}.  Here is a sample
15785session showing a simple @command{awk} program, its input data, and the
15786results from running @command{pgawk}.  First, the @command{awk} program:
15787
15788@example
15789BEGIN @{ print "First BEGIN rule" @}
15790
15791END @{ print "First END rule" @}
15792
15793/foo/ @{
15794    print "matched /foo/, gosh"
15795    for (i = 1; i <= 3; i++)
15796        sing()
15797@}
15798
15799@{
15800    if (/foo/)
15801        print "if is true"
15802    else
15803        print "else is true"
15804@}
15805
15806BEGIN @{ print "Second BEGIN rule" @}
15807
15808END @{ print "Second END rule" @}
15809
15810function sing(    dummy)
15811@{
15812    print "I gotta be me!"
15813@}
15814@end example
15815
15816Following is the input data:
15817
15818@example
15819foo
15820bar
15821baz
15822foo
15823junk
15824@end example
15825
15826Here is the @file{awkprof.out} that results from running @command{pgawk}
15827on this program and data (this example also illustrates that @command{awk}
15828programmers sometimes have to work late):
15829
15830@cindex @code{BEGIN} pattern, @command{pgawk} program
15831@cindex @code{END} pattern, @command{pgawk} program
15832@example
15833        # gawk profile, created Sun Aug 13 00:00:15 2000
15834
15835        # BEGIN block(s)
15836
15837        BEGIN @{
15838     1          print "First BEGIN rule"
15839     1          print "Second BEGIN rule"
15840        @}
15841
15842        # Rule(s)
15843
15844     5  /foo/   @{ # 2
15845     2          print "matched /foo/, gosh"
15846     6          for (i = 1; i <= 3; i++) @{
15847     6                  sing()
15848                @}
15849        @}
15850
15851     5  @{
15852     5          if (/foo/) @{ # 2
15853     2                  print "if is true"
15854     3          @} else @{
15855     3                  print "else is true"
15856                @}
15857        @}
15858
15859        # END block(s)
15860
15861        END @{
15862     1          print "First END rule"
15863     1          print "Second END rule"
15864        @}
15865
15866        # Functions, listed alphabetically
15867
15868     6  function sing(dummy)
15869        @{
15870     6          print "I gotta be me!"
15871        @}
15872@end example
15873
15874This example illustrates many of the basic rules for profiling output.
15875The rules are as follows:
15876
15877@itemize @bullet
15878@item
15879The program is printed in the order @code{BEGIN} rule,
15880pattern/action rules, @code{END} rule and functions, listed
15881alphabetically.
15882Multiple @code{BEGIN} and @code{END} rules are merged together.
15883
15884@cindex patterns, counts
15885@item
15886Pattern-action rules have two counts.
15887The first count, to the left of the rule, shows how many times
15888the rule's pattern was @emph{tested}.
15889The second count, to the right of the rule's opening left brace
15890in a comment,
15891shows how many times the rule's action was @emph{executed}.
15892The difference between the two indicates how many times the rule's
15893pattern evaluated to false.
15894
15895@item
15896Similarly,
15897the count for an @code{if}-@code{else} statement shows how many times
15898the condition was tested.
15899To the right of the opening left brace for the @code{if}'s body
15900is a count showing how many times the condition was true.
15901The count for the @code{else}
15902indicates how many times the test failed.
15903
15904@cindex loops, count for header
15905@item
15906The count for a loop header (such as @code{for}
15907or @code{while}) shows how many times the loop test was executed.
15908(Because of this, you can't just look at the count on the first
15909statement in a rule to determine how many times the rule was executed.
15910If the first statement is a loop, the count is misleading.)
15911
15912@cindex functions, user-defined, counts
15913@cindex user-defined, functions, counts
15914@item
15915For user-defined functions, the count next to the @code{function}
15916keyword indicates how many times the function was called.
15917The counts next to the statements in the body show how many times
15918those statements were executed.
15919
15920@cindex @code{@{@}} (braces), @command{pgawk} program
15921@cindex braces (@code{@{@}}), @command{pgawk} program
15922@item
15923The layout uses ``K&R'' style with tabs.
15924Braces are used everywhere, even when
15925the body of an @code{if}, @code{else}, or loop is only a single statement.
15926
15927@cindex @code{()} (parentheses), @command{pgawk} program
15928@cindex parentheses @code{()}, @command{pgawk} program
15929@item
15930Parentheses are used only where needed, as indicated by the structure
15931of the program and the precedence rules.
15932@c extra verbiage here satisfies the copyeditor. ugh.
15933For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
15934the total by four.  However, @samp{3 + 5 * 4} has no parentheses, and
15935means @samp{3 + (5 * 4)}.
15936
15937@item
15938All string concatenations are parenthesized too.
15939(This could be made a bit smarter.)
15940
15941@item
15942Parentheses are used around the arguments to @code{print}
15943and @code{printf} only when
15944the @code{print} or @code{printf} statement is followed by a redirection.
15945Similarly, if
15946the target of a redirection isn't a scalar, it gets parenthesized.
15947
15948@item
15949@command{pgawk} supplies leading comments in
15950front of the @code{BEGIN} and @code{END} rules,
15951the pattern/action rules, and the functions.
15952
15953@end itemize
15954
15955The profiled version of your program may not look exactly like what you
15956typed when you wrote it.  This is because @command{pgawk} creates the
15957profiled version by ``pretty printing'' its internal representation of
15958the program.  The advantage to this is that @command{pgawk} can produce
15959a standard representation.  The disadvantage is that all source-code
15960comments are lost, as are the distinctions among multiple @code{BEGIN}
15961and @code{END} rules.  Also, things such as:
15962
15963@example
15964/foo/
15965@end example
15966
15967@noindent
15968come out as:
15969
15970@example
15971/foo/   @{
15972    print $0
15973@}
15974@end example
15975
15976@noindent
15977which is correct, but possibly surprising.
15978
15979@cindex profiling @command{awk} programs, dynamically
15980@cindex @command{pgawk} program, dynamic profiling
15981Besides creating profiles when a program has completed,
15982@command{pgawk} can produce a profile while it is running.
15983This is useful if your @command{awk} program goes into an
15984infinite loop and you want to see what has been executed.
15985To use this feature, run @command{pgawk} in the background:
15986
15987@example
15988$ pgawk -f myprog &
15989[1] 13992
15990@end example
15991
15992@c comma does NOT start secondary
15993@cindex @command{kill} command, dynamic profiling
15994@cindex @code{USR1} signal
15995@cindex signals, @code{USR1}/@code{SIGUSR1}
15996@noindent
15997The shell prints a job number and process ID number; in this case, 13992.
15998Use the @command{kill} command to send the @code{USR1} signal
15999to @command{pgawk}:
16000
16001@example
16002$ kill -USR1 13992
16003@end example
16004
16005@noindent
16006As usual, the profiled version of the program is written to
16007@file{awkprof.out}, or to a different file if you use the @option{--profile}
16008option.
16009
16010Along with the regular profile, as shown earlier, the profile
16011includes a trace of any active functions:
16012
16013@example
16014# Function Call Stack:
16015
16016#   3. baz
16017#   2. bar
16018#   1. foo
16019# -- main --
16020@end example
16021
16022You may send @command{pgawk} the @code{USR1} signal as many times as you like.
16023Each time, the profile and function call trace are appended to the output
16024profile file.
16025
16026@cindex @code{HUP} signal
16027@cindex signals, @code{HUP}/@code{SIGHUP}
16028If you use the @code{HUP} signal instead of the @code{USR1} signal,
16029@command{pgawk} produces the profile and the function call trace and then exits.
16030
16031@cindex @code{INT} signal (MS-DOS)
16032@cindex signals, @code{INT}/@code{SIGINT} (MS-DOS)
16033@cindex @code{QUIT} signal (MS-DOS)
16034@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-DOS)
16035When @command{pgawk} runs on MS-DOS or MS-Windows, it uses the
16036@code{INT} and @code{QUIT} signals for producing the profile and, in
16037the case of the @code{INT} signal, @command{pgawk} exits.  This is
16038because these systems don't support the @command{kill} command, so the
16039only signals you can deliver to a program are those generated by the
16040keyboard.  The @code{INT} signal is generated by the
16041@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the
16042@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
16043@c ENDOFRANGE advgaw
16044@c ENDOFRANGE gawadv
16045@c ENDOFRANGE pgawk
16046@c ENDOFRANGE awkp
16047@c ENDOFRANGE proawk
16048
16049@node Invoking Gawk
16050@chapter Running @command{awk} and @command{gawk}
16051
16052This @value{CHAPTER} covers how to run awk, both POSIX-standard
16053and @command{gawk}-specific command-line options, and what
16054@command{awk} and
16055@command{gawk} do with non-option arguments.
16056It then proceeds to cover how @command{gawk} searches for source files,
16057obsolete options and/or features, and known bugs in @command{gawk}.
16058This @value{CHAPTER} rounds out the discussion of @command{awk}
16059as a program and as a language.
16060
16061While a number of the options and features described here were
16062discussed in passing earlier in the book, this @value{CHAPTER} provides the
16063full details.
16064
16065@menu
16066* Command Line::                How to run @command{awk}.
16067* Options::                     Command-line options and their meanings.
16068* Other Arguments::             Input file names and variable assignments.
16069* AWKPATH Variable::            Searching directories for @command{awk}
16070                                programs.
16071* Obsolete::                    Obsolete Options and/or features.
16072* Undocumented::                Undocumented Options and Features.
16073* Known Bugs::                  Known Bugs in @command{gawk}.
16074@end menu
16075
16076@node Command Line
16077@section Invoking @command{awk}
16078@cindex command line, invoking @command{awk} from
16079@cindex @command{awk}, invoking
16080@cindex arguments, command-line, invoking @command{awk}
16081@cindex options, command-line, invoking @command{awk}
16082
16083There are two ways to run @command{awk}---with an explicit program or with
16084one or more program files.  Here are templates for both of them; items
16085enclosed in [@dots{}] in these templates are optional:
16086
16087@example
16088awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{}
16089awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
16090@end example
16091
16092@cindex GNU long options
16093@cindex long options
16094@cindex options, long
16095Besides traditional one-letter POSIX-style options, @command{gawk} also
16096supports GNU long options.
16097
16098@cindex dark corner, invoking @command{awk}
16099@cindex lint checking, empty programs
16100It is possible to invoke @command{awk} with an empty program:
16101
16102@example
16103awk '' datafile1 datafile2
16104@end example
16105
16106@cindex @code{--lint} option
16107@noindent
16108Doing so makes little sense, though; @command{awk} exits
16109silently when given an empty program.
16110@value{DARKCORNER}
16111If @option{--lint} has
16112been specified on the command line, @command{gawk} issues a
16113warning that the program is empty.
16114
16115@node Options
16116@section Command-Line Options
16117@c STARTOFRANGE ocl
16118@cindex options, command-line
16119@c STARTOFRANGE clo
16120@cindex command line, options
16121@c STARTOFRANGE gnulo
16122@cindex GNU long options
16123@c STARTOFRANGE longo
16124@cindex options, long
16125
16126Options begin with a dash and consist of a single character.
16127GNU-style long options consist of two dashes and a keyword.
16128The keyword can be abbreviated, as long as the abbreviation allows the option
16129to be uniquely identified.  If the option takes an argument, then the
16130keyword is either immediately followed by an equals sign (@samp{=}) and the
16131argument's value, or the keyword and the argument's value are separated
16132by whitespace.
16133If a particular option with a value is given more than once, it is the
16134last value that counts.
16135
16136@cindex POSIX @command{awk}, GNU long options and
16137Each long option for @command{gawk} has a corresponding
16138POSIX-style option.
16139The long and short options are
16140interchangeable in all contexts.
16141The options and their meanings are as follows:
16142
16143@table @code
16144@item -F @var{fs}
16145@itemx --field-separator @var{fs}
16146@cindex @code{-F} option
16147@cindex @code{--field-separator} option
16148@cindex @code{FS} variable, @code{--field-separator} option and
16149Sets the @code{FS} variable to @var{fs}
16150(@pxref{Field Separators}).
16151
16152@item -f @var{source-file}
16153@itemx --file @var{source-file}
16154@cindex @code{-f} option
16155@cindex @code{--file} option
16156@cindex @command{awk} programs, location of
16157Indicates that the @command{awk} program is to be found in @var{source-file}
16158instead of in the first non-option argument.
16159
16160@item -v @var{var}=@var{val}
16161@itemx --assign @var{var}=@var{val}
16162@cindex @code{-v} option
16163@cindex @code{--assign} option
16164@cindex variables, setting
16165Sets the variable @var{var} to the value @var{val} @emph{before}
16166execution of the program begins.  Such variable values are available
16167inside the @code{BEGIN} rule
16168(@pxref{Other Arguments}).
16169
16170The @option{-v} option can only set one variable, but it can be used
16171more than once, setting another variable each time, like this:
16172@samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}.
16173
16174@c last comma is part of secondary
16175@cindex built-in variables, @code{-v} option, setting with
16176@c last comma is part of tertiary
16177@cindex variables, built-in, @code{-v} option, setting with
16178@strong{Caution:}  Using @option{-v} to set the values of the built-in
16179variables may lead to surprising results.  @command{awk} will reset the
16180values of those variables as it needs to, possibly ignoring any
16181predefined value you may have given.
16182
16183@item -mf @var{N}
16184@itemx -mr @var{N}
16185@cindex @code{-mf}/@code{-mr} options
16186@cindex memory, setting limits
16187Sets various memory limits to the value @var{N}.  The @samp{f} flag sets
16188the maximum number of fields and the @samp{r} flag sets the maximum
16189record size.  These two flags and the @option{-m} option are from the
16190Bell Laboratories research version of Unix @command{awk}.  They are provided
16191for compatibility but otherwise ignored by
16192@command{gawk}, since @command{gawk} has no predefined limits.
16193(The Bell Laboratories @command{awk} no longer needs these options;
16194it continues to accept them to avoid breaking old programs.)
16195
16196@item -W @var{gawk-opt}
16197@cindex @code{-W} option
16198Following the POSIX standard, implementation-specific
16199options are supplied as arguments to the @option{-W} option.  These options
16200also have corresponding GNU-style long options.
16201Note that the long options may be abbreviated, as long as
16202the abbreviations remain unique.
16203The full list of @command{gawk}-specific options is provided next.
16204
16205@item --
16206@cindex command line, options, end of
16207@cindex options, command-line, end of
16208Signals the end of the command-line options.  The following arguments
16209are not treated as options even if they begin with @samp{-}.  This
16210interpretation of @option{--} follows the POSIX argument parsing
16211conventions.
16212
16213@cindex @code{-} (hyphen), filenames beginning with
16214@cindex hyphen (@code{-}), filenames beginning with
16215This is useful if you have @value{FN}s that start with @samp{-},
16216or in shell scripts, if you have @value{FN}s that will be specified
16217by the user that could start with @samp{-}.
16218@end table
16219@c ENDOFRANGE gnulo
16220@c ENDOFRANGE longo
16221
16222The previous list described options mandated by the POSIX standard,
16223as well as options available in the Bell Laboratories version of @command{awk}.
16224The following list describes @command{gawk}-specific options:
16225
16226@table @code
16227@item -W compat
16228@itemx -W traditional
16229@itemx --compat
16230@itemx --traditional
16231@cindex @code{--compat} option
16232@cindex @code{--traditional} option
16233@cindex compatibility mode (@command{gawk}), specifying
16234Specifies @dfn{compatibility mode}, in which the GNU extensions to
16235the @command{awk} language are disabled, so that @command{gawk} behaves just
16236like the Bell Laboratories research version of Unix @command{awk}.
16237@option{--traditional} is the preferred form of this option.
16238@xref{POSIX/GNU},
16239which summarizes the extensions.  Also see
16240@ref{Compatibility Mode}.
16241
16242@item -W copyright
16243@itemx --copyright
16244@cindex @code{--copyright} option
16245@cindex GPL (General Public License), printing
16246Print the short version of the General Public License and then exit.
16247
16248@item -W copyleft
16249@itemx --copyleft
16250@cindex @code{--copyleft} option
16251Just like @option{--copyright}.
16252This option may disappear in a future version of @command{gawk}.
16253
16254@cindex @code{--dump-variables} option
16255@cindex @code{awkvars.out} file
16256@cindex files, @code{awkvars.out}
16257@cindex variables, global, printing list of
16258@item -W dump-variables@r{[}=@var{file}@r{]}
16259@itemx --dump-variables@r{[}=@var{file}@r{]}
16260Prints a sorted list of global variables, their types, and final values
16261to @var{file}.  If no @var{file} is provided, @command{gawk} prints this
16262list to the file named @file{awkvars.out} in the current directory.
16263
16264@c last comma is part of secondary
16265@cindex troubleshooting, typographical errors, global variables
16266Having a list of all global variables is a good way to look for
16267typographical errors in your programs.
16268You would also use this option if you have a large program with a lot of
16269functions, and you want to be sure that your functions don't
16270inadvertently use global variables that you meant to be local.
16271(This is a particularly easy mistake to make with simple variable
16272names like @code{i}, @code{j}, etc.)
16273
16274@item -W gen-po
16275@itemx --gen-po
16276@cindex @code{--gen-po} option
16277@cindex portable object files, generating
16278@cindex files, portable object, generating
16279Analyzes the source program and
16280generates a GNU @code{gettext} Portable Object file on standard
16281output for all string constants that have been marked for translation.
16282@xref{Internationalization},
16283for information about this option.
16284
16285@item -W help
16286@itemx -W usage
16287@itemx --help
16288@itemx --usage
16289@cindex @code{--help} option
16290@cindex @code{--usage} option
16291@cindex GNU long options, printing list of
16292@cindex options, printing list of
16293@cindex printing, list of options
16294Prints a ``usage'' message summarizing the short and long style options
16295that @command{gawk} accepts and then exit.
16296
16297@item -W lint@r{[}=fatal@r{]}
16298@itemx --lint@r{[}=fatal@r{]}
16299@cindex @code{--lint} option
16300@cindex lint checking, issuing warnings
16301@cindex warnings, issuing
16302Warns about constructs that are dubious or nonportable to
16303other @command{awk} implementations.
16304Some warnings are issued when @command{gawk} first reads your program.  Others
16305are issued at runtime, as your program executes.
16306With an optional argument of @samp{fatal},
16307lint warnings become fatal errors.
16308This may be drastic, but its use will certainly encourage the
16309development of cleaner @command{awk} programs.
16310With an optional argument of @samp{invalid}, only warnings about things that are
16311actually invalid are issued. (This is not fully implemented yet.)
16312
16313@item -W lint-old
16314@itemx --lint-old
16315@cindex @code{--lint-old} option
16316Warns about constructs that are not available in the original version of
16317@command{awk} from Version 7 Unix
16318(@pxref{V7/SVR3.1}).
16319
16320@item -W non-decimal-data
16321@itemx --non-decimal-data
16322@cindex @code{--non-decimal-data} option
16323@cindex hexadecimal, values, enabling interpretation of
16324@c comma is part of primary
16325@cindex octal values, enabling interpretation of
16326Enable automatic interpretation of octal and hexadecimal
16327values in input data
16328(@pxref{Nondecimal Data}).
16329
16330@cindex troubleshooting, @code{--non-decimal-data} option
16331@strong{Caution:} This option can severely break old programs.
16332Use with care.
16333
16334@item -W posix
16335@itemx --posix
16336@cindex @code{--posix} option
16337@cindex POSIX mode
16338@c last comma is part of tertiary
16339@cindex @command{gawk}, extensions, disabling
16340Operates in strict POSIX mode.  This disables all @command{gawk}
16341extensions (just like @option{--traditional}) and adds the following additional
16342restrictions:
16343
16344@c IMPORTANT! Keep this list in sync with the one in node POSIX
16345
16346@itemize @bullet
16347@cindex escape sequences, unrecognized
16348@item
16349@code{\x} escape sequences are not recognized
16350(@pxref{Escape Sequences}).
16351
16352@cindex newlines
16353@cindex whitespace, newlines as
16354@item
16355Newlines do not act as whitespace to separate fields when @code{FS} is
16356equal to a single space
16357(@pxref{Fields}).
16358
16359@item
16360Newlines are not allowed after @samp{?} or @samp{:}
16361(@pxref{Conditional Exp}).
16362
16363@item
16364The synonym @code{func} for the keyword @code{function} is not
16365recognized (@pxref{Definition Syntax}).
16366
16367@cindex @code{*} (asterisk), @code{**} operator
16368@cindex asterisk (@code{*}), @code{**} operator
16369@cindex @code{*} (asterisk), @code{**=} operator
16370@cindex asterisk (@code{*}), @code{**=} operator
16371@cindex @code{^} (caret), @code{^} operator
16372@cindex caret (@code{^}), @code{^} operator
16373@cindex @code{^} (caret), @code{^=} operator
16374@cindex caret (@code{^}), @code{^=} operator
16375@item
16376The @samp{**} and @samp{**=} operators cannot be used in
16377place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops},
16378and also @pxref{Assignment Ops}).
16379
16380@cindex @code{FS} variable, as TAB character
16381@item
16382Specifying @samp{-Ft} on the command-line does not set the value
16383of @code{FS} to be a single TAB character
16384(@pxref{Field Separators}).
16385
16386@c comma does not start secondary
16387@cindex @code{fflush} function, unsupported
16388@item
16389The @code{fflush} built-in function is not supported
16390(@pxref{I/O Functions}).
16391@end itemize
16392
16393@c @cindex automatic warnings
16394@c @cindex warnings, automatic
16395@cindex @code{--traditional} option, @code{--posix} option and
16396@cindex @code{--posix} option, @code{--traditional} option and
16397If you supply both @option{--traditional} and @option{--posix} on the
16398command line, @option{--posix} takes precedence. @command{gawk}
16399also issues a warning if both options are supplied.
16400
16401@item -W profile@r{[}=@var{file}@r{]}
16402@itemx --profile@r{[}=@var{file}@r{]}
16403@cindex @code{--profile} option
16404@cindex @command{awk} programs, profiling, enabling
16405Enable profiling of @command{awk} programs
16406(@pxref{Profiling}).
16407By default, profiles are created in a file named @file{awkprof.out}.
16408The optional @var{file} argument allows you to specify a different
16409@value{FN} for the profile file.
16410
16411When run with @command{gawk}, the profile is just a ``pretty printed'' version
16412of the program.  When run with @command{pgawk}, the profile contains execution
16413counts for each statement in the program in the left margin, and function
16414call counts for each function.
16415
16416@item -W re-interval
16417@itemx --re-interval
16418@cindex @code{--re-interval} option
16419@cindex regular expressions, interval expressions and
16420Allows interval expressions
16421(@pxref{Regexp Operators})
16422in regexps.
16423Because interval expressions were traditionally not available in @command{awk},
16424@command{gawk} does not provide them by default. This prevents old @command{awk}
16425programs from breaking.
16426
16427@item -W source @var{program-text}
16428@itemx --source @var{program-text}
16429@cindex @code{--source} option
16430@cindex source code, mixing
16431Allows you to mix source code in files with source
16432code that you enter on the command line.
16433Program source code is taken from the @var{program-text}.
16434This is particularly useful
16435when you have library functions that you want to use from your command-line
16436programs (@pxref{AWKPATH Variable}).
16437
16438@item -W version
16439@itemx --version
16440@cindex @code{--version} option
16441@c last comma is part of tertiary
16442@cindex @command{gawk}, versions of, information about, printing
16443Prints version information for this particular copy of @command{gawk}.
16444This allows you to determine if your copy of @command{gawk} is up to date
16445with respect to whatever the Free Software Foundation is currently
16446distributing.
16447It is also useful for bug reports
16448(@pxref{Bugs}).
16449@end table
16450
16451As long as program text has been supplied,
16452any other options are flagged as invalid with a warning message but
16453are otherwise ignored.
16454
16455@cindex @code{-F} option, @code{-Ft} sets @code{FS} to TAB
16456In compatibility mode, as a special case, if the value of @var{fs} supplied
16457to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB
16458character (@code{"\t"}).  This is true only for @option{--traditional} and not
16459for @option{--posix}
16460(@pxref{Field Separators}).
16461
16462@cindex @code{-f} option, on command line
16463The @option{-f} option may be used more than once on the command line.
16464If it is, @command{awk} reads its program source from all of the named files, as
16465if they had been concatenated together into one big file.  This is
16466useful for creating libraries of @command{awk} functions.  These functions
16467can be written once and then retrieved from a standard place, instead
16468of having to be included into each individual program.
16469(As mentioned in
16470@ref{Definition Syntax},
16471function names must be unique.)
16472
16473Library functions can still be used, even if the program is entered at the terminal,
16474by specifying @samp{-f /dev/tty}.  After typing your program,
16475type @kbd{@value{CTL}-d} (the end-of-file character) to terminate it.
16476(You may also use @samp{-f -} to read program source from the standard
16477input but then you will not be able to also use the standard input as a
16478source of data.)
16479
16480Because it is clumsy using the standard @command{awk} mechanisms to mix source
16481file and command-line @command{awk} programs, @command{gawk} provides the
16482@option{--source} option.  This does not require you to pre-empt the standard
16483input for your source code; it allows you to easily mix command-line
16484and library source code
16485(@pxref{AWKPATH Variable}).
16486
16487@cindex @code{--source} option
16488If no @option{-f} or @option{--source} option is specified, then @command{gawk}
16489uses the first non-option command-line argument as the text of the
16490program source code.
16491
16492@cindex @code{POSIXLY_CORRECT} environment variable
16493@cindex lint checking, @code{POSIXLY_CORRECT} environment variable
16494@cindex POSIX mode
16495If the environment variable @env{POSIXLY_CORRECT} exists,
16496then @command{gawk} behaves in strict POSIX mode, exactly as if
16497you had supplied the @option{--posix} command-line option.
16498Many GNU programs look for this environment variable to turn on
16499strict POSIX mode. If @option{--lint} is supplied on the command line
16500and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT},
16501then it issues a warning message indicating that POSIX
16502mode is in effect.
16503You would typically set this variable in your shell's startup file.
16504For a Bourne-compatible shell (such as @command{bash}), you would add these
16505lines to the @file{.profile} file in your home directory:
16506
16507@example
16508POSIXLY_CORRECT=true
16509export POSIXLY_CORRECT
16510@end example
16511
16512@cindex @command{csh} utility, @code{POSIXLY_CORRECT} environment variable
16513For a @command{csh}-compatible
16514shell,@footnote{Not recommended.}
16515you would add this line to the @file{.login} file in your home directory:
16516
16517@example
16518setenv POSIXLY_CORRECT true
16519@end example
16520
16521@cindex portability, @code{POSIXLY_CORRECT} environment variable
16522Having @env{POSIXLY_CORRECT} set is not recommended for daily use,
16523but it is good for testing the portability of your programs to other
16524environments.
16525@c ENDOFRANGE ocl
16526@c ENDOFRANGE clo
16527
16528@node Other Arguments
16529@section Other Command-Line Arguments
16530@cindex command line, arguments
16531@cindex arguments, command-line
16532
16533Any additional arguments on the command line are normally treated as
16534input files to be processed in the order specified.   However, an
16535argument that has the form @code{@var{var}=@var{value}}, assigns
16536the value @var{value} to the variable @var{var}---it does not specify a
16537file at all.
16538(This was discussed earlier in
16539@ref{Assignment Options}.)
16540
16541@cindex @code{ARGIND} variable, command-line arguments
16542@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments
16543All these arguments are made available to your @command{awk} program in the
16544@code{ARGV} array (@pxref{Built-in Variables}).  Command-line options
16545and the program text (if present) are omitted from @code{ARGV}.
16546All other arguments, including variable assignments, are
16547included.   As each element of @code{ARGV} is processed, @command{gawk}
16548sets the variable @code{ARGIND} to the index in @code{ARGV} of the
16549current element.
16550
16551@cindex input files, variable assignments and
16552The distinction between @value{FN} arguments and variable-assignment
16553arguments is made when @command{awk} is about to open the next input file.
16554At that point in execution, it checks the @value{FN} to see whether
16555it is really a variable assignment; if so, @command{awk} sets the variable
16556instead of reading a file.
16557
16558Therefore, the variables actually receive the given values after all
16559previously specified files have been read.  In particular, the values of
16560variables assigned in this fashion are @emph{not} available inside a
16561@code{BEGIN} rule
16562(@pxref{BEGIN/END}),
16563because such rules are run before @command{awk} begins scanning the argument list.
16564
16565@cindex dark corner, escape sequences
16566The variable values given on the command line are processed for escape
16567sequences (@pxref{Escape Sequences}).
16568@value{DARKCORNER}
16569
16570In some earlier implementations of @command{awk}, when a variable assignment
16571occurred before any @value{FN}s, the assignment would happen @emph{before}
16572the @code{BEGIN} rule was executed.  @command{awk}'s behavior was thus
16573inconsistent; some command-line assignments were available inside the
16574@code{BEGIN} rule, while others were not.  Unfortunately,
16575some applications came to depend
16576upon this ``feature.''  When @command{awk} was changed to be more consistent,
16577the @option{-v} option was added to accommodate applications that depended
16578upon the old behavior.
16579
16580The variable assignment feature is most useful for assigning to variables
16581such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
16582output formats before scanning the @value{DF}s.  It is also useful for
16583controlling state if multiple passes are needed over a @value{DF}.  For
16584example:
16585
16586@cindex files, multiple passes over
16587@example
16588awk 'pass == 1  @{ @var{pass 1 stuff} @}
16589     pass == 2  @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata
16590@end example
16591
16592Given the variable assignment feature, the @option{-F} option for setting
16593the value of @code{FS} is not
16594strictly necessary.  It remains for historical compatibility.
16595
16596@node AWKPATH Variable
16597@section The @env{AWKPATH} Environment Variable
16598@cindex @env{AWKPATH} environment variable
16599@cindex directories, searching
16600@cindex search paths, for source files
16601@cindex differences in @command{awk} and @command{gawk}, @code{AWKPATH} environment variable
16602@ifinfo
16603The previous @value{SECTION} described how @command{awk} program files can be named
16604on the command-line with the @option{-f} option.
16605@end ifinfo
16606In most @command{awk}
16607implementations, you must supply a precise path name for each program
16608file, unless the file is in the current directory.
16609But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option
16610does not contain a @samp{/}, then @command{gawk} searches a list of
16611directories (called the @dfn{search path}), one by one, looking for a
16612file with the specified name.
16613
16614The search path is a string consisting of directory names
16615separated by colons.  @command{gawk} gets its search path from the
16616@env{AWKPATH} environment variable.  If that variable does not exist,
16617@command{gawk} uses a default path,
16618@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
16619may use a different directory; it
16620will depend upon how @command{gawk} was built and installed. The actual
16621directory is the value of @samp{$(datadir)} generated when
16622@command{gawk} was configured.  You probably don't need to worry about this,
16623though.} (Programs written for use by
16624system administrators should use an @env{AWKPATH} variable that
16625does not include the current directory, @file{.}.)
16626
16627The search path feature is particularly useful for building libraries
16628of useful @command{awk} functions.  The library files can be placed in a
16629standard directory in the default path and then specified on
16630the command line with a short @value{FN}.  Otherwise, the full @value{FN}
16631would have to be typed for each file.
16632
16633By using both the @option{--source} and @option{-f} options, your command-line
16634@command{awk} programs can use facilities in @command{awk} library files
16635(@pxref{Library Functions}).
16636Path searching is not done if @command{gawk} is in compatibility mode.
16637This is true for both @option{--traditional} and @option{--posix}.
16638@xref{Options}.
16639
16640@strong{Note:} If you want files in the current directory to be found,
16641you must include the current directory in the path, either by including
16642@file{.} explicitly in the path or by writing a null entry in the
16643path.  (A null entry is indicated by starting or ending the path with a
16644colon or by placing two colons next to each other (@samp{::}).)  If the
16645current directory is not included in the path, then files cannot be
16646found in the current directory.  This path search mechanism is identical
16647to the shell's.
16648@c someday, @cite{The Bourne Again Shell}....
16649
16650Starting with @value{PVERSION} 3.0, if @env{AWKPATH} is not defined in the
16651environment, @command{gawk} places its default search path into
16652@code{ENVIRON["AWKPATH"]}. This makes it easy to determine
16653the actual search path that @command{gawk} will use
16654from within an @command{awk} program.
16655
16656While you can change @code{ENVIRON["AWKPATH"]} within your @command{awk}
16657program, this has no effect on the running program's behavior.  This makes
16658sense: the @env{AWKPATH} environment variable is used to find the program
16659source files.  Once your program is running, all the files have been
16660found, and @command{gawk} no longer needs to use @env{AWKPATH}.
16661
16662@node Obsolete
16663@section Obsolete Options and/or Features
16664
16665@cindex features, advanced, See advanced features
16666@cindex options, deprecated
16667@cindex features, deprecated
16668@cindex obsolete features
16669This @value{SECTION} describes features and/or command-line options from
16670previous releases of @command{gawk} that are either not available in the
16671current version or that are still supported but deprecated (meaning that
16672they will @emph{not} be in the next release).
16673
16674@c update this section for each release!
16675
16676@cindex @code{next file} statement, deprecated
16677@cindex @code{nextfile} statement, @code{next file} statement and
16678For @value{PVERSION} @value{VERSION} of @command{gawk}, there are no
16679deprecated command-line options
16680@c or other deprecated features
16681from the previous version of @command{gawk}.
16682The use of @samp{next file} (two words) for @code{nextfile} was deprecated
16683in @command{gawk} 3.0 but still worked.  Starting with @value{PVERSION} 3.1, the
16684two-word usage is no longer accepted.
16685
16686The process-related special files described in
16687@ref{Special Process},
16688work as described, but
16689are now considered deprecated.
16690@command{gawk} prints a warning message every time they are used.
16691(Use @code{PROCINFO} instead; see
16692@ref{Auto-set}.)
16693They will be removed from the next release of @command{gawk}.
16694
16695@ignore
16696This @value{SECTION}
16697is thus essentially a place holder,
16698in case some option becomes obsolete in a future version of @command{gawk}.
16699@end ignore
16700
16701@node Undocumented
16702@section Undocumented Options and Features
16703@cindex undocumented features
16704@cindex features, undocumented
16705@cindex Skywalker, Luke
16706@cindex Kenobi, Obi-Wan
16707@cindex Jedi knights
16708@cindex Knights, jedi
16709@quotation
16710@i{Use the Source, Luke!}@*
16711Obi-Wan
16712@end quotation
16713
16714This @value{SECTION} intentionally left
16715blank.
16716
16717@ignore
16718@c If these came out in the Info file or TeX document, then they wouldn't
16719@c be undocumented, would they?
16720
16721@command{gawk} has one undocumented option:
16722
16723@table @code
16724@item -W nostalgia
16725@itemx --nostalgia
16726Print the message @code{"awk: bailing out near line 1"} and dump core.
16727This option was inspired by the common behavior of very early versions of
16728Unix @command{awk} and by a t--shirt.
16729The message is @emph{not} subject to translation in non-English locales.
16730@c so there! nyah, nyah.
16731@end table
16732
16733Early versions of @command{awk} used to not require any separator (either
16734a newline or @samp{;}) between the rules in @command{awk} programs.  Thus,
16735it was common to see one-line programs like:
16736
16737@example
16738awk '@{ sum += $1 @} END @{ print sum @}'
16739@end example
16740
16741@command{gawk} actually supports this but it is purposely undocumented
16742because it is considered bad style.  The correct way to write such a program
16743is either
16744
16745@example
16746awk '@{ sum += $1 @} ; END @{ print sum @}'
16747@end example
16748
16749@noindent
16750or
16751
16752@example
16753awk '@{ sum += $1 @}
16754     END @{ print sum @}' data
16755@end example
16756
16757@noindent
16758@xref{Statements/Lines}, for a fuller
16759explanation.
16760
16761You can insert newlines after the @samp{;} in @code{for} loops.
16762This seems to have been a long-undocumented feature in Unix @command{awk}.
16763
16764Similarly, you may use @code{print} or @code{printf} statements in the
16765@var{init} and @var{increment} parts of a @code{for} loop.  This is another
16766long-undocumented ``feature'' of Unix @code{awk}.
16767
16768If the environment variable @env{WHINY_USERS} exists
16769when @command{gawk} is run,
16770then the associative @code{for} loop will go through the array
16771indices in sorted order.
16772The comparison used for sorting is simple string comparison;
16773any non-English or non-ASCII locales are not taken into account.
16774@code{IGNORECASE} does not affect the comparison either.
16775
16776In addition, if @env{WHINY_USERS} is set, the profiled version of a
16777program generated by @option{--profile} will print all 8-bit characters
16778verbatim, instead of using the octal equivalent.
16779
16780@end ignore
16781
16782@node Known Bugs
16783@section Known Bugs in @command{gawk}
16784@cindex @command{gawk}, debugging
16785@cindex debugging @command{gawk}
16786@cindex troubleshooting, @command{gawk}
16787
16788@itemize @bullet
16789@cindex troubleshooting, @code{-F} option
16790@cindex @code{-F} option, troubleshooting
16791@cindex @code{FS} variable, changing value of
16792@item
16793The @option{-F} option for changing the value of @code{FS}
16794(@pxref{Options})
16795is not necessary given the command-line variable
16796assignment feature; it remains only for backward compatibility.
16797
16798@item
16799Syntactically invalid single-character programs tend to overflow
16800the parse stack, generating a rather unhelpful message.  Such programs
16801are surprisingly difficult to diagnose in the completely general case,
16802and the effort to do so really is not worth it.
16803@end itemize
16804
16805@ignore
16806@c Try this
16807@iftex
16808@page
16809@headings off
16810@majorheading II@ @ @ Using @command{awk} and @command{gawk}
16811Part II shows how to use @command{awk} and @command{gawk} for problem solving.
16812There is lots of code here for you to read and learn from.
16813It contains the following chapters:
16814
16815@itemize @bullet
16816@item
16817@ref{Library Functions}.
16818
16819@item
16820@ref{Sample Programs}.
16821
16822@end itemize
16823
16824@page
16825@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
16826@oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
16827@end iftex
16828@end ignore
16829
16830@node Library Functions
16831@chapter A Library of @command{awk} Functions
16832@c STARTOFRANGE libf
16833@cindex libraries of @command{awk} functions
16834@c STARTOFRANGE flib
16835@cindex functions, library
16836@c STARTOFRANGE fudlib
16837@cindex functions, user-defined, library of
16838
16839@ref{User-defined}, describes how to write
16840your own @command{awk} functions.  Writing functions is important, because
16841it allows you to encapsulate algorithms and program tasks in a single
16842place.  It simplifies programming, making program development more
16843manageable, and making programs more readable.
16844
16845One valuable way to learn a new programming language is to @emph{read}
16846programs in that language.  To that end, this @value{CHAPTER}
16847and @ref{Sample Programs},
16848provide a good-sized body of code for you to read,
16849and hopefully, to learn from.
16850
16851@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
16852This @value{CHAPTER} presents a library of useful @command{awk} functions.
16853Many of the sample programs presented later in this @value{DOCUMENT}
16854use these functions.
16855The functions are presented here in a progression from simple to complex.
16856
16857@cindex Texinfo
16858@ref{Extract Program},
16859presents a program that you can use to extract the source code for
16860these example library functions and programs from the Texinfo source
16861for this @value{DOCUMENT}.
16862(This has already been done as part of the @command{gawk} distribution.)
16863
16864If you have written one or more useful, general-purpose @command{awk} functions
16865and would like to contribute them to the author's collection of @command{awk}
16866programs, see
16867@ref{How To Contribute}, for more information.
16868
16869@cindex portability, example programs
16870The programs in this @value{CHAPTER} and in
16871@ref{Sample Programs},
16872freely use features that are @command{gawk}-specific.
16873Rewriting these programs for different implementations of awk is pretty straightforward.
16874
16875Diagnostic error messages are sent to @file{/dev/stderr}.
16876Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"} if your system
16877does not have a @file{/dev/stderr}, or if you cannot use @command{gawk}.
16878
16879A number of programs use @code{nextfile}
16880(@pxref{Nextfile Statement})
16881to skip any remaining input in the input file.
16882@ref{Nextfile Function},
16883shows you how to write a function that does the same thing.
16884
16885@c 12/2000: Thanks to Nelson Beebe for pointing out the output issue.
16886@cindex case sensitivity, example programs
16887@cindex @code{IGNORECASE} variable, in example programs
16888Finally, some of the programs choose to ignore upper- and lowercase
16889distinctions in their input. They do so by assigning one to @code{IGNORECASE}.
16890You can achieve almost the same effect@footnote{The effects are
16891not identical.  Output of the transformed
16892record will be in all lowercase, while @code{IGNORECASE} preserves the original
16893contents of the input record.} by adding the following rule to the
16894beginning of the program:
16895
16896@example
16897# ignore case
16898@{ $0 = tolower($0) @}
16899@end example
16900
16901@noindent
16902Also, verify that all regexp and string constants used in
16903comparisons use only lowercase letters.
16904
16905@menu
16906* Library Names::               How to best name private global variables in
16907                                library functions.
16908* General Functions::           Functions that are of general use.
16909* Data File Management::        Functions for managing command-line data
16910                                files.
16911* Getopt Function::             A function for processing command-line
16912                                arguments.
16913* Passwd Functions::            Functions for getting user information.
16914* Group Functions::             Functions for getting group information.
16915@end menu
16916
16917@node Library Names
16918@section Naming Library Function Global Variables
16919
16920@cindex names, arrays/variables
16921@cindex names, functions
16922@cindex namespace issues
16923@cindex @command{awk} programs, documenting
16924@cindex documentation, of @command{awk} programs
16925Due to the way the @command{awk} language evolved, variables are either
16926@dfn{global} (usable by the entire program) or @dfn{local} (usable just by
16927a specific function).  There is no intermediate state analogous to
16928@code{static} variables in C.
16929
16930@cindex variables, global, for library functions
16931@cindex private variables
16932@cindex variables, private
16933Library functions often need to have global variables that they can use to
16934preserve state information between calls to the function---for example,
16935@code{getopt}'s variable @code{_opti}
16936(@pxref{Getopt Function}).
16937Such variables are called @dfn{private}, since the only functions that need to
16938use them are the ones in the library.
16939
16940When writing a library function, you should try to choose names for your
16941private variables that will not conflict with any variables used by
16942either another library function or a user's main program.  For example, a
16943name like @samp{i} or @samp{j} is not a good choice, because user programs
16944often use variable names like these for their own purposes.
16945
16946@cindex programming conventions, private variable names
16947The example programs shown in this @value{CHAPTER} all start the names of their
16948private variables with an underscore (@samp{_}).  Users generally don't use
16949leading underscores in their variable names, so this convention immediately
16950decreases the chances that the variable name will be accidentally shared
16951with the user's program.
16952
16953@cindex @code{_} (underscore), in names of private variables
16954@cindex underscore (@code{_}), in names of private variables
16955In addition, several of the library functions use a prefix that helps
16956indicate what function or set of functions use the variables---for example,
16957@code{_pw_byname} in the user database routines
16958(@pxref{Passwd Functions}).
16959This convention is recommended, since it even further decreases the
16960chance of inadvertent conflict among variable names.  Note that this
16961convention is used equally well for variable names and for private
16962function names as well.@footnote{While all the library routines could have
16963been rewritten to use this convention, this was not done, in order to
16964show how my own @command{awk} programming style has evolved and to
16965provide some basis for this discussion.}
16966
16967As a final note on variable naming, if a function makes global variables
16968available for use by a main program, it is a good convention to start that
16969variable's name with a capital letter---for
16970example, @code{getopt}'s @code{Opterr} and @code{Optind} variables
16971(@pxref{Getopt Function}).
16972The leading capital letter indicates that it is global, while the fact that
16973the variable name is not all capital letters indicates that the variable is
16974not one of @command{awk}'s built-in variables, such as @code{FS}.
16975
16976@cindex @code{--dump-variables} option
16977It is also important that @emph{all} variables in library
16978functions that do not need to save state are, in fact, declared
16979local.@footnote{@command{gawk}'s @option{--dump-variables} command-line
16980option is useful for verifying this.} If this is not done, the variable
16981could accidentally be used in the user's program, leading to bugs that
16982are very difficult to track down:
16983
16984@example
16985function lib_func(x, y,    l1, l2)
16986@{
16987    @dots{}
16988    @var{use variable} some_var   # some_var should be local
16989    @dots{}                   # but is not by oversight
16990@}
16991@end example
16992
16993@cindex arrays, associative, library functions and
16994@cindex libraries of @command{awk} functions, associative arrays and
16995@cindex functions, library, associative arrays and
16996@cindex Tcl
16997A different convention, common in the Tcl community, is to use a single
16998associative array to hold the values needed by the library function(s), or
16999``package.''  This significantly decreases the number of actual global names
17000in use.  For example, the functions described in
17001@ref{Passwd Functions},
17002might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}},
17003@code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of
17004@code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}},
17005and @code{@w{_pw_count}}.
17006
17007The conventions presented in this @value{SECTION} are exactly
17008that: conventions. You are not required to write your programs this
17009way---we merely recommend that you do so.
17010
17011@node General Functions
17012@section General Programming
17013
17014This @value{SECTION} presents a number of functions that are of general
17015programming use.
17016
17017@menu
17018* Nextfile Function::           Two implementations of a @code{nextfile}
17019                                function.
17020* Assert Function::             A function for assertions in @command{awk}
17021                                programs.
17022* Round Function::              A function for rounding if @code{sprintf} does
17023                                not do it correctly.
17024* Cliff Random Function::       The Cliff Random Number Generator.
17025* Ordinal Functions::           Functions for using characters as numbers and
17026                                vice versa.
17027* Join Function::               A function to join an array into a string.
17028* Gettimeofday Function::       A function to get formatted times.
17029@end menu
17030
17031@node Nextfile Function
17032@subsection Implementing @code{nextfile} as a Function
17033
17034@cindex input files, skipping
17035@c STARTOFRANGE libfnex
17036@cindex libraries of @command{awk} functions, @code{nextfile} statement
17037@c STARTOFRANGE flibnex
17038@cindex functions, library, @code{nextfile} statement
17039@c STARTOFRANGE nexim
17040@cindex @code{nextfile} statement, implementing
17041@cindex @command{gawk}, @code{nextfile} statement in
17042The @code{nextfile} statement, presented in
17043@ref{Nextfile Statement},
17044is a @command{gawk}-specific extension---it is not available in most other
17045implementations of @command{awk}.  This @value{SECTION} shows two versions of a
17046@code{nextfile} function that you can use to simulate @command{gawk}'s
17047@code{nextfile} statement if you cannot use @command{gawk}.
17048
17049A first attempt at writing a @code{nextfile} function is as follows:
17050
17051@example
17052# nextfile --- skip remaining records in current file
17053# this should be read in before the "main" awk program
17054
17055function nextfile()    @{ _abandon_ = FILENAME; next @}
17056_abandon_ == FILENAME  @{ next @}
17057@end example
17058
17059@cindex programming conventions, @code{nextfile} statement
17060Because it supplies a rule that must be executed first, this file should
17061be included before the main program. This rule compares the current
17062@value{DF}'s name (which is always in the @code{FILENAME} variable) to
17063a private variable named @code{_abandon_}.  If the @value{FN} matches,
17064then the action part of the rule executes a @code{next} statement to
17065go on to the next record.  (The use of @samp{_} in the variable name is
17066a convention.  It is discussed more fully in
17067@ref{Library Names}.)
17068
17069The use of the @code{next} statement effectively creates a loop that reads
17070all the records from the current @value{DF}.
17071The end of the file is eventually reached and
17072a new @value{DF} is opened, changing the value of @code{FILENAME}.
17073Once this happens, the comparison of @code{_abandon_} to @code{FILENAME}
17074fails, and execution continues with the first rule of the ``real'' program.
17075
17076The @code{nextfile} function itself simply sets the value of @code{_abandon_}
17077and then executes a @code{next} statement to start the
17078loop.
17079@ignore
17080@c If the function can't be used on other versions of awk, this whole
17081@c section is pointless, no?  Sigh.
17082@footnote{@command{gawk} is the only known @command{awk} implementation
17083that allows you to
17084execute @code{next} from within a function body. Some other workaround
17085is necessary if you are not using @command{gawk}.}
17086@end ignore
17087
17088@cindex @code{nextfile} user-defined function
17089This initial version has a subtle problem.
17090If the same @value{DF} is listed @emph{twice} on the commandline,
17091one right after the other
17092or even with just a variable assignment between them,
17093this code skips right through the file a second time, even though
17094it should stop when it gets to the end of the first occurrence.
17095A second version of @code{nextfile} that remedies this problem
17096is shown here:
17097
17098@example
17099@c file eg/lib/nextfile.awk
17100# nextfile --- skip remaining records in current file
17101# correctly handle successive occurrences of the same file
17102@c endfile
17103@ignore
17104@c file eg/lib/nextfile.awk
17105#
17106# Arnold Robbins, arnold@@gnu.org, Public Domain
17107# May, 1993
17108
17109@c endfile
17110@end ignore
17111@c file eg/lib/nextfile.awk
17112# this should be read in before the "main" awk program
17113
17114function nextfile()   @{ _abandon_ = FILENAME; next @}
17115
17116_abandon_ == FILENAME @{
17117      if (FNR == 1)
17118          _abandon_ = ""
17119      else
17120          next
17121@}
17122@c endfile
17123@end example
17124
17125The @code{nextfile} function has not changed.  It makes @code{_abandon_}
17126equal to the current @value{FN} and then executes a @code{next} statement.
17127The @code{next} statement reads the next record and increments @code{FNR}
17128so that @code{FNR} is guaranteed to have a value of at least two.
17129However, if @code{nextfile} is called for the last record in the file,
17130then @command{awk} closes the current @value{DF} and moves on to the next
17131one.  Upon doing so, @code{FILENAME} is set to the name of the new file
17132and @code{FNR} is reset to one.  If this next file is the same as
17133the previous one, @code{_abandon_} is still equal to @code{FILENAME}.
17134However, @code{FNR} is equal to one, telling us that this is a new
17135occurrence of the file and not the one we were reading when the
17136@code{nextfile} function was executed.  In that case, @code{_abandon_}
17137is reset to the empty string, so that further executions of this rule
17138fail (until the next time that @code{nextfile} is called).
17139
17140If @code{FNR} is not one, then we are still in the original @value{DF}
17141and the program executes a @code{next} statement to skip through it.
17142
17143An important question to ask at this point is: given that the
17144functionality of @code{nextfile} can be provided with a library file,
17145why is it built into @command{gawk}?  Adding
17146features for little reason leads to larger, slower programs that are
17147harder to maintain.
17148The answer is that building @code{nextfile} into @command{gawk} provides
17149significant gains in efficiency.  If the @code{nextfile} function is executed
17150at the beginning of a large @value{DF}, @command{awk} still has to scan the entire
17151file, splitting it up into records,
17152@c at least conceptually
17153just to skip over it.  The built-in
17154@code{nextfile} can simply close the file immediately and proceed to the
17155next one, which saves a lot of time.  This is particularly important in
17156@command{awk}, because @command{awk} programs are generally I/O-bound (i.e.,
17157they spend most of their time doing input and output, instead of performing
17158computations).
17159@c ENDOFRANGE libfnex
17160@c ENDOFRANGE flibnex
17161@c ENDOFRANGE nexim
17162
17163@node Assert Function
17164@subsection Assertions
17165
17166@c STARTOFRANGE asse
17167@cindex assertions
17168@c STARTOFRANGE assef
17169@cindex @code{assert} function (C library)
17170@c STARTOFRANGE libfass
17171@cindex libraries of @command{awk} functions, assertions
17172@c STARTOFRANGE flibass
17173@cindex functions, library, assertions
17174@cindex @command{awk} programs, lengthy, assertions
17175When writing large programs, it is often useful to know
17176that a condition or set of conditions is true.  Before proceeding with a
17177particular computation, you make a statement about what you believe to be
17178the case.  Such a statement is known as an
17179@dfn{assertion}.  The C language provides an @code{<assert.h>} header file
17180and corresponding @code{assert} macro that the programmer can use to make
17181assertions.  If an assertion fails, the @code{assert} macro arranges to
17182print a diagnostic message describing the condition that should have
17183been true but was not, and then it kills the program.  In C, using
17184@code{assert} looks this:
17185
17186@example
17187#include <assert.h>
17188
17189int myfunc(int a, double b)
17190@{
17191     assert(a <= 5 && b >= 17.1);
17192     @dots{}
17193@}
17194@end example
17195
17196If the assertion fails, the program prints a message similar to this:
17197
17198@example
17199prog.c:5: assertion failed: a <= 5 && b >= 17.1
17200@end example
17201
17202@cindex @code{assert} user-defined function
17203The C language makes it possible to turn the condition into a string for use
17204in printing the diagnostic message.  This is not possible in @command{awk}, so
17205this @code{assert} function also requires a string version of the condition
17206that is being tested.
17207Following is the function:
17208
17209@example
17210@c file eg/lib/assert.awk
17211# assert --- assert that a condition is true. Otherwise exit.
17212@c endfile
17213@ignore
17214@c file eg/lib/assert.awk
17215
17216#
17217# Arnold Robbins, arnold@@gnu.org, Public Domain
17218# May, 1993
17219
17220@c endfile
17221@end ignore
17222@c file eg/lib/assert.awk
17223function assert(condition, string)
17224@{
17225    if (! condition) @{
17226        printf("%s:%d: assertion failed: %s\n",
17227            FILENAME, FNR, string) > "/dev/stderr"
17228        _assert_exit = 1
17229        exit 1
17230    @}
17231@}
17232
17233@group
17234END @{
17235    if (_assert_exit)
17236        exit 1
17237@}
17238@end group
17239@c endfile
17240@end example
17241
17242The @code{assert} function tests the @code{condition} parameter. If it
17243is false, it prints a message to standard error, using the @code{string}
17244parameter to describe the failed condition.  It then sets the variable
17245@code{_assert_exit} to one and executes the @code{exit} statement.
17246The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
17247rules finds @code{_assert_exit} to be true, it exits immediately.
17248
17249The purpose of the test in the @code{END} rule is to
17250keep any other @code{END} rules from running.  When an assertion fails, the
17251program should exit immediately.
17252If no assertions fail, then @code{_assert_exit} is still
17253false when the @code{END} rule is run normally, and the rest of the
17254program's @code{END} rules execute.
17255For all of this to work correctly, @file{assert.awk} must be the
17256first source file read by @command{awk}.
17257The function can be used in a program in the following way:
17258
17259@example
17260function myfunc(a, b)
17261@{
17262     assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
17263     @dots{}
17264@}
17265@end example
17266
17267@noindent
17268If the assertion fails, you see a message similar to the following:
17269
17270@example
17271mydata:1357: assertion failed: a <= 5 && b >= 17.1
17272@end example
17273
17274@cindex @code{END} pattern, @code{assert} user-defined function and
17275There is a small problem with this version of @code{assert}.
17276An @code{END} rule is automatically added
17277to the program calling @code{assert}.  Normally, if a program consists
17278of just a @code{BEGIN} rule, the input files and/or standard input are
17279not read. However, now that the program has an @code{END} rule, @command{awk}
17280attempts to read the input @value{DF}s or standard input
17281(@pxref{Using BEGIN/END}),
17282most likely causing the program to hang as it waits for input.
17283
17284@cindex @code{BEGIN} pattern, @code{assert} user-defined function and
17285There is a simple workaround to this:
17286make sure the @code{BEGIN} rule always ends
17287with an @code{exit} statement.
17288@c ENDOFRANGE asse
17289@c ENDOFRANGE assef
17290@c ENDOFRANGE flibass
17291@c ENDOFRANGE libfass
17292
17293@node Round Function
17294@subsection Rounding Numbers
17295
17296@cindex rounding
17297@cindex rounding numbers
17298@cindex numbers, rounding
17299@cindex libraries of @command{awk} functions, rounding numbers
17300@cindex functions, library, rounding numbers
17301@cindex @code{print} statement, @code{sprintf} function and
17302@cindex @code{printf} statement, @code{sprintf} function and
17303@cindex @code{sprintf} function, @code{print}/@code{printf} statements and
17304The way @code{printf} and @code{sprintf}
17305(@pxref{Printf})
17306perform rounding often depends upon the system's C @code{sprintf}
17307subroutine.  On many machines, @code{sprintf} rounding is ``unbiased,''
17308which means it doesn't always round a trailing @samp{.5} up, contrary
17309to naive expectations.  In unbiased rounding, @samp{.5} rounds to even,
17310rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4.  This means
17311that if you are using a format that does rounding (e.g., @code{"%.0f"}),
17312you should check what your system does.  The following function does
17313traditional rounding; it might be useful if your awk's @code{printf}
17314does unbiased rounding:
17315
17316@cindex @code{round} user-defined function
17317@example
17318@c file eg/lib/round.awk
17319# round.awk --- do normal rounding
17320@c endfile
17321@ignore
17322@c file eg/lib/round.awk
17323#
17324# Arnold Robbins, arnold@@gnu.org, Public Domain
17325# August, 1996
17326
17327@c endfile
17328@end ignore
17329@c file eg/lib/round.awk
17330function round(x,   ival, aval, fraction)
17331@{
17332   ival = int(x)    # integer part, int() truncates
17333
17334   # see if fractional part
17335   if (ival == x)   # no fraction
17336      return x
17337
17338   if (x < 0) @{
17339      aval = -x     # absolute value
17340      ival = int(aval)
17341      fraction = aval - ival
17342      if (fraction >= .5)
17343         return int(x) - 1   # -2.5 --> -3
17344      else
17345         return int(x)       # -2.3 --> -2
17346   @} else @{
17347      fraction = x - ival
17348      if (fraction >= .5)
17349         return ival + 1
17350      else
17351         return ival
17352   @}
17353@}
17354
17355# test harness
17356@{ print $0, round($0) @}
17357@c endfile
17358@end example
17359
17360@node Cliff Random Function
17361@subsection The Cliff Random Number Generator
17362@cindex random numbers, Cliff
17363@cindex Cliff random numbers
17364@cindex numbers, Cliff random
17365@cindex functions, library, Cliff random numbers
17366
17367The Cliff random number
17368generator@footnote{@uref{http://mathworld.wolfram.com/CliffRandomNumberGenerator.hmtl}}
17369is a very simple random number generator that ``passes the noise sphere test
17370for randomness by showing no structure.''
17371It is easily programmed, in less than 10 lines of @command{awk} code:
17372
17373@cindex @code{cliff_rand} user-defined function
17374@example
17375@c file eg/lib/cliff_rand.awk
17376# cliff_rand.awk --- generate Cliff random numbers
17377@c endfile
17378@ignore
17379@c file eg/lib/cliff_rand.awk
17380#
17381# Arnold Robbins, arnold@@gnu.org, Public Domain
17382# December 2000
17383
17384@c endfile
17385@end ignore
17386@c file eg/lib/cliff_rand.awk
17387BEGIN @{ _cliff_seed = 0.1 @}
17388
17389function cliff_rand()
17390@{
17391    _cliff_seed = (100 * log(_cliff_seed)) % 1
17392    if (_cliff_seed < 0)
17393        _cliff_seed = - _cliff_seed
17394    return _cliff_seed
17395@}
17396@c endfile
17397@end example
17398
17399This algorithm requires an initial ``seed'' of 0.1.  Each new value
17400uses the current seed as input for the calculation.
17401If the built-in @code{rand} function
17402(@pxref{Numeric Functions})
17403isn't random enough, you might try using this function instead.
17404
17405@node Ordinal Functions
17406@subsection Translating Between Characters and Numbers
17407
17408@cindex libraries of @command{awk} functions, character values as numbers
17409@cindex functions, library, character values as numbers
17410@cindex characters, values of as numbers
17411@cindex numbers, as values of characters
17412One commercial implementation of @command{awk} supplies a built-in function,
17413@code{ord}, which takes a character and returns the numeric value for that
17414character in the machine's character set.  If the string passed to
17415@code{ord} has more than one character, only the first one is used.
17416
17417The inverse of this function is @code{chr} (from the function of the same
17418name in Pascal), which takes a number and returns the corresponding character.
17419Both functions are written very nicely in @command{awk}; there is no real
17420reason to build them into the @command{awk} interpreter:
17421
17422@cindex @code{ord} user-defined function
17423@cindex @code{chr} user-defined function
17424@example
17425@c file eg/lib/ord.awk
17426# ord.awk --- do ord and chr
17427
17428# Global identifiers:
17429#    _ord_:        numerical values indexed by characters
17430#    _ord_init:    function to initialize _ord_
17431@c endfile
17432@ignore
17433@c file eg/lib/ord.awk
17434#
17435# Arnold Robbins, arnold@@gnu.org, Public Domain
17436# 16 January, 1992
17437# 20 July, 1992, revised
17438
17439@c endfile
17440@end ignore
17441@c file eg/lib/ord.awk
17442BEGIN    @{ _ord_init() @}
17443
17444function _ord_init(    low, high, i, t)
17445@{
17446    low = sprintf("%c", 7) # BEL is ascii 7
17447    if (low == "\a") @{    # regular ascii
17448        low = 0
17449        high = 127
17450    @} else if (sprintf("%c", 128 + 7) == "\a") @{
17451        # ascii, mark parity
17452        low = 128
17453        high = 255
17454    @} else @{        # ebcdic(!)
17455        low = 0
17456        high = 255
17457    @}
17458
17459    for (i = low; i <= high; i++) @{
17460        t = sprintf("%c", i)
17461        _ord_[t] = i
17462    @}
17463@}
17464@c endfile
17465@end example
17466
17467@cindex character sets
17468@cindex character encodings
17469@cindex ASCII
17470@cindex EBCDIC
17471@cindex mark parity
17472Some explanation of the numbers used by @code{chr} is worthwhile.
17473The most prominent character set in use today is ASCII. Although an
174748-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
17475defines characters that use the values from 0 to 127.@footnote{ASCII
17476has been extended in many countries to use the values from 128 to 255
17477for country-specific characters.  If your  system uses these extensions,
17478you can simplify @code{_ord_init} to simply loop from 0 to 255.}
17479In the now distant past,
17480at least one minicomputer manufacturer
17481@c Pr1me, blech
17482used ASCII, but with mark parity, meaning that the leftmost bit in the byte
17483is always 1.  This means that on those systems, characters
17484have numeric values from 128 to 255.
17485Finally, large mainframe systems use the EBCDIC character set, which
17486uses all 256 values.
17487While there are other character sets in use on some older systems,
17488they are not really worth worrying about:
17489
17490@example
17491@c file eg/lib/ord.awk
17492function ord(str,    c)
17493@{
17494    # only first character is of interest
17495    c = substr(str, 1, 1)
17496    return _ord_[c]
17497@}
17498
17499function chr(c)
17500@{
17501    # force c to be numeric by adding 0
17502    return sprintf("%c", c + 0)
17503@}
17504@c endfile
17505
17506#### test code ####
17507# BEGIN    \
17508# @{
17509#    for (;;) @{
17510#        printf("enter a character: ")
17511#        if (getline var <= 0)
17512#            break
17513#        printf("ord(%s) = %d\n", var, ord(var))
17514#    @}
17515# @}
17516@c endfile
17517@end example
17518
17519An obvious improvement to these functions is to move the code for the
17520@code{@w{_ord_init}} function into the body of the @code{BEGIN} rule.  It was
17521written this way initially for ease of development.
17522There is a ``test program'' in a @code{BEGIN} rule, to test the
17523function.  It is commented out for production use.
17524
17525@node Join Function
17526@subsection Merging an Array into a String
17527
17528@cindex libraries of @command{awk} functions, merging arrays into strings
17529@cindex functions, library, merging arrays into strings
17530@cindex strings, merging arrays into
17531@cindex arrays, merging into strings
17532When doing string processing, it is often useful to be able to join
17533all the strings in an array into one long string.  The following function,
17534@code{join}, accomplishes this task.  It is used later in several of
17535the application programs
17536(@pxref{Sample Programs}).
17537
17538Good function design is important; this function needs to be general but it
17539should also have a reasonable default behavior.  It is called with an array
17540as well as the beginning and ending indices of the elements in the array to be
17541merged.  This assumes that the array indices are numeric---a reasonable
17542assumption since the array was likely created with @code{split}
17543(@pxref{String Functions}):
17544
17545@cindex @code{join} user-defined function
17546@example
17547@c file eg/lib/join.awk
17548# join.awk --- join an array into a string
17549@c endfile
17550@ignore
17551@c file eg/lib/join.awk
17552#
17553# Arnold Robbins, arnold@@gnu.org, Public Domain
17554# May 1993
17555
17556@c endfile
17557@end ignore
17558@c file eg/lib/join.awk
17559function join(array, start, end, sep,    result, i)
17560@{
17561    if (sep == "")
17562       sep = " "
17563    else if (sep == SUBSEP) # magic value
17564       sep = ""
17565    result = array[start]
17566    for (i = start + 1; i <= end; i++)
17567        result = result sep array[i]
17568    return result
17569@}
17570@c endfile
17571@end example
17572
17573An optional additional argument is the separator to use when joining the
17574strings back together.  If the caller supplies a nonempty value,
17575@code{join} uses it; if it is not supplied, it has a null
17576value.  In this case, @code{join} uses a single blank as a default
17577separator for the strings.  If the value is equal to @code{SUBSEP},
17578then @code{join} joins the strings with no separator between them.
17579@code{SUBSEP} serves as a ``magic'' value to indicate that there should
17580be no separation between the component strings.@footnote{It would
17581be nice if @command{awk} had an assignment operator for concatenation.
17582The lack of an explicit operator for concatenation makes string operations
17583more difficult than they really need to be.}
17584
17585@node Gettimeofday Function
17586@subsection Managing the Time of Day
17587
17588@cindex libraries of @command{awk} functions, managing, time
17589@cindex functions, library, managing time
17590@cindex timestamps, formatted
17591@cindex time, managing
17592The @code{systime} and @code{strftime} functions described in
17593@ref{Time Functions},
17594provide the minimum functionality necessary for dealing with the time of day
17595in human readable form.  While @code{strftime} is extensive, the control
17596formats are not necessarily easy to remember or intuitively obvious when
17597reading a program.
17598
17599The following function, @code{gettimeofday}, populates a user-supplied array
17600with preformatted time information.  It returns a string with the current
17601time formatted in the same way as the @command{date} utility:
17602
17603@cindex @code{gettimeofday} user-defined function
17604@example
17605@c file eg/lib/gettime.awk
17606# gettimeofday.awk --- get the time of day in a usable format
17607@c endfile
17608@ignore
17609@c file eg/lib/gettime.awk
17610#
17611# Arnold Robbins, arnold@@gnu.org, Public Domain, May 1993
17612#
17613@c endfile
17614@end ignore
17615@c file eg/lib/gettime.awk
17616
17617# Returns a string in the format of output of date(1)
17618# Populates the array argument time with individual values:
17619#    time["second"]       -- seconds (0 - 59)
17620#    time["minute"]       -- minutes (0 - 59)
17621#    time["hour"]         -- hours (0 - 23)
17622#    time["althour"]      -- hours (0 - 12)
17623#    time["monthday"]     -- day of month (1 - 31)
17624#    time["month"]        -- month of year (1 - 12)
17625#    time["monthname"]    -- name of the month
17626#    time["shortmonth"]   -- short name of the month
17627#    time["year"]         -- year modulo 100 (0 - 99)
17628#    time["fullyear"]     -- full year
17629#    time["weekday"]      -- day of week (Sunday = 0)
17630#    time["altweekday"]   -- day of week (Monday = 0)
17631#    time["dayname"]      -- name of weekday
17632#    time["shortdayname"] -- short name of weekday
17633#    time["yearday"]      -- day of year (0 - 365)
17634#    time["timezone"]     -- abbreviation of timezone name
17635#    time["ampm"]         -- AM or PM designation
17636#    time["weeknum"]      -- week number, Sunday first day
17637#    time["altweeknum"]   -- week number, Monday first day
17638
17639function gettimeofday(time,    ret, now, i)
17640@{
17641    # get time once, avoids unnecessary system calls
17642    now = systime()
17643
17644    # return date(1)-style output
17645    ret = strftime("%a %b %d %H:%M:%S %Z %Y", now)
17646
17647    # clear out target array
17648    delete time
17649
17650    # fill in values, force numeric values to be
17651    # numeric by adding 0
17652    time["second"]       = strftime("%S", now) + 0
17653    time["minute"]       = strftime("%M", now) + 0
17654    time["hour"]         = strftime("%H", now) + 0
17655    time["althour"]      = strftime("%I", now) + 0
17656    time["monthday"]     = strftime("%d", now) + 0
17657    time["month"]        = strftime("%m", now) + 0
17658    time["monthname"]    = strftime("%B", now)
17659    time["shortmonth"]   = strftime("%b", now)
17660    time["year"]         = strftime("%y", now) + 0
17661    time["fullyear"]     = strftime("%Y", now) + 0
17662    time["weekday"]      = strftime("%w", now) + 0
17663    time["altweekday"]   = strftime("%u", now) + 0
17664    time["dayname"]      = strftime("%A", now)
17665    time["shortdayname"] = strftime("%a", now)
17666    time["yearday"]      = strftime("%j", now) + 0
17667    time["timezone"]     = strftime("%Z", now)
17668    time["ampm"]         = strftime("%p", now)
17669    time["weeknum"]      = strftime("%U", now) + 0
17670    time["altweeknum"]   = strftime("%W", now) + 0
17671
17672    return ret
17673@}
17674@c endfile
17675@end example
17676
17677The string indices are easier to use and read than the various formats
17678required by @code{strftime}.  The @code{alarm} program presented in
17679@ref{Alarm Program},
17680uses this function.
17681A more general design for the @code{gettimeofday} function would have
17682allowed the user to supply an optional timestamp value to use instead
17683of the current time.
17684
17685@node Data File Management
17686@section @value{DDF} Management
17687
17688@c STARTOFRANGE dataf
17689@cindex files, managing
17690@c STARTOFRANGE libfdataf
17691@cindex libraries of @command{awk} functions, managing, @value{DF}s
17692@c STARTOFRANGE flibdataf
17693@cindex functions, library, managing @value{DF}s
17694This @value{SECTION} presents functions that are useful for managing
17695command-line @value{DF}s.
17696
17697@menu
17698* Filetrans Function::          A function for handling data file transitions.
17699* Rewind Function::             A function for rereading the current file.
17700* File Checking::               Checking that data files are readable.
17701* Empty Files::                 Checking for zero-length files.
17702* Ignoring Assigns::            Treating assignments as file names.
17703@end menu
17704
17705@node Filetrans Function
17706@subsection Noting @value{DDF} Boundaries
17707
17708@cindex files, managing, @value{DF} boundaries
17709@cindex files, initialization and cleanup
17710The @code{BEGIN} and @code{END} rules are each executed exactly once at
17711the beginning and end of your @command{awk} program, respectively
17712(@pxref{BEGIN/END}).
17713We (the @command{gawk} authors) once had a user who mistakenly thought that the
17714@code{BEGIN} rule is executed at the beginning of each @value{DF} and the
17715@code{END} rule is executed at the end of each @value{DF}.  When informed
17716that this was not the case, the user requested that we add new special
17717patterns to @command{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that
17718would have the desired behavior.  He even supplied us the code to do so.
17719
17720Adding these special patterns to @command{gawk} wasn't necessary;
17721the job can be done cleanly in @command{awk} itself, as illustrated
17722by the following library program.
17723It arranges to call two user-supplied functions, @code{beginfile} and
17724@code{endfile}, at the beginning and end of each @value{DF}.
17725Besides solving the problem in only nine(!) lines of code, it does so
17726@emph{portably}; this works with any implementation of @command{awk}:
17727
17728@example
17729# transfile.awk
17730#
17731# Give the user a hook for filename transitions
17732#
17733# The user must supply functions beginfile() and endfile()
17734# that each take the name of the file being started or
17735# finished, respectively.
17736@c #
17737@c # Arnold Robbins, arnold@@gnu.org, Public Domain
17738@c # January 1992
17739
17740FILENAME != _oldfilename \
17741@{
17742    if (_oldfilename != "")
17743        endfile(_oldfilename)
17744    _oldfilename = FILENAME
17745    beginfile(FILENAME)
17746@}
17747
17748END   @{ endfile(FILENAME) @}
17749@end example
17750
17751This file must be loaded before the user's ``main'' program, so that the
17752rule it supplies is executed first.
17753
17754This rule relies on @command{awk}'s @code{FILENAME} variable that
17755automatically changes for each new @value{DF}.  The current @value{FN} is
17756saved in a private variable, @code{_oldfilename}.  If @code{FILENAME} does
17757not equal @code{_oldfilename}, then a new @value{DF} is being processed and
17758it is necessary to call @code{endfile} for the old file.  Because
17759@code{endfile} should only be called if a file has been processed, the
17760program first checks to make sure that @code{_oldfilename} is not the null
17761string.  The program then assigns the current @value{FN} to
17762@code{_oldfilename} and calls @code{beginfile} for the file.
17763Because, like all @command{awk} variables, @code{_oldfilename} is
17764initialized to the null string, this rule executes correctly even for the
17765first @value{DF}.
17766
17767The program also supplies an @code{END} rule to do the final processing for
17768the last file.  Because this @code{END} rule comes before any @code{END} rules
17769supplied in the ``main'' program, @code{endfile} is called first.  Once
17770again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
17771
17772@cindex @code{beginfile} user-defined function
17773@cindex @code{endfile} user-defined function
17774This version has same problem as the first version of @code{nextfile}
17775(@pxref{Nextfile Function}).
17776If the same @value{DF} occurs twice in a row on the command line, then
17777@code{endfile} and @code{beginfile} are not executed at the end of the
17778first pass and at the beginning of the second pass.
17779The following version solves the problem:
17780
17781@example
17782@c file eg/lib/ftrans.awk
17783# ftrans.awk --- handle data file transitions
17784#
17785# user supplies beginfile() and endfile() functions
17786@c endfile
17787@ignore
17788@c file eg/lib/ftrans.awk
17789#
17790# Arnold Robbins, arnold@@gnu.org, Public Domain
17791# November 1992
17792
17793@c endfile
17794@end ignore
17795@c file eg/lib/ftrans.awk
17796FNR == 1 @{
17797    if (_filename_ != "")
17798        endfile(_filename_)
17799    _filename_ = FILENAME
17800    beginfile(FILENAME)
17801@}
17802
17803END  @{ endfile(_filename_) @}
17804@c endfile
17805@end example
17806
17807@ref{Wc Program},
17808shows how this library function can be used and
17809how it simplifies writing the main program.
17810
17811@node Rewind Function
17812@subsection Rereading the Current File
17813
17814@cindex files, reading
17815Another request for a new built-in function was for a @code{rewind}
17816function that would make it possible to reread the current file.
17817The requesting user didn't want to have to use @code{getline}
17818(@pxref{Getline})
17819inside a loop.
17820
17821However, as long as you are not in the @code{END} rule, it is
17822quite easy to arrange to immediately close the current input file
17823and then start over with it from the top.
17824For lack of a better name, we'll call it @code{rewind}:
17825
17826@cindex @code{rewind} user-defined function
17827@example
17828@c file eg/lib/rewind.awk
17829# rewind.awk --- rewind the current file and start over
17830@c endfile
17831@ignore
17832@c file eg/lib/rewind.awk
17833#
17834# Arnold Robbins, arnold@@gnu.org, Public Domain
17835# September 2000
17836
17837@c endfile
17838@end ignore
17839@c file eg/lib/rewind.awk
17840function rewind(    i)
17841@{
17842    # shift remaining arguments up
17843    for (i = ARGC; i > ARGIND; i--)
17844        ARGV[i] = ARGV[i-1]
17845
17846    # make sure gawk knows to keep going
17847    ARGC++
17848
17849    # make current file next to get done
17850    ARGV[ARGIND+1] = FILENAME
17851
17852    # do it
17853    nextfile
17854@}
17855@c endfile
17856@end example
17857
17858This code relies on the @code{ARGIND} variable
17859(@pxref{Auto-set}),
17860which is specific to @command{gawk}.
17861If you are not using
17862@command{gawk}, you can use ideas presented in
17863@ifnotinfo
17864the previous @value{SECTION}
17865@end ifnotinfo
17866@ifinfo
17867@ref{Filetrans Function},
17868@end ifinfo
17869to either update @code{ARGIND} on your own
17870or modify this code as appropriate.
17871
17872The @code{rewind} function also relies on the @code{nextfile} keyword
17873(@pxref{Nextfile Statement}).
17874@xref{Nextfile Function},
17875for a function version of @code{nextfile}.
17876
17877@node File Checking
17878@subsection Checking for Readable @value{DDF}s
17879
17880@cindex troubleshooting, readable @value{DF}s
17881@c comma is part of primary
17882@cindex readable @value{DF}s, checking
17883@cindex files, skipping
17884Normally, if you give @command{awk} a @value{DF} that isn't readable,
17885it stops with a fatal error.  There are times when you
17886might want to just ignore such files and keep going.  You can
17887do this by prepending the following program to your @command{awk}
17888program:
17889
17890@cindex @code{readable.awk} program
17891@example
17892@c file eg/lib/readable.awk
17893# readable.awk --- library file to skip over unreadable files
17894@c endfile
17895@ignore
17896@c file eg/lib/readable.awk
17897#
17898# Arnold Robbins, arnold@@gnu.org, Public Domain
17899# October 2000
17900
17901@c endfile
17902@end ignore
17903@c file eg/lib/readable.awk
17904BEGIN @{
17905    for (i = 1; i < ARGC; i++) @{
17906        if (ARGV[i] ~ /^[A-Za-z_][A-Za-z0-9_]*=.*/ \
17907            || ARGV[i] == "-")
17908            continue    # assignment or standard input
17909        else if ((getline junk < ARGV[i]) < 0) # unreadable
17910            delete ARGV[i]
17911        else
17912            close(ARGV[i])
17913    @}
17914@}
17915@c endfile
17916@end example
17917
17918@cindex troubleshooting, @code{getline} function
17919In @command{gawk}, the @code{getline} won't be fatal (unless
17920@option{--posix} is in force).
17921Removing the element from @code{ARGV} with @code{delete}
17922skips the file (since it's no longer in the list).
17923
17924@c This doesn't handle /dev/stdin etc.  Not worth the hassle to mention or fix.
17925
17926@node Empty Files
17927@subsection Checking For Zero-length Files
17928
17929All known @command{awk} implementations silently skip over zero-length files.
17930This is a by-product of @command{awk}'s implicit
17931read-a-record-and-match-against-the-rules loop: when @command{awk}
17932tries to read a record from an empty file, it immediately receives an
17933end of file indication, closes the file, and proceeds on to the next
17934command-line @value{DF}, @emph{without} executing any user-level
17935@command{awk} program code.
17936
17937Using @command{gawk}'s @code{ARGIND} variable
17938(@pxref{Built-in Variables}), it is possible to detect when an empty
17939@value{DF} has been skipped.  Similar to the library file presented
17940in @ref{Filetrans Function}, the following library file calls a function named
17941@code{zerofile} that the user must provide.  The arguments passed are
17942the @value{FN} and the position in @code{ARGV} where it was found:
17943
17944@cindex @code{zerofile.awk} program
17945@example
17946@c file eg/lib/zerofile.awk
17947# zerofile.awk --- library file to process empty input files
17948@c endfile
17949@ignore
17950@c file eg/lib/zerofile.awk
17951#
17952# Arnold Robbins, arnold@@gnu.org, Public Domain
17953# June 2003
17954
17955@c endfile
17956@end ignore
17957@c file eg/lib/zerofile.awk
17958BEGIN @{ Argind = 0 @}
17959
17960ARGIND > Argind + 1 @{
17961    for (Argind++; Argind < ARGIND; Argind++)
17962        zerofile(ARGV[Argind], Argind)
17963@}
17964
17965ARGIND != Argind @{ Argind = ARGIND @}
17966
17967END @{
17968    if (ARGIND > Argind)
17969        for (Argind++; Argind <= ARGIND; Argind++)
17970            zerofile(ARGV[Argind], Argind)
17971@}
17972@c endfile
17973@end example
17974
17975The user-level variable @code{Argind} allows the @command{awk} program
17976to track its progress through @code{ARGV}.  Whenever the program detects
17977that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or
17978more empty files were skipped.  The action then calls @code{zerofile} for
17979each such file, incrementing @code{Argind} along the way.
17980
17981The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date
17982in the normal case.
17983
17984Finally, the @code{END} rule catches the case of any empty files at
17985the end of the command-line arguments.  Note that the test in the
17986condition of the @code{for} loop uses the @samp{<=} operator,
17987not @code{<}.
17988
17989As an exercise, you might consider whether this same problem can
17990be solved without relying on @command{gawk}'s @code{ARGIND} variable.
17991
17992As a second exercise, revise this code to handle the case where
17993an intervening value in @code{ARGV} is a variable assignment.
17994
17995@ignore
17996# zerofile2.awk --- same thing, portably
17997BEGIN @{
17998    ARGIND = Argind = 0
17999    for (i = 1; i < ARGC; i++)
18000        Fnames[ARGV[i]]++
18001
18002@}
18003FNR == 1 @{
18004    while (ARGV[ARGIND] != FILENAME)
18005        ARGIND++
18006    Seen[FILENAME]++
18007    if (Seen[FILENAME] == Fnames[FILENAME])
18008        do
18009            ARGIND++
18010        while (ARGV[ARGIND] != FILENAME)
18011@}
18012ARGIND > Argind + 1 @{
18013    for (Argind++; Argind < ARGIND; Argind++)
18014        zerofile(ARGV[Argind], Argind)
18015@}
18016ARGIND != Argind @{
18017    Argind = ARGIND
18018@}
18019END @{
18020    if (ARGIND < ARGC - 1)
18021        ARGIND = ARGC - 1
18022    if (ARGIND > Argind)
18023        for (Argind++; Argind <= ARGIND; Argind++)
18024            zerofile(ARGV[Argind], Argind)
18025@}
18026@end ignore
18027
18028@node Ignoring Assigns
18029@subsection Treating Assignments as @value{FFN}s
18030
18031@cindex assignments as filenames
18032@cindex filenames, assignments as
18033Occasionally, you might not want @command{awk} to process command-line
18034variable assignments
18035(@pxref{Assignment Options}).
18036In particular, if you have @value{FN}s that contain an @samp{=} character,
18037@command{awk} treats the @value{FN} as an assignment, and does not process it.
18038
18039Some users have suggested an additional command-line option for @command{gawk}
18040to disable command-line assignments.  However, some simple programming with
18041a library file does the trick:
18042
18043@cindex @code{noassign.awk} program
18044@example
18045@c file eg/lib/noassign.awk
18046# noassign.awk --- library file to avoid the need for a
18047# special option that disables command-line assignments
18048@c endfile
18049@ignore
18050@c file eg/lib/noassign.awk
18051#
18052# Arnold Robbins, arnold@@gnu.org, Public Domain
18053# October 1999
18054
18055@c endfile
18056@end ignore
18057@c file eg/lib/noassign.awk
18058function disable_assigns(argc, argv,    i)
18059@{
18060    for (i = 1; i < argc; i++)
18061        if (argv[i] ~ /^[A-Za-z_][A-Za-z_0-9]*=.*/)
18062            argv[i] = ("./" argv[i])
18063@}
18064
18065BEGIN @{
18066    if (No_command_assign)
18067        disable_assigns(ARGC, ARGV)
18068@}
18069@c endfile
18070@end example
18071
18072You then run your program this way:
18073
18074@example
18075awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
18076@end example
18077
18078The function works by looping through the arguments.
18079It prepends @samp{./} to
18080any argument that matches the form
18081of a variable assignment, turning that argument into a @value{FN}.
18082
18083The use of @code{No_command_assign} allows you to disable command-line
18084assignments at invocation time, by giving the variable a true value.
18085When not set, it is initially zero (i.e., false), so the command-line arguments
18086are left alone.
18087@c ENDOFRANGE dataf
18088@c ENDOFRANGE flibdataf
18089@c ENDOFRANGE libfdataf
18090
18091@node Getopt Function
18092@section Processing Command-Line Options
18093
18094@c STARTOFRANGE libfclo
18095@cindex libraries of @command{awk} functions, command-line options
18096@c STARTOFRANGE flibclo
18097@cindex functions, library, command-line options
18098@c STARTOFRANGE clop
18099@cindex command-line options, processing
18100@c STARTOFRANGE oclp
18101@cindex options, command-line, processing
18102@c STARTOFRANGE clibf
18103@cindex functions, library, C library
18104@cindex arguments, processing
18105Most utilities on POSIX compatible systems take options, or ``switches,'' on
18106the command line that can be used to change the way a program behaves.
18107@command{awk} is an example of such a program
18108(@pxref{Options}).
18109Often, options take @dfn{arguments}; i.e., data that the program needs to
18110correctly obey the command-line option.  For example, @command{awk}'s
18111@option{-F} option requires a string to use as the field separator.
18112The first occurrence on the command line of either @option{--} or a
18113string that does not begin with @samp{-} ends the options.
18114
18115@cindex @code{getopt} function (C library)
18116Modern Unix systems provide a C function named @code{getopt} for processing
18117command-line arguments.  The programmer provides a string describing the
18118one-letter options. If an option requires an argument, it is followed in the
18119string with a colon.  @code{getopt} is also passed the
18120count and values of the command-line arguments and is called in a loop.
18121@code{getopt} processes the command-line arguments for option letters.
18122Each time around the loop, it returns a single character representing the
18123next option letter that it finds, or @samp{?} if it finds an invalid option.
18124When it returns @minus{}1, there are no options left on the command line.
18125
18126When using @code{getopt}, options that do not take arguments can be
18127grouped together.  Furthermore, options that take arguments require that the
18128argument is present.  The argument can immediately follow the option letter,
18129or it can be a separate command-line argument.
18130
18131Given a hypothetical program that takes
18132three command-line options, @option{-a}, @option{-b}, and @option{-c}, where
18133@option{-b} requires an argument, all of the following are valid ways of
18134invoking the program:
18135
18136@example
18137prog -a -b foo -c data1 data2 data3
18138prog -ac -bfoo -- data1 data2 data3
18139prog -acbfoo data1 data2 data3
18140@end example
18141
18142Notice that when the argument is grouped with its option, the rest of
18143the argument is considered to be the option's argument.
18144In this example, @option{-acbfoo} indicates that all of the
18145@option{-a}, @option{-b}, and @option{-c} options were supplied,
18146and that @samp{foo} is the argument to the @option{-b} option.
18147
18148@code{getopt} provides four external variables that the programmer can use:
18149
18150@table @code
18151@item optind
18152The index in the argument value array (@code{argv}) where the first
18153nonoption command-line argument can be found.
18154
18155@item optarg
18156The string value of the argument to an option.
18157
18158@item opterr
18159Usually @code{getopt} prints an error message when it finds an invalid
18160option.  Setting @code{opterr} to zero disables this feature.  (An
18161application might want to print its own error message.)
18162
18163@item optopt
18164The letter representing the command-line option.
18165@c While not usually documented, most versions supply this variable.
18166@end table
18167
18168The following C fragment shows how @code{getopt} might process command-line
18169arguments for @command{awk}:
18170
18171@example
18172int
18173main(int argc, char *argv[])
18174@{
18175    @dots{}
18176    /* print our own message */
18177    opterr = 0;
18178    while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) @{
18179        switch (c) @{
18180        case 'f':    /* file */
18181            @dots{}
18182            break;
18183        case 'F':    /* field separator */
18184            @dots{}
18185            break;
18186        case 'v':    /* variable assignment */
18187            @dots{}
18188            break;
18189        case 'W':    /* extension */
18190            @dots{}
18191            break;
18192        case '?':
18193        default:
18194            usage();
18195            break;
18196        @}
18197    @}
18198    @dots{}
18199@}
18200@end example
18201
18202As a side point, @command{gawk} actually uses the GNU @code{getopt_long}
18203function to process both normal and GNU-style long options
18204(@pxref{Options}).
18205
18206The abstraction provided by @code{getopt} is very useful and is quite
18207handy in @command{awk} programs as well.  Following is an @command{awk}
18208version of @code{getopt}.  This function highlights one of the
18209greatest weaknesses in @command{awk}, which is that it is very poor at
18210manipulating single characters.  Repeated calls to @code{substr} are
18211necessary for accessing individual characters
18212(@pxref{String Functions}).@footnote{This
18213function was written before @command{gawk} acquired the ability to
18214split strings into single characters using @code{""} as the separator.
18215We have left it alone, since using @code{substr} is more portable.}
18216
18217The discussion that follows walks through the code a bit at a time:
18218
18219@cindex @code{getopt} user-defined function
18220@example
18221@c file eg/lib/getopt.awk
18222# getopt.awk --- do C library getopt(3) function in awk
18223@c endfile
18224@ignore
18225@c file eg/lib/getopt.awk
18226#
18227# Arnold Robbins, arnold@@gnu.org, Public Domain
18228#
18229# Initial version: March, 1991
18230# Revised: May, 1993
18231
18232@c endfile
18233@end ignore
18234@c file eg/lib/getopt.awk
18235# External variables:
18236#    Optind -- index in ARGV of first nonoption argument
18237#    Optarg -- string value of argument to current option
18238#    Opterr -- if nonzero, print our own diagnostic
18239#    Optopt -- current option letter
18240
18241# Returns:
18242#    -1     at end of options
18243#    ?      for unrecognized option
18244#    <c>    a character representing the current option
18245
18246# Private Data:
18247#    _opti  -- index in multi-flag option, e.g., -abc
18248@c endfile
18249@end example
18250
18251The function starts out with
18252a list of the global variables it uses,
18253what the return values are, what they mean, and any global variables that
18254are ``private'' to this library function.  Such documentation is essential
18255for any program, and particularly for library functions.
18256
18257The @code{getopt} function first checks that it was indeed called with a string of options
18258(the @code{options} parameter).  If @code{options} has a zero length,
18259@code{getopt} immediately returns @minus{}1:
18260
18261@cindex @code{getopt} user-defined function
18262@example
18263@c file eg/lib/getopt.awk
18264function getopt(argc, argv, options,    thisopt, i)
18265@{
18266    if (length(options) == 0)    # no options given
18267        return -1
18268
18269@group
18270    if (argv[Optind] == "--") @{  # all done
18271        Optind++
18272        _opti = 0
18273        return -1
18274@end group
18275    @} else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) @{
18276        _opti = 0
18277        return -1
18278    @}
18279@c endfile
18280@end example
18281
18282The next thing to check for is the end of the options.  A @option{--}
18283ends the command-line options, as does any command-line argument that
18284does not begin with a @samp{-}.  @code{Optind} is used to step through
18285the array of command-line arguments; it retains its value across calls
18286to @code{getopt}, because it is a global variable.
18287
18288The regular expression that is used, @code{@w{/^-[^: \t\n\f\r\v\b]/}}, is
18289perhaps a bit of overkill; it checks for a @samp{-} followed by anything
18290that is not whitespace and not a colon.
18291If the current command-line argument does not match this pattern,
18292it is not an option, and it ends option processing:
18293
18294@example
18295@c file eg/lib/getopt.awk
18296    if (_opti == 0)
18297        _opti = 2
18298    thisopt = substr(argv[Optind], _opti, 1)
18299    Optopt = thisopt
18300    i = index(options, thisopt)
18301    if (i == 0) @{
18302        if (Opterr)
18303            printf("%c -- invalid option\n",
18304                                  thisopt) > "/dev/stderr"
18305        if (_opti >= length(argv[Optind])) @{
18306            Optind++
18307            _opti = 0
18308        @} else
18309            _opti++
18310        return "?"
18311    @}
18312@c endfile
18313@end example
18314
18315The @code{_opti} variable tracks the position in the current command-line
18316argument (@code{argv[Optind]}).  If multiple options are
18317grouped together with one @samp{-} (e.g., @option{-abx}), it is necessary
18318to return them to the user one at a time.
18319
18320If @code{_opti} is equal to zero, it is set to two, which is the index in
18321the string of the next character to look at (we skip the @samp{-}, which
18322is at position one).  The variable @code{thisopt} holds the character,
18323obtained with @code{substr}.  It is saved in @code{Optopt} for the main
18324program to use.
18325
18326If @code{thisopt} is not in the @code{options} string, then it is an
18327invalid option.  If @code{Opterr} is nonzero, @code{getopt} prints an error
18328message on the standard error that is similar to the message from the C
18329version of @code{getopt}.
18330
18331Because the option is invalid, it is necessary to skip it and move on to the
18332next option character.  If @code{_opti} is greater than or equal to the
18333length of the current command-line argument, it is necessary to move on
18334to the next argument, so @code{Optind} is incremented and @code{_opti} is reset
18335to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
18336incremented.
18337
18338In any case, because the option is invalid, @code{getopt} returns @samp{?}.
18339The main program can examine @code{Optopt} if it needs to know what the
18340invalid option letter actually is. Continuing on:
18341
18342@example
18343@c file eg/lib/getopt.awk
18344    if (substr(options, i + 1, 1) == ":") @{
18345        # get option argument
18346        if (length(substr(argv[Optind], _opti + 1)) > 0)
18347            Optarg = substr(argv[Optind], _opti + 1)
18348        else
18349            Optarg = argv[++Optind]
18350        _opti = 0
18351    @} else
18352        Optarg = ""
18353@c endfile
18354@end example
18355
18356If the option requires an argument, the option letter is followed by a colon
18357in the @code{options} string.  If there are remaining characters in the
18358current command-line argument (@code{argv[Optind]}), then the rest of that
18359string is assigned to @code{Optarg}.  Otherwise, the next command-line
18360argument is used (@samp{-xFOO} versus @samp{@w{-x FOO}}). In either case,
18361@code{_opti} is reset to zero, because there are no more characters left to
18362examine in the current command-line argument. Continuing:
18363
18364@example
18365@c file eg/lib/getopt.awk
18366    if (_opti == 0 || _opti >= length(argv[Optind])) @{
18367        Optind++
18368        _opti = 0
18369    @} else
18370        _opti++
18371    return thisopt
18372@}
18373@c endfile
18374@end example
18375
18376Finally, if @code{_opti} is either zero or greater than the length of the
18377current command-line argument, it means this element in @code{argv} is
18378through being processed, so @code{Optind} is incremented to point to the
18379next element in @code{argv}.  If neither condition is true, then only
18380@code{_opti} is incremented, so that the next option letter can be processed
18381on the next call to @code{getopt}.
18382
18383The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
18384@code{Opterr} is set to one, since the default behavior is for @code{getopt}
18385to print a diagnostic message upon seeing an invalid option.  @code{Optind}
18386is set to one, since there's no reason to look at the program name, which is
18387in @code{ARGV[0]}:
18388
18389@example
18390@c file eg/lib/getopt.awk
18391BEGIN @{
18392    Opterr = 1    # default is to diagnose
18393    Optind = 1    # skip ARGV[0]
18394
18395    # test program
18396    if (_getopt_test) @{
18397        while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
18398            printf("c = <%c>, optarg = <%s>\n",
18399                                       _go_c, Optarg)
18400        printf("non-option arguments:\n")
18401        for (; Optind < ARGC; Optind++)
18402            printf("\tARGV[%d] = <%s>\n",
18403                                    Optind, ARGV[Optind])
18404    @}
18405@}
18406@c endfile
18407@end example
18408
18409The rest of the @code{BEGIN} rule is a simple test program.  Here is the
18410result of two sample runs of the test program:
18411
18412@example
18413$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
18414@print{} c = <a>, optarg = <>
18415@print{} c = <c>, optarg = <>
18416@print{} c = <b>, optarg = <ARG>
18417@print{} non-option arguments:
18418@print{}         ARGV[3] = <bax>
18419@print{}         ARGV[4] = <-x>
18420
18421$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
18422@print{} c = <a>, optarg = <>
18423@error{} x -- invalid option
18424@print{} c = <?>, optarg = <>
18425@print{} non-option arguments:
18426@print{}         ARGV[4] = <xyz>
18427@print{}         ARGV[5] = <abc>
18428@end example
18429
18430In both runs,
18431the first @option{--} terminates the arguments to @command{awk}, so that it does
18432not try to interpret the @option{-a}, etc., as its own options.
18433Several of the sample programs presented in
18434@ref{Sample Programs},
18435use @code{getopt} to process their arguments.
18436@c ENDOFRANGE libfclo
18437@c ENDOFRANGE flibclo
18438@c ENDOFRANGE clop
18439@c ENDOFRANGE oclp
18440
18441@node Passwd Functions
18442@section Reading the User Database
18443
18444@c STARTOFRANGE libfudata
18445@cindex libraries of @command{awk} functions, user database, reading
18446@c STARTOFRANGE flibudata
18447@cindex functions, library, user database, reading
18448@c last comma is part of primary
18449@c STARTOFRANGE udatar
18450@cindex user database, reading
18451@c last comma is part of secondary
18452@c STARTOFRANGE dataur
18453@cindex database, users, reading
18454@cindex @code{PROCINFO} array
18455The @code{PROCINFO} array
18456(@pxref{Built-in Variables})
18457provides access to the current user's real and effective user and group ID
18458numbers, and if available, the user's supplementary group set.
18459However, because these are numbers, they do not provide very useful
18460information to the average user.  There needs to be some way to find the
18461user information associated with the user and group ID numbers.  This
18462@value{SECTION} presents a suite of functions for retrieving information from the
18463user database.  @xref{Group Functions},
18464for a similar suite that retrieves information from the group database.
18465
18466@cindex @code{getpwent} function (C library)
18467@cindex @code{getpwent} user-defined function
18468@cindex users, information about, retrieving
18469@cindex login information
18470@cindex account information
18471@cindex password file
18472@cindex files, password
18473The POSIX standard does not define the file where user information is
18474kept.  Instead, it provides the @code{<pwd.h>} header file
18475and several C language subroutines for obtaining user information.
18476The primary function is @code{getpwent}, for ``get password entry.''
18477The ``password'' comes from the original user database file,
18478@file{/etc/passwd}, which stores user information, along with the
18479encrypted passwords (hence the name).
18480
18481@cindex @command{pwcat} program
18482While an @command{awk} program could simply read @file{/etc/passwd}
18483directly, this file may not contain complete information about the
18484system's set of users.@footnote{It is often the case that password
18485information is stored in a network database.} To be sure you are able to
18486produce a readable and complete version of the user database, it is necessary
18487to write a small C program that calls @code{getpwent}.  @code{getpwent}
18488is defined as returning a pointer to a @code{struct passwd}.  Each time it
18489is called, it returns the next entry in the database.  When there are
18490no more entries, it returns @code{NULL}, the null pointer.  When this
18491happens, the C program should call @code{endpwent} to close the database.
18492Following is @command{pwcat}, a C program that ``cats'' the password database:
18493
18494@c Use old style function header for portability to old systems (SunOS, HP/UX).
18495
18496@example
18497@c file eg/lib/pwcat.c
18498/*
18499 * pwcat.c
18500 *
18501 * Generate a printable version of the password database
18502 */
18503@c endfile
18504@ignore
18505@c file eg/lib/pwcat.c
18506/*
18507 * Arnold Robbins, arnold@@gnu.org, May 1993
18508 * Public Domain
18509 */
18510
18511#if HAVE_CONFIG_H
18512#include <config.h>
18513#endif
18514
18515@c endfile
18516@end ignore
18517@c file eg/lib/pwcat.c
18518#include <stdio.h>
18519#include <pwd.h>
18520
18521@c endfile
18522@ignore
18523@c file eg/lib/pwcat.c
18524#if defined (STDC_HEADERS)
18525#include <stdlib.h>
18526#endif
18527
18528@c endfile
18529@end ignore
18530@c file eg/lib/pwcat.c
18531int
18532main(argc, argv)
18533int argc;
18534char **argv;
18535@{
18536    struct passwd *p;
18537
18538    while ((p = getpwent()) != NULL)
18539        printf("%s:%s:%ld:%ld:%s:%s:%s\n",
18540            p->pw_name, p->pw_passwd, (long) p->pw_uid,
18541            (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
18542
18543    endpwent();
18544    return 0;
18545@}
18546@c endfile
18547@end example
18548
18549If you don't understand C, don't worry about it.
18550The output from @command{pwcat} is the user database, in the traditional
18551@file{/etc/passwd} format of colon-separated fields.  The fields are:
18552
18553@ignore
18554@table @asis
18555@item Login name
18556The user's login name.
18557
18558@item Encrypted password
18559The user's encrypted password.  This may not be available on some systems.
18560
18561@item User-ID
18562The user's numeric user ID number.
18563(On some systems it's a C @code{long}, and not an @code{int}.  Thus
18564we cast it to @code{long} for all cases.)
18565
18566@item Group-ID
18567The user's numeric group ID number.
18568(Similar comments about @code{long} vs.@: @code{int} apply here.)
18569
18570@item Full name
18571The user's full name, and perhaps other information associated with the
18572user.
18573
18574@item Home directory
18575The user's login (or ``home'') directory (familiar to shell programmers as
18576@code{$HOME}).
18577
18578@item Login shell
18579The program that is run when the user logs in.  This is usually a
18580shell, such as @command{bash}.
18581@end table
18582@end ignore
18583
18584@multitable {Encrypted password} {1234567890123456789012345678901234567890123456}
18585@item Login name @tab The user's login name.
18586
18587@item Encrypted password @tab The user's encrypted password.  This may not be available on some systems.
18588
18589@item User-ID @tab The user's numeric user ID number.
18590
18591@item Group-ID @tab The user's numeric group ID number.
18592
18593@item Full name @tab The user's full name, and perhaps other information associated with the
18594user.
18595
18596@item Home directory @tab The user's login (or ``home'') directory (familiar to shell programmers as
18597@code{$HOME}).
18598
18599@item Login shell @tab The program that is run when the user logs in.  This is usually a
18600shell, such as @command{bash}.
18601@end multitable
18602
18603A few lines representative of @command{pwcat}'s output are as follows:
18604
18605@cindex Jacobs, Andrew
18606@cindex Robbins, Arnold
18607@cindex Robbins, Miriam
18608@example
18609$ pwcat
18610@print{} root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
18611@print{} nobody:*:65534:65534::/:
18612@print{} daemon:*:1:1::/:
18613@print{} sys:*:2:2::/:/bin/csh
18614@print{} bin:*:3:3::/bin:
18615@print{} arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
18616@print{} miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
18617@print{} andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
18618@dots{}
18619@end example
18620
18621With that introduction, following is a group of functions for getting user
18622information.  There are several functions here, corresponding to the C
18623functions of the same names:
18624
18625@c Exercise: simplify all these functions that return values.
18626@c Answer: return foo[key] returns "" if key not there, no need to check with `in'.
18627
18628@cindex @code{_pw_init} user-defined function
18629@example
18630@c file eg/lib/passwdawk.in
18631# passwd.awk --- access password file information
18632@c endfile
18633@ignore
18634@c file eg/lib/passwdawk.in
18635#
18636# Arnold Robbins, arnold@@gnu.org, Public Domain
18637# May 1993
18638# Revised October 2000
18639
18640@c endfile
18641@end ignore
18642@c file eg/lib/passwdawk.in
18643BEGIN @{
18644    # tailor this to suit your system
18645    _pw_awklib = "/usr/local/libexec/awk/"
18646@}
18647
18648function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw)
18649@{
18650    if (_pw_inited)
18651        return
18652
18653    oldfs = FS
18654    oldrs = RS
18655    olddol0 = $0
18656    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
18657    FS = ":"
18658    RS = "\n"
18659
18660    pwcat = _pw_awklib "pwcat"
18661    while ((pwcat | getline) > 0) @{
18662        _pw_byname[$1] = $0
18663        _pw_byuid[$3] = $0
18664        _pw_bycount[++_pw_total] = $0
18665    @}
18666    close(pwcat)
18667    _pw_count = 0
18668    _pw_inited = 1
18669    FS = oldfs
18670    if (using_fw)
18671        FIELDWIDTHS = FIELDWIDTHS
18672    RS = oldrs
18673    $0 = olddol0
18674@}
18675@c endfile
18676@end example
18677
18678@cindex @code{BEGIN} pattern, @code{pwcat} program
18679The @code{BEGIN} rule sets a private variable to the directory where
18680@command{pwcat} is stored.  Because it is used to help out an @command{awk} library
18681routine, we have chosen to put it in @file{/usr/local/libexec/awk};
18682however, you might want it to be in a different directory on your system.
18683
18684The function @code{_pw_init} keeps three copies of the user information
18685in three associative arrays.  The arrays are indexed by username
18686(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
18687occurrence (@code{_pw_bycount}).
18688The variable @code{_pw_inited} is used for efficiency; @code{_pw_init}
18689needs only to be called once.
18690
18691@cindex @code{getline} command, @code{_pw_init} function
18692Because this function uses @code{getline} to read information from
18693@command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}.
18694It notes in the variable @code{using_fw} whether field splitting
18695with @code{FIELDWIDTHS} is in effect or not.
18696Doing so is necessary, since these functions could be called
18697from anywhere within a user's program, and the user may have his
18698or her
18699own way of splitting records and fields.
18700
18701The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
18702is @code{"FIELDWIDTHS"} if field splitting is being done with
18703@code{FIELDWIDTHS}.  This makes it possible to restore the correct
18704field-splitting mechanism later.  The test can only be true for
18705@command{gawk}.  It is false if using @code{FS} or on some other
18706@command{awk} implementation.
18707
18708The main part of the function uses a loop to read database lines, split
18709the line into fields, and then store the line into each array as necessary.
18710When the loop is done, @code{@w{_pw_init}} cleans up by closing the pipeline,
18711setting @code{@w{_pw_inited}} to one, and restoring @code{FS} (and @code{FIELDWIDTHS}
18712if necessary), @code{RS}, and @code{$0}.
18713The use of @code{@w{_pw_count}} is explained shortly.
18714
18715@c NEXT ED: All of these functions don't need the ... in ... test.  Just
18716@c return the array element, which will be "" if not already there.  Duh.
18717@cindex @code{getpwnam} function (C library)
18718The @code{getpwnam} function takes a username as a string argument. If that
18719user is in the database, it returns the appropriate line. Otherwise, it
18720returns the null string:
18721
18722@cindex @code{getpwnam} user-defined function
18723@example
18724@group
18725@c file eg/lib/passwdawk.in
18726function getpwnam(name)
18727@{
18728    _pw_init()
18729    if (name in _pw_byname)
18730        return _pw_byname[name]
18731    return ""
18732@}
18733@c endfile
18734@end group
18735@end example
18736
18737@cindex @code{getpwuid} function (C library)
18738Similarly,
18739the @code{getpwuid} function takes a user ID number argument. If that
18740user number is in the database, it returns the appropriate line. Otherwise, it
18741returns the null string:
18742
18743@cindex @code{getpwuid} user-defined function
18744@example
18745@c file eg/lib/passwdawk.in
18746function getpwuid(uid)
18747@{
18748    _pw_init()
18749    if (uid in _pw_byuid)
18750        return _pw_byuid[uid]
18751    return ""
18752@}
18753@c endfile
18754@end example
18755
18756@cindex @code{getpwent} function (C library)
18757The @code{getpwent} function simply steps through the database, one entry at
18758a time.  It uses @code{_pw_count} to track its current position in the
18759@code{_pw_bycount} array:
18760
18761@cindex @code{getpwent} user-defined function
18762@example
18763@c file eg/lib/passwdawk.in
18764function getpwent()
18765@{
18766    _pw_init()
18767    if (_pw_count < _pw_total)
18768        return _pw_bycount[++_pw_count]
18769    return ""
18770@}
18771@c endfile
18772@end example
18773
18774@cindex @code{endpwent} function (C library)
18775The @code{@w{endpwent}} function resets @code{@w{_pw_count}} to zero, so that
18776subsequent calls to @code{getpwent} start over again:
18777
18778@cindex @code{endpwent} user-defined function
18779@example
18780@c file eg/lib/passwdawk.in
18781function endpwent()
18782@{
18783    _pw_count = 0
18784@}
18785@c endfile
18786@end example
18787
18788A conscious design decision in this suite was made that each subroutine calls
18789@code{@w{_pw_init}} to initialize the database arrays.  The overhead of running
18790a separate process to generate the user database, and the I/O to scan it,
18791are only incurred if the user's main program actually calls one of these
18792functions.  If this library file is loaded along with a user's program, but
18793none of the routines are ever called, then there is no extra runtime overhead.
18794(The alternative is move the body of @code{@w{_pw_init}} into a
18795@code{BEGIN} rule, which always runs @command{pwcat}.  This simplifies the
18796code but runs an extra process that may never be needed.)
18797
18798In turn, calling @code{_pw_init} is not too expensive, because the
18799@code{_pw_inited} variable keeps the program from reading the data more than
18800once.  If you are worried about squeezing every last cycle out of your
18801@command{awk} program, the check of @code{_pw_inited} could be moved out of
18802@code{_pw_init} and duplicated in all the other functions.  In practice,
18803this is not necessary, since most @command{awk} programs are I/O-bound, and it
18804clutters up the code.
18805
18806The @command{id} program in @ref{Id Program},
18807uses these functions.
18808@c ENDOFRANGE libfudata
18809@c ENDOFRANGE flibudata
18810@c ENDOFRANGE udatar
18811@c ENDOFRANGE dataur
18812
18813@node Group Functions
18814@section Reading the Group Database
18815
18816@c STARTOFRANGE libfgdata
18817@cindex libraries of @command{awk} functions, group database, reading
18818@c STARTOFRANGE flibgdata
18819@cindex functions, library, group database, reading
18820@c STARTOFRANGE gdatar
18821@cindex group database, reading
18822@c STARTOFRANGE datagr
18823@cindex database, group, reading
18824@cindex @code{PROCINFO} array
18825@cindex @code{getgrent} function (C library)
18826@cindex @code{getgrent} user-defined function
18827@c comma is part of primary
18828@cindex groups, information about
18829@cindex account information
18830@cindex group file
18831@cindex files, group
18832Much of the discussion presented in
18833@ref{Passwd Functions},
18834applies to the group database as well.  Although there has traditionally
18835been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
18836standard only provides a set of C library routines
18837(@code{<grp.h>} and @code{getgrent})
18838for accessing the information.
18839Even though this file may exist, it likely does not have
18840complete information.  Therefore, as with the user database, it is necessary
18841to have a small C program that generates the group database as its output.
18842
18843@cindex @command{grcat} program
18844@command{grcat}, a C program that ``cats'' the group database,
18845is as follows:
18846
18847@example
18848@c file eg/lib/grcat.c
18849/*
18850 * grcat.c
18851 *
18852 * Generate a printable version of the group database
18853 */
18854@c endfile
18855@ignore
18856@c file eg/lib/grcat.c
18857/*
18858 * Arnold Robbins, arnold@@gnu.org, May 1993
18859 * Public Domain
18860 */
18861
18862/* For OS/2, do nothing. */
18863#if HAVE_CONFIG_H
18864#include <config.h>
18865#endif
18866
18867#if defined (STDC_HEADERS)
18868#include <stdlib.h>
18869#endif
18870
18871#ifndef HAVE_GETGRENT
18872int main() { return 0; }
18873#else
18874@c endfile
18875@end ignore
18876@c file eg/lib/grcat.c
18877#include <stdio.h>
18878#include <grp.h>
18879
18880int
18881main(argc, argv)
18882int argc;
18883char **argv;
18884@{
18885    struct group *g;
18886    int i;
18887
18888    while ((g = getgrent()) != NULL) @{
18889        printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
18890                                     (long) g->gr_gid);
18891        for (i = 0; g->gr_mem[i] != NULL; i++) @{
18892            printf("%s", g->gr_mem[i]);
18893@group
18894            if (g->gr_mem[i+1] != NULL)
18895                putchar(',');
18896        @}
18897@end group
18898        putchar('\n');
18899    @}
18900    endgrent();
18901    return 0;
18902@}
18903@c endfile
18904@end example
18905@ignore
18906@c file eg/lib/grcat.c
18907#endif /* HAVE_GETGRENT */
18908@c endfile
18909@end ignore
18910
18911Each line in the group database represents one group.  The fields are
18912separated with colons and represent the following information:
18913
18914@ignore
18915@table @asis
18916@item Group Name
18917The name of the group.
18918
18919@item Group Password
18920The encrypted group password. In practice, this field is never used. It is
18921usually empty or set to @samp{*}.
18922
18923@item Group ID Number
18924The numeric group ID number. This number is unique within the file.
18925(On some systems it's a C @code{long}, and not an @code{int}.  Thus
18926we cast it to @code{long} for all cases.)
18927
18928@item Group Member List
18929A comma-separated list of usernames.  These users are members of the group.
18930Modern Unix systems allow users to be members of several groups
18931simultaneously.  If your system does, then there are elements
18932@code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO}
18933for those group ID numbers.
18934(Note that @code{PROCINFO} is a @command{gawk} extension;
18935@pxref{Built-in Variables}.)
18936@end table
18937@end ignore
18938
18939@multitable {Encrypted password} {1234567890123456789012345678901234567890123456}
18940@item Group name @tab The group's name.
18941
18942@item Group password @tab The group's encrypted password. In practice, this field is never used;
18943it is usually empty or set to @samp{*}.
18944
18945@item Group-ID @tab
18946The group's numeric group ID number; this number should be unique within the file.
18947
18948@item Group member list @tab
18949A comma-separated list of usernames.  These users are members of the group.
18950Modern Unix systems allow users to be members of several groups
18951simultaneously.  If your system does, then there are elements
18952@code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO}
18953for those group ID numbers.
18954(Note that @code{PROCINFO} is a @command{gawk} extension;
18955@pxref{Built-in Variables}.)
18956@end multitable
18957
18958Here is what running @command{grcat} might produce:
18959
18960@example
18961$ grcat
18962@print{} wheel:*:0:arnold
18963@print{} nogroup:*:65534:
18964@print{} daemon:*:1:
18965@print{} kmem:*:2:
18966@print{} staff:*:10:arnold,miriam,andy
18967@print{} other:*:20:
18968@dots{}
18969@end example
18970
18971Here are the functions for obtaining information from the group database.
18972There are several, modeled after the C library functions of the same names:
18973
18974@cindex @code{getline} command, @code{_gr_init} user-defined function
18975@cindex @code{_gr_init} user-defined function
18976@example
18977@c file eg/lib/groupawk.in
18978# group.awk --- functions for dealing with the group file
18979@c endfile
18980@ignore
18981@c file eg/lib/groupawk.in
18982#
18983# Arnold Robbins, arnold@@gnu.org, Public Domain
18984# May 1993
18985# Revised October 2000
18986
18987@c endfile
18988@end ignore
18989@c line break on _gr_init for smallbook
18990@c file eg/lib/groupawk.in
18991BEGIN    \
18992@{
18993    # Change to suit your system
18994    _gr_awklib = "/usr/local/libexec/awk/"
18995@}
18996
18997function _gr_init(    oldfs, oldrs, olddol0, grcat,
18998                             using_fw, n, a, i)
18999@{
19000    if (_gr_inited)
19001        return
19002
19003    oldfs = FS
19004    oldrs = RS
19005    olddol0 = $0
19006    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
19007    FS = ":"
19008    RS = "\n"
19009
19010    grcat = _gr_awklib "grcat"
19011    while ((grcat | getline) > 0) @{
19012        if ($1 in _gr_byname)
19013            _gr_byname[$1] = _gr_byname[$1] "," $4
19014        else
19015            _gr_byname[$1] = $0
19016        if ($3 in _gr_bygid)
19017            _gr_bygid[$3] = _gr_bygid[$3] "," $4
19018        else
19019            _gr_bygid[$3] = $0
19020
19021        n = split($4, a, "[ \t]*,[ \t]*")
19022        for (i = 1; i <= n; i++)
19023            if (a[i] in _gr_groupsbyuser)
19024                _gr_groupsbyuser[a[i]] = \
19025                    _gr_groupsbyuser[a[i]] " " $1
19026            else
19027                _gr_groupsbyuser[a[i]] = $1
19028
19029        _gr_bycount[++_gr_count] = $0
19030    @}
19031    close(grcat)
19032    _gr_count = 0
19033    _gr_inited++
19034    FS = oldfs
19035    if (using_fw)
19036        FIELDWIDTHS = FIELDWIDTHS
19037    RS = oldrs
19038    $0 = olddol0
19039@}
19040@c endfile
19041@end example
19042
19043The @code{BEGIN} rule sets a private variable to the directory where
19044@command{grcat} is stored.  Because it is used to help out an @command{awk} library
19045routine, we have chosen to put it in @file{/usr/local/libexec/awk}.  You might
19046want it to be in a different directory on your system.
19047
19048These routines follow the same general outline as the user database routines
19049(@pxref{Passwd Functions}).
19050The @code{@w{_gr_inited}} variable is used to
19051ensure that the database is scanned no more than once.
19052The @code{@w{_gr_init}} function first saves @code{FS}, @code{FIELDWIDTHS}, @code{RS}, and
19053@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
19054scanning the group information.
19055
19056The group information is stored is several associative arrays.
19057The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
19058(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
19059There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}),
19060which is a space-separated list of groups to which each user belongs.
19061
19062Unlike the user database, it is possible to have multiple records in the
19063database for the same group.  This is common when a group has a large number
19064of members.  A pair of such entries might look like the following:
19065
19066@example
19067tvpeople:*:101:johnny,jay,arsenio
19068tvpeople:*:101:david,conan,tom,joan
19069@end example
19070
19071For this reason, @code{_gr_init} looks to see if a group name or
19072group ID number is already seen.  If it is, then the usernames are
19073simply concatenated onto the previous list of users.  (There is actually a
19074subtle problem with the code just presented.  Suppose that
19075the first time there were no names. This code adds the names with
19076a leading comma. It also doesn't check that there is a @code{$4}.)
19077
19078Finally, @code{_gr_init} closes the pipeline to @command{grcat}, restores
19079@code{FS} (and @code{FIELDWIDTHS} if necessary), @code{RS}, and @code{$0},
19080initializes @code{_gr_count} to zero
19081(it is used later), and makes @code{_gr_inited} nonzero.
19082
19083@cindex @code{getgrnam} function (C library)
19084The @code{getgrnam} function takes a group name as its argument, and if that
19085group exists, it is returned. Otherwise, @code{getgrnam} returns the null
19086string:
19087
19088@cindex @code{getgrnam} user-defined function
19089@example
19090@c file eg/lib/groupawk.in
19091function getgrnam(group)
19092@{
19093    _gr_init()
19094    if (group in _gr_byname)
19095        return _gr_byname[group]
19096    return ""
19097@}
19098@c endfile
19099@end example
19100
19101@cindex @code{getgrgid} function (C library)
19102The @code{getgrgid} function is similar, it takes a numeric group ID and
19103looks up the information associated with that group ID:
19104
19105@cindex @code{getgrgid} user-defined function
19106@example
19107@c file eg/lib/groupawk.in
19108function getgrgid(gid)
19109@{
19110    _gr_init()
19111    if (gid in _gr_bygid)
19112        return _gr_bygid[gid]
19113    return ""
19114@}
19115@c endfile
19116@end example
19117
19118@cindex @code{getgruser} function (C library)
19119The @code{getgruser} function does not have a C counterpart. It takes a
19120username and returns the list of groups that have the user as a member:
19121
19122@cindex @code{getgruser} function, user-defined
19123@example
19124@c file eg/lib/groupawk.in
19125function getgruser(user)
19126@{
19127    _gr_init()
19128    if (user in _gr_groupsbyuser)
19129        return _gr_groupsbyuser[user]
19130    return ""
19131@}
19132@c endfile
19133@end example
19134
19135@cindex @code{getgrent} function (C library)
19136The @code{getgrent} function steps through the database one entry at a time.
19137It uses @code{_gr_count} to track its position in the list:
19138
19139@cindex @code{getgrent} user-defined function
19140@example
19141@c file eg/lib/groupawk.in
19142function getgrent()
19143@{
19144    _gr_init()
19145    if (++_gr_count in _gr_bycount)
19146        return _gr_bycount[_gr_count]
19147    return ""
19148@}
19149@c endfile
19150@end example
19151@c ENDOFRANGE clibf
19152
19153@cindex @code{endgrent} function (C library)
19154The @code{endgrent} function resets @code{_gr_count} to zero so that @code{getgrent} can
19155start over again:
19156
19157@cindex @code{endgrent} user-defined function
19158@example
19159@c file eg/lib/groupawk.in
19160function endgrent()
19161@{
19162    _gr_count = 0
19163@}
19164@c endfile
19165@end example
19166
19167As with the user database routines, each function calls @code{_gr_init} to
19168initialize the arrays.  Doing so only incurs the extra overhead of running
19169@command{grcat} if these functions are used (as opposed to moving the body of
19170@code{_gr_init} into a @code{BEGIN} rule).
19171
19172Most of the work is in scanning the database and building the various
19173associative arrays.  The functions that the user calls are themselves very
19174simple, relying on @command{awk}'s associative arrays to do work.
19175
19176The @command{id} program in @ref{Id Program},
19177uses these functions.
19178@c ENDOFRANGE libfgdata
19179@c ENDOFRANGE flibgdata
19180@c ENDOFRANGE gdatar
19181@c ENDOFRANGE libf
19182@c ENDOFRANGE flib
19183@c ENDOFRANGE fudlib
19184@c ENDOFRANGE datagr
19185
19186@node Sample Programs
19187@chapter Practical @command{awk} Programs
19188@c STARTOFRANGE awkpex
19189@cindex @command{awk} programs, examples of
19190
19191@ref{Library Functions},
19192presents the idea that reading programs in a language contributes to
19193learning that language.  This @value{CHAPTER} continues that theme,
19194presenting a potpourri of @command{awk} programs for your reading
19195enjoyment.
19196@ifnotinfo
19197There are three sections.
19198The first describes how to run the programs presented
19199in this @value{CHAPTER}.
19200
19201The second presents @command{awk}
19202versions of several common POSIX utilities.
19203These are programs that you are hopefully already familiar with,
19204and therefore, whose problems are understood.
19205By reimplementing these programs in @command{awk},
19206you can focus on the @command{awk}-related aspects of solving
19207the programming problem.
19208
19209The third is a grab bag of interesting programs.
19210These solve a number of different data-manipulation and management
19211problems.  Many of the programs are short, which emphasizes @command{awk}'s
19212ability to do a lot in just a few lines of code.
19213@end ifnotinfo
19214
19215Many of these programs use the library functions presented in
19216@ref{Library Functions}.
19217
19218@menu
19219* Running Examples::            How to run these examples.
19220* Clones::                      Clones of common utilities.
19221* Miscellaneous Programs::      Some interesting @command{awk} programs.
19222@end menu
19223
19224@node Running Examples
19225@section Running the Example Programs
19226
19227To run a given program, you would typically do something like this:
19228
19229@example
19230awk -f @var{program} -- @var{options} @var{files}
19231@end example
19232
19233@noindent
19234Here, @var{program} is the name of the @command{awk} program (such as
19235@file{cut.awk}), @var{options} are any command-line options for the
19236program that start with a @samp{-}, and @var{files} are the actual @value{DF}s.
19237
19238If your system supports the @samp{#!} executable interpreter mechanism
19239(@pxref{Executable Scripts}),
19240you can instead run your program directly:
19241
19242@example
19243cut.awk -c1-8 myfiles > results
19244@end example
19245
19246If your @command{awk} is not @command{gawk}, you may instead need to use this:
19247
19248@example
19249cut.awk -- -c1-8 myfiles > results
19250@end example
19251
19252@node Clones
19253@section Reinventing Wheels for Fun and Profit
19254@c last comma is part of secondary
19255@c STARTOFRANGE posimawk
19256@cindex POSIX, programs, implementing in @command{awk}
19257
19258This @value{SECTION} presents a number of POSIX utilities that are implemented in
19259@command{awk}.  Reinventing these programs in @command{awk} is often enjoyable,
19260because the algorithms can be very clearly expressed, and the code is usually
19261very concise and simple.  This is true because @command{awk} does so much for you.
19262
19263It should be noted that these programs are not necessarily intended to
19264replace the installed versions on your system.  Instead, their
19265purpose is to illustrate @command{awk} language programming for ``real world''
19266tasks.
19267
19268The programs are presented in alphabetical order.
19269
19270@menu
19271* Cut Program::                 The @command{cut} utility.
19272* Egrep Program::               The @command{egrep} utility.
19273* Id Program::                  The @command{id} utility.
19274* Split Program::               The @command{split} utility.
19275* Tee Program::                 The @command{tee} utility.
19276* Uniq Program::                The @command{uniq} utility.
19277* Wc Program::                  The @command{wc} utility.
19278@end menu
19279
19280@node Cut Program
19281@subsection Cutting out Fields and Columns
19282
19283@cindex @command{cut} utility
19284@c STARTOFRANGE cut
19285@cindex @command{cut} utility
19286@c STARTOFRANGE ficut
19287@cindex fields, cutting
19288@c STARTOFRANGE colcut
19289@cindex columns, cutting
19290The @command{cut} utility selects, or ``cuts,'' characters or fields
19291from its standard input and sends them to its standard output.
19292Fields are separated by tabs by default,
19293but you may supply a command-line option to change the field
19294@dfn{delimiter} (i.e., the field-separator character). @command{cut}'s
19295definition of fields is less general than @command{awk}'s.
19296
19297A common use of @command{cut} might be to pull out just the login name of
19298logged-on users from the output of @command{who}.  For example, the following
19299pipeline generates a sorted, unique list of the logged-on users:
19300
19301@example
19302who | cut -c1-8 | sort | uniq
19303@end example
19304
19305The options for @command{cut} are:
19306
19307@table @code
19308@item -c @var{list}
19309Use @var{list} as the list of characters to cut out.  Items within the list
19310may be separated by commas, and ranges of characters can be separated with
19311dashes.  The list @samp{1-8,15,22-35} specifies characters 1 through
193128, 15, and 22 through 35.
19313
19314@item -f @var{list}
19315Use @var{list} as the list of fields to cut out.
19316
19317@item -d @var{delim}
19318Use @var{delim} as the field-separator character instead of the tab
19319character.
19320
19321@item -s
19322Suppress printing of lines that do not contain the field delimiter.
19323@end table
19324
19325The @command{awk} implementation of @command{cut} uses the @code{getopt} library
19326function (@pxref{Getopt Function})
19327and the @code{join} library function
19328(@pxref{Join Function}).
19329
19330The program begins with a comment describing the options, the library
19331functions needed, and a @code{usage} function that prints out a usage
19332message and exits.  @code{usage} is called if invalid arguments are
19333supplied:
19334
19335@cindex @code{cut.awk} program
19336@example
19337@c file eg/prog/cut.awk
19338# cut.awk --- implement cut in awk
19339@c endfile
19340@ignore
19341@c file eg/prog/cut.awk
19342#
19343# Arnold Robbins, arnold@@gnu.org, Public Domain
19344# May 1993
19345
19346@c endfile
19347@end ignore
19348@c file eg/prog/cut.awk
19349# Options:
19350#    -f list     Cut fields
19351#    -d c        Field delimiter character
19352#    -c list     Cut characters
19353#
19354#    -s          Suppress lines without the delimiter
19355#
19356# Requires getopt and join library functions
19357
19358@group
19359function usage(    e1, e2)
19360@{
19361    e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
19362    e2 = "usage: cut [-c list] [files...]"
19363    print e1 > "/dev/stderr"
19364    print e2 > "/dev/stderr"
19365    exit 1
19366@}
19367@end group
19368@c endfile
19369@end example
19370
19371@noindent
19372The variables @code{e1} and @code{e2} are used so that the function
19373fits nicely on the
19374@ifnotinfo
19375page.
19376@end ifnotinfo
19377@ifnottex
19378screen.
19379@end ifnottex
19380
19381@cindex @code{BEGIN} pattern, running @command{awk} programs and
19382@cindex @code{FS} variable, running @command{awk} programs and
19383Next comes a @code{BEGIN} rule that parses the command-line options.
19384It sets @code{FS} to a single TAB character, because that is @command{cut}'s
19385default field separator.  The output field separator is also set to be the
19386same as the input field separator.  Then @code{getopt} is used to step
19387through the command-line options.  Exactly one of the variables
19388@code{by_fields} or @code{by_chars} is set to true, to indicate that
19389processing should be done by fields or by characters, respectively.
19390When cutting by characters, the output field separator is set to the null
19391string:
19392
19393@example
19394@c file eg/prog/cut.awk
19395BEGIN    \
19396@{
19397    FS = "\t"    # default
19398    OFS = FS
19399    while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{
19400        if (c == "f") @{
19401            by_fields = 1
19402            fieldlist = Optarg
19403        @} else if (c == "c") @{
19404            by_chars = 1
19405            fieldlist = Optarg
19406            OFS = ""
19407        @} else if (c == "d") @{
19408            if (length(Optarg) > 1) @{
19409                printf("Using first character of %s" \
19410                " for delimiter\n", Optarg) > "/dev/stderr"
19411                Optarg = substr(Optarg, 1, 1)
19412            @}
19413            FS = Optarg
19414            OFS = FS
19415            if (FS == " ")    # defeat awk semantics
19416                FS = "[ ]"
19417        @} else if (c == "s")
19418            suppress++
19419        else
19420            usage()
19421    @}
19422
19423    for (i = 1; i < Optind; i++)
19424        ARGV[i] = ""
19425@c endfile
19426@end example
19427
19428@cindex field separators, spaces as
19429Special care is taken when the field delimiter is a space.  Using
19430a single space (@code{@w{" "}}) for the value of @code{FS} is
19431incorrect---@command{awk} would separate fields with runs of spaces,
19432tabs, and/or newlines, and we want them to be separated with individual
19433spaces.  Also, note that after @code{getopt} is through, we have to
19434clear out all the elements of @code{ARGV} from 1 to @code{Optind},
19435so that @command{awk} does not try to process the command-line options
19436as @value{FN}s.
19437
19438After dealing with the command-line options, the program verifies that the
19439options make sense.  Only one or the other of @option{-c} and @option{-f}
19440should be used, and both require a field list.  Then the program calls
19441either @code{set_fieldlist} or @code{set_charlist} to pull apart the
19442list of fields or characters:
19443
19444@example
19445@c file eg/prog/cut.awk
19446    if (by_fields && by_chars)
19447        usage()
19448
19449    if (by_fields == 0 && by_chars == 0)
19450        by_fields = 1    # default
19451
19452    if (fieldlist == "") @{
19453        print "cut: needs list for -c or -f" > "/dev/stderr"
19454        exit 1
19455    @}
19456
19457    if (by_fields)
19458        set_fieldlist()
19459    else
19460        set_charlist()
19461@}
19462@c endfile
19463@end example
19464
19465@code{set_fieldlist}  is used to split the field list apart at the commas
19466and into an array.  Then, for each element of the array, it looks to
19467see if it is actually a range, and if so, splits it apart. The range
19468is verified to make sure the first number is smaller than the second.
19469Each number in the list is added to the @code{flist} array, which
19470simply lists the fields that will be printed.  Normal field splitting
19471is used.  The program lets @command{awk} handle the job of doing the
19472field splitting:
19473
19474@example
19475@c file eg/prog/cut.awk
19476function set_fieldlist(        n, m, i, j, k, f, g)
19477@{
19478    n = split(fieldlist, f, ",")
19479    j = 1    # index in flist
19480    for (i = 1; i <= n; i++) @{
19481        if (index(f[i], "-") != 0) @{ # a range
19482            m = split(f[i], g, "-")
19483@group
19484            if (m != 2 || g[1] >= g[2]) @{
19485                printf("bad field list: %s\n",
19486                                  f[i]) > "/dev/stderr"
19487                exit 1
19488            @}
19489@end group
19490            for (k = g[1]; k <= g[2]; k++)
19491                flist[j++] = k
19492        @} else
19493            flist[j++] = f[i]
19494    @}
19495    nfields = j - 1
19496@}
19497@c endfile
19498@end example
19499
19500The @code{set_charlist} function is more complicated than @code{set_fieldlist}.
19501The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable
19502(@pxref{Constant Size}),
19503which describes constant-width input.  When using a character list, that is
19504exactly what we have.
19505
19506Setting up @code{FIELDWIDTHS} is more complicated than simply listing the
19507fields that need to be printed.  We have to keep track of the fields to
19508print and also the intervening characters that have to be skipped.
19509For example, suppose you wanted characters 1 through 8, 15, and
1951022 through 35.  You would use @samp{-c 1-8,15,22-35}.  The necessary value
19511for @code{FIELDWIDTHS} is @code{@w{"8 6 1 6 14"}}.  This yields five
19512fields, and the fields to print
19513are @code{$1}, @code{$3}, and @code{$5}.
19514The intermediate fields are @dfn{filler},
19515which is stuff in between the desired data.
19516@code{flist} lists the fields to print, and @code{t} tracks the
19517complete field list, including filler fields:
19518
19519@example
19520@c file eg/prog/cut.awk
19521function set_charlist(    field, i, j, f, g, t,
19522                          filler, last, len)
19523@{
19524    field = 1   # count total fields
19525    n = split(fieldlist, f, ",")
19526    j = 1       # index in flist
19527    for (i = 1; i <= n; i++) @{
19528        if (index(f[i], "-") != 0) @{ # range
19529            m = split(f[i], g, "-")
19530            if (m != 2 || g[1] >= g[2]) @{
19531                printf("bad character list: %s\n",
19532                               f[i]) > "/dev/stderr"
19533                exit 1
19534            @}
19535            len = g[2] - g[1] + 1
19536            if (g[1] > 1)  # compute length of filler
19537                filler = g[1] - last - 1
19538            else
19539                filler = 0
19540@group
19541            if (filler)
19542                t[field++] = filler
19543@end group
19544            t[field++] = len  # length of field
19545            last = g[2]
19546            flist[j++] = field - 1
19547        @} else @{
19548            if (f[i] > 1)
19549                filler = f[i] - last - 1
19550            else
19551                filler = 0
19552            if (filler)
19553                t[field++] = filler
19554            t[field++] = 1
19555            last = f[i]
19556            flist[j++] = field - 1
19557        @}
19558    @}
19559    FIELDWIDTHS = join(t, 1, field - 1)
19560    nfields = j - 1
19561@}
19562@c endfile
19563@end example
19564
19565Next is the rule that actually processes the data.  If the @option{-s} option
19566is given, then @code{suppress} is true.  The first @code{if} statement
19567makes sure that the input record does have the field separator.  If
19568@command{cut} is processing fields, @code{suppress} is true, and the field
19569separator character is not in the record, then the record is skipped.
19570
19571If the record is valid, then @command{gawk} has split the data
19572into fields, either using the character in @code{FS} or using fixed-length
19573fields and @code{FIELDWIDTHS}.  The loop goes through the list of fields
19574that should be printed.  The corresponding field is printed if it contains data.
19575If the next field also has data, then the separator character is
19576written out between the fields:
19577
19578@example
19579@c file eg/prog/cut.awk
19580@{
19581    if (by_fields && suppress && index($0, FS) != 0)
19582        next
19583
19584    for (i = 1; i <= nfields; i++) @{
19585        if ($flist[i] != "") @{
19586            printf "%s", $flist[i]
19587            if (i < nfields && $flist[i+1] != "")
19588                printf "%s", OFS
19589        @}
19590    @}
19591    print ""
19592@}
19593@c endfile
19594@end example
19595
19596This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS}
19597variable to do the character-based cutting.  While it is possible in
19598other @command{awk} implementations to use @code{substr}
19599(@pxref{String Functions}),
19600it is also extremely painful.
19601The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
19602of picking the input line apart by characters.
19603@c ENDOFRANGE cut
19604@c ENDOFRANGE ficut
19605@c ENDOFRANGE colcut
19606
19607@c Exercise: Rewrite using split with "".
19608
19609@node Egrep Program
19610@subsection Searching for Regular Expressions in Files
19611
19612@c STARTOFRANGE regexps
19613@cindex regular expressions, searching for
19614@c STARTOFRANGE sfregexp
19615@cindex searching, files for regular expressions
19616@c STARTOFRANGE fsregexp
19617@cindex files, searching for regular expressions
19618@cindex @command{egrep} utility
19619The @command{egrep} utility searches files for patterns.  It uses regular
19620expressions that are almost identical to those available in @command{awk}
19621(@pxref{Regexp}).
19622It is used in the following manner:
19623
19624@example
19625egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{}
19626@end example
19627
19628The @var{pattern} is a regular expression.  In typical usage, the regular
19629expression is quoted to prevent the shell from expanding any of the
19630special characters as @value{FN} wildcards.  Normally, @command{egrep}
19631prints the lines that matched.  If multiple @value{FN}s are provided on
19632the command line, each output line is preceded by the name of the file
19633and a colon.
19634
19635The options to @command{egrep} are as follows:
19636
19637@table @code
19638@item -c
19639Print out a count of the lines that matched the pattern, instead of the
19640lines themselves.
19641
19642@item -s
19643Be silent.  No output is produced and the exit value indicates whether
19644the pattern was matched.
19645
19646@item -v
19647Invert the sense of the test. @command{egrep} prints the lines that do
19648@emph{not} match the pattern and exits successfully if the pattern is not
19649matched.
19650
19651@item -i
19652Ignore case distinctions in both the pattern and the input data.
19653
19654@item -l
19655Only print (list) the names of the files that matched, not the lines that matched.
19656
19657@item -e @var{pattern}
19658Use @var{pattern} as the regexp to match.  The purpose of the @option{-e}
19659option is to allow patterns that start with a @samp{-}.
19660@end table
19661
19662This version uses the @code{getopt} library function
19663(@pxref{Getopt Function})
19664and the file transition library program
19665(@pxref{Filetrans Function}).
19666
19667The program begins with a descriptive comment and then a @code{BEGIN} rule
19668that processes the command-line arguments with @code{getopt}.  The @option{-i}
19669(ignore case) option is particularly easy with @command{gawk}; we just use the
19670@code{IGNORECASE} built-in variable
19671(@pxref{Built-in Variables}):
19672
19673@cindex @code{egrep.awk} program
19674@example
19675@c file eg/prog/egrep.awk
19676# egrep.awk --- simulate egrep in awk
19677@c endfile
19678@ignore
19679@c file eg/prog/egrep.awk
19680#
19681# Arnold Robbins, arnold@@gnu.org, Public Domain
19682# May 1993
19683
19684@c endfile
19685@end ignore
19686@c file eg/prog/egrep.awk
19687# Options:
19688#    -c    count of lines
19689#    -s    silent - use exit value
19690#    -v    invert test, success if no match
19691#    -i    ignore case
19692#    -l    print filenames only
19693#    -e    argument is pattern
19694#
19695# Requires getopt and file transition library functions
19696
19697BEGIN @{
19698    while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
19699        if (c == "c")
19700            count_only++
19701        else if (c == "s")
19702            no_print++
19703        else if (c == "v")
19704            invert++
19705        else if (c == "i")
19706            IGNORECASE = 1
19707        else if (c == "l")
19708            filenames_only++
19709        else if (c == "e")
19710            pattern = Optarg
19711        else
19712            usage()
19713    @}
19714@c endfile
19715@end example
19716
19717Next comes the code that handles the @command{egrep}-specific behavior. If no
19718pattern is supplied with @option{-e}, the first nonoption on the
19719command line is used.  The @command{awk} command-line arguments up to @code{ARGV[Optind]}
19720are cleared, so that @command{awk} won't try to process them as files.  If no
19721files are specified, the standard input is used, and if multiple files are
19722specified, we make sure to note this so that the @value{FN}s can precede the
19723matched lines in the output:
19724
19725@example
19726@c file eg/prog/egrep.awk
19727    if (pattern == "")
19728        pattern = ARGV[Optind++]
19729
19730    for (i = 1; i < Optind; i++)
19731        ARGV[i] = ""
19732    if (Optind >= ARGC) @{
19733        ARGV[1] = "-"
19734        ARGC = 2
19735    @} else if (ARGC - Optind > 1)
19736        do_filenames++
19737
19738#    if (IGNORECASE)
19739#        pattern = tolower(pattern)
19740@}
19741@c endfile
19742@end example
19743
19744The last two lines are commented out, since they are not needed in
19745@command{gawk}.  They should be uncommented if you have to use another version
19746of @command{awk}.
19747
19748The next set of lines should be uncommented if you are not using
19749@command{gawk}.  This rule translates all the characters in the input line
19750into lowercase if the @option{-i} option is specified.@footnote{It
19751also introduces a subtle bug;
19752if a match happens, we output the translated line, not the original.}
19753The rule is
19754commented out since it is not necessary with @command{gawk}:
19755
19756@c Exercise: Fix this, w/array and new line as key to original line
19757
19758@example
19759@c file eg/prog/egrep.awk
19760#@{
19761#    if (IGNORECASE)
19762#        $0 = tolower($0)
19763#@}
19764@c endfile
19765@end example
19766
19767The @code{beginfile} function is called by the rule in @file{ftrans.awk}
19768when each new file is processed.  In this case, it is very simple; all it
19769does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
19770how many lines in the current file matched the pattern
19771(naming the parameter @code{junk} shows we know that @code{beginfile}
19772is called with a parameter, but that we're not interested in its value):
19773
19774@example
19775@c file eg/prog/egrep.awk
19776function beginfile(junk)
19777@{
19778    fcount = 0
19779@}
19780@c endfile
19781@end example
19782
19783The @code{endfile} function is called after each file has been processed.
19784It affects the output only when the user wants a count of the number of lines that
19785matched.  @code{no_print} is true only if the exit status is desired.
19786@code{count_only} is true if line counts are desired.  @command{egrep}
19787therefore only prints line counts if printing and counting are enabled.
19788The output format must be adjusted depending upon the number of files to
19789process.  Finally, @code{fcount} is added to @code{total}, so that we
19790know the total number of lines that matched the pattern:
19791
19792@example
19793@c file eg/prog/egrep.awk
19794function endfile(file)
19795@{
19796    if (! no_print && count_only)
19797        if (do_filenames)
19798            print file ":" fcount
19799        else
19800            print fcount
19801
19802    total += fcount
19803@}
19804@c endfile
19805@end example
19806
19807The following rule does most of the work of matching lines. The variable
19808@code{matches} is true if the line matched the pattern. If the user
19809wants lines that did not match, the sense of @code{matches} is inverted
19810using the @samp{!} operator. @code{fcount} is incremented with the value of
19811@code{matches}, which is either one or zero, depending upon a
19812successful or unsuccessful match.  If the line does not match, the
19813@code{next} statement just moves on to the next record.
19814
19815@cindex @code{!} (exclamation point), @code{!} operator
19816@cindex exclamation point (@code{!}), @code{!} operator
19817A number of additional tests are made, but they are only done if we
19818are not counting lines.  First, if the user only wants exit status
19819(@code{no_print} is true), then it is enough to know that @emph{one}
19820line in this file matched, and we can skip on to the next file with
19821@code{nextfile}.  Similarly, if we are only printing @value{FN}s, we can
19822print the @value{FN}, and then skip to the next file with @code{nextfile}.
19823Finally, each line is printed, with a leading @value{FN} and colon
19824if necessary:
19825
19826@cindex @code{!} operator
19827@example
19828@c file eg/prog/egrep.awk
19829@{
19830    matches = ($0 ~ pattern)
19831    if (invert)
19832        matches = ! matches
19833
19834    fcount += matches    # 1 or 0
19835
19836    if (! matches)
19837        next
19838
19839    if (! count_only) @{
19840        if (no_print)
19841            nextfile
19842
19843        if (filenames_only) @{
19844            print FILENAME
19845            nextfile
19846        @}
19847
19848        if (do_filenames)
19849            print FILENAME ":" $0
19850        else
19851            print
19852    @}
19853@}
19854@c endfile
19855@end example
19856
19857The @code{END} rule takes care of producing the correct exit status. If
19858there are no matches, the exit status is one; otherwise it is zero:
19859
19860@example
19861@c file eg/prog/egrep.awk
19862END    \
19863@{
19864    if (total == 0)
19865        exit 1
19866    exit 0
19867@}
19868@c endfile
19869@end example
19870
19871The @code{usage} function prints a usage message in case of invalid options,
19872and then exits:
19873
19874@example
19875@c file eg/prog/egrep.awk
19876function usage(    e)
19877@{
19878    e = "Usage: egrep [-csvil] [-e pat] [files ...]"
19879    e = e "\n\tegrep [-csvil] pat [files ...]"
19880    print e > "/dev/stderr"
19881    exit 1
19882@}
19883@c endfile
19884@end example
19885
19886The variable @code{e} is used so that the function fits nicely
19887on the printed page.
19888
19889@cindex @code{END} pattern, backslash continuation and
19890@cindex @code{\} (backslash), continuing lines and
19891@cindex backslash (@code{\}), continuing lines and
19892Just a note on programming style: you may have noticed that the @code{END}
19893rule uses backslash continuation, with the open brace on a line by
19894itself.  This is so that it more closely resembles the way functions
19895are written.  Many of the examples
19896in this @value{CHAPTER}
19897use this style. You can decide for yourself if you like writing
19898your @code{BEGIN} and @code{END} rules this way
19899or not.
19900@c ENDOFRANGE regexps
19901@c ENDOFRANGE sfregexp
19902@c ENDOFRANGE fsregexp
19903
19904@node Id Program
19905@subsection Printing out User Information
19906
19907@cindex printing, user information
19908@cindex users, information about, printing
19909@cindex @command{id} utility
19910The @command{id} utility lists a user's real and effective user ID numbers,
19911real and effective group ID numbers, and the user's group set, if any.
19912@command{id} only prints the effective user ID and group ID if they are
19913different from the real ones.  If possible, @command{id} also supplies the
19914corresponding user and group names.  The output might look like this:
19915
19916@example
19917$ id
19918@print{} uid=2076(arnold) gid=10(staff) groups=10(staff),4(tty)
19919@end example
19920
19921This information is part of what is provided by @command{gawk}'s
19922@code{PROCINFO} array (@pxref{Built-in Variables}).
19923However, the @command{id} utility provides a more palatable output than just
19924individual numbers.
19925
19926Here is a simple version of @command{id} written in @command{awk}.
19927It uses the user database library functions
19928(@pxref{Passwd Functions})
19929and the group database library functions
19930(@pxref{Group Functions}):
19931
19932The program is fairly straightforward.  All the work is done in the
19933@code{BEGIN} rule.  The user and group ID numbers are obtained from
19934@code{PROCINFO}.
19935The code is repetitive.  The entry in the user database for the real user ID
19936number is split into parts at the @samp{:}. The name is the first field.
19937Similar code is used for the effective user ID number and the group
19938numbers:
19939
19940@cindex @code{id.awk} program
19941@example
19942@c file eg/prog/id.awk
19943# id.awk --- implement id in awk
19944#
19945# Requires user and group library functions
19946@c endfile
19947@ignore
19948@c file eg/prog/id.awk
19949#
19950# Arnold Robbins, arnold@@gnu.org, Public Domain
19951# May 1993
19952# Revised February 1996
19953
19954@c endfile
19955@end ignore
19956@c file eg/prog/id.awk
19957# output is:
19958# uid=12(foo) euid=34(bar) gid=3(baz) \
19959#             egid=5(blat) groups=9(nine),2(two),1(one)
19960
19961@group
19962BEGIN    \
19963@{
19964    uid = PROCINFO["uid"]
19965    euid = PROCINFO["euid"]
19966    gid = PROCINFO["gid"]
19967    egid = PROCINFO["egid"]
19968@end group
19969
19970    printf("uid=%d", uid)
19971    pw = getpwuid(uid)
19972    if (pw != "") @{
19973        split(pw, a, ":")
19974        printf("(%s)", a[1])
19975    @}
19976
19977    if (euid != uid) @{
19978        printf(" euid=%d", euid)
19979        pw = getpwuid(euid)
19980        if (pw != "") @{
19981            split(pw, a, ":")
19982            printf("(%s)", a[1])
19983        @}
19984    @}
19985
19986    printf(" gid=%d", gid)
19987    pw = getgrgid(gid)
19988    if (pw != "") @{
19989        split(pw, a, ":")
19990        printf("(%s)", a[1])
19991    @}
19992
19993    if (egid != gid) @{
19994        printf(" egid=%d", egid)
19995        pw = getgrgid(egid)
19996        if (pw != "") @{
19997            split(pw, a, ":")
19998            printf("(%s)", a[1])
19999        @}
20000    @}
20001
20002    for (i = 1; ("group" i) in PROCINFO; i++) @{
20003        if (i == 1)
20004            printf(" groups=")
20005        group = PROCINFO["group" i]
20006        printf("%d", group)
20007        pw = getgrgid(group)
20008        if (pw != "") @{
20009            split(pw, a, ":")
20010            printf("(%s)", a[1])
20011        @}
20012        if (("group" (i+1)) in PROCINFO)
20013            printf(",")
20014    @}
20015
20016    print ""
20017@}
20018@c endfile
20019@end example
20020
20021@cindex @code{in} operator
20022The test in the @code{for} loop is worth noting.
20023Any supplementary groups in the @code{PROCINFO} array have the
20024indices @code{"group1"} through @code{"group@var{N}"} for some
20025@var{N}, i.e., the total number of supplementary groups.
20026However, we don't know in advance how many of these groups
20027there are.
20028
20029This loop works by starting at one, concatenating the value with
20030@code{"group"}, and then using @code{in} to see if that value is
20031in the array.  Eventually, @code{i} is incremented past
20032the last group in the array and the loop exits.
20033
20034The loop is also correct if there are @emph{no} supplementary
20035groups; then the condition is false the first time it's
20036tested, and the loop body never executes.
20037
20038@c exercise!!!
20039@ignore
20040The POSIX version of @command{id} takes arguments that control which
20041information is printed.  Modify this version to accept the same
20042arguments and perform in the same way.
20043@end ignore
20044
20045@node Split Program
20046@subsection Splitting a Large File into Pieces
20047
20048@c STARTOFRANGE filspl
20049@cindex files, splitting
20050@cindex @code{split} utility
20051The @code{split} program splits large text files into smaller pieces.
20052Usage is as follows:
20053
20054@example
20055split @r{[}-@var{count}@r{]} file @r{[} @var{prefix} @r{]}
20056@end example
20057
20058By default,
20059the output files are named @file{xaa}, @file{xab}, and so on. Each file has
200601000 lines in it, with the likely exception of the last file. To change the
20061number of lines in each file, supply a number on the command line
20062preceded with a minus; e.g., @samp{-500} for files with 500 lines in them
20063instead of 1000.  To change the name of the output files to something like
20064@file{myfileaa}, @file{myfileab}, and so on, supply an additional
20065argument that specifies the @value{FN} prefix.
20066
20067Here is a version of @code{split} in @command{awk}. It uses the @code{ord} and
20068@code{chr} functions presented in
20069@ref{Ordinal Functions}.
20070
20071The program first sets its defaults, and then tests to make sure there are
20072not too many arguments.  It then looks at each argument in turn.  The
20073first argument could be a minus sign followed by a number. If it is, this happens
20074to look like a negative number, so it is made positive, and that is the
20075count of lines.  The data @value{FN} is skipped over and the final argument
20076is used as the prefix for the output @value{FN}s:
20077
20078@cindex @code{split.awk} program
20079@example
20080@c file eg/prog/split.awk
20081# split.awk --- do split in awk
20082#
20083# Requires ord and chr library functions
20084@c endfile
20085@ignore
20086@c file eg/prog/split.awk
20087#
20088# Arnold Robbins, arnold@@gnu.org, Public Domain
20089# May 1993
20090
20091@c endfile
20092@end ignore
20093@c file eg/prog/split.awk
20094# usage: split [-num] [file] [outname]
20095
20096BEGIN @{
20097    outfile = "x"    # default
20098    count = 1000
20099    if (ARGC > 4)
20100        usage()
20101
20102    i = 1
20103    if (ARGV[i] ~ /^-[0-9]+$/) @{
20104        count = -ARGV[i]
20105        ARGV[i] = ""
20106        i++
20107    @}
20108    # test argv in case reading from stdin instead of file
20109    if (i in ARGV)
20110        i++    # skip data file name
20111    if (i in ARGV) @{
20112        outfile = ARGV[i]
20113        ARGV[i] = ""
20114    @}
20115
20116    s1 = s2 = "a"
20117    out = (outfile s1 s2)
20118@}
20119@c endfile
20120@end example
20121
20122The next rule does most of the work. @code{tcount} (temporary count) tracks
20123how many lines have been printed to the output file so far. If it is greater
20124than @code{count}, it is time to close the current file and start a new one.
20125@code{s1} and @code{s2} track the current suffixes for the @value{FN}. If
20126they are both @samp{z}, the file is just too big.  Otherwise, @code{s1}
20127moves to the next letter in the alphabet and @code{s2} starts over again at
20128@samp{a}:
20129
20130@c else on separate line here for page breaking
20131@example
20132@c file eg/prog/split.awk
20133@{
20134    if (++tcount > count) @{
20135        close(out)
20136        if (s2 == "z") @{
20137            if (s1 == "z") @{
20138                printf("split: %s is too large to split\n",
20139                       FILENAME) > "/dev/stderr"
20140                exit 1
20141            @}
20142            s1 = chr(ord(s1) + 1)
20143            s2 = "a"
20144        @}
20145@group
20146        else
20147            s2 = chr(ord(s2) + 1)
20148@end group
20149        out = (outfile s1 s2)
20150        tcount = 1
20151    @}
20152    print > out
20153@}
20154@c endfile
20155@end example
20156
20157@c Exercise: do this with just awk builtin functions, index("abc..."), substr, etc.
20158
20159@noindent
20160The @code{usage} function simply prints an error message and exits:
20161
20162@example
20163@c file eg/prog/split.awk
20164function usage(   e)
20165@{
20166    e = "usage: split [-num] [file] [outname]"
20167    print e > "/dev/stderr"
20168    exit 1
20169@}
20170@c endfile
20171@end example
20172
20173@noindent
20174The variable @code{e} is used so that the function
20175fits nicely on the
20176@ifinfo
20177screen.
20178@end ifinfo
20179@ifnotinfo
20180page.
20181@end ifnotinfo
20182
20183This program is a bit sloppy; it relies on @command{awk} to automatically close the last file
20184instead of doing it in an @code{END} rule.
20185It also assumes that letters are contiguous in the character set,
20186which isn't true for EBCDIC systems.
20187@c BFD...
20188@c ENDOFRANGE filspl
20189
20190@node Tee Program
20191@subsection Duplicating Output into Multiple Files
20192
20193@c last comma is part of secondary
20194@cindex files, multiple, duplicating output into
20195@cindex output, duplicating into files
20196@cindex @code{tee} utility
20197The @code{tee} program is known as a ``pipe fitting.''  @code{tee} copies
20198its standard input to its standard output and also duplicates it to the
20199files named on the command line.  Its usage is as follows:
20200
20201@example
20202tee @r{[}-a@r{]} file @dots{}
20203@end example
20204
20205The @option{-a} option tells @code{tee} to append to the named files, instead of
20206truncating them and starting over.
20207
20208The @code{BEGIN} rule first makes a copy of all the command-line arguments
20209into an array named @code{copy}.
20210@code{ARGV[0]} is not copied, since it is not needed.
20211@code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to
20212process each @value{FN} in @code{ARGV} as input data.
20213
20214@cindex flag variables
20215If the first argument is @option{-a}, then the flag variable
20216@code{append} is set to true, and both @code{ARGV[1]} and
20217@code{copy[1]} are deleted. If @code{ARGC} is less than two, then no
20218@value{FN}s were supplied and @code{tee} prints a usage message and exits.
20219Finally, @command{awk} is forced to read the standard input by setting
20220@code{ARGV[1]} to @code{"-"} and @code{ARGC} to two:
20221
20222@c NEXT ED: Add more leading commentary in this program
20223@cindex @code{tee.awk} program
20224@example
20225@c file eg/prog/tee.awk
20226# tee.awk --- tee in awk
20227@c endfile
20228@ignore
20229@c file eg/prog/tee.awk
20230#
20231# Arnold Robbins, arnold@@gnu.org, Public Domain
20232# May 1993
20233# Revised December 1995
20234
20235@c endfile
20236@end ignore
20237@c file eg/prog/tee.awk
20238BEGIN    \
20239@{
20240    for (i = 1; i < ARGC; i++)
20241        copy[i] = ARGV[i]
20242
20243    if (ARGV[1] == "-a") @{
20244        append = 1
20245        delete ARGV[1]
20246        delete copy[1]
20247        ARGC--
20248    @}
20249    if (ARGC < 2) @{
20250        print "usage: tee [-a] file ..." > "/dev/stderr"
20251        exit 1
20252    @}
20253    ARGV[1] = "-"
20254    ARGC = 2
20255@}
20256@c endfile
20257@end example
20258
20259The single rule does all the work.  Since there is no pattern, it is
20260executed for each line of input.  The body of the rule simply prints the
20261line into each file on the command line, and then to the standard output:
20262
20263@example
20264@c file eg/prog/tee.awk
20265@{
20266    # moving the if outside the loop makes it run faster
20267    if (append)
20268        for (i in copy)
20269            print >> copy[i]
20270    else
20271        for (i in copy)
20272            print > copy[i]
20273    print
20274@}
20275@c endfile
20276@end example
20277
20278@noindent
20279It is also possible to write the loop this way:
20280
20281@example
20282for (i in copy)
20283    if (append)
20284        print >> copy[i]
20285    else
20286        print > copy[i]
20287@end example
20288
20289@noindent
20290This is more concise but it is also less efficient.  The @samp{if} is
20291tested for each record and for each output file.  By duplicating the loop
20292body, the @samp{if} is only tested once for each input record.  If there are
20293@var{N} input records and @var{M} output files, the first method only
20294executes @var{N} @samp{if} statements, while the second executes
20295@var{N}@code{*}@var{M} @samp{if} statements.
20296
20297Finally, the @code{END} rule cleans up by closing all the output files:
20298
20299@example
20300@c file eg/prog/tee.awk
20301END    \
20302@{
20303    for (i in copy)
20304        close(copy[i])
20305@}
20306@c endfile
20307@end example
20308
20309@node Uniq Program
20310@subsection Printing Nonduplicated Lines of Text
20311
20312@c STARTOFRANGE prunt
20313@cindex printing, unduplicated lines of text
20314@c first comma is part of primary
20315@c STARTOFRANGE tpul
20316@cindex text, printing, unduplicated lines of
20317@cindex @command{uniq} utility
20318The @command{uniq} utility reads sorted lines of data on its standard
20319input, and by default removes duplicate lines.  In other words, it only
20320prints unique lines---hence the name.  @command{uniq} has a number of
20321options. The usage is as follows:
20322
20323@example
20324uniq @r{[}-udc @r{[}-@var{n}@r{]]} @r{[}+@var{n}@r{]} @r{[} @var{input file} @r{[} @var{output file} @r{]]}
20325@end example
20326
20327The options for @command{uniq} are:
20328
20329@table @code
20330@item -d
20331Pnly print only repeated lines.
20332
20333@item -u
20334Print only nonrepeated lines.
20335
20336@item -c
20337Count lines. This option overrides @option{-d} and @option{-u}.  Both repeated
20338and nonrepeated lines are counted.
20339
20340@item -@var{n}
20341Skip @var{n} fields before comparing lines.  The definition of fields
20342is similar to @command{awk}'s default: nonwhitespace characters separated
20343by runs of spaces and/or tabs.
20344
20345@item +@var{n}
20346Skip @var{n} characters before comparing lines.  Any fields specified with
20347@samp{-@var{n}} are skipped first.
20348
20349@item @var{input file}
20350Data is read from the input file named on the command line, instead of from
20351the standard input.
20352
20353@item @var{output file}
20354The generated output is sent to the named output file, instead of to the
20355standard output.
20356@end table
20357
20358Normally @command{uniq} behaves as if both the @option{-d} and
20359@option{-u} options are provided.
20360
20361@command{uniq} uses the
20362@code{getopt} library function
20363(@pxref{Getopt Function})
20364and the @code{join} library function
20365(@pxref{Join Function}).
20366
20367The program begins with a @code{usage} function and then a brief outline of
20368the options and their meanings in a comment.
20369The @code{BEGIN} rule deals with the command-line arguments and options. It
20370uses a trick to get @code{getopt} to handle options of the form @samp{-25},
20371treating such an option as the option letter @samp{2} with an argument of
20372@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks
20373like a number), @code{Optarg} is
20374concatenated with the option digit and then the result is added to zero to make
20375it into a number.  If there is only one digit in the option, then
20376@code{Optarg} is not needed. In this case, @code{Optind} must be decremented so that
20377@code{getopt} processes it next time.  This code is admittedly a bit
20378tricky.
20379
20380If no options are supplied, then the default is taken, to print both
20381repeated and nonrepeated lines.  The output file, if provided, is assigned
20382to @code{outputfile}.  Early on, @code{outputfile} is initialized to the
20383standard output, @file{/dev/stdout}:
20384
20385@cindex @code{uniq.awk} program
20386@example
20387@c file eg/prog/uniq.awk
20388@group
20389# uniq.awk --- do uniq in awk
20390#
20391# Requires getopt and join library functions
20392@end group
20393@c endfile
20394@ignore
20395@c file eg/prog/uniq.awk
20396#
20397# Arnold Robbins, arnold@@gnu.org, Public Domain
20398# May 1993
20399
20400@c endfile
20401@end ignore
20402@c file eg/prog/uniq.awk
20403function usage(    e)
20404@{
20405    e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
20406    print e > "/dev/stderr"
20407    exit 1
20408@}
20409
20410# -c    count lines. overrides -d and -u
20411# -d    only repeated lines
20412# -u    only non-repeated lines
20413# -n    skip n fields
20414# +n    skip n characters, skip fields first
20415
20416BEGIN   \
20417@{
20418    count = 1
20419    outputfile = "/dev/stdout"
20420    opts = "udc0:1:2:3:4:5:6:7:8:9:"
20421    while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
20422        if (c == "u")
20423            non_repeated_only++
20424        else if (c == "d")
20425            repeated_only++
20426        else if (c == "c")
20427            do_count++
20428        else if (index("0123456789", c) != 0) @{
20429            # getopt requires args to options
20430            # this messes us up for things like -5
20431            if (Optarg ~ /^[0-9]+$/)
20432                fcount = (c Optarg) + 0
20433            else @{
20434                fcount = c + 0
20435                Optind--
20436            @}
20437        @} else
20438            usage()
20439    @}
20440
20441    if (ARGV[Optind] ~ /^\+[0-9]+$/) @{
20442        charcount = substr(ARGV[Optind], 2) + 0
20443        Optind++
20444    @}
20445
20446    for (i = 1; i < Optind; i++)
20447        ARGV[i] = ""
20448
20449    if (repeated_only == 0 && non_repeated_only == 0)
20450        repeated_only = non_repeated_only = 1
20451
20452    if (ARGC - Optind == 2) @{
20453        outputfile = ARGV[ARGC - 1]
20454        ARGV[ARGC - 1] = ""
20455    @}
20456@}
20457@c endfile
20458@end example
20459
20460The following function, @code{are_equal}, compares the current line,
20461@code{$0}, to the
20462previous line, @code{last}.  It handles skipping fields and characters.
20463If no field count and no character count are specified, @code{are_equal}
20464simply returns one or zero depending upon the result of a simple string
20465comparison of @code{last} and @code{$0}.  Otherwise, things get more
20466complicated.
20467If fields have to be skipped, each line is broken into an array using
20468@code{split}
20469(@pxref{String Functions});
20470the desired fields are then joined back into a line using @code{join}.
20471The joined lines are stored in @code{clast} and @code{cline}.
20472If no fields are skipped, @code{clast} and @code{cline} are set to
20473@code{last} and @code{$0}, respectively.
20474Finally, if characters are skipped, @code{substr} is used to strip off the
20475leading @code{charcount} characters in @code{clast} and @code{cline}.  The
20476two strings are then compared and @code{are_equal} returns the result:
20477
20478@example
20479@c file eg/prog/uniq.awk
20480function are_equal(    n, m, clast, cline, alast, aline)
20481@{
20482    if (fcount == 0 && charcount == 0)
20483        return (last == $0)
20484
20485    if (fcount > 0) @{
20486        n = split(last, alast)
20487        m = split($0, aline)
20488        clast = join(alast, fcount+1, n)
20489        cline = join(aline, fcount+1, m)
20490    @} else @{
20491        clast = last
20492        cline = $0
20493    @}
20494    if (charcount) @{
20495        clast = substr(clast, charcount + 1)
20496        cline = substr(cline, charcount + 1)
20497    @}
20498
20499    return (clast == cline)
20500@}
20501@c endfile
20502@end example
20503
20504The following two rules are the body of the program.  The first one is
20505executed only for the very first line of data.  It sets @code{last} equal to
20506@code{$0}, so that subsequent lines of text have something to be compared to.
20507
20508The second rule does the work. The variable @code{equal} is one or zero,
20509depending upon the results of @code{are_equal}'s comparison. If @command{uniq}
20510is counting repeated lines, and the lines are equal, then it increments the @code{count} variable.
20511Otherwise, it prints the line and resets @code{count},
20512since the two lines are not equal.
20513
20514If @command{uniq} is not counting, and if the lines are equal, @code{count} is incremented.
20515Nothing is printed, since the point is to remove duplicates.
20516Otherwise, if @command{uniq} is counting repeated lines and more than
20517one line is seen, or if @command{uniq} is counting nonrepeated lines
20518and only one line is seen, then the line is printed, and @code{count}
20519is reset.
20520
20521Finally, similar logic is used in the @code{END} rule to print the final
20522line of input data:
20523
20524@example
20525@c file eg/prog/uniq.awk
20526NR == 1 @{
20527    last = $0
20528    next
20529@}
20530
20531@{
20532    equal = are_equal()
20533
20534    if (do_count) @{    # overrides -d and -u
20535        if (equal)
20536            count++
20537        else @{
20538            printf("%4d %s\n", count, last) > outputfile
20539            last = $0
20540            count = 1    # reset
20541        @}
20542        next
20543    @}
20544
20545    if (equal)
20546        count++
20547    else @{
20548        if ((repeated_only && count > 1) ||
20549            (non_repeated_only && count == 1))
20550                print last > outputfile
20551        last = $0
20552        count = 1
20553    @}
20554@}
20555
20556END @{
20557    if (do_count)
20558        printf("%4d %s\n", count, last) > outputfile
20559    else if ((repeated_only && count > 1) ||
20560            (non_repeated_only && count == 1))
20561        print last > outputfile
20562@}
20563@c endfile
20564@end example
20565@c ENDOFRANGE prunt
20566@c ENDOFRANGE tpul
20567
20568@node Wc Program
20569@subsection Counting Things
20570
20571@c STARTOFRANGE count
20572@cindex counting
20573@c STARTOFRANGE infco
20574@cindex input files, counting elements in
20575@c STARTOFRANGE woco
20576@cindex words, counting
20577@c STARTOFRANGE chco
20578@cindex characters, counting
20579@c STARTOFRANGE lico
20580@cindex lines, counting
20581@cindex @command{wc} utility
20582The @command{wc} (word count) utility counts lines, words, and characters in
20583one or more input files. Its usage is as follows:
20584
20585@example
20586wc @r{[}-lwc@r{]} @r{[} @var{files} @dots{} @r{]}
20587@end example
20588
20589If no files are specified on the command line, @command{wc} reads its standard
20590input. If there are multiple files, it also prints total counts for all
20591the files.  The options and their meanings are shown in the following list:
20592
20593@table @code
20594@item -l
20595Count only lines.
20596
20597@item -w
20598Count only words.
20599A ``word'' is a contiguous sequence of nonwhitespace characters, separated
20600by spaces and/or tabs.  Luckily, this is the normal way @command{awk} separates
20601fields in its input data.
20602
20603@item -c
20604Count only characters.
20605@end table
20606
20607Implementing @command{wc} in @command{awk} is particularly elegant,
20608since @command{awk} does a lot of the work for us; it splits lines into
20609words (i.e., fields) and counts them, it counts lines (i.e., records),
20610and it can easily tell us how long a line is.
20611
20612This uses the @code{getopt} library function
20613(@pxref{Getopt Function})
20614and the file-transition functions
20615(@pxref{Filetrans Function}).
20616
20617This version has one notable difference from traditional versions of
20618@command{wc}: it always prints the counts in the order lines, words,
20619and characters.  Traditional versions note the order of the @option{-l},
20620@option{-w}, and @option{-c} options on the command line, and print the
20621counts in that order.
20622
20623The @code{BEGIN} rule does the argument processing.  The variable
20624@code{print_total} is true if more than one file is named on the
20625command line:
20626
20627@cindex @code{wc.awk} program
20628@example
20629@c file eg/prog/wc.awk
20630# wc.awk --- count lines, words, characters
20631@c endfile
20632@ignore
20633@c file eg/prog/wc.awk
20634#
20635# Arnold Robbins, arnold@@gnu.org, Public Domain
20636# May 1993
20637@c endfile
20638@end ignore
20639@c file eg/prog/wc.awk
20640
20641# Options:
20642#    -l    only count lines
20643#    -w    only count words
20644#    -c    only count characters
20645#
20646# Default is to count lines, words, characters
20647#
20648# Requires getopt and file transition library functions
20649
20650BEGIN @{
20651    # let getopt print a message about
20652    # invalid options. we ignore them
20653    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
20654        if (c == "l")
20655            do_lines = 1
20656        else if (c == "w")
20657            do_words = 1
20658        else if (c == "c")
20659            do_chars = 1
20660    @}
20661    for (i = 1; i < Optind; i++)
20662        ARGV[i] = ""
20663
20664    # if no options, do all
20665    if (! do_lines && ! do_words && ! do_chars)
20666        do_lines = do_words = do_chars = 1
20667
20668    print_total = (ARGC - i > 2)
20669@}
20670@c endfile
20671@end example
20672
20673The @code{beginfile} function is simple; it just resets the counts of lines,
20674words, and characters to zero, and saves the current @value{FN} in
20675@code{fname}:
20676
20677@c NEXT ED: make it lines = words = chars = 0
20678@example
20679@c file eg/prog/wc.awk
20680function beginfile(file)
20681@{
20682    chars = lines = words = 0
20683    fname = FILENAME
20684@}
20685@c endfile
20686@end example
20687
20688The @code{endfile} function adds the current file's numbers to the running
20689totals of lines, words, and characters.@footnote{@command{wc} can't just use the value of
20690@code{FNR} in @code{endfile}. If you examine
20691the code in
20692@ref{Filetrans Function}
20693you will see that
20694@code{FNR} has already been reset by the time
20695@code{endfile} is called.}  It then prints out those numbers
20696for the file that was just read. It relies on @code{beginfile} to reset the
20697numbers for the following @value{DF}:
20698@c ONE DAY: make the above footnote an exercise, instead of giving away the answer.
20699
20700@c NEXT ED: make order for += be lines, words, chars
20701@example
20702@c file eg/prog/wc.awk
20703function endfile(file)
20704@{
20705    tchars += chars
20706    tlines += lines
20707    twords += words
20708    if (do_lines)
20709        printf "\t%d", lines
20710@group
20711    if (do_words)
20712        printf "\t%d", words
20713@end group
20714    if (do_chars)
20715        printf "\t%d", chars
20716    printf "\t%s\n", fname
20717@}
20718@c endfile
20719@end example
20720
20721There is one rule that is executed for each line. It adds the length of
20722the record, plus one, to @code{chars}.  Adding one plus the record length
20723is needed because the newline character separating records (the value
20724of @code{RS}) is not part of the record itself, and thus not included
20725in its length.  Next, @code{lines} is incremented for each line read,
20726and @code{words} is incremented by the value of @code{NF}, which is the
20727number of ``words'' on this line:
20728
20729@example
20730@c file eg/prog/wc.awk
20731# do per line
20732@{
20733    chars += length($0) + 1    # get newline
20734    lines++
20735    words += NF
20736@}
20737@c endfile
20738@end example
20739
20740Finally, the @code{END} rule simply prints the totals for all the files:
20741
20742@example
20743@c file eg/prog/wc.awk
20744END @{
20745    if (print_total) @{
20746        if (do_lines)
20747            printf "\t%d", tlines
20748        if (do_words)
20749            printf "\t%d", twords
20750        if (do_chars)
20751            printf "\t%d", tchars
20752        print "\ttotal"
20753    @}
20754@}
20755@c endfile
20756@end example
20757@c ENDOFRANGE count
20758@c ENDOFRANGE infco
20759@c ENDOFRANGE lico
20760@c ENDOFRANGE woco
20761@c ENDOFRANGE chco
20762@c ENDOFRANGE posimawk
20763
20764@node Miscellaneous Programs
20765@section A Grab Bag of @command{awk} Programs
20766
20767This @value{SECTION} is a large ``grab bag'' of miscellaneous programs.
20768We hope you find them both interesting and enjoyable.
20769
20770@menu
20771* Dupword Program::             Finding duplicated words in a document.
20772* Alarm Program::               An alarm clock.
20773* Translate Program::           A program similar to the @command{tr} utility.
20774* Labels Program::              Printing mailing labels.
20775* Word Sorting::                A program to produce a word usage count.
20776* History Sorting::             Eliminating duplicate entries from a history
20777                                file.
20778* Extract Program::             Pulling out programs from Texinfo source
20779                                files.
20780* Simple Sed::                  A Simple Stream Editor.
20781* Igawk Program::               A wrapper for @command{awk} that includes
20782                                files.
20783@end menu
20784
20785@node Dupword Program
20786@subsection Finding Duplicated Words in a Document
20787
20788@c last comma is part of secondary
20789@cindex words, duplicate, searching for
20790@cindex searching, for words
20791@c first comma is part of primary
20792@cindex documents, searching
20793A common error when writing large amounts of prose is to accidentally
20794duplicate words.  Typically you will see this in text as something like ``the
20795the program does the following@dots{}''  When the text is online, often
20796the duplicated words occur at the end of one line and the beginning of
20797another, making them very difficult to spot.
20798@c as here!
20799
20800This program, @file{dupword.awk}, scans through a file one line at a time
20801and looks for adjacent occurrences of the same word.  It also saves the last
20802word on a line (in the variable @code{prev}) for comparison with the first
20803word on the next line.
20804
20805@cindex Texinfo
20806The first two statements make sure that the line is all lowercase,
20807so that, for example, ``The'' and ``the'' compare equal to each other.
20808The next statement replaces nonalphanumeric and nonwhitespace characters
20809with spaces, so that punctuation does not affect the comparison either.
20810The characters are replaced with spaces so that formatting controls
20811don't create nonsense words (e.g., the Texinfo @samp{@@code@{NF@}}
20812becomes @samp{codeNF} if punctuation is simply deleted).  The record is
20813then resplit into fields, yielding just the actual words on the line,
20814and ensuring that there are no empty fields.
20815
20816If there are no fields left after removing all the punctuation, the
20817current record is skipped.  Otherwise, the program loops through each
20818word, comparing it to the previous one:
20819
20820@cindex @code{dupword.awk} program
20821@example
20822@c file eg/prog/dupword.awk
20823# dupword.awk --- find duplicate words in text
20824@c endfile
20825@ignore
20826@c file eg/prog/dupword.awk
20827#
20828# Arnold Robbins, arnold@@gnu.org, Public Domain
20829# December 1991
20830# Revised October 2000
20831
20832@c endfile
20833@end ignore
20834@c file eg/prog/dupword.awk
20835@{
20836    $0 = tolower($0)
20837    gsub(/[^[:alnum:][:blank:]]/, " ");
20838    $0 = $0         # re-split
20839    if (NF == 0)
20840        next
20841    if ($1 == prev)
20842        printf("%s:%d: duplicate %s\n",
20843            FILENAME, FNR, $1)
20844    for (i = 2; i <= NF; i++)
20845        if ($i == $(i-1))
20846            printf("%s:%d: duplicate %s\n",
20847                FILENAME, FNR, $i)
20848    prev = $NF
20849@}
20850@c endfile
20851@end example
20852
20853@node Alarm Program
20854@subsection An Alarm Clock Program
20855@cindex insomnia, cure for
20856@cindex Robbins, Arnold
20857@quotation
20858@i{Nothing cures insomnia like a ringing alarm clock.}@*
20859Arnold Robbins
20860@end quotation
20861
20862@c STARTOFRANGE tialarm
20863@cindex time, alarm clock example program
20864@c STARTOFRANGE alaex
20865@cindex alarm clock example program
20866The following program is a simple ``alarm clock'' program.
20867You give it a time of day and an optional message.  At the specified time,
20868it prints the message on the standard output. In addition, you can give it
20869the number of times to repeat the message as well as a delay between
20870repetitions.
20871
20872This program uses the @code{gettimeofday} function from
20873@ref{Gettimeofday Function}.
20874
20875All the work is done in the @code{BEGIN} rule.  The first part is argument
20876checking and setting of defaults: the delay, the count, and the message to
20877print.  If the user supplied a message without the ASCII BEL
20878character (known as the ``alert'' character, @code{"\a"}), then it is added to
20879the message.  (On many systems, printing the ASCII BEL generates an
20880audible alert. Thus when the alarm goes off, the system calls attention
20881to itself in case the user is not looking at the computer or terminal.)
20882Here is the program:
20883
20884@cindex @code{alarm.awk} program
20885@example
20886@c file eg/prog/alarm.awk
20887# alarm.awk --- set an alarm
20888#
20889# Requires gettimeofday library function
20890@c endfile
20891@ignore
20892@c file eg/prog/alarm.awk
20893#
20894# Arnold Robbins, arnold@@gnu.org, Public Domain
20895# May 1993
20896
20897@c endfile
20898@end ignore
20899@c file eg/prog/alarm.awk
20900# usage: alarm time [ "message" [ count [ delay ] ] ]
20901
20902BEGIN    \
20903@{
20904    # Initial argument sanity checking
20905    usage1 = "usage: alarm time ['message' [count [delay]]]"
20906    usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
20907
20908    if (ARGC < 2) @{
20909        print usage1 > "/dev/stderr"
20910        print usage2 > "/dev/stderr"
20911        exit 1
20912    @} else if (ARGC == 5) @{
20913        delay = ARGV[4] + 0
20914        count = ARGV[3] + 0
20915        message = ARGV[2]
20916    @} else if (ARGC == 4) @{
20917        count = ARGV[3] + 0
20918        message = ARGV[2]
20919    @} else if (ARGC == 3) @{
20920        message = ARGV[2]
20921    @} else if (ARGV[1] !~ /[0-9]?[0-9]:[0-9][0-9]/) @{
20922        print usage1 > "/dev/stderr"
20923        print usage2 > "/dev/stderr"
20924        exit 1
20925    @}
20926
20927    # set defaults for once we reach the desired time
20928    if (delay == 0)
20929        delay = 180    # 3 minutes
20930@group
20931    if (count == 0)
20932        count = 5
20933@end group
20934    if (message == "")
20935        message = sprintf("\aIt is now %s!\a", ARGV[1])
20936    else if (index(message, "\a") == 0)
20937        message = "\a" message "\a"
20938@c endfile
20939@end example
20940
20941The next @value{SECTION} of code turns the alarm time into hours and minutes,
20942converts it (if necessary) to a 24-hour clock, and then turns that
20943time into a count of the seconds since midnight.  Next it turns the current
20944time into a count of seconds since midnight.  The difference between the two
20945is how long to wait before setting off the alarm:
20946
20947@example
20948@c file eg/prog/alarm.awk
20949    # split up alarm time
20950    split(ARGV[1], atime, ":")
20951    hour = atime[1] + 0    # force numeric
20952    minute = atime[2] + 0  # force numeric
20953
20954    # get current broken down time
20955    gettimeofday(now)
20956
20957    # if time given is 12-hour hours and it's after that
20958    # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
20959    # then add 12 to real hour
20960    if (hour < 12 && now["hour"] > hour)
20961        hour += 12
20962
20963    # set target time in seconds since midnight
20964    target = (hour * 60 * 60) + (minute * 60)
20965
20966    # get current time in seconds since midnight
20967    current = (now["hour"] * 60 * 60) + \
20968               (now["minute"] * 60) + now["second"]
20969
20970    # how long to sleep for
20971    naptime = target - current
20972    if (naptime <= 0) @{
20973        print "time is in the past!" > "/dev/stderr"
20974        exit 1
20975    @}
20976@c endfile
20977@end example
20978
20979@cindex @command{sleep} utility
20980Finally, the program uses the @code{system} function
20981(@pxref{I/O Functions})
20982to call the @command{sleep} utility.  The @command{sleep} utility simply pauses
20983for the given number of seconds.  If the exit status is not zero,
20984the program assumes that @command{sleep} was interrupted and exits. If
20985@command{sleep} exited with an OK status (zero), then the program prints the
20986message in a loop, again using @command{sleep} to delay for however many
20987seconds are necessary:
20988
20989@example
20990@c file eg/prog/alarm.awk
20991    # zzzzzz..... go away if interrupted
20992    if (system(sprintf("sleep %d", naptime)) != 0)
20993        exit 1
20994
20995    # time to notify!
20996    command = sprintf("sleep %d", delay)
20997    for (i = 1; i <= count; i++) @{
20998        print message
20999        # if sleep command interrupted, go away
21000        if (system(command) != 0)
21001            break
21002    @}
21003
21004    exit 0
21005@}
21006@c endfile
21007@end example
21008@c ENDOFRANGE tialarm
21009@c ENDOFRANGE alaex
21010
21011@node Translate Program
21012@subsection Transliterating Characters
21013
21014@c STARTOFRANGE chtra
21015@cindex characters, transliterating
21016@cindex @command{tr} utility
21017The system @command{tr} utility transliterates characters.  For example, it is
21018often used to map uppercase letters into lowercase for further processing:
21019
21020@example
21021@var{generate data} | tr 'A-Z' 'a-z' | @var{process data} @dots{}
21022@end example
21023
21024@command{tr} requires two lists of characters.@footnote{On some older
21025System V systems,
21026@ifset ORA
21027including Solaris,
21028@end ifset
21029@command{tr} may require that the lists be written as
21030range expressions enclosed in square brackets (@samp{[a-z]}) and quoted,
21031to prevent the shell from attempting a @value{FN} expansion.  This is
21032not a feature.}  When processing the input, the first character in the
21033first list is replaced with the first character in the second list,
21034the second character in the first list is replaced with the second
21035character in the second list, and so on.  If there are more characters
21036in the ``from'' list than in the ``to'' list, the last character of the
21037``to'' list is used for the remaining characters in the ``from'' list.
21038
21039Some time ago,
21040@c early or mid-1989!
21041a user proposed that a transliteration function should
21042be added to @command{gawk}.
21043@c Wishing to avoid gratuitous new features,
21044@c at least theoretically
21045The following program was written to
21046prove that character transliteration could be done with a user-level
21047function.  This program is not as complete as the system @command{tr} utility
21048but it does most of the job.
21049
21050The @command{translate} program demonstrates one of the few weaknesses
21051of standard @command{awk}: dealing with individual characters is very
21052painful, requiring repeated use of the @code{substr}, @code{index},
21053and @code{gsub} built-in functions
21054(@pxref{String Functions}).@footnote{This
21055program was written before @command{gawk} acquired the ability to
21056split each character in a string into separate array elements.}
21057@c Exercise: How might you use this new feature to simplify the program?
21058There are two functions.  The first, @code{stranslate}, takes three
21059arguments:
21060
21061@table @code
21062@item from
21063A list of characters from which to translate.
21064
21065@item to
21066A list of characters to which to translate.
21067
21068@item target
21069The string on which to do the translation.
21070@end table
21071
21072Associative arrays make the translation part fairly easy. @code{t_ar} holds
21073the ``to'' characters, indexed by the ``from'' characters.  Then a simple
21074loop goes through @code{from}, one character at a time.  For each character
21075in @code{from}, if the character appears in @code{target}, @code{gsub}
21076is used to change it to the corresponding @code{to} character.
21077
21078The @code{translate} function simply calls @code{stranslate} using @code{$0}
21079as the target.  The main program sets two global variables, @code{FROM} and
21080@code{TO}, from the command line, and then changes @code{ARGV} so that
21081@command{awk} reads from the standard input.
21082
21083Finally, the processing rule simply calls @code{translate} for each record:
21084
21085@cindex @code{translate.awk} program
21086@example
21087@c file eg/prog/translate.awk
21088# translate.awk --- do tr-like stuff
21089@c endfile
21090@ignore
21091@c file eg/prog/translate.awk
21092#
21093# Arnold Robbins, arnold@@gnu.org, Public Domain
21094# August 1989
21095
21096@c endfile
21097@end ignore
21098@c file eg/prog/translate.awk
21099# Bugs: does not handle things like: tr A-Z a-z, it has
21100# to be spelled out. However, if `to' is shorter than `from',
21101# the last character in `to' is used for the rest of `from'.
21102
21103function stranslate(from, to, target,     lf, lt, t_ar, i, c)
21104@{
21105    lf = length(from)
21106    lt = length(to)
21107    for (i = 1; i <= lt; i++)
21108        t_ar[substr(from, i, 1)] = substr(to, i, 1)
21109    if (lt < lf)
21110        for (; i <= lf; i++)
21111            t_ar[substr(from, i, 1)] = substr(to, lt, 1)
21112    for (i = 1; i <= lf; i++) @{
21113        c = substr(from, i, 1)
21114        if (index(target, c) > 0)
21115            gsub(c, t_ar[c], target)
21116    @}
21117    return target
21118@}
21119
21120function translate(from, to)
21121@{
21122    return $0 = stranslate(from, to, $0)
21123@}
21124
21125# main program
21126BEGIN @{
21127@group
21128    if (ARGC < 3) @{
21129        print "usage: translate from to" > "/dev/stderr"
21130        exit
21131    @}
21132@end group
21133    FROM = ARGV[1]
21134    TO = ARGV[2]
21135    ARGC = 2
21136    ARGV[1] = "-"
21137@}
21138
21139@{
21140    translate(FROM, TO)
21141    print
21142@}
21143@c endfile
21144@end example
21145
21146While it is possible to do character transliteration in a user-level
21147function, it is not necessarily efficient, and we (the @command{gawk}
21148authors) started to consider adding a built-in function.  However,
21149shortly after writing this program, we learned that the System V Release 4
21150@command{awk} had added the @code{toupper} and @code{tolower} functions
21151(@pxref{String Functions}).
21152These functions handle the vast majority of the
21153cases where character transliteration is necessary, and so we chose to
21154simply add those functions to @command{gawk} as well and then leave well
21155enough alone.
21156
21157An obvious improvement to this program would be to set up the
21158@code{t_ar} array only once, in a @code{BEGIN} rule. However, this
21159assumes that the ``from'' and ``to'' lists
21160will never change throughout the lifetime of the program.
21161@c ENDOFRANGE chtra
21162
21163@node Labels Program
21164@subsection Printing Mailing Labels
21165
21166@c STARTOFRANGE prml
21167@cindex printing, mailing labels
21168@c comma is part of primary
21169@c STARTOFRANGE mlprint
21170@cindex mailing labels, printing
21171Here is a ``real world''@footnote{``Real world'' is defined as
21172``a program actually used to get something done.''}
21173program.  This
21174script reads lists of names and
21175addresses and generates mailing labels.  Each page of labels has 20 labels
21176on it, 2 across and 10 down.  The addresses are guaranteed to be no more
21177than 5 lines of data.  Each address is separated from the next by a blank
21178line.
21179
21180The basic idea is to read 20 labels worth of data.  Each line of each label
21181is stored in the @code{line} array.  The single rule takes care of filling
21182the @code{line} array and printing the page when 20 labels have been read.
21183
21184The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that
21185@command{awk} splits records at blank lines
21186(@pxref{Records}).
21187It sets @code{MAXLINES} to 100, since 100 is the maximum number
21188of lines on the page (20 * 5 = 100).
21189
21190Most of the work is done in the @code{printpage} function.
21191The label lines are stored sequentially in the @code{line} array.  But they
21192have to print horizontally; @code{line[1]} next to @code{line[6]},
21193@code{line[2]} next to @code{line[7]}, and so on.  Two loops are used to
21194accomplish this.  The outer loop, controlled by @code{i}, steps through
21195every 10 lines of data; this is each row of labels.  The inner loop,
21196controlled by @code{j}, goes through the lines within the row.
21197As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
21198the row, and @samp{i+j+5} is the entry next to it.  The output ends up
21199looking something like this:
21200
21201@example
21202line 1          line 6
21203line 2          line 7
21204line 3          line 8
21205line 4          line 9
21206line 5          line 10
21207@dots{}
21208@end example
21209
21210As a final note, an extra blank line is printed at lines 21 and 61, to keep
21211the output lined up on the labels.  This is dependent on the particular
21212brand of labels in use when the program was written.  You will also note
21213that there are 2 blank lines at the top and 2 blank lines at the bottom.
21214
21215The @code{END} rule arranges to flush the final page of labels; there may
21216not have been an even multiple of 20 labels in the data:
21217
21218@cindex @code{labels.awk} program
21219@example
21220@c file eg/prog/labels.awk
21221# labels.awk --- print mailing labels
21222@c endfile
21223@ignore
21224@c file eg/prog/labels.awk
21225#
21226# Arnold Robbins, arnold@@gnu.org, Public Domain
21227# June 1992
21228@c endfile
21229@end ignore
21230@c file eg/prog/labels.awk
21231
21232# Each label is 5 lines of data that may have blank lines.
21233# The label sheets have 2 blank lines at the top and 2 at
21234# the bottom.
21235
21236BEGIN    @{ RS = "" ; MAXLINES = 100 @}
21237
21238function printpage(    i, j)
21239@{
21240    if (Nlines <= 0)
21241        return
21242
21243    printf "\n\n"        # header
21244
21245    for (i = 1; i <= Nlines; i += 10) @{
21246        if (i == 21 || i == 61)
21247            print ""
21248        for (j = 0; j < 5; j++) @{
21249            if (i + j > MAXLINES)
21250                break
21251            printf "   %-41s %s\n", line[i+j], line[i+j+5]
21252        @}
21253        print ""
21254    @}
21255
21256    printf "\n\n"        # footer
21257
21258    for (i in line)
21259        line[i] = ""
21260@}
21261
21262# main rule
21263@{
21264    if (Count >= 20) @{
21265        printpage()
21266        Count = 0
21267        Nlines = 0
21268    @}
21269    n = split($0, a, "\n")
21270    for (i = 1; i <= n; i++)
21271        line[++Nlines] = a[i]
21272    for (; i <= 5; i++)
21273        line[++Nlines] = ""
21274    Count++
21275@}
21276
21277END    \
21278@{
21279    printpage()
21280@}
21281@c endfile
21282@end example
21283@c ENDOFRANGE prml
21284@c ENDOFRANGE mlprint
21285
21286@node Word Sorting
21287@subsection Generating Word-Usage Counts
21288
21289@c last comma is part of secondary
21290@c STARTOFRANGE worus
21291@cindex words, usage counts, generating
21292@c NEXT ED: Rewrite this whole section and example
21293The following @command{awk} program prints
21294the number of occurrences of each word in its input.  It illustrates the
21295associative nature of @command{awk} arrays by using strings as subscripts.  It
21296also demonstrates the @samp{for @var{index} in @var{array}} mechanism.
21297Finally, it shows how @command{awk} is used in conjunction with other
21298utility programs to do a useful task of some complexity with a minimum of
21299effort.  Some explanations follow the program listing:
21300
21301@example
21302# Print list of word frequencies
21303@{
21304    for (i = 1; i <= NF; i++)
21305        freq[$i]++
21306@}
21307
21308END @{
21309    for (word in freq)
21310        printf "%s\t%d\n", word, freq[word]
21311@}
21312@end example
21313
21314@c Exercise: Use asort() here
21315
21316This program has two rules.  The
21317first rule, because it has an empty pattern, is executed for every input line.
21318It uses @command{awk}'s field-accessing mechanism
21319(@pxref{Fields}) to pick out the individual words from
21320the line, and the built-in variable @code{NF} (@pxref{Built-in Variables})
21321to know how many fields are available.
21322For each input word, it increments an element of the array @code{freq} to
21323reflect that the word has been seen an additional time.
21324
21325The second rule, because it has the pattern @code{END}, is not executed
21326until the input has been exhausted.  It prints out the contents of the
21327@code{freq} table that has been built up inside the first action.
21328This program has several problems that would prevent it from being
21329useful by itself on real text files:
21330
21331@itemize @bullet
21332@item
21333Words are detected using the @command{awk} convention that fields are
21334separated just by whitespace.  Other characters in the input (except
21335newlines) don't have any special meaning to @command{awk}.  This means that
21336punctuation characters count as part of words.
21337
21338@item
21339The @command{awk} language considers upper- and lowercase characters to be
21340distinct.  Therefore, ``bartender'' and ``Bartender'' are not treated
21341as the same word.  This is undesirable, since in normal text, words
21342are capitalized if they begin sentences, and a frequency analyzer should not
21343be sensitive to capitalization.
21344
21345@item
21346The output does not come out in any useful order.  You're more likely to be
21347interested in which words occur most frequently or in having an alphabetized
21348table of how frequently each word occurs.
21349@end itemize
21350
21351@cindex @command{sort} utility
21352The way to solve these problems is to use some of @command{awk}'s more advanced
21353features.  First, we use @code{tolower} to remove
21354case distinctions.  Next, we use @code{gsub} to remove punctuation
21355characters.  Finally, we use the system @command{sort} utility to process the
21356output of the @command{awk} script.  Here is the new version of
21357the program:
21358
21359@cindex @code{wordfreq.awk} program
21360@example
21361@c file eg/prog/wordfreq.awk
21362# wordfreq.awk --- print list of word frequencies
21363
21364@{
21365    $0 = tolower($0)    # remove case distinctions
21366    # remove punctuation
21367    gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
21368    for (i = 1; i <= NF; i++)
21369        freq[$i]++
21370@}
21371
21372END @{
21373    for (word in freq)
21374        printf "%s\t%d\n", word, freq[word]
21375@}
21376@c endfile
21377@end example
21378
21379Assuming we have saved this program in a file named @file{wordfreq.awk},
21380and that the data is in @file{file1}, the following pipeline:
21381
21382@example
21383awk -f wordfreq.awk file1 | sort -k 2nr
21384@end example
21385
21386@noindent
21387produces a table of the words appearing in @file{file1} in order of
21388decreasing frequency.  The @command{awk} program suitably massages the
21389data and produces a word frequency table, which is not ordered.
21390
21391The @command{awk} script's output is then sorted by the @command{sort}
21392utility and printed on the terminal.  The options given to @command{sort}
21393specify a sort that uses the second field of each input line (skipping
21394one field), that the sort keys should be treated as numeric quantities
21395(otherwise @samp{15} would come before @samp{5}), and that the sorting
21396should be done in descending (reverse) order.
21397
21398The @command{sort} could even be done from within the program, by changing
21399the @code{END} action to:
21400
21401@example
21402@c file eg/prog/wordfreq.awk
21403END @{
21404    sort = "sort -k 2nr"
21405    for (word in freq)
21406        printf "%s\t%d\n", word, freq[word] | sort
21407    close(sort)
21408@}
21409@c endfile
21410@end example
21411
21412This way of sorting must be used on systems that do not
21413have true pipes at the command-line (or batch-file) level.
21414See the general operating system documentation for more information on how
21415to use the @command{sort} program.
21416@c ENDOFRANGE worus
21417
21418@node History Sorting
21419@subsection Removing Duplicates from Unsorted Text
21420
21421@c last comma is part of secondary
21422@c STARTOFRANGE lidu
21423@cindex lines, duplicate, removing
21424The @command{uniq} program
21425(@pxref{Uniq Program}),
21426removes duplicate lines from @emph{sorted} data.
21427
21428Suppose, however, you need to remove duplicate lines from a @value{DF} but
21429that you want to preserve the order the lines are in.  A good example of
21430this might be a shell history file.  The history file keeps a copy of all
21431the commands you have entered, and it is not unusual to repeat a command
21432several times in a row.  Occasionally you might want to compact the history
21433by removing duplicate entries.  Yet it is desirable to maintain the order
21434of the original commands.
21435
21436This simple program does the job.  It uses two arrays.  The @code{data}
21437array is indexed by the text of each line.
21438For each line, @code{data[$0]} is incremented.
21439If a particular line has not
21440been seen before, then @code{data[$0]} is zero.
21441In this case, the text of the line is stored in @code{lines[count]}.
21442Each element of @code{lines} is a unique command, and the indices of
21443@code{lines} indicate the order in which those lines are encountered.
21444The @code{END} rule simply prints out the lines, in order:
21445
21446@cindex Rakitzis, Byron
21447@cindex @code{histsort.awk} program
21448@example
21449@c file eg/prog/histsort.awk
21450# histsort.awk --- compact a shell history file
21451# Thanks to Byron Rakitzis for the general idea
21452@c endfile
21453@ignore
21454@c file eg/prog/histsort.awk
21455#
21456# Arnold Robbins, arnold@@gnu.org, Public Domain
21457# May 1993
21458
21459@c endfile
21460@end ignore
21461@c file eg/prog/histsort.awk
21462@group
21463@{
21464    if (data[$0]++ == 0)
21465        lines[++count] = $0
21466@}
21467@end group
21468
21469END @{
21470    for (i = 1; i <= count; i++)
21471        print lines[i]
21472@}
21473@c endfile
21474@end example
21475
21476This program also provides a foundation for generating other useful
21477information.  For example, using the following @code{print} statement in the
21478@code{END} rule indicates how often a particular command is used:
21479
21480@example
21481print data[lines[i]], lines[i]
21482@end example
21483
21484This works because @code{data[$0]} is incremented each time a line is
21485seen.
21486@c ENDOFRANGE lidu
21487
21488@node Extract Program
21489@subsection Extracting Programs from Texinfo Source Files
21490
21491@c STARTOFRANGE texse
21492@cindex Texinfo, extracting programs from source files
21493@c last comma is part of secondary
21494@c STARTOFRANGE fitex
21495@cindex files, Texinfo, extracting programs from
21496@ifnotinfo
21497Both this chapter and the previous chapter
21498(@ref{Library Functions})
21499present a large number of @command{awk} programs.
21500@end ifnotinfo
21501@ifinfo
21502The nodes
21503@ref{Library Functions},
21504and @ref{Sample Programs},
21505are the top level nodes for a large number of @command{awk} programs.
21506@end ifinfo
21507If you want to experiment with these programs, it is tedious to have to type
21508them in by hand.  Here we present a program that can extract parts of a
21509Texinfo input file into separate files.
21510
21511@cindex Texinfo
21512This @value{DOCUMENT} is written in Texinfo, the GNU project's document
21513formatting
21514language.
21515A single Texinfo source file can be used to produce both
21516printed and online documentation.
21517@ifnotinfo
21518Texinfo is fully documented in the book
21519@cite{Texinfo---The GNU Documentation Format},
21520available from the Free Software Foundation.
21521@end ifnotinfo
21522@ifinfo
21523The Texinfo language is described fully, starting with
21524@ref{Top}.
21525@end ifinfo
21526
21527For our purposes, it is enough to know three things about Texinfo input
21528files:
21529
21530@itemize @bullet
21531@item
21532The ``at'' symbol (@samp{@@}) is special in Texinfo, much as
21533the backslash (@samp{\}) is in C
21534or @command{awk}.  Literal @samp{@@} symbols are represented in Texinfo source
21535files as @samp{@@@@}.
21536
21537@item
21538Comments start with either @samp{@@c} or @samp{@@comment}.
21539The file-extraction program works by using special comments that start
21540at the beginning of a line.
21541
21542@item
21543Lines containing @samp{@@group} and @samp{@@end group} commands bracket
21544example text that should not be split across a page boundary.
21545(Unfortunately, @TeX{} isn't always smart enough to do things exactly right,
21546and we have to give it some help.)
21547@end itemize
21548
21549The following program, @file{extract.awk}, reads through a Texinfo source
21550file and does two things, based on the special comments.
21551Upon seeing @samp{@w{@@c system @dots{}}},
21552it runs a command, by extracting the command text from the
21553control line and passing it on to the @code{system} function
21554(@pxref{I/O Functions}).
21555Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to
21556the file @var{filename}, until @samp{@@c endfile} is encountered.
21557The rules in @file{extract.awk} match either @samp{@@c} or
21558@samp{@@comment} by letting the @samp{omment} part be optional.
21559Lines containing @samp{@@group} and @samp{@@end group} are simply removed.
21560@file{extract.awk} uses the @code{join} library function
21561(@pxref{Join Function}).
21562
21563The example programs in the online Texinfo source for @cite{@value{TITLE}}
21564(@file{gawk.texi}) have all been bracketed inside @samp{file} and
21565@samp{endfile} lines.  The @command{gawk} distribution uses a copy of
21566@file{extract.awk} to extract the sample programs and install many
21567of them in a standard directory where @command{gawk} can find them.
21568The Texinfo file looks something like this:
21569
21570@example
21571@dots{}
21572This program has a @@code@{BEGIN@} rule,
21573that prints a nice message:
21574
21575@@example
21576@@c file examples/messages.awk
21577BEGIN @@@{ print "Don't panic!" @@@}
21578@@c end file
21579@@end example
21580
21581It also prints some final advice:
21582
21583@@example
21584@@c file examples/messages.awk
21585END @@@{ print "Always avoid bored archeologists!" @@@}
21586@@c end file
21587@@end example
21588@dots{}
21589@end example
21590
21591@file{extract.awk} begins by setting @code{IGNORECASE} to one, so that
21592mixed upper- and lowercase letters in the directives won't matter.
21593
21594The first rule handles calling @code{system}, checking that a command is
21595given (@code{NF} is at least three) and also checking that the command
21596exits with a zero exit status, signifying OK:
21597
21598@cindex @code{extract.awk} program
21599@example
21600@c file eg/prog/extract.awk
21601# extract.awk --- extract files and run programs
21602#                 from texinfo files
21603@c endfile
21604@ignore
21605@c file eg/prog/extract.awk
21606#
21607# Arnold Robbins, arnold@@gnu.org, Public Domain
21608# May 1993
21609# Revised September 2000
21610
21611@c endfile
21612@end ignore
21613@c file eg/prog/extract.awk
21614BEGIN    @{ IGNORECASE = 1 @}
21615
21616/^@@c(omment)?[ \t]+system/    \
21617@{
21618    if (NF < 3) @{
21619        e = (FILENAME ":" FNR)
21620        e = (e  ": badly formed `system' line")
21621        print e > "/dev/stderr"
21622        next
21623    @}
21624    $1 = ""
21625    $2 = ""
21626    stat = system($0)
21627    if (stat != 0) @{
21628        e = (FILENAME ":" FNR)
21629        e = (e ": warning: system returned " stat)
21630        print e > "/dev/stderr"
21631    @}
21632@}
21633@c endfile
21634@end example
21635
21636@noindent
21637The variable @code{e} is used so that the function
21638fits nicely on the
21639@ifnotinfo
21640page.
21641@end ifnotinfo
21642@ifnottex
21643screen.
21644@end ifnottex
21645
21646The second rule handles moving data into files.  It verifies that a
21647@value{FN} is given in the directive.  If the file named is not the
21648current file, then the current file is closed.  Keeping the current file
21649open until a new file is encountered allows the use of the @samp{>}
21650redirection for printing the contents, keeping open file management
21651simple.
21652
21653The @samp{for} loop does the work.  It reads lines using @code{getline}
21654(@pxref{Getline}).
21655For an unexpected end of file, it calls the @code{@w{unexpected_eof}}
21656function.  If the line is an ``endfile'' line, then it breaks out of
21657the loop.
21658If the line is an @samp{@@group} or @samp{@@end group} line, then it
21659ignores it and goes on to the next line.
21660Similarly, comments within examples are also ignored.
21661
21662Most of the work is in the following few lines.  If the line has no @samp{@@}
21663symbols, the program can print it directly.
21664Otherwise, each leading @samp{@@} must be stripped off.
21665To remove the @samp{@@} symbols, the line is split into separate elements of
21666the array @code{a}, using the @code{split} function
21667(@pxref{String Functions}).
21668The @samp{@@} symbol is used as the separator character.
21669Each element of @code{a} that is empty indicates two successive @samp{@@}
21670symbols in the original line.  For each two empty elements (@samp{@@@@} in
21671the original file), we have to add a single @samp{@@} symbol back in.
21672
21673When the processing of the array is finished, @code{join} is called with the
21674value of @code{SUBSEP}, to rejoin the pieces back into a single
21675line.  That line is then printed to the output file:
21676
21677@example
21678@c file eg/prog/extract.awk
21679/^@@c(omment)?[ \t]+file/    \
21680@{
21681    if (NF != 3) @{
21682        e = (FILENAME ":" FNR ": badly formed `file' line")
21683        print e > "/dev/stderr"
21684        next
21685    @}
21686    if ($3 != curfile) @{
21687        if (curfile != "")
21688            close(curfile)
21689        curfile = $3
21690    @}
21691
21692    for (;;) @{
21693        if ((getline line) <= 0)
21694            unexpected_eof()
21695        if (line ~ /^@@c(omment)?[ \t]+endfile/)
21696            break
21697        else if (line ~ /^@@(end[ \t]+)?group/)
21698            continue
21699        else if (line ~ /^@@c(omment+)?[ \t]+/)
21700            continue
21701        if (index(line, "@@") == 0) @{
21702            print line > curfile
21703            continue
21704        @}
21705        n = split(line, a, "@@")
21706        # if a[1] == "", means leading @@,
21707        # don't add one back in.
21708        for (i = 2; i <= n; i++) @{
21709            if (a[i] == "") @{ # was an @@@@
21710                a[i] = "@@"
21711                if (a[i+1] == "")
21712                    i++
21713            @}
21714        @}
21715        print join(a, 1, n, SUBSEP) > curfile
21716    @}
21717@}
21718@c endfile
21719@end example
21720
21721An important thing to note is the use of the @samp{>} redirection.
21722Output done with @samp{>} only opens the file once; it stays open and
21723subsequent output is appended to the file
21724(@pxref{Redirection}).
21725This makes it easy to mix program text and explanatory prose for the same
21726sample source file (as has been done here!) without any hassle.  The file is
21727only closed when a new data @value{FN} is encountered or at the end of the
21728input file.
21729
21730Finally, the function @code{@w{unexpected_eof}} prints an appropriate
21731error message and then exits.
21732The @code{END} rule handles the final cleanup, closing the open file:
21733
21734@c function lb put on same line for page breaking. sigh
21735@example
21736@c file eg/prog/extract.awk
21737@group
21738function unexpected_eof() @{
21739    printf("%s:%d: unexpected EOF or error\n",
21740        FILENAME, FNR) > "/dev/stderr"
21741    exit 1
21742@}
21743@end group
21744
21745END @{
21746    if (curfile)
21747        close(curfile)
21748@}
21749@c endfile
21750@end example
21751@c ENDOFRANGE texse
21752@c ENDOFRANGE fitex
21753
21754@node Simple Sed
21755@subsection A Simple Stream Editor
21756
21757@cindex @command{sed} utility
21758@cindex stream editors
21759The @command{sed} utility is a stream editor, a program that reads a
21760stream of data, makes changes to it, and passes it on.
21761It is often used to make global changes to a large file or to a stream
21762of data generated by a pipeline of commands.
21763While @command{sed} is a complicated program in its own right, its most common
21764use is to perform global substitutions in the middle of a pipeline:
21765
21766@example
21767command1 < orig.data | sed 's/old/new/g' | command2 > result
21768@end example
21769
21770Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp
21771@samp{old} on each input line and globally replace it with the text
21772@samp{new}, i.e., all the occurrences on a line.  This is similar to
21773@command{awk}'s @code{gsub} function
21774(@pxref{String Functions}).
21775
21776The following program, @file{awksed.awk}, accepts at least two command-line
21777arguments: the pattern to look for and the text to replace it with. Any
21778additional arguments are treated as data @value{FN}s to process. If none
21779are provided, the standard input is used:
21780
21781@cindex Brennan, Michael
21782@cindex @command{awksed.awk} program
21783@c @cindex simple stream editor
21784@c @cindex stream editor, simple
21785@example
21786@c file eg/prog/awksed.awk
21787# awksed.awk --- do s/foo/bar/g using just print
21788#    Thanks to Michael Brennan for the idea
21789@c endfile
21790@ignore
21791@c file eg/prog/awksed.awk
21792#
21793# Arnold Robbins, arnold@@gnu.org, Public Domain
21794# August 1995
21795
21796@c endfile
21797@end ignore
21798@c file eg/prog/awksed.awk
21799function usage()
21800@{
21801    print "usage: awksed pat repl [files...]" > "/dev/stderr"
21802    exit 1
21803@}
21804
21805BEGIN @{
21806    # validate arguments
21807    if (ARGC < 3)
21808        usage()
21809
21810    RS = ARGV[1]
21811    ORS = ARGV[2]
21812
21813    # don't use arguments as files
21814    ARGV[1] = ARGV[2] = ""
21815@}
21816
21817@group
21818# look ma, no hands!
21819@{
21820    if (RT == "")
21821        printf "%s", $0
21822    else
21823        print
21824@}
21825@end group
21826@c endfile
21827@end example
21828
21829The program relies on @command{gawk}'s ability to have @code{RS} be a regexp,
21830as well as on the setting of @code{RT} to the actual text that terminates the
21831record (@pxref{Records}).
21832
21833The idea is to have @code{RS} be the pattern to look for. @command{gawk}
21834automatically sets @code{$0} to the text between matches of the pattern.
21835This is text that we want to keep, unmodified.  Then, by setting @code{ORS}
21836to the replacement text, a simple @code{print} statement outputs the
21837text we want to keep, followed by the replacement text.
21838
21839There is one wrinkle to this scheme, which is what to do if the last record
21840doesn't end with text that matches @code{RS}.  Using a @code{print}
21841statement unconditionally prints the replacement text, which is not correct.
21842However, if the file did not end in text that matches @code{RS}, @code{RT}
21843is set to the null string.  In this case, we can print @code{$0} using
21844@code{printf}
21845(@pxref{Printf}).
21846
21847The @code{BEGIN} rule handles the setup, checking for the right number
21848of arguments and calling @code{usage} if there is a problem. Then it sets
21849@code{RS} and @code{ORS} from the command-line arguments and sets
21850@code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are
21851not treated as @value{FN}s
21852(@pxref{ARGC and ARGV}).
21853
21854The @code{usage} function prints an error message and exits.
21855Finally, the single rule handles the printing scheme outlined above,
21856using @code{print} or @code{printf} as appropriate, depending upon the
21857value of @code{RT}.
21858
21859@ignore
21860Exercise, compare the performance of this version with the more
21861straightforward:
21862
21863BEGIN {
21864    pat = ARGV[1]
21865    repl = ARGV[2]
21866    ARGV[1] = ARGV[2] = ""
21867}
21868
21869{ gsub(pat, repl); print }
21870
21871Exercise: what are the advantages and disadvantages of this version versus sed?
21872  Advantage: egrep regexps
21873             speed (?)
21874  Disadvantage: no & in replacement text
21875
21876Others?
21877@end ignore
21878
21879@node Igawk Program
21880@subsection An Easy Way to Use Library Functions
21881
21882@c STARTOFRANGE libfex
21883@cindex libraries of @command{awk} functions, example program for using
21884@c STARTOFRANGE flibex
21885@cindex functions, library, example program for using
21886Using library functions in @command{awk} can be very beneficial. It
21887encourages code reuse and the writing of general functions. Programs are
21888smaller and therefore clearer.
21889However, using library functions is only easy when writing @command{awk}
21890programs; it is painful when running them, requiring multiple @option{-f}
21891options.  If @command{gawk} is unavailable, then so too is the @env{AWKPATH}
21892environment variable and the ability to put @command{awk} functions into a
21893library directory (@pxref{Options}).
21894It would be nice to be able to write programs in the following manner:
21895
21896@example
21897# library functions
21898@@include getopt.awk
21899@@include join.awk
21900@dots{}
21901
21902# main program
21903BEGIN @{
21904    while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
21905        @dots{}
21906    @dots{}
21907@}
21908@end example
21909
21910The following program, @file{igawk.sh}, provides this service.
21911It simulates @command{gawk}'s searching of the @env{AWKPATH} variable
21912and also allows @dfn{nested} includes; i.e., a file that is included
21913with @samp{@@include} can contain further @samp{@@include} statements.
21914@command{igawk} makes an effort to only include files once, so that nested
21915includes don't accidentally include a library function twice.
21916
21917@command{igawk} should behave just like @command{gawk} externally.  This
21918means it should accept all of @command{gawk}'s command-line arguments,
21919including the ability to have multiple source files specified via
21920@option{-f}, and the ability to mix command-line and library source files.
21921
21922The program is written using the POSIX Shell (@command{sh}) command
21923language.@footnote{Fully explaining the @command{sh} language is beyond
21924the scope of this book. We provide some minimal explanations, but see
21925a good shell programming book if you wish to understand things in more
21926depth.} It works as follows:
21927
21928@enumerate
21929@item
21930Loop through the arguments, saving anything that doesn't represent
21931@command{awk} source code for later, when the expanded program is run.
21932
21933@item
21934For any arguments that do represent @command{awk} text, put the arguments into
21935a shell variable that will be expanded.  There are two cases:
21936
21937@enumerate a
21938@item
21939Literal text, provided with @option{--source} or @option{--source=}.  This
21940text is just appended directly.
21941
21942@item
21943Source @value{FN}s, provided with @option{-f}.  We use a neat trick and append
21944@samp{@@include @var{filename}} to the shell variable's contents.  Since the file-inclusion
21945program works the way @command{gawk} does, this gets the text
21946of the file included into the program at the correct point.
21947@end enumerate
21948
21949@item
21950Run an @command{awk} program (naturally) over the shell variable's contents to expand
21951@samp{@@include} statements.  The expanded program is placed in a second
21952shell variable.
21953
21954@item
21955Run the expanded program with @command{gawk} and any other original command-line
21956arguments that the user supplied (such as the data @value{FN}s).
21957@end enumerate
21958
21959This program uses shell variables extensively; for storing command line arguments,
21960the text of the @command{awk} program that will expand the user's program, for the
21961user's original program, and for the expanded program.  Doing so removes some
21962potential problems that might arise were we to use temporary files instead,
21963at the cost of making the script somewhat more complicated.
21964
21965The initial part of the program turns on shell tracing if the first
21966argument is @samp{debug}.
21967
21968The next part loops through all the command-line arguments.
21969There are several cases of interest:
21970
21971@table @code
21972@item --
21973This ends the arguments to @command{igawk}.  Anything else should be passed on
21974to the user's @command{awk} program without being evaluated.
21975
21976@item -W
21977This indicates that the next option is specific to @command{gawk}.  To make
21978argument processing easier, the @option{-W} is appended to the front of the
21979remaining arguments and the loop continues.  (This is an @command{sh}
21980programming trick.  Don't worry about it if you are not familiar with
21981@command{sh}.)
21982
21983@item -v@r{,} -F
21984These are saved and passed on to @command{gawk}.
21985
21986@item -f@r{,} --file@r{,} --file=@r{,} -Wfile=
21987The @value{FN} is appended to the shell variable @code{program} with an
21988@samp{@@include} statement.
21989The @command{expr} utility is used to remove the leading option part of the
21990argument (e.g., @samp{--file=}).
21991(Typical @command{sh} usage would be to use the @command{echo} and @command{sed}
21992utilities to do this work.  Unfortunately, some versions of @command{echo} evaluate
21993escape sequences in their arguments, possibly mangling the program text.
21994Using @command{expr} avoids this problem.)
21995
21996@item --source@r{,} --source=@r{,} -Wsource=
21997The source text is appended to @code{program}.
21998
21999@item --version@r{,} -Wversion
22000@command{igawk} prints its version number, runs @samp{gawk --version}
22001to get the @command{gawk} version information, and then exits.
22002@end table
22003
22004If none of the @option{-f}, @option{--file}, @option{-Wfile}, @option{--source},
22005or @option{-Wsource} arguments are supplied, then the first nonoption argument
22006should be the @command{awk} program.  If there are no command-line
22007arguments left, @command{igawk} prints an error message and exits.
22008Otherwise, the first argument is appended to @code{program}.
22009In any case, after the arguments have been processed,
22010@code{program} contains the complete text of the original @command{awk}
22011program.
22012
22013The program is as follows:
22014
22015@cindex @code{igawk.sh} program
22016@example
22017@c file eg/prog/igawk.sh
22018#! /bin/sh
22019# igawk --- like gawk but do @@include processing
22020@c endfile
22021@ignore
22022@c file eg/prog/igawk.sh
22023#
22024# Arnold Robbins, arnold@@gnu.org, Public Domain
22025# July 1993
22026
22027@c endfile
22028@end ignore
22029@c file eg/prog/igawk.sh
22030if [ "$1" = debug ]
22031then
22032    set -x
22033    shift
22034fi
22035
22036# A literal newline, so that program text is formmatted correctly
22037n='
22038'
22039
22040# Initialize variables to empty
22041program=
22042opts=
22043
22044while [ $# -ne 0 ] # loop over arguments
22045do
22046    case $1 in
22047    --)     shift; break;;
22048
22049    -W)     shift
22050            # The $@{x?'message here'@} construct prints a
22051            # diagnostic if $x is the null string
22052            set -- -W"$@{@@?'missing operand'@}"
22053            continue;;
22054
22055    -[vF])  opts="$opts $1 '$@{2?'missing operand'@}'"
22056            shift;;
22057
22058    -[vF]*) opts="$opts '$1'" ;;
22059
22060    -f)     program="$program$n@@include $@{2?'missing operand'@}"
22061            shift;;
22062
22063    -f*)    f=`expr "$1" : '-f\(.*\)'`
22064            program="$program$n@@include $f";;
22065
22066    -[W-]file=*)
22067            f=`expr "$1" : '-.file=\(.*\)'`
22068            program="$program$n@@include $f";;
22069
22070    -[W-]file)
22071            program="$program$n@@include $@{2?'missing operand'@}"
22072            shift;;
22073
22074    -[W-]source=*)
22075            t=`expr "$1" : '-.source=\(.*\)'`
22076            program="$program$n$t";;
22077
22078    -[W-]source)
22079            program="$program$n$@{2?'missing operand'@}"
22080            shift;;
22081
22082    -[W-]version)
22083            echo igawk: version 2.0 1>&2
22084            gawk --version
22085            exit 0 ;;
22086
22087    -[W-]*) opts="$opts '$1'" ;;
22088
22089    *)      break;;
22090    esac
22091    shift
22092done
22093
22094if [ -z "$program" ]
22095then
22096     program=$@{1?'missing program'@}
22097     shift
22098fi
22099
22100# At this point, `program' has the program.
22101@c endfile
22102@end example
22103
22104The @command{awk} program to process @samp{@@include} directives
22105is stored in the shell variable @code{expand_prog}.  Doing this keeps
22106the shell script readable.  The @command{awk} program
22107reads through the user's program, one line at a time, using @code{getline}
22108(@pxref{Getline}).  The input
22109@value{FN}s and @samp{@@include} statements are managed using a stack.
22110As each @samp{@@include} is encountered, the current @value{FN} is
22111``pushed'' onto the stack and the file named in the @samp{@@include}
22112directive becomes the current @value{FN}.  As each file is finished,
22113the stack is ``popped,'' and the previous input file becomes the current
22114input file again.  The process is started by making the original file
22115the first one on the stack.
22116
22117The @code{pathto} function does the work of finding the full path to
22118a file.  It simulates @command{gawk}'s behavior when searching the
22119@env{AWKPATH} environment variable
22120(@pxref{AWKPATH Variable}).
22121If a @value{FN} has a @samp{/} in it, no path search is done. Otherwise,
22122the @value{FN} is concatenated with the name of each directory in
22123the path, and an attempt is made to open the generated @value{FN}.
22124The only way to test if a file can be read in @command{awk} is to go
22125ahead and try to read it with @code{getline}; this is what @code{pathto}
22126does.@footnote{On some very old versions of @command{awk}, the test
22127@samp{getline junk < t} can loop forever if the file exists but is empty.
22128Caveat emptor.} If the file can be read, it is closed and the @value{FN}
22129is returned:
22130
22131@ignore
22132An alternative way to test for the file's existence would be to call
22133@samp{system("test -r " t)}, which uses the @command{test} utility to
22134see if the file exists and is readable.  The disadvantage to this method
22135is that it requires creating an extra process and can thus be slightly
22136slower.
22137@end ignore
22138
22139@example
22140@c file eg/prog/igawk.sh
22141expand_prog='
22142
22143function pathto(file,    i, t, junk)
22144@{
22145    if (index(file, "/") != 0)
22146        return file
22147
22148    for (i = 1; i <= ndirs; i++) @{
22149        t = (pathlist[i] "/" file)
22150@group
22151        if ((getline junk < t) > 0) @{
22152            # found it
22153            close(t)
22154            return t
22155        @}
22156@end group
22157    @}
22158    return ""
22159@}
22160@c endfile
22161@end example
22162
22163The main program is contained inside one @code{BEGIN} rule.  The first thing it
22164does is set up the @code{pathlist} array that @code{pathto} uses.  After
22165splitting the path on @samp{:}, null elements are replaced with @code{"."},
22166which represents the current directory:
22167
22168@example
22169@c file eg/prog/igawk.sh
22170BEGIN @{
22171    path = ENVIRON["AWKPATH"]
22172    ndirs = split(path, pathlist, ":")
22173    for (i = 1; i <= ndirs; i++) @{
22174        if (pathlist[i] == "")
22175            pathlist[i] = "."
22176    @}
22177@c endfile
22178@end example
22179
22180The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}.
22181The main loop comes next.  Input lines are read in succession. Lines that
22182do not start with @samp{@@include} are printed verbatim.
22183If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}.
22184@code{pathto} is called to generate the full path.  If it cannot, then we
22185print an error message and continue.
22186
22187The next thing to check is if the file is included already.  The
22188@code{processed} array is indexed by the full @value{FN} of each included
22189file and it tracks this information for us.  If the file is
22190seen again, a warning message is printed. Otherwise, the new @value{FN} is
22191pushed onto the stack and processing continues.
22192
22193Finally, when @code{getline} encounters the end of the input file, the file
22194is closed and the stack is popped.  When @code{stackptr} is less than zero,
22195the program is done:
22196
22197@example
22198@c file eg/prog/igawk.sh
22199    stackptr = 0
22200    input[stackptr] = ARGV[1] # ARGV[1] is first file
22201
22202    for (; stackptr >= 0; stackptr--) @{
22203        while ((getline < input[stackptr]) > 0) @{
22204            if (tolower($1) != "@@include") @{
22205                print
22206                continue
22207            @}
22208            fpath = pathto($2)
22209@group
22210            if (fpath == "") @{
22211                printf("igawk:%s:%d: cannot find %s\n",
22212                    input[stackptr], FNR, $2) > "/dev/stderr"
22213                continue
22214            @}
22215@end group
22216            if (! (fpath in processed)) @{
22217                processed[fpath] = input[stackptr]
22218                input[++stackptr] = fpath  # push onto stack
22219            @} else
22220                print $2, "included in", input[stackptr],
22221                    "already included in",
22222                    processed[fpath] > "/dev/stderr"
22223        @}
22224        close(input[stackptr])
22225    @}
22226@}'  # close quote ends `expand_prog' variable
22227
22228processed_program=`gawk -- "$expand_prog" /dev/stdin <<EOF
22229$program
22230EOF
22231`
22232@c endfile
22233@end example
22234
22235The shell construct @samp{@var{command} << @var{marker}} is called a @dfn{here document}.
22236Everything in the shell script up to the @var{marker} is fed to @var{command} as input.
22237The shell processes the contents of the here document for variable and command substitution
22238(and possibly other things as well, depending upon the shell).
22239
22240The shell construct @samp{`@dots{}`} is called @dfn{command substitution}.
22241The output of the command between the two backquotes (grave accents) is substituted
22242into the command line.  It is saved as a single string, even if the results
22243contain whitespace.
22244
22245The expanded program is saved in the variable @code{processed_program}.
22246It's done in these steps:
22247
22248@enumerate
22249@item
22250Run @command{gawk} with the @samp{@@include}-processing program (the
22251value of the @code{expand_prog} shell variable) on standard input.
22252
22253@item
22254Standard input is the contents of the user's program, from the shell variable @code{program}.
22255Its contents are fed to @command{gawk} via a here document.
22256
22257@item
22258The results of this processing are saved in the shell variable @code{processed_program} by using command substitution.
22259@end enumerate
22260
22261The last step is to call @command{gawk} with the expanded program,
22262along with the original
22263options and command-line arguments that the user supplied.
22264
22265@c this causes more problems than it solves, so leave it out.
22266@ignore
22267The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
22268to handle an interesting case. Suppose that the user's program only has
22269a @code{BEGIN} rule and there are no @value{DF}s to read.
22270The program should exit without reading any @value{DF}s.
22271However, suppose that an included library file defines an @code{END}
22272rule of its own. In this case, @command{gawk} will hang, reading standard
22273input. In order to avoid this, @file{/dev/null} is explicitly added to the
22274command-line. Reading from @file{/dev/null} always returns an immediate
22275end of file indication.
22276
22277@c Hmm. Add /dev/null if $# is 0?  Still messes up ARGV. Sigh.
22278@end ignore
22279
22280@example
22281@c file eg/prog/igawk.sh
22282eval gawk $opts -- '"$processed_program"' '"$@@"'
22283@c endfile
22284@end example
22285
22286The @command{eval} command is a shell construct that reruns the shell's parsing
22287process.  This keeps things properly quoted.
22288
22289This version of @command{igawk} represents my fourth attempt at this program.
22290There are four key simplifications that make the program work better:
22291
22292@itemize @bullet
22293@item
22294Using @samp{@@include} even for the files named with @option{-f} makes building
22295the initial collected @command{awk} program much simpler; all the
22296@samp{@@include} processing can be done once.
22297
22298@item
22299Not trying to save the line read with @code{getline}
22300in the @code{pathto} function when testing for the
22301file's accessibility for use with the main program simplifies things
22302considerably.
22303@c what problem does this engender though - exercise
22304@c answer, reading from "-" or /dev/stdin
22305
22306@item
22307Using a @code{getline} loop in the @code{BEGIN} rule does it all in one
22308place.  It is not necessary to call out to a separate loop for processing
22309nested @samp{@@include} statements.
22310
22311@item
22312Instead of saving the expanded program in a temporary file, putting it in a shell variable
22313avoids some potential security problems.
22314This has the disadvantage that the script relies upon more features
22315of the @command{sh} language, making it harder to follow for those who
22316aren't familiar with @command{sh}.
22317@end itemize
22318
22319Also, this program illustrates that it is often worthwhile to combine
22320@command{sh} and @command{awk} programming together.  You can usually
22321accomplish quite a lot, without having to resort to low-level programming
22322in C or C++, and it is frequently easier to do certain kinds of string
22323and argument manipulation using the shell than it is in @command{awk}.
22324
22325Finally, @command{igawk} shows that it is not always necessary to add new
22326features to a program; they can often be layered on top.  With @command{igawk},
22327there is no real reason to build @samp{@@include} processing into
22328@command{gawk} itself.
22329
22330@cindex search paths, for source files
22331@c comma is part of primary
22332@cindex source files, search path for
22333@c last comma is part of secondary
22334@cindex files, source, search path for
22335@cindex directories, searching
22336As an additional example of this, consider the idea of having two
22337files in a directory in the search path:
22338
22339@table @file
22340@item default.awk
22341This file contains a set of default library functions, such
22342as @code{getopt} and @code{assert}.
22343
22344@item site.awk
22345This file contains library functions that are specific to a site or
22346installation; i.e., locally developed functions.
22347Having a separate file allows @file{default.awk} to change with
22348new @command{gawk} releases, without requiring the system administrator to
22349update it each time by adding the local functions.
22350@end table
22351
22352One user
22353@c Karl Berry, karl@ileaf.com, 10/95
22354suggested that @command{gawk} be modified to automatically read these files
22355upon startup.  Instead, it would be very simple to modify @command{igawk}
22356to do this. Since @command{igawk} can process nested @samp{@@include}
22357directives, @file{default.awk} could simply contain @samp{@@include}
22358statements for the desired library functions.
22359@c ENDOFRANGE libfex
22360@c ENDOFRANGE flibex
22361@c ENDOFRANGE awkpex
22362
22363@c Exercise: make this change
22364
22365@ignore
22366@c Try this
22367@iftex
22368@page
22369@headings off
22370@majorheading III@ @ @ Appendixes
22371Part III provides the appendixes, the Glossary, and two licenses that cover
22372the @command{gawk} source code and this @value{DOCUMENT}, respectively.
22373It contains the following appendixes:
22374
22375@itemize @bullet
22376@item
22377@ref{Language History}.
22378
22379@item
22380@ref{Installation}.
22381
22382@item
22383@ref{Notes}.
22384
22385@item
22386@ref{Basic Concepts}.
22387
22388@item
22389@ref{Glossary}.
22390
22391@item
22392@ref{Copying}.
22393
22394@item
22395@ref{GNU Free Documentation License}.
22396@end itemize
22397
22398@page
22399@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
22400@oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
22401@end iftex
22402@end ignore
22403
22404@node Language History
22405@appendix The Evolution of the @command{awk} Language
22406
22407This @value{DOCUMENT} describes the GNU implementation of @command{awk}, which follows
22408the POSIX specification.
22409Many long-time @command{awk} users learned @command{awk} programming
22410with the original @command{awk} implementation in Version 7 Unix.
22411(This implementation was the basis for @command{awk} in Berkeley Unix,
22412through 4.3-Reno.  Subsequent versions of Berkeley Unix, and systems
22413derived from 4.4BSD-Lite, use various versions of @command{gawk}
22414for their @command{awk}.)
22415This @value{CHAPTER} briefly describes the
22416evolution of the @command{awk} language, with cross-references to other parts
22417of the @value{DOCUMENT} where you can find more information.
22418
22419@menu
22420* V7/SVR3.1::                   The major changes between V7 and System V
22421                                Release 3.1.
22422* SVR4::                        Minor changes between System V Releases 3.1
22423                                and 4.
22424* POSIX::                       New features from the POSIX standard.
22425* BTL::                         New features from the Bell Laboratories
22426                                version of @command{awk}.
22427* POSIX/GNU::                   The extensions in @command{gawk} not in POSIX
22428                                @command{awk}.
22429* Contributors::                The major contributors to @command{gawk}.
22430@end menu
22431
22432@node V7/SVR3.1
22433@appendixsec Major Changes Between V7 and SVR3.1
22434@c STARTOFRANGE gawkv
22435@cindex @command{awk}, versions of
22436@c STARTOFRANGE gawkv1
22437@cindex @command{awk}, versions of, changes between V7 and SVR3.1
22438
22439The @command{awk} language evolved considerably between the release of
22440Version 7 Unix (1978) and the new version that was first made generally available in
22441System V Release 3.1 (1987).  This @value{SECTION} summarizes the changes, with
22442cross-references to further details:
22443
22444@itemize @bullet
22445@item
22446The requirement for @samp{;} to separate rules on a line
22447(@pxref{Statements/Lines}).
22448
22449@item
22450User-defined functions and the @code{return} statement
22451(@pxref{User-defined}).
22452
22453@item
22454The @code{delete} statement (@pxref{Delete}).
22455
22456@item
22457The @code{do}-@code{while} statement
22458(@pxref{Do Statement}).
22459
22460@item
22461The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and
22462@code{srand} (@pxref{Numeric Functions}).
22463
22464@item
22465The built-in functions @code{gsub}, @code{sub}, and @code{match}
22466(@pxref{String Functions}).
22467
22468@item
22469The built-in functions @code{close} and @code{system}
22470(@pxref{I/O Functions}).
22471
22472@item
22473The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
22474and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
22475
22476@item
22477The conditional expression using the ternary operator @samp{?:}
22478(@pxref{Conditional Exp}).
22479
22480@item
22481The exponentiation operator @samp{^}
22482(@pxref{Arithmetic Ops}) and its assignment operator
22483form @samp{^=} (@pxref{Assignment Ops}).
22484
22485@item
22486C-compatible operator precedence, which breaks some old @command{awk}
22487programs (@pxref{Precedence}).
22488
22489@item
22490Regexps as the value of @code{FS}
22491(@pxref{Field Separators}) and as the
22492third argument to the @code{split} function
22493(@pxref{String Functions}).
22494
22495@item
22496Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
22497(@pxref{Regexp Usage}).
22498
22499@item
22500The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
22501(@pxref{Escape Sequences}).
22502(Some vendors have updated their old versions of @command{awk} to
22503recognize @samp{\b}, @samp{\f}, and @samp{\r}, but this is not
22504something you can rely on.)
22505
22506@item
22507Redirection of input for the @code{getline} function
22508(@pxref{Getline}).
22509
22510@item
22511Multiple @code{BEGIN} and @code{END} rules
22512(@pxref{BEGIN/END}).
22513
22514@item
22515Multidimensional arrays
22516(@pxref{Multi-dimensional}).
22517@end itemize
22518@c ENDOFRANGE gawkv1
22519
22520@node SVR4
22521@appendixsec Changes Between SVR3.1 and SVR4
22522
22523@cindex @command{awk}, versions of, changes between SVR3.1 and SVR4
22524The System V Release 4 (1989) version of Unix @command{awk} added these features
22525(some of which originated in @command{gawk}):
22526
22527@itemize @bullet
22528@item
22529The @code{ENVIRON} variable (@pxref{Built-in Variables}).
22530@c gawk and MKS awk
22531
22532@item
22533Multiple @option{-f} options on the command line
22534(@pxref{Options}).
22535@c MKS awk
22536
22537@item
22538The @option{-v} option for assigning variables before program execution begins
22539(@pxref{Options}).
22540@c GNU, Bell Laboratories & MKS together
22541
22542@item
22543The @option{--} option for terminating command-line options.
22544
22545@item
22546The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences
22547(@pxref{Escape Sequences}).
22548@c GNU, for ANSI C compat
22549
22550@item
22551A defined return value for the @code{srand} built-in function
22552(@pxref{Numeric Functions}).
22553
22554@item
22555The @code{toupper} and @code{tolower} built-in string functions
22556for case translation
22557(@pxref{String Functions}).
22558
22559@item
22560A cleaner specification for the @samp{%c} format-control letter in the
22561@code{printf} function
22562(@pxref{Control Letters}).
22563
22564@item
22565The ability to dynamically pass the field width and precision (@code{"%*.*d"})
22566in the argument list of the @code{printf} function
22567(@pxref{Control Letters}).
22568
22569@item
22570The use of regexp constants, such as @code{/foo/}, as expressions, where
22571they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/}
22572(@pxref{Using Constant Regexps}).
22573
22574@item
22575Processing of escape sequences inside command-line variable assignments
22576(@pxref{Assignment Options}).
22577@end itemize
22578
22579@node POSIX
22580@appendixsec Changes Between SVR4 and POSIX @command{awk}
22581@cindex @command{awk}, versions of, changes between SVR4 and POSIX @command{awk}
22582@cindex POSIX @command{awk}, changes in @command{awk} versions
22583
22584The POSIX Command Language and Utilities standard for @command{awk} (1992)
22585introduced the following changes into the language:
22586
22587@itemize @bullet
22588@item
22589The use of @option{-W} for implementation-specific options
22590(@pxref{Options}).
22591
22592@item
22593The use of @code{CONVFMT} for controlling the conversion of numbers
22594to strings (@pxref{Conversion}).
22595
22596@item
22597The concept of a numeric string and tighter comparison rules to go
22598with it (@pxref{Typing and Comparison}).
22599
22600@item
22601More complete documentation of many of the previously undocumented
22602features of the language.
22603@end itemize
22604
22605The following common extensions are not permitted by the POSIX
22606standard:
22607
22608@c IMPORTANT! Keep this list in sync with the one in node Options
22609
22610@itemize @bullet
22611@item
22612@code{\x} escape sequences are not recognized
22613(@pxref{Escape Sequences}).
22614
22615@item
22616Newlines do not act as whitespace to separate fields when @code{FS} is
22617equal to a single space
22618(@pxref{Fields}).
22619
22620@item
22621Newlines are not allowed after @samp{?} or @samp{:}
22622(@pxref{Conditional Exp}).
22623
22624@item
22625The synonym @code{func} for the keyword @code{function} is not
22626recognized (@pxref{Definition Syntax}).
22627
22628@item
22629The operators @samp{**} and @samp{**=} cannot be used in
22630place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops},
22631and @ref{Assignment Ops}).
22632
22633@item
22634Specifying @samp{-Ft} on the command line does not set the value
22635of @code{FS} to be a single TAB character
22636(@pxref{Field Separators}).
22637
22638@item
22639The @code{fflush} built-in function is not supported
22640(@pxref{I/O Functions}).
22641@end itemize
22642@c ENDOFRANGE gawkv
22643
22644@node BTL
22645@appendixsec Extensions in the Bell Laboratories @command{awk}
22646
22647@cindex @command{awk}, versions of, See Also Bell Laboratories @command{awk}
22648@cindex extensions, Bell Laboratories @command{awk}
22649@cindex Bell Laboratories @command{awk} extensions
22650@cindex Kernighan, Brian
22651Brian Kernighan, one of the original designers of Unix @command{awk},
22652has made his version available via his home page
22653(@pxref{Other Versions}).
22654This @value{SECTION} describes extensions in his version of @command{awk} that are
22655not in POSIX @command{awk}:
22656
22657@itemize @bullet
22658@item
22659The @samp{-mf @var{N}} and @samp{-mr @var{N}} command-line options
22660to set the maximum number of fields and the maximum
22661record size, respectively
22662(@pxref{Options}).
22663As a side note, his @command{awk} no longer needs these options;
22664it continues to accept them to avoid breaking old programs.
22665
22666@item
22667The @code{fflush} built-in function for flushing buffered output
22668(@pxref{I/O Functions}).
22669
22670@item
22671The @samp{**} and @samp{**=} operators
22672(@pxref{Arithmetic Ops}
22673and
22674@ref{Assignment Ops}).
22675
22676@item
22677The use of @code{func} as an abbreviation for @code{function}
22678(@pxref{Definition Syntax}).
22679
22680@ignore
22681@item
22682The @code{SYMTAB} array, that allows access to @command{awk}'s internal symbol
22683table. This feature is not documented, largely because
22684it is somewhat shakily implemented. For instance, you cannot access arrays
22685or array elements through it.
22686@end ignore
22687@end itemize
22688
22689The Bell Laboratories @command{awk} also incorporates the following extensions,
22690originally developed for @command{gawk}:
22691
22692@itemize @bullet
22693@item
22694The @samp{\x} escape sequence
22695(@pxref{Escape Sequences}).
22696
22697@item
22698The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
22699special files
22700(@pxref{Special Files}).
22701
22702@item
22703The ability for @code{FS} and for the third
22704argument to @code{split} to be null strings
22705(@pxref{Single Character Fields}).
22706
22707@item
22708The @code{nextfile} statement
22709(@pxref{Nextfile Statement}).
22710
22711@item
22712The ability to delete all of an array at once with @samp{delete @var{array}}
22713(@pxref{Delete}).
22714@end itemize
22715
22716@node POSIX/GNU
22717@appendixsec Extensions in @command{gawk} Not in POSIX @command{awk}
22718
22719@ignore
22720I've tried to follow this general order, esp. for the 3.0 and 3.1 sections:
22721       variables
22722       special files
22723       language changes (e.g., hex constants)
22724       differences in standard awk functions
22725       new gawk functions
22726       new keywords
22727       new command-line options
22728       new ports
22729Within each category, be alphabetical.
22730@end ignore
22731
22732@c STARTOFRANGE fripls
22733@cindex compatibility mode (@command{gawk}), extensions
22734@c STARTOFRANGE exgnot
22735@cindex extensions, in @command{gawk}, not in POSIX @command{awk}
22736@c STARTOFRANGE posnot
22737@cindex POSIX, @command{gawk} extensions not included in
22738The GNU implementation, @command{gawk}, adds a large number of features.
22739This @value{SECTION} lists them in the order they were added to @command{gawk}.
22740They can all be disabled with either the @option{--traditional} or
22741@option{--posix} options
22742(@pxref{Options}).
22743
22744Version 2.10 of @command{gawk} introduced the following features:
22745
22746@itemize @bullet
22747@item
22748The @env{AWKPATH} environment variable for specifying a path search for
22749the @option{-f} command-line option
22750(@pxref{Options}).
22751
22752@item
22753The @code{IGNORECASE} variable and its effects
22754(@pxref{Case-sensitivity}).
22755
22756@item
22757The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and
22758@file{/dev/fd/@var{N}} special @value{FN}s
22759(@pxref{Special Files}).
22760@end itemize
22761
22762Version 2.13 of @command{gawk} introduced the following features:
22763
22764@itemize @bullet
22765@item
22766The @code{FIELDWIDTHS} variable and its effects
22767(@pxref{Constant Size}).
22768
22769@item
22770The @code{systime} and @code{strftime} built-in functions for obtaining
22771and printing timestamps
22772(@pxref{Time Functions}).
22773
22774@item
22775The @option{-W lint} option to provide error and portability checking
22776for both the source code and at runtime
22777(@pxref{Options}).
22778
22779@item
22780The @option{-W compat} option to turn off the GNU extensions
22781(@pxref{Options}).
22782
22783@item
22784The @option{-W posix} option for full POSIX compliance
22785(@pxref{Options}).
22786@end itemize
22787
22788Version 2.14 of @command{gawk} introduced the following feature:
22789
22790@itemize @bullet
22791@item
22792The @code{next file} statement for skipping to the next @value{DF}
22793(@pxref{Nextfile Statement}).
22794@end itemize
22795
22796Version 2.15 of @command{gawk} introduced the following features:
22797
22798@itemize @bullet
22799@item
22800The @code{ARGIND} variable, which tracks the movement of @code{FILENAME}
22801through @code{ARGV}  (@pxref{Built-in Variables}).
22802
22803@item
22804The @code{ERRNO} variable, which contains the system error message when
22805@code{getline} returns @minus{}1 or @code{close} fails
22806(@pxref{Built-in Variables}).
22807
22808@item
22809The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and
22810@file{/dev/user} @value{FN} interpretation
22811(@pxref{Special Files}).
22812
22813@item
22814The ability to delete all of an array at once with @samp{delete @var{array}}
22815(@pxref{Delete}).
22816
22817@item
22818The ability to use GNU-style long-named options that start with @option{--}
22819(@pxref{Options}).
22820
22821@item
22822The @option{--source} option for mixing command-line and library-file
22823source code
22824(@pxref{Options}).
22825@end itemize
22826
22827Version 3.0 of @command{gawk} introduced the following features:
22828
22829@itemize @bullet
22830@item
22831@code{IGNORECASE} changed, now applying to string comparison as well
22832as regexp operations
22833(@pxref{Case-sensitivity}).
22834
22835@item
22836The @code{RT} variable that contains the input text that
22837matched @code{RS}
22838(@pxref{Records}).
22839
22840@item
22841Full support for both POSIX and GNU regexps
22842(@pxref{Regexp}).
22843
22844@item
22845The @code{gensub} function for more powerful text manipulation
22846(@pxref{String Functions}).
22847
22848@item
22849The @code{strftime} function acquired a default time format,
22850allowing it to be called with no arguments
22851(@pxref{Time Functions}).
22852
22853@item
22854The ability for @code{FS} and for the third
22855argument to @code{split} to be null strings
22856(@pxref{Single Character Fields}).
22857
22858@item
22859The ability for @code{RS} to be a regexp
22860(@pxref{Records}).
22861
22862@item
22863The @code{next file} statement became @code{nextfile}
22864(@pxref{Nextfile Statement}).
22865
22866@item
22867The @option{--lint-old} option to
22868warn about constructs that are not available in
22869the original Version 7 Unix version of @command{awk}
22870(@pxref{V7/SVR3.1}).
22871
22872@item
22873The @option{-m} option and the @code{fflush} function from the
22874Bell Laboratories research version of @command{awk}
22875(@pxref{Options}; also
22876@pxref{I/O Functions}).
22877
22878@item
22879The @option{--re-interval} option to provide interval expressions in regexps
22880(@pxref{Regexp Operators}).
22881
22882@item
22883The @option{--traditional} option was added as a better name for
22884@option{--compat} (@pxref{Options}).
22885
22886@item
22887The use of GNU Autoconf to control the configuration process
22888(@pxref{Quick Installation}).
22889
22890@item
22891Amiga support
22892(@pxref{Amiga Installation}).
22893
22894@end itemize
22895
22896Version 3.1 of @command{gawk} introduced the following features:
22897
22898@itemize @bullet
22899@item
22900The @code{BINMODE} special variable for non-POSIX systems,
22901which allows binary I/O for input and/or output files
22902(@pxref{PC Using}).
22903
22904@item
22905The @code{LINT} special variable, which dynamically controls lint warnings
22906(@pxref{Built-in Variables}).
22907
22908@item
22909The @code{PROCINFO} array for providing process-related information
22910(@pxref{Built-in Variables}).
22911
22912@item
22913The @code{TEXTDOMAIN} special variable for setting an application's
22914internationalization text domain
22915(@pxref{Built-in Variables},
22916and
22917@ref{Internationalization}).
22918
22919@item
22920The ability to use octal and hexadecimal constants in @command{awk}
22921program source code
22922(@pxref{Nondecimal-numbers}).
22923
22924@item
22925The @samp{|&} operator for two-way I/O to a coprocess
22926(@pxref{Two-way I/O}).
22927
22928@item
22929The @file{/inet} special files for TCP/IP networking using @samp{|&}
22930(@pxref{TCP/IP Networking}).
22931
22932@item
22933The optional second argument to @code{close} that allows closing one end
22934of a two-way pipe to a coprocess
22935(@pxref{Two-way I/O}).
22936
22937@item
22938The optional third argument to the @code{match} function
22939for capturing text-matching subexpressions within a regexp
22940(@pxref{String Functions}).
22941
22942@item
22943Positional specifiers in @code{printf} formats for
22944making translations easier
22945(@pxref{Printf Ordering}).
22946
22947@item
22948The @code{asort} and @code{asorti} functions for sorting arrays
22949(@pxref{Array Sorting}).
22950
22951@item
22952The @code{bindtextdomain}, @code{dcgettext} and @code{dcngettext} functions
22953for internationalization
22954(@pxref{Programmer i18n}).
22955
22956@item
22957The @code{extension} built-in function and the ability to add
22958new built-in functions dynamically
22959(@pxref{Dynamic Extensions}).
22960
22961@item
22962The @code{mktime} built-in function for creating timestamps
22963(@pxref{Time Functions}).
22964
22965@item
22966The
22967@code{and},
22968@code{or},
22969@code{xor},
22970@code{compl},
22971@code{lshift},
22972@code{rshift},
22973and
22974@code{strtonum} built-in
22975functions
22976(@pxref{Bitwise Functions}).
22977
22978@item
22979@cindex @code{next file} statement
22980The support for @samp{next file} as two words was removed completely
22981(@pxref{Nextfile Statement}).
22982
22983@item
22984The @option{--dump-variables} option to print a list of all global variables
22985(@pxref{Options}).
22986
22987@item
22988The @option{--gen-po} command-line option and the use of a leading
22989underscore to mark strings that should be translated
22990(@pxref{String Extraction}).
22991
22992@item
22993The @option{--non-decimal-data} option to allow non-decimal
22994input data
22995(@pxref{Nondecimal Data}).
22996
22997@item
22998The @option{--profile} option and @command{pgawk}, the
22999profiling version of @command{gawk}, for producing execution
23000profiles of @command{awk} programs
23001(@pxref{Profiling}).
23002
23003@item
23004The @option{--enable-portals} configuration option to enable special treatment of
23005pathnames that begin with @file{/p} as BSD portals
23006(@pxref{Portal Files}).
23007
23008@item
23009The use of GNU Automake to help in standardizing the configuration process
23010(@pxref{Quick Installation}).
23011
23012@item
23013The use of GNU @code{gettext} for @command{gawk}'s own message output
23014(@pxref{Gawk I18N}).
23015
23016@item
23017BeOS support
23018(@pxref{BeOS Installation}).
23019
23020@item
23021Tandem support
23022(@pxref{Tandem Installation}).
23023
23024@item
23025The Atari port became officially unsupported
23026(@pxref{Atari Installation}).
23027
23028@item
23029The source code now uses new-style function definitions, with
23030@command{ansi2knr} to convert the code on systems with old compilers.
23031
23032@item
23033The @option{--disable-lint} configuration option to disable lint checking
23034at compile time
23035(@pxref{Additional Configuration Options}).
23036
23037@end itemize
23038
23039@c XXX ADD MORE STUFF HERE
23040
23041@c ENDOFRANGE fripls
23042@c ENDOFRANGE exgnot
23043@c ENDOFRANGE posnot
23044
23045@node Contributors
23046@appendixsec Major Contributors to @command{gawk}
23047@cindex @command{gawk}, list of contributors to
23048@quotation
23049@i{Always give credit where credit is due.}@*
23050Anonymous
23051@end quotation
23052
23053This @value{SECTION} names the major contributors to @command{gawk}
23054and/or this @value{DOCUMENT}, in approximate chronological order:
23055
23056@itemize @bullet
23057@item
23058@cindex Aho, Alfred
23059@cindex Weinberger, Peter
23060@cindex Kernighan, Brian
23061Dr.@: Alfred V.@: Aho,
23062Dr.@: Peter J.@: Weinberger, and
23063Dr.@: Brian W.@: Kernighan, all of Bell Laboratories,
23064designed and implemented Unix @command{awk},
23065from which @command{gawk} gets the majority of its feature set.
23066
23067@item
23068@cindex Rubin, Paul
23069Paul Rubin
23070did the initial design and implementation in 1986, and wrote
23071the first draft (around 40 pages) of this @value{DOCUMENT}.
23072
23073@item
23074@cindex Fenlason, Jay
23075Jay Fenlason
23076finished the initial implementation.
23077
23078@item
23079@cindex Close, Diane
23080Diane Close
23081revised the first draft of this @value{DOCUMENT}, bringing it
23082to around 90 pages.
23083
23084@item
23085@cindex Stallman, Richard
23086Richard Stallman
23087helped finish the implementation and the initial draft of this
23088@value{DOCUMENT}.
23089He is also the founder of the FSF and the GNU project.
23090
23091@item
23092@cindex Woods, John
23093John Woods
23094contributed parts of the code (mostly fixes) in
23095the initial version of @command{gawk}.
23096
23097@item
23098@cindex Trueman, David
23099In 1988,
23100David Trueman
23101took over primary maintenance of @command{gawk},
23102making it compatible with ``new'' @command{awk}, and
23103greatly improving its performance.
23104
23105@item
23106@cindex Rankin, Pat
23107Pat Rankin
23108provided the VMS port and its documentation.
23109
23110@item
23111@cindex Kwok, Conrad
23112@cindex Garfinkle, Scott
23113@cindex Williams, Kent
23114Conrad Kwok,
23115Scott Garfinkle,
23116and
23117Kent Williams
23118did the initial ports to MS-DOS with various versions of MSC.
23119
23120@item
23121@cindex Peterson, Hal
23122Hal Peterson
23123provided help in porting @command{gawk} to Cray systems.
23124
23125@item
23126@cindex Rommel, Kai Uwe
23127Kai Uwe Rommel
23128provided the initial port to OS/2 and its documentation.
23129
23130@item
23131@cindex Jaegermann, Michal
23132Michal Jaegermann
23133provided the port to Atari systems and its documentation.
23134He continues to provide portability checking with DEC Alpha
23135systems, and has done a lot of work to make sure @command{gawk}
23136works on non-32-bit systems.
23137
23138@item
23139@cindex Fish, Fred
23140Fred Fish
23141provided the port to Amiga systems and its documentation.
23142
23143@item
23144@cindex Deifik, Scott
23145Scott Deifik
23146currently maintains the MS-DOS port.
23147
23148@item
23149@cindex Grigera, Juan
23150Juan Grigera
23151maintains the port to Windows32 systems.
23152
23153@item
23154@cindex Hankerson, Darrel
23155Dr.@: Darrel Hankerson
23156acts as coordinator for the various ports to different PC platforms
23157and creates binary distributions for various PC operating systems.
23158He is also instrumental in keeping the documentation up to date for
23159the various PC platforms.
23160
23161@item
23162@cindex Zoulas, Christos
23163Christos Zoulas
23164provided the @code{extension}
23165built-in function for dynamically adding new modules.
23166
23167@item
23168@cindex Kahrs, J@"urgen
23169J@"urgen Kahrs
23170contributed the initial version of the TCP/IP networking
23171code and documentation, and motivated the inclusion of the @samp{|&} operator.
23172
23173@item
23174@cindex Davies, Stephen
23175Stephen Davies
23176provided the port to Tandem systems and its documentation.
23177
23178@item
23179@cindex Brown, Martin
23180Martin Brown
23181provided the port to BeOS and its documentation.
23182
23183@item
23184@cindex Peters, Arno
23185Arno Peters
23186did the initial work to convert @command{gawk} to use
23187GNU Automake and @code{gettext}.
23188
23189@item
23190@cindex Broder, Alan J.@:
23191Alan J.@: Broder
23192provided the initial version of the @code{asort} function
23193as well as the code for the new optional third argument to the @code{match} function.
23194
23195@item
23196@cindex Buening, Andreas
23197Andreas Buening
23198updated the @command{gawk} port for OS/2.
23199
23200@cindex Hasegawa, Isamu
23201Isamu Hasegawa,
23202of IBM in Japan, contributed support for multibyte characters.
23203
23204@cindex Benzinger, Michael
23205Michael Benzinger contributed the initial code for @code{switch} statements.
23206
23207@cindex McPhee, Patrick
23208Patrick T.J.@: McPhee contributed the code for dynamic loading in Windows32
23209environments.
23210
23211@item
23212@cindex Robbins, Arnold
23213Arnold Robbins
23214has been working on @command{gawk} since 1988, at first
23215helping David Trueman, and as the primary maintainer since around 1994.
23216@end itemize
23217
23218@node Installation
23219@appendix Installing @command{gawk}
23220
23221@c last two commas are part of see also
23222@cindex operating systems, See Also GNU/Linux, PC operating systems, Unix
23223@c STARTOFRANGE gligawk
23224@cindex @command{gawk}, installing
23225@c STARTOFRANGE ingawk
23226@cindex installing @command{gawk}
23227This appendix provides instructions for installing @command{gawk} on the
23228various platforms that are supported by the developers.  The primary
23229developer supports GNU/Linux (and Unix), whereas the other ports are
23230contributed.
23231@xref{Bugs},
23232for the electronic mail addresses of the people who did
23233the respective ports.
23234
23235@menu
23236* Gawk Distribution::           What is in the @command{gawk} distribution.
23237* Unix Installation::           Installing @command{gawk} under various
23238                                versions of Unix.
23239* Non-Unix Installation::       Installation on Other Operating Systems.
23240* Unsupported::                 Systems whose ports are no longer supported.
23241* Bugs::                        Reporting Problems and Bugs.
23242* Other Versions::              Other freely available @command{awk}
23243                                implementations.
23244@end menu
23245
23246@node Gawk Distribution
23247@appendixsec The @command{gawk} Distribution
23248@cindex source code, @command{gawk}
23249
23250This @value{SECTION} describes how to get the @command{gawk}
23251distribution, how to extract it, and then what is in the various files and
23252subdirectories.
23253
23254@menu
23255* Getting::                     How to get the distribution.
23256* Extracting::                  How to extract the distribution.
23257* Distribution contents::       What is in the distribution.
23258@end menu
23259
23260@node Getting
23261@appendixsubsec Getting the @command{gawk} Distribution
23262@c last comma is part of secondary
23263@cindex @command{gawk}, source code, obtaining
23264There are three ways to get GNU software:
23265
23266@itemize @bullet
23267@item
23268Copy it from someone else who already has it.
23269
23270@cindex FSF (Free Software Foundation)
23271@cindex Free Software Foundation (FSF)
23272@item
23273Order @command{gawk} directly from the Free Software Foundation.
23274Software distributions are available for
23275Gnu/Linux, Unix, and MS-Windows, in several CD packages.
23276Their address is:
23277
23278@display
23279Free Software Foundation
2328059 Temple Place, Suite 330
23281Boston, MA  02111-1307 USA
23282Phone: +1-617-542-5942
23283Fax (including Japan): +1-617-542-2652
23284Email: @email{gnu@@gnu.org}
23285URL: @uref{http://www.gnu.org}
23286@end display
23287
23288@noindent
23289Ordering from the FSF directly contributes to the support of the foundation
23290and to the production of more free software.
23291
23292@item
23293Retrieve @command{gawk} by using anonymous @command{ftp} to the Internet host
23294@code{ftp.gnu.org}, in the directory @file{/gnu/gawk}.
23295@end itemize
23296
23297The GNU software archive is mirrored around the world.
23298The up-to-date list of mirror sites is available from
23299@uref{http://www.gnu.org/order/ftp.html, the main FSF web site}.
23300Try to use one of the mirrors; they
23301will be less busy, and you can usually find one closer to your site.
23302
23303@node Extracting
23304@appendixsubsec Extracting the Distribution
23305@command{gawk} is distributed as a @code{tar} file compressed with the
23306GNU Zip program, @code{gzip}.
23307
23308Once you have the distribution (for example,
23309@file{gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz}),
23310use @code{gzip} to expand the
23311file and then use @code{tar} to extract it.  You can use the following
23312pipeline to produce the @command{gawk} distribution:
23313
23314@example
23315# Under System V, add 'o' to the tar options
23316gzip -d -c gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz | tar -xvpf -
23317@end example
23318
23319@noindent
23320This creates a directory named @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}
23321in the current directory.
23322
23323The distribution @value{FN} is of the form
23324@file{gawk-@var{V}.@var{R}.@var{P}.tar.gz}.
23325The @var{V} represents the major version of @command{gawk},
23326the @var{R} represents the current release of version @var{V}, and
23327the @var{P} represents a @dfn{patch level}, meaning that minor bugs have
23328been fixed in the release.  The current patch level is @value{PATCHLEVEL},
23329but when retrieving distributions, you should get the version with the highest
23330version, release, and patch level.  (Note, however, that patch levels greater than
23331or equal to 80 denote ``beta'' or nonproduction software; you might not want
23332to retrieve such a version unless you don't mind experimenting.)
23333If you are not on a Unix system, you need to make other arrangements
23334for getting and extracting the @command{gawk} distribution.  You should consult
23335a local expert.
23336
23337@node Distribution contents
23338@appendixsubsec Contents of the @command{gawk} Distribution
23339@c STARTOFRANGE gawdis
23340@cindex @command{gawk}, distribution
23341
23342The @command{gawk} distribution has a number of C source files,
23343documentation files,
23344subdirectories, and files related to the configuration process
23345(@pxref{Unix Installation}),
23346as well as several subdirectories related to different non-Unix
23347operating systems:
23348
23349@table @asis
23350@item Various @samp{.c}, @samp{.y}, and @samp{.h} files
23351The actual @command{gawk} source code.
23352@end table
23353
23354@table @file
23355@item README
23356@itemx README_d/README.*
23357Descriptive files: @file{README} for @command{gawk} under Unix and the
23358rest for the various hardware and software combinations.
23359
23360@item INSTALL
23361A file providing an overview of the configuration and installation process.
23362
23363@item ChangeLog
23364A detailed list of source code changes as bugs are fixed or improvements made.
23365
23366@item NEWS
23367A list of changes to @command{gawk} since the last release or patch.
23368
23369@item COPYING
23370The GNU General Public License.
23371
23372@item FUTURES
23373A brief list of features and changes being contemplated for future
23374releases, with some indication of the time frame for the feature, based
23375on its difficulty.
23376
23377@item LIMITATIONS
23378A list of those factors that limit @command{gawk}'s performance.
23379Most of these depend on the hardware or operating system software and
23380are not limits in @command{gawk} itself.
23381
23382@item POSIX.STD
23383A description of one area in which the POSIX standard for @command{awk} is
23384incorrect as well as how @command{gawk} handles the problem.
23385
23386@c comma is part of primary
23387@cindex artificial intelligence, @command{gawk} and
23388@item doc/awkforai.txt
23389A short article describing why @command{gawk} is a good language for
23390AI (Artificial Intelligence) programming.
23391
23392@item doc/README.card
23393@itemx doc/ad.block
23394@itemx doc/awkcard.in
23395@itemx doc/cardfonts
23396@itemx doc/colors
23397@itemx doc/macros
23398@itemx doc/no.colors
23399@itemx doc/setter.outline
23400The @command{troff} source for a five-color @command{awk} reference card.
23401A modern version of @command{troff} such as GNU @command{troff} (@command{groff}) is
23402needed to produce the color version. See the file @file{README.card}
23403for instructions if you have an older @command{troff}.
23404
23405@item doc/gawk.1
23406The @command{troff} source for a manual page describing @command{gawk}.
23407This is distributed for the convenience of Unix users.
23408
23409@cindex Texinfo
23410@item doc/gawk.texi
23411The Texinfo source file for this @value{DOCUMENT}.
23412It should be processed with @TeX{} to produce a printed document, and
23413with @command{makeinfo} to produce an Info or HTML file.
23414
23415@item doc/awk.info
23416The generated Info file for this @value{DOCUMENT}.
23417
23418@item doc/gawkinet.texi
23419The Texinfo source file for
23420@ifinfo
23421@xref{Top}.
23422@end ifinfo
23423@ifnotinfo
23424@cite{TCP/IP Internetworking with @command{gawk}}.
23425@end ifnotinfo
23426It should be processed with @TeX{} to produce a printed document and
23427with @command{makeinfo} to produce an Info or HTML file.
23428
23429@item doc/gawkinet.info
23430The generated Info file for
23431@cite{TCP/IP Internetworking with @command{gawk}}.
23432
23433@item doc/igawk.1
23434The @command{troff} source for a manual page describing the @command{igawk}
23435program presented in
23436@ref{Igawk Program}.
23437
23438@item doc/Makefile.in
23439The input file used during the configuration process to generate the
23440actual @file{Makefile} for creating the documentation.
23441
23442@item Makefile.am
23443@itemx */Makefile.am
23444Files used by the GNU @command{automake} software for generating
23445the @file{Makefile.in} files used by @command{autoconf} and
23446@command{configure}.
23447
23448@item Makefile.in
23449@itemx acconfig.h
23450@itemx acinclude.m4
23451@itemx aclocal.m4
23452@itemx configh.in
23453@itemx configure.in
23454@itemx configure
23455@itemx custom.h
23456@itemx missing_d/*
23457@itemx m4/*
23458These files and subdirectories are used when configuring @command{gawk}
23459for various Unix systems.  They are explained in
23460@ref{Unix Installation}.
23461
23462@item intl/*
23463@itemx po/*
23464The @file{intl} directory provides the GNU @code{gettext} library, which implements
23465@command{gawk}'s internationalization features, while the @file{po} library
23466contains message translations.
23467
23468@item awklib/extract.awk
23469@itemx awklib/Makefile.am
23470@itemx awklib/Makefile.in
23471@itemx awklib/eg/*
23472The @file{awklib} directory contains a copy of @file{extract.awk}
23473(@pxref{Extract Program}),
23474which can be used to extract the sample programs from the Texinfo
23475source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} file, which
23476@command{configure} uses to generate a @file{Makefile}.
23477@file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}.
23478The library functions from
23479@ref{Library Functions},
23480and the @command{igawk} program from
23481@ref{Igawk Program},
23482are included as ready-to-use files in the @command{gawk} distribution.
23483They are installed as part of the installation process.
23484The rest of the programs in this @value{DOCUMENT} are available in appropriate
23485subdirectories of @file{awklib/eg}.
23486
23487@item unsupported/atari/*
23488Files needed for building @command{gawk} on an Atari ST
23489(@pxref{Atari Installation}, for details).
23490
23491@item unsupported/tandem/*
23492Files needed for building @command{gawk} on a Tandem
23493(@pxref{Tandem Installation}, for details).
23494
23495@item posix/*
23496Files needed for building @command{gawk} on POSIX-compliant systems.
23497
23498@item pc/*
23499Files needed for building @command{gawk} under MS-DOS, MS Windows and OS/2
23500(@pxref{PC Installation}, for details).
23501
23502@item vms/*
23503Files needed for building @command{gawk} under VMS
23504(@pxref{VMS Installation}, for details).
23505
23506@item test/*
23507A test suite for
23508@command{gawk}.  You can use @samp{make check} from the top-level @command{gawk}
23509directory to run your version of @command{gawk} against the test suite.
23510If @command{gawk} successfully passes @samp{make check}, then you can
23511be confident of a successful port.
23512@end table
23513@c ENDOFRANGE gawdis
23514
23515@node Unix Installation
23516@appendixsec Compiling and Installing @command{gawk} on Unix
23517
23518Usually, you can compile and install @command{gawk} by typing only two
23519commands.  However, if you use an unusual system, you may need
23520to configure @command{gawk} for your system yourself.
23521
23522@menu
23523* Quick Installation::               Compiling @command{gawk} under Unix.
23524* Additional Configuration Options:: Other compile-time options.
23525* Configuration Philosophy::         How it's all supposed to work.
23526@end menu
23527
23528@node Quick Installation
23529@appendixsubsec Compiling @command{gawk} for Unix
23530
23531@c @cindex installation, unix
23532After you have extracted the @command{gawk} distribution, @command{cd}
23533to @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}.  Like most GNU software,
23534@command{gawk} is configured
23535automatically for your Unix system by running the @command{configure} program.
23536This program is a Bourne shell script that is generated automatically using
23537GNU @command{autoconf}.
23538@ifnotinfo
23539(The @command{autoconf} software is
23540described fully in
23541@cite{Autoconf---Generating Automatic Configuration Scripts},
23542which is available from the Free Software Foundation.)
23543@end ifnotinfo
23544@ifinfo
23545(The @command{autoconf} software is described fully starting with
23546@ref{Top}.)
23547@end ifinfo
23548
23549To configure @command{gawk}, simply run @command{configure}:
23550
23551@example
23552sh ./configure
23553@end example
23554
23555This produces a @file{Makefile} and @file{config.h} tailored to your system.
23556The @file{config.h} file describes various facts about your system.
23557You might want to edit the @file{Makefile} to
23558change the @code{CFLAGS} variable, which controls
23559the command-line options that are passed to the C compiler (such as
23560optimization levels or compiling for debugging).
23561
23562Alternatively, you can add your own values for most @command{make}
23563variables on the command line, such as @code{CC} and @code{CFLAGS}, when
23564running @command{configure}:
23565
23566@example
23567CC=cc CFLAGS=-g sh ./configure
23568@end example
23569
23570@noindent
23571See the file @file{INSTALL} in the @command{gawk} distribution for
23572all the details.
23573
23574After you have run @command{configure} and possibly edited the @file{Makefile},
23575type:
23576
23577@example
23578make
23579@end example
23580
23581@noindent
23582Shortly thereafter, you should have an executable version of @command{gawk}.
23583That's all there is to it!
23584To verify that @command{gawk} is working properly,
23585run @samp{make check}.  All of the tests should succeed.
23586If these steps do not work, or if any of the tests fail,
23587check the files in the @file{README_d} directory to see if you've
23588found a known problem.  If the failure is not described there,
23589please send in a bug report
23590(@pxref{Bugs}.)
23591
23592@node Additional Configuration Options
23593@appendixsubsec Additional Configuration Options
23594@cindex @command{gawk}, configuring, options
23595@c comma is part of primary
23596@cindex configuration options, @command{gawk}
23597
23598There are several additional options you may use on the @command{configure}
23599command line when compiling @command{gawk} from scratch, including:
23600
23601@table @code
23602@cindex @code{--enable-portals} configuration option
23603@cindex configuration option, @code{--enable-portals}
23604@item --enable-portals
23605Treat pathnames that begin
23606with @file{/p} as BSD portal files when doing two-way I/O with
23607the @samp{|&} operator
23608(@pxref{Portal Files}).
23609
23610@cindex @code{--enable-switch} configuration option
23611@cindex configuration option, @code{--enable-switch}
23612@item --enable-switch
23613Enable the recognition and execution of C-style @code{switch} statements
23614in @command{awk} programs
23615(@pxref{Switch Statement}.)
23616
23617@cindex Linux
23618@cindex GNU/Linux
23619@cindex @code{--with-included-gettext} configuration option
23620@cindex @code{--with-included-gettext} configuration option, configuring @command{gawk} with
23621@cindex configuration option, @code{--with-included-gettext}
23622@item --with-included-gettext
23623Use the version of the @code{gettext} library that comes with @command{gawk}.
23624This option should be used on systems that do @emph{not} use @value{PVERSION} 2 (or later)
23625of the GNU C library.
23626All known modern GNU/Linux systems use Glibc 2.  Use this option on any other system.
23627
23628@cindex @code{--disable-lint} configuration option
23629@cindex configuration option, @code{--disable-lint}
23630@item --disable-lint
23631This option disables all lint checking within @code{gawk}.  The
23632@option{--lint} and @option{--lint-old} options
23633(@pxref{Options})
23634are accepted, but silently do nothing.
23635Similarly, setting the @code{LINT} variable
23636(@pxref{User-modified})
23637has no effect on the running @command{awk} program.
23638
23639When used with GCC's automatic dead-code-elimination, this option
23640cuts almost 200K bytes off the size of the @command{gawk}
23641executable on GNU/Linux x86 systems.  Results on other systems and
23642with other compilers are likely to vary.
23643Using this option may bring you some slight performance improvement.
23644
23645Using this option will cause some of the tests in the test suite
23646to fail.  This option may be removed at a later date.
23647
23648@cindex @code{--disable-nls} configuration option
23649@cindex configuration option, @code{--disable-nls}
23650@item --disable-nls
23651Disable all message-translation facilities.
23652This is usually not desirable, but it may bring you some slight performance
23653improvement.
23654You should also use this option if @option{--with-included-gettext}
23655doesn't work on your system.
23656@end table
23657
23658@node Configuration Philosophy
23659@appendixsubsec The Configuration Process
23660
23661@cindex @command{gawk}, configuring
23662This @value{SECTION} is of interest only if you know something about using the
23663C language and the Unix operating system.
23664
23665The source code for @command{gawk} generally attempts to adhere to formal
23666standards wherever possible.  This means that @command{gawk} uses library
23667routines that are specified by the ISO C standard and by the POSIX
23668operating system interface standard.  When using an ISO C compiler,
23669function prototypes are used to help improve the compile-time checking.
23670
23671Many Unix systems do not support all of either the ISO or the
23672POSIX standards.  The @file{missing_d} subdirectory in the @command{gawk}
23673distribution contains replacement versions of those functions that are
23674most likely to be missing.
23675
23676The @file{config.h} file that @command{configure} creates contains
23677definitions that describe features of the particular operating system
23678where you are attempting to compile @command{gawk}.  The three things
23679described by this file are: what header files are available, so that
23680they can be correctly included, what (supposedly) standard functions
23681are actually available in your C libraries, and various miscellaneous
23682facts about your variant of Unix.  For example, there may not be an
23683@code{st_blksize} element in the @code{stat} structure.  In this case,
23684@samp{HAVE_ST_BLKSIZE} is undefined.
23685
23686@cindex @code{custom.h} file
23687It is possible for your C compiler to lie to @command{configure}. It may
23688do so by not exiting with an error when a library function is not
23689available.  To get around this, edit the file @file{custom.h}.
23690Use an @samp{#ifdef} that is appropriate for your system, and either
23691@code{#define} any constants that @command{configure} should have defined but
23692didn't, or @code{#undef} any constants that @command{configure} defined and
23693should not have.  @file{custom.h} is automatically included by
23694@file{config.h}.
23695
23696It is also possible that the @command{configure} program generated by
23697@command{autoconf} will not work on your system in some other fashion.
23698If you do have a problem, the file @file{configure.in} is the input for
23699@command{autoconf}.  You may be able to change this file and generate a
23700new version of @command{configure} that works on your system
23701(@pxref{Bugs},
23702for information on how to report problems in configuring @command{gawk}).
23703The same mechanism may be used to send in updates to @file{configure.in}
23704and/or @file{custom.h}.
23705
23706@node Non-Unix Installation
23707@appendixsec Installation on Other Operating Systems
23708
23709This @value{SECTION} describes how to install @command{gawk} on
23710various non-Unix systems.
23711
23712@menu
23713* Amiga Installation::          Installing @command{gawk} on an Amiga.
23714* BeOS Installation::           Installing @command{gawk} on BeOS.
23715* PC Installation::             Installing and Compiling @command{gawk} on
23716                                MS-DOS and OS/2.
23717* VMS Installation::            Installing @command{gawk} on VMS.
23718@end menu
23719
23720@node Amiga Installation
23721@appendixsubsec Installing @command{gawk} on an Amiga
23722
23723@cindex amiga
23724@cindex installation, amiga
23725You can install @command{gawk} on an Amiga system using a Unix emulation
23726environment, available via anonymous @command{ftp} from
23727@code{ftp.ninemoons.com} in the directory @file{pub/ade/current}.
23728This includes a shell based on @command{pdksh}.  The primary component of
23729this environment is a Unix emulation library, @file{ixemul.lib}.
23730@c could really use more background here, who wrote this, etc.
23731
23732A more complete distribution for the Amiga is available on
23733the Geek Gadgets CD-ROM, available from:
23734
23735@display
23736CRONUS
237371840 E. Warner Road #105-265
23738Tempe, AZ 85284  USA
23739US Toll Free: (800) 804-0833
23740Phone: +1-602-491-0442
23741FAX: +1-602-491-0048
23742Email: @email{info@@ninemoons.com}
23743WWW: @uref{http://www.ninemoons.com}
23744Anonymous @command{ftp} site: @code{ftp.ninemoons.com}
23745@end display
23746
23747Once you have the distribution, you can configure @command{gawk} simply by
23748running @command{configure}:
23749
23750@example
23751configure -v m68k-amigaos
23752@end example
23753
23754Then run @command{make} and you should be all set!
23755If these steps do not work, please send in a bug report
23756(@pxref{Bugs}).
23757
23758@node BeOS Installation
23759@appendixsubsec Installing @command{gawk} on BeOS
23760@cindex BeOS
23761@cindex installation, beos
23762
23763@c From email contributed by Martin Brown, mc@whoever.com
23764Since BeOS DR9, all the tools that you should need to build @code{gawk} are
23765included with BeOS. The process is basically identical to the Unix process
23766of running @command{configure} and then @command{make}. Full instructions are given below.
23767
23768You can compile @command{gawk} under BeOS by extracting the standard sources
23769and running @command{configure}. You @emph{must} specify the location
23770prefix for the installation directory. For BeOS DR9 and beyond, the best directory to
23771use is @file{/boot/home/config}, so the @command{configure} command is:
23772
23773@example
23774configure --prefix=/boot/home/config
23775@end example
23776
23777This installs the compiled application into @file{/boot/home/config/bin},
23778which is already specified in the standard @env{PATH}.
23779
23780Once the configuration process is completed, you can run @command{make},
23781and then @samp{make install}:
23782
23783@example
23784$ make
23785@dots{}
23786$ make install
23787@end example
23788
23789BeOS uses @command{bash} as its shell; thus, you use @command{gawk} the same way you would
23790under Unix.
23791If these steps do not work, please send in a bug report
23792(@pxref{Bugs}).
23793
23794@c Rewritten by Scott Deifik <scottd@amgen.com>
23795@c and Darrel Hankerson <hankedr@mail.auburn.edu>
23796
23797@node PC Installation
23798@appendixsubsec Installation on PC Operating Systems
23799
23800@c first comma is part of primary
23801@cindex PC operating systems, @command{gawk} on, installing
23802@c {PC, gawk on} is the secondary term
23803@cindex operating systems, PC, @command{gawk} on, installing
23804This @value{SECTION} covers installation and usage of @command{gawk} on x86 machines
23805running DOS, any version of Windows, or OS/2.
23806In this @value{SECTION}, the term ``Windows32''
23807refers to any of Windows-95/98/ME/NT/2000.
23808
23809The limitations of DOS (and DOS shells under Windows or OS/2) has meant
23810that various ``DOS extenders'' are often used with programs such as
23811@command{gawk}.  The varying capabilities of Microsoft Windows 3.1
23812and Windows32 can add to the confusion.  For an overview of the
23813considerations, please refer to @file{README_d/README.pc} in the
23814distribution.
23815
23816@menu
23817* PC Binary Installation::      Installing a prepared distribution.
23818* PC Compiling::                Compiling @command{gawk} for MS-DOS, Windows32,
23819                                and OS/2.
23820* PC Dynamic::                  Compiling @command{gawk} for dynamic libraries.
23821* PC Using::                    Running @command{gawk} on MS-DOS, Windows32 and
23822                                OS/2.
23823* Cygwin::                      Building and running @command{gawk} for
23824                                Cygwin.
23825@end menu
23826
23827@node PC Binary Installation
23828@appendixsubsubsec Installing a Prepared Distribution for PC Systems
23829
23830If you have received a binary distribution prepared by the DOS
23831maintainers, then @command{gawk} and the necessary support files appear
23832under the @file{gnu} directory, with executables in @file{gnu/bin},
23833libraries in @file{gnu/lib/awk}, and manual pages under @file{gnu/man}.
23834This is designed for easy installation to a @file{/gnu} directory on your
23835drive---however, the files can be installed anywhere provided @env{AWKPATH} is
23836set properly.  Regardless of the installation directory, the first line of
23837@file{igawk.cmd} and @file{igawk.bat} (in @file{gnu/bin}) may need to be
23838edited.
23839
23840The binary distribution contains a separate file describing the
23841contents. In particular, it may include more than one version of the
23842@command{gawk} executable.
23843
23844OS/2 (32 bit, EMX) binary distributions are prepared for the @file{/usr}
23845directory of your preferred drive. Set @env{UNIXROOT} to your installation
23846drive (e.g., @samp{e:}) if you want to install @command{gawk} onto another drive
23847than the hardcoded default @samp{c:}. Executables appear in @file{/usr/bin},
23848libraries under @file{/usr/share/awk}, manual pages under @file{/usr/man},
23849Texinfo documentation under @file{/usr/info} and NLS files under @file{/usr/share/locale}.
23850If you already have a file @file{/usr/info/dir} from another package
23851@emph{do not overwrite it!} Instead enter the following commands at your prompt
23852(replace @samp{x:} by your installation drive):
23853
23854@example
23855install-info --info-dir=x:/usr/info x:/usr/info/awk.info
23856install-info --info-dir=x:/usr/info x:/usr/info/gawkinet.info
23857@end example
23858
23859However, the files can be installed anywhere provided @env{AWKPATH} is
23860set properly.
23861
23862The binary distribution may contain a separate file containing additional
23863or more detailed installation instructions.
23864
23865@node PC Compiling
23866@appendixsubsubsec Compiling @command{gawk} for PC Operating Systems
23867
23868@command{gawk} can be compiled for MS-DOS, Windows32, and OS/2 using the GNU
23869development tools from DJ Delorie (DJGPP; MS-DOS only) or Eberhard
23870Mattes (EMX; MS-DOS, Windows32 and OS/2).  Microsoft Visual C/C++ can be used
23871to build a Windows32 version, and Microsoft C/C++ can be
23872used to build 16-bit versions for MS-DOS and OS/2.
23873@c FIXME:
23874(As of @command{gawk} 3.1.2, the MSC version doesn't work. However,
23875the maintainer is working on fixing it.)
23876The file
23877@file{README_d/README.pc} in the @command{gawk} distribution contains
23878additional notes, and @file{pc/Makefile} contains important information on
23879compilation options.
23880
23881To build @command{gawk} for MS-DOS, Windows32, and OS/2 (16 bit only; for 32 bit
23882(EMX) you can use the @command{configure} script and skip the following paragraphs;
23883for details see below), copy the files in the @file{pc} directory (@emph{except}
23884for @file{ChangeLog}) to the directory with the rest of the @command{gawk}
23885sources. The @file{Makefile} contains a configuration section with comments and
23886may need to be edited in order to work with your @command{make} utility.
23887
23888The @file{Makefile} contains a number of targets for building various MS-DOS,
23889Windows32, and OS/2 versions. A list of targets is printed if the @command{make}
23890command is given without a target. As an example, to build @command{gawk}
23891using the DJGPP tools, enter @samp{make djgpp}.
23892
23893Using @command{make} to run the standard tests and to install @command{gawk}
23894requires additional Unix-like tools, including @command{sh}, @command{sed}, and
23895@command{cp}. In order to run the tests, the @file{test/*.ok} files may need to
23896be converted so that they have the usual DOS-style end-of-line markers. Most
23897of the tests work properly with Stewartson's shell along with the
23898companion utilities or appropriate GNU utilities.  However, some editing of
23899@file{test/Makefile} is required. It is recommended that you copy the file
23900@file{pc/Makefile.tst} over the file @file{test/Makefile} as a
23901replacement. Details can be found in @file{README_d/README.pc}
23902and in the file @file{pc/Makefile.tst}.
23903
23904The 32 bit EMX version of @command{gawk} works ``out of the box'' under OS/2.
23905In principle, it is possible to compile @command{gawk} the following way:
23906
23907@example
23908$ ./configure
23909$ make
23910@end example
23911
23912This is not recommended, though. To get an OMF executable you should
23913use the following commands at your @command{sh} prompt:
23914
23915@example
23916$ CPPFLAGS="-D__ST_MT_ERRNO__"
23917$ export CPPFLAGS
23918$ CFLAGS="-O2 -Zomf -Zmt"
23919$ export CFLAGS
23920$ LDFLAGS="-s -Zcrtdll -Zlinker /exepack:2 -Zlinker /pm:vio -Zstack 0x8000"
23921$ export LDFLAGS
23922$ RANLIB="echo"
23923$ export RANLIB
23924$ ./configure --prefix=c:/usr --without-included-gettext
23925$ make AR=emxomfar
23926@end example
23927
23928These are just suggestions. You may use any other set of (self-consistent)
23929environment variables and compiler flags.
23930
23931To get an FHS-compliant file hierarchy it is recommended to use the additional
23932@command{configure} options @option{--infodir=c:/usr/share/info}, @option{--mandir=c:/usr/share/man}
23933and @option{--libexecdir=c:/usr/lib}.
23934
23935The internal @code{gettext} library tends to be problematic. It is therefore recommended
23936to use either an external one (@option{--without-included-gettext}) or to disable
23937NLS entirely (@option{--disable-nls}).
23938
23939If you use GCC 2.95 or newer it is recommended to use also:
23940
23941@example
23942$ LIBS="-lgcc"
23943$ export LIBS
23944@end example
23945
23946You can also get an @code{a.out} executable if you prefer:
23947
23948@example
23949$ CPPFLAGS="-D__ST_MT_ERRNO__"
23950$ export CPPFLAGS
23951$ CFLAGS="-O2 -Zmt"
23952$ export CFLAGS
23953$ LDFLAGS="-s -Zstack 0x8000"
23954$ LIBS="-lgcc"
23955$ unset RANLIB
23956$ ./configure --prefix=c:/usr --without-included-gettext
23957$ make
23958@end example
23959
23960@strong{Note:} Even if the compiled @command{gawk.exe} (@code{a.out}) executable
23961contains a DOS header, it does @emph{not} work under DOS. To compile an executable
23962that runs under DOS, @code{"-DPIPES_SIMULATED"} must be added to @env{CPPFLAGS}.
23963But then some nonstandard extensions of @command{gawk} (e.g., @samp{|&}) do not work!
23964
23965After compilation the internal tests can be performed. Enter
23966@samp{make check CMP="diff -a"} at your command prompt. All tests
23967but the @code{pid} test are expected to work properly. The @code{pid}
23968test fails because child processes are not started by @code{fork()}.
23969
23970@samp{make install} works as expected.
23971
23972@strong{Note:} Most OS/2 ports of GNU @command{make} are not able to handle
23973the Makefiles of this package. If you encounter any problems with @command{make}
23974try GNU Make 3.79.1 or later versions. You should find the latest
23975version on @uref{http://www.unixos2.org/sw/pub/binary/make/} or on
23976@uref{ftp://hobbes.nmsu.edu/pub/os2/}.
23977
23978@node PC Dynamic
23979@appendixsubsubsec Compiling @command{gawk} For Dynamic Libraries
23980
23981@c From README_d/README.pcdynamic
23982@c 11 June 2003
23983
23984To compile @command{gawk} with dynamic extension support,
23985uncomment the definitions of @code{DYN_FLAGS}, @code{DYN_EXP},
23986@code{DYN_OBJ}, and @code{DYN_MAKEXP} in the configuration section of
23987the @file{Makefile}. There are two definitions for @code{DYN_MAKEXP}:
23988pick the one that matches your target.
23989
23990To build some of the example extension libraries, @command{cd} to the
23991extension directory and copy @file{Makefile.pc} to @file{Makefile}. You
23992can then build using the same two targets. To run the example
23993@command{awk} scripts, you'll need to either change the call to
23994the @code{extension} function to match the name of the library (for
23995instance, change @code{"./ordchr.so"} to @code{"ordchr.dll"} or simply
23996@code{"ordchr"}), or rename the library to match the call (for instance,
23997rename @file{ordchr.dll} to @file{ordchr.so}).
23998
23999If you build @command{gawk.exe} with one compiler but want to build
24000an extension library with the other, you need to copy the import
24001library. Visual C uses a library called @file{gawk.lib}, while MinGW uses
24002a library called @file{libgawk.a}. These files are equivalent and will
24003interoperate if you give them the correct name.  The resulting shared
24004libraries are also interoperable.
24005
24006To create your own extension library, you can use the examples as models,
24007but you're essentially on your own. Post to @code{comp.lang.awk} or
24008send electronic mail to @email{ptjm@@interlog.com} if you have problems getting
24009started. If you need to access functions or variables which are not
24010exported by @command{gawk.exe}, add them to @file{gawkw32.def} and
24011rebuild. You should also add @code{ATTRIBUTE_EXPORTED} to the declaration
24012in @file{awk.h} of any variables you add to @file{gawkw32.def}.
24013
24014Note that extension libraries have the name of the @command{awk}
24015executable embedded in them at link time, so they will work only
24016with @command{gawk.exe}. In particular, they won't work if you
24017rename @command{gawk.exe} to @command{awk.exe} or if you try to use
24018@command{pgawk.exe}. You can perform profiling by temporarily renaming
24019@command{pgawk.exe} to @command{gawk.exe}. You can resolve this problem
24020by changing the program name in the definition of @code{DYN_MAKEXP}
24021for your compiler.
24022
24023On Windows32, libraries are sought first in the current directory, then in
24024the directory containing @command{gawk.exe}, and finally through the
24025@env{PATH} environment variable.
24026
24027@node PC Using
24028@appendixsubsubsec Using @command{gawk} on PC Operating Systems
24029@c STARTOFRANGE opgawx
24030@cindex operating systems, PC, @command{gawk} on
24031@c STARTOFRANGE pcgawon
24032@cindex PC operating systems, @command{gawk} on
24033
24034With the exception of the Cygwin environment,
24035the @samp{|&} operator and TCP/IP networking
24036(@pxref{TCP/IP Networking})
24037are not supported for MS-DOS or MS-Windows. EMX (OS/2 only) does support
24038at least the @samp{|&} operator.
24039
24040@cindex search paths
24041@cindex @command{gawk}, OS/2 version of
24042@cindex @command{gawk}, MS-DOS version of
24043@cindex @code{;} (semicolon), @code{AWKPATH} variable and
24044@cindex semicolon (@code{;}), @code{AWKPATH} variable and
24045@cindex @code{AWKPATH} environment variable
24046The OS/2 and MS-DOS versions of @command{gawk} search for program files as
24047described in @ref{AWKPATH Variable}.
24048However, semicolons (rather than colons) separate elements
24049in the @env{AWKPATH} variable. If @env{AWKPATH} is not set or is empty,
24050then the default search path for OS/2 (16 bit) and MS-DOS versions is
24051@code{@w{".;c:/lib/awk;c:/gnu/lib/awk"}}.
24052
24053The search path for OS/2 (32 bit, EMX) is determined by the prefix directory
24054(most likely @file{/usr} or @file{c:/usr}) that has been specified as an option of
24055the @command{configure} script like it is the case for the Unix versions.
24056If @file{c:/usr} is the prefix directory then the default search path contains @file{.}
24057and @file{c:/usr/share/awk}.
24058Additionally, to support binary distributions of @command{gawk} for OS/2
24059systems whose drive @samp{c:} might not support long file names or might not exist
24060at all, there is a special environment variable. If @env{UNIXROOT} specifies
24061a drive then this specific drive is also searched for program files.
24062E.g., if @env{UNIXROOT} is set to @file{e:} the complete default search path is
24063@code{@w{".;c:/usr/share/awk;e:/usr/share/awk"}}.
24064
24065An @command{sh}-like shell (as opposed to @command{command.com} under MS-DOS
24066or @command{cmd.exe} under OS/2) may be useful for @command{awk} programming.
24067Ian Stewartson has written an excellent shell for MS-DOS and OS/2,
24068Daisuke Aoyama has ported GNU @command{bash} to MS-DOS using the DJGPP tools,
24069and several shells are available for OS/2, including @command{ksh}.  The file
24070@file{README_d/README.pc} in the @command{gawk} distribution contains
24071information on these shells.  Users of Stewartson's shell on DOS should
24072examine its documentation for handling command lines; in particular,
24073the setting for @command{gawk} in the shell configuration may need to be
24074changed and the @code{ignoretype} option may also be of interest.
24075
24076@cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable
24077@cindex @code{BINMODE} variable
24078Under OS/2 and DOS, @command{gawk} (and many other text programs) silently
24079translate end-of-line @code{"\r\n"} to @code{"\n"} on input and @code{"\n"}
24080to @code{"\r\n"} on output.  A special @code{BINMODE} variable allows
24081control over these translations and is interpreted as follows:
24082
24083@itemize @bullet
24084@item
24085If @code{BINMODE} is @samp{"r"}, or
24086@code{(BINMODE & 1)} is nonzero, then
24087binary mode is set on read (i.e., no translations on reads).
24088
24089@item
24090If @code{BINMODE} is @code{"w"}, or
24091@code{(BINMODE & 2)} is nonzero, then
24092binary mode is set on write (i.e., no translations on writes).
24093
24094@item
24095If @code{BINMODE} is @code{"rw"} or @code{"wr"},
24096binary mode is set for both read and write
24097(same as @code{(BINMODE & 3)}).
24098
24099@item
24100@code{BINMODE=@var{non-null-string}} is
24101the same as @samp{BINMODE=3} (i.e., no translations on
24102reads or writes).  However, @command{gawk} issues a warning
24103message if the string is not one of @code{"rw"} or @code{"wr"}.
24104@end itemize
24105
24106@noindent
24107The modes for standard input and standard output are set one time
24108only (after the
24109command line is read, but before processing any of the @command{awk} program).
24110Setting @code{BINMODE} for standard input or
24111standard output is accomplished by using an
24112appropriate @samp{-v BINMODE=@var{N}} option on the command line.
24113@code{BINMODE} is set at the time a file or pipe is opened and cannot be
24114changed mid-stream.
24115
24116The name @code{BINMODE} was chosen to match @command{mawk}
24117(@pxref{Other Versions}).
24118Both @command{mawk} and @command{gawk} handle @code{BINMODE} similarly; however,
24119@command{mawk} adds a @samp{-W BINMODE=@var{N}} option and an environment
24120variable that can set @code{BINMODE}, @code{RS}, and @code{ORS}.  The
24121files @file{binmode[1-3].awk} (under @file{gnu/lib/awk} in some of the
24122prepared distributions) have been chosen to match @command{mawk}'s @samp{-W
24123BINMODE=@var{N}} option.  These can be changed or discarded; in particular,
24124the setting of @code{RS} giving the fewest ``surprises'' is open to debate.
24125@command{mawk} uses @samp{RS = "\r\n"} if binary mode is set on read, which is
24126appropriate for files with the DOS-style end-of-line.
24127
24128To illustrate, the following examples set binary mode on writes for standard
24129output and other files, and set @code{ORS} as the ``usual'' DOS-style
24130end-of-line:
24131
24132@example
24133gawk -v BINMODE=2 -v ORS="\r\n" @dots{}
24134@end example
24135
24136@noindent
24137or:
24138
24139@example
24140gawk -v BINMODE=w -f binmode2.awk @dots{}
24141@end example
24142
24143@noindent
24144These give the same result as the @samp{-W BINMODE=2} option in
24145@command{mawk}.
24146The following changes the record separator to @code{"\r\n"} and sets binary
24147mode on reads, but does not affect the mode on standard input:
24148
24149@example
24150gawk -v RS="\r\n" --source "BEGIN @{ BINMODE = 1 @}" @dots{}
24151@end example
24152
24153@noindent
24154or:
24155
24156@example
24157gawk -f binmode1.awk @dots{}
24158@end example
24159
24160@noindent
24161With proper quoting, in the first example the setting of @code{RS} can be
24162moved into the @code{BEGIN} rule.
24163
24164@node Cygwin
24165@appendixsubsubsec Using @command{gawk} In The Cygwin Environment
24166
24167@command{gawk} can be used ``out of the box'' under Windows if you are
24168using the Cygwin environment.@footnote{@uref{http://www.cygwin.com}}
24169This environment provides an excellent simulation of Unix, using the
24170GNU tools, such as @command{bash}, the GNU Compiler Collection (GCC),
24171GNU Make, and other GNU tools.  Compilation and installation for Cygwin
24172is the same as for a Unix system:
24173
24174@example
24175tar -xvpzf gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz
24176cd gawk-@value{VERSION}.@value{PATCHLEVEL}
24177./configure
24178make
24179@end example
24180
24181When compared to GNU/Linux on the same system, the @samp{configure}
24182step on Cygwin takes considerably longer.  However, it does finish,
24183and then the @samp{make} proceeds as usual.
24184
24185@strong{Note:} The @samp{|&} operator and TCP/IP networking
24186(@pxref{TCP/IP Networking})
24187are fully supported in the Cygwin environment.  This is not true
24188for any other environment for MS-DOS or MS-Windows.
24189
24190@node VMS Installation
24191@appendixsubsec How to Compile and Install @command{gawk} on VMS
24192
24193@c based on material from Pat Rankin <rankin@eql.caltech.edu>
24194@c now rankin@pactechdata.com
24195
24196@cindex installation, vms
24197This @value{SUBSECTION} describes how to compile and install @command{gawk} under VMS.
24198
24199@menu
24200* VMS Compilation::             How to compile @command{gawk} under VMS.
24201* VMS Installation Details::    How to install @command{gawk} under VMS.
24202* VMS Running::                 How to run @command{gawk} under VMS.
24203* VMS POSIX::                   Alternate instructions for VMS POSIX.
24204@end menu
24205
24206@node VMS Compilation
24207@appendixsubsubsec Compiling @command{gawk} on VMS
24208
24209To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that
24210issues all the necessary @code{CC} and @code{LINK} commands. There is
24211also a @file{Makefile} for use with the @code{MMS} utility.  From the source
24212directory, use either:
24213
24214@example
24215$ @@[.VMS]VMSBUILD.COM
24216@end example
24217
24218@noindent
24219or:
24220
24221@example
24222$ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK
24223@end example
24224
24225Depending upon which C compiler you are using, follow one of the sets
24226of instructions in this table:
24227
24228@table @asis
24229@item VAX C V3.x
24230Use either @file{vmsbuild.com} or @file{descrip.mms} as is.  These use
24231@code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0.
24232
24233@item VAX C V2.x
24234You must have Version 2.3 or 2.4; older ones won't work.  Edit either
24235@file{vmsbuild.com} or @file{descrip.mms} according to the comments in them.
24236For @file{vmsbuild.com}, this just entails removing two @samp{!} delimiters.
24237Also edit @file{config.h} (which is a copy of file @file{[.config]vms-conf.h})
24238and comment out or delete the two lines @samp{#define __STDC__ 0} and
24239@samp{#define VAXC_BUILTINS} near the end.
24240
24241@item GNU C
24242Edit @file{vmsbuild.com} or @file{descrip.mms}; the changes are different
24243from those for VAX C V2.x but equally straightforward.  No changes to
24244@file{config.h} are needed.
24245
24246@item DEC C
24247Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments.
24248No changes to @file{config.h} are needed.
24249@end table
24250
24251@command{gawk} has been tested under VAX/VMS 5.5-1 using VAX C V3.2, and
24252GNU C 1.40 and 2.3.  It should work without modifications for VMS V4.6 and up.
24253
24254@node VMS Installation Details
24255@appendixsubsubsec Installing @command{gawk} on VMS
24256
24257To install @command{gawk}, all you need is a ``foreign'' command, which is
24258a @code{DCL} symbol whose value begins with a dollar sign. For example:
24259
24260@example
24261$ GAWK :== $disk1:[gnubin]GAWK
24262@end example
24263
24264@noindent
24265Substitute the actual location of @command{gawk.exe} for
24266@samp{$disk1:[gnubin]}. The symbol should be placed in the
24267@file{login.com} of any user who wants to run @command{gawk},
24268so that it is defined every time the user logs on.
24269Alternatively, the symbol may be placed in the system-wide
24270@file{sylogin.com} procedure, which allows all users
24271to run @command{gawk}.
24272
24273Optionally, the help entry can be loaded into a VMS help library:
24274
24275@example
24276$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
24277@end example
24278
24279@noindent
24280(You may want to substitute a site-specific help library rather than
24281the standard VMS library @samp{HELPLIB}.)  After loading the help text,
24282the command:
24283
24284@example
24285$ HELP GAWK
24286@end example
24287
24288@noindent
24289provides information about both the @command{gawk} implementation and the
24290@command{awk} programming language.
24291
24292The logical name @samp{AWK_LIBRARY} can designate a default location
24293for @command{awk} program files.  For the @option{-f} option, if the specified
24294@value{FN} has no device or directory path information in it, @command{gawk}
24295looks in the current directory first, then in the directory specified
24296by the translation of @samp{AWK_LIBRARY} if the file is not found.
24297If, after searching in both directories, the file still is not found,
24298@command{gawk} appends the suffix @samp{.awk} to the filename and retries
24299the file search.  If @samp{AWK_LIBRARY} is not defined, that
24300portion of the file search fails benignly.
24301
24302@node VMS Running
24303@appendixsubsubsec Running @command{gawk} on VMS
24304
24305Command-line parsing and quoting conventions are significantly different
24306on VMS, so examples in this @value{DOCUMENT} or from other sources often need minor
24307changes.  They @emph{are} minor though, and all @command{awk} programs
24308should run correctly.
24309
24310Here are a couple of trivial tests:
24311
24312@example
24313$ gawk -- "BEGIN @{print ""Hello, World!""@}"
24314$ gawk -"W" version
24315! could also be -"W version" or "-W version"
24316@end example
24317
24318@noindent
24319Note that uppercase and mixed-case text must be quoted.
24320
24321The VMS port of @command{gawk} includes a @code{DCL}-style interface in addition
24322to the original shell-style interface (see the help entry for details).
24323One side effect of dual command-line parsing is that if there is only a
24324single parameter (as in the quoted string program above), the command
24325becomes ambiguous.  To work around this, the normally optional @option{--}
24326flag is required to force Unix style rather than @code{DCL} parsing.  If any
24327other dash-type options (or multiple parameters such as @value{DF}s to
24328process) are present, there is no ambiguity and @option{--} can be omitted.
24329
24330@c @cindex directory search
24331@c @cindex path, search
24332@cindex search paths
24333@cindex search paths, for source files
24334The default search path, when looking for @command{awk} program files specified
24335by the @option{-f} option, is @code{"SYS$DISK:[],AWK_LIBRARY:"}.  The logical
24336name @samp{AWKPATH} can be used to override this default.  The format
24337of @samp{AWKPATH} is a comma-separated list of directory specifications.
24338When defining it, the value should be quoted so that it retains a single
24339translation and not a multitranslation @code{RMS} searchlist.
24340
24341@node VMS POSIX
24342@appendixsubsubsec Building and Using @command{gawk} on VMS POSIX
24343
24344Ignore the instructions above, although @file{vms/gawk.hlp} should still
24345be made available in a help library.  The source tree should be unpacked
24346into a container file subsystem rather than into the ordinary VMS filesystem.
24347Make sure that the two scripts, @file{configure} and
24348@file{vms/posix-cc.sh}, are executable; use @samp{chmod +x} on them if
24349necessary.  Then execute the following two commands:
24350
24351@example
24352psx> CC=vms/posix-cc.sh configure
24353psx> make CC=c89 gawk
24354@end example
24355
24356@noindent
24357The first command constructs files @file{config.h} and @file{Makefile} out
24358of templates, using a script to make the C compiler fit @command{configure}'s
24359expectations.  The second command compiles and links @command{gawk} using
24360the C compiler directly; ignore any warnings from @command{make} about being
24361unable to redefine @code{CC}.  @command{configure} takes a very long
24362time to execute, but at least it provides incremental feedback as it runs.
24363
24364This has been tested with VAX/VMS V6.2, VMS POSIX V2.0, and DEC C V5.2.
24365
24366Once built, @command{gawk} works like any other shell utility.  Unlike
24367the normal VMS port of @command{gawk}, no special command-line manipulation is
24368needed in the VMS POSIX environment.
24369
24370@node Unsupported
24371@appendixsec Unsupported Operating System Ports
24372
24373This sections describes systems for which
24374the @command{gawk} port is no longer supported.
24375
24376@menu
24377* Atari Installation::          Installing @command{gawk} on the Atari ST.
24378* Tandem Installation::         Installing @command{gawk} on a Tandem.
24379@end menu
24380
24381@node Atari Installation
24382@appendixsubsec Installing @command{gawk} on the Atari ST
24383
24384The Atari port is no longer supported.  It is
24385included for those who might want to use it but it is no longer being
24386actively maintained.
24387
24388@c based on material from Michal Jaegermann <michal@gortel.phys.ualberta.ca>
24389@cindex atari
24390@cindex installation, atari
24391There are no substantial differences when installing @command{gawk} on
24392various Atari models.  Compiled @command{gawk} executables do not require
24393a large amount of memory with most @command{awk} programs, and should run on all
24394Motorola processor-based models (called further ST, even if that is not
24395exactly right).
24396
24397In order to use @command{gawk}, you need to have a shell, either text or
24398graphics, that does not map all the characters of a command line to
24399uppercase.  Maintaining case distinction in option flags is very
24400important (@pxref{Options}).
24401These days this is the default and it may only be a problem for some
24402very old machines.  If your system does not preserve the case of option
24403flags, you need to upgrade your tools.  Support for I/O
24404redirection is necessary to make it easy to import @command{awk} programs
24405from other environments.  Pipes are nice to have but not vital.
24406
24407@menu
24408* Atari Compiling::             Compiling @command{gawk} on Atari.
24409* Atari Using::                 Running @command{gawk} on Atari.
24410@end menu
24411
24412@node Atari Compiling
24413@appendixsubsubsec Compiling @command{gawk} on the Atari ST
24414
24415A proper compilation of @command{gawk} sources when @code{sizeof(int)}
24416differs from @code{sizeof(void *)} requires an ISO C compiler. An initial
24417port was done with @command{gcc}.  You may actually prefer executables
24418where @code{int}s are four bytes wide but the other variant works as well.
24419
24420You may need quite a bit of memory when trying to recompile the @command{gawk}
24421sources, as some source files (@file{regex.c} in particular) are quite
24422big.  If you run out of memory compiling such a file, try reducing the
24423optimization level for this particular file, which may help.
24424
24425@cindex Linux
24426@cindex GNU/Linux
24427With a reasonable shell (@command{bash} will do), you have a pretty good chance
24428that the @command{configure} utility will succeed, and in particular if
24429you run GNU/Linux, MiNT or a similar operating system.  Otherwise
24430sample versions of @file{config.h} and @file{Makefile.st} are given in the
24431@file{atari} subdirectory and can be edited and copied to the
24432corresponding files in the main source directory.  Even if
24433@command{configure} produces something, it might be advisable to compare
24434its results with the sample versions and possibly make adjustments.
24435
24436Some @command{gawk} source code fragments depend on a preprocessor define
24437@samp{atarist}.  This basically assumes the TOS environment with @command{gcc}.
24438Modify these sections as appropriate if they are not right for your
24439environment.  Also see the remarks about @env{AWKPATH} and @code{envsep} in
24440@ref{Atari Using}.
24441
24442As shipped, the sample @file{config.h} claims that the @code{system}
24443function is missing from the libraries, which is not true, and an
24444alternative implementation of this function is provided in
24445@file{unsupported/atari/system.c}.
24446Depending upon your particular combination of
24447shell and operating system, you might want to change the file to indicate
24448that @code{system} is available.
24449
24450@node Atari Using
24451@appendixsubsubsec Running @command{gawk} on the Atari ST
24452
24453An executable version of @command{gawk} should be placed, as usual,
24454anywhere in your @env{PATH} where your shell can find it.
24455
24456While executing, the Atari version of @command{gawk} creates a number of temporary files.  When
24457using @command{gcc} libraries for TOS, @command{gawk} looks for either of
24458the environment variables, @env{TEMP} or @env{TMPDIR}, in that order.
24459If either one is found, its value is assumed to be a directory for
24460temporary files.  This directory must exist, and if you can spare the
24461memory, it is a good idea to put it on a RAM drive.  If neither
24462@env{TEMP} nor @env{TMPDIR} are found, then @command{gawk} uses the
24463current directory for its temporary files.
24464
24465The ST version of @command{gawk} searches for its program files, as described in
24466@ref{AWKPATH Variable}.
24467The default value for the @env{AWKPATH} variable is taken from
24468@code{DEFPATH} defined in @file{Makefile}. The sample @command{gcc}/TOS
24469@file{Makefile} for the ST in the distribution sets @code{DEFPATH} to
24470@code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}.  The search path can be
24471modified by explicitly setting @env{AWKPATH} to whatever you want.
24472Note that colons cannot be used on the ST to separate elements in the
24473@env{AWKPATH} variable, since they have another reserved meaning.
24474Instead, you must use a comma to separate elements in the path.  When
24475recompiling, the separating character can be modified by initializing
24476the @code{envsep} variable in @file{unsupported/atari/gawkmisc.atr} to another
24477value.
24478
24479Although @command{awk} allows great flexibility in doing I/O redirections
24480from within a program, this facility should be used with care on the ST
24481running under TOS.  In some circumstances, the OS routines for file-handle
24482pool processing lose track of certain events, causing the
24483computer to crash and requiring a reboot.  Often a warm reboot is
24484sufficient.  Fortunately, this happens infrequently and in rather
24485esoteric situations.  In particular, avoid having one part of an
24486@command{awk} program using @code{print} statements explicitly redirected
24487to @file{/dev/stdout}, while other @code{print} statements use the
24488default standard output, and a calling shell has redirected standard
24489output to a file.
24490@c 10/2000: Is this still true, now that gawk does /dev/stdout internally?
24491
24492When @command{gawk} is compiled with the ST version of @command{gcc} and its
24493usual libraries, it accepts both @samp{/} and @samp{\} as path separators.
24494While this is convenient, it should be remembered that this removes one
24495technically valid character (@samp{/}) from your @value{FN}.
24496It may also create problems for external programs called via the @code{system}
24497function, which may not support this convention.  Whenever it is possible
24498that a file created by @command{gawk} will be used by some other program,
24499use only backslashes.  Also remember that in @command{awk}, backslashes in
24500strings have to be doubled in order to get literal backslashes
24501(@pxref{Escape Sequences}).
24502
24503@node Tandem Installation
24504@appendixsubsec Installing @command{gawk} on a Tandem
24505@cindex tandem
24506@cindex installation, tandem
24507
24508The Tandem port is only minimally supported.
24509The port's contributor no longer has access to a Tandem system.
24510
24511@c This section based on README.Tandem by Stephen Davies (scldad@sdc.com.au)
24512The Tandem port was done on a Cyclone machine running D20.
24513The port is pretty clean and all facilities seem to work except for
24514the I/O piping facilities
24515(@pxref{Getline/Pipe},
24516@ref{Getline/Variable/Pipe},
24517and
24518@ref{Redirection}),
24519which is just too foreign a concept for Tandem.
24520
24521To build a Tandem executable from source, download all of the files so
24522that the @value{FN}s on the Tandem box conform to the restrictions of D20.
24523For example, @file{array.c} becomes @file{ARRAYC}, and @file{awk.h}
24524becomes @file{AWKH}.  The totally Tandem-specific files are in the
24525@file{tandem} ``subvolume'' (@file{unsupported/tandem} in the @command{gawk}
24526distribution) and should be copied to the main source directory before
24527building @command{gawk}.
24528
24529The file @file{compit} can then be used to compile and bind an executable.
24530Alas, there is no @command{configure} or @command{make}.
24531
24532Usage is the same as for Unix, except that D20 requires all @samp{@{} and
24533@samp{@}} characters to be escaped with @samp{~} on the command line
24534(but @emph{not} in script files). Also, the standard Tandem syntax for
24535@samp{/in filename,out filename/} must be used instead of the usual
24536Unix @samp{<} and @samp{>} for file redirection.  (Redirection options
24537on @code{getline}, @code{print} etc., are supported.)
24538
24539The @samp{-mr @var{val}} option
24540(@pxref{Options})
24541has been ``stolen'' to enable Tandem users to process fixed-length
24542records with no ``end-of-line'' character. That is, @samp{-mr 74} tells
24543@command{gawk} to read the input file as fixed 74-byte records.
24544@c ENDOFRANGE opgawx
24545@c ENDOFRANGE pcgawon
24546
24547@node Bugs
24548@appendixsec Reporting Problems and Bugs
24549@cindex archeologists
24550@quotation
24551@i{There is nothing more dangerous than a bored archeologist.}@*
24552The Hitchhiker's Guide to the Galaxy
24553@end quotation
24554@c the radio show, not the book. :-)
24555
24556@c STARTOFRANGE dbugg
24557@cindex debugging @command{gawk}, bug reports
24558@c STARTOFRANGE tblgawb
24559@cindex troubleshooting, @command{gawk}, bug reports
24560If you have problems with @command{gawk} or think that you have found a bug,
24561please report it to the developers; we cannot promise to do anything
24562but we might well want to fix it.
24563
24564Before reporting a bug, make sure you have actually found a real bug.
24565Carefully reread the documentation and see if it really says you can do
24566what you're trying to do.  If it's not clear whether you should be able
24567to do something or not, report that too; it's a bug in the documentation!
24568
24569Before reporting a bug or trying to fix it yourself, try to isolate it
24570to the smallest possible @command{awk} program and input @value{DF} that
24571reproduces the problem.  Then send us the program and @value{DF},
24572some idea of what kind of Unix system you're using,
24573the compiler you used to compile @command{gawk}, and the exact results
24574@command{gawk} gave you.  Also say what you expected to occur; this helps
24575us decide whether the problem is really in the documentation.
24576
24577@cindex @code{bug-gawk@@gnu.org} bug reporting address
24578@cindex email address for bug reports, @code{bug-gawk@@gnu.org}
24579@cindex bug reports, email address, @code{bug-gawk@@gnu.org}
24580Once you have a precise problem, send email to @email{bug-gawk@@gnu.org}.
24581
24582@cindex Robbins, Arnold
24583Please include the version number of @command{gawk} you are using.
24584You can get this information with the command @samp{gawk --version}.
24585Using this address automatically sends a carbon copy of your
24586mail to me.  If necessary, I can be reached directly at
24587@email{arnold@@gnu.org}.  The bug reporting address is preferred since the
24588email list is archived at the GNU Project.
24589@emph{All email should be in English, since that is my native language.}
24590
24591@cindex @code{comp.lang.awk} newsgroup
24592@strong{Caution:} Do @emph{not} try to report bugs in @command{gawk} by
24593posting to the Usenet/Internet newsgroup @code{comp.lang.awk}.
24594While the @command{gawk} developers do occasionally read this newsgroup,
24595there is no guarantee that we will see your posting.  The steps described
24596above are the official recognized ways for reporting bugs.
24597
24598Non-bug suggestions are always welcome as well.  If you have questions
24599about things that are unclear in the documentation or are just obscure
24600features, ask me; I will try to help you out, although I
24601may not have the time to fix the problem.  You can send me electronic
24602mail at the Internet address noted previously.
24603
24604If you find bugs in one of the non-Unix ports of @command{gawk}, please send
24605an electronic mail message to the person who maintains that port.  They
24606are named in the following list, as well as in the @file{README} file in the @command{gawk}
24607distribution.  Information in the @file{README} file should be considered
24608authoritative if it conflicts with this @value{DOCUMENT}.
24609
24610The people maintaining the non-Unix ports of @command{gawk} are
24611as follows:
24612
24613@ignore
24614@table @asis
24615@cindex Fish, Fred
24616@item Amiga
24617Fred Fish, @email{fnf@@ninemoons.com}.
24618
24619@cindex Brown, Martin
24620@item BeOS
24621Martin Brown, @email{mc@@whoever.com}.
24622
24623@cindex Deifik, Scott
24624@cindex Hankerson, Darrel
24625@item MS-DOS
24626Scott Deifik, @email{scottd@@amgen.com} and
24627Darrel Hankerson, @email{hankedr@@mail.auburn.edu}.
24628
24629@cindex Grigera, Juan
24630@item MS-Windows
24631Juan Grigera, @email{juan@@biophnet.unlp.edu.ar}.
24632
24633@item OS/2
24634The Unix for OS/2 team, @email{gawk-maintainer@@unixos2.org}.
24635
24636@cindex Davies, Stephen
24637@item Tandem
24638Stephen Davies, @email{scldad@@sdc.com.au}.
24639
24640@cindex Rankin, Pat
24641@item VMS
24642Pat Rankin, @email{rankin@@pactechdata.com}.
24643@end table
24644@end ignore
24645
24646@multitable {MS-Windows} {123456789012345678901234567890123456789001234567890}
24647@cindex Fish, Fred
24648@item Amiga @tab Fred Fish, @email{fnf@@ninemoons.com}.
24649
24650@cindex Brown, Martin
24651@item BeOS @tab Martin Brown, @email{mc@@whoever.com}.
24652
24653@cindex Deifik, Scott
24654@cindex Hankerson, Darrel
24655@item MS-DOS @tab Scott Deifik, @email{scottd@@amgen.com} and
24656Darrel Hankerson, @email{hankedr@@mail.auburn.edu}.
24657
24658@cindex Grigera, Juan
24659@item MS-Windows @tab Juan Grigera, @email{juan@@biophnet.unlp.edu.ar}.
24660
24661@item OS/2 @tab The Unix for OS/2 team, @email{gawk-maintainer@@unixos2.org}.
24662
24663@cindex Davies, Stephen
24664@item Tandem @tab Stephen Davies, @email{scldad@@sdc.com.au}.
24665
24666@cindex Rankin, Pat
24667@item VMS @tab Pat Rankin, @email{rankin@@pactechdata.com}.
24668@end multitable
24669
24670If your bug is also reproducible under Unix, please send a copy of your
24671report to the @email{bug-gawk@@gnu.org} email list as well.
24672@c ENDOFRANGE dbugg
24673@c ENDOFRANGE tblgawb
24674
24675@node Other Versions
24676@appendixsec Other Freely Available @command{awk} Implementations
24677@c STARTOFRANGE awkim
24678@cindex @command{awk}, implementations
24679@ignore
24680From: emory!amc.com!brennan (Michael Brennan)
24681Subject: C++ comments in awk programs
24682To: arnold@gnu.ai.mit.edu (Arnold Robbins)
24683Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT)
24684
24685@end ignore
24686@cindex Brennan, Michael
24687@quotation
24688@i{It's kind of fun to put comments like this in your awk code.}@*
24689@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}@*
24690Michael Brennan
24691@end quotation
24692
24693There are three other freely available @command{awk} implementations.
24694This @value{SECTION} briefly describes where to get them:
24695
24696@table @asis
24697@cindex Kernighan, Brian
24698@cindex source code, Bell Laboratories @command{awk}
24699@item Unix @command{awk}
24700Brian Kernighan has made his implementation of
24701@command{awk} freely available.
24702You can retrieve this version via the World Wide Web from
24703his home page.@footnote{@uref{http://cm.bell-labs.com/who/bwk}}
24704It is available in several archive formats:
24705
24706@table @asis
24707@item Shell archive
24708@uref{http://cm.bell-labs.com/who/bwk/awk.shar}
24709
24710@item Compressed @command{tar} file
24711@uref{http://cm.bell-labs.com/who/bwk/awk.tar.gz}
24712
24713@item Zip file
24714@uref{http://cm.bell-labs.com/who/bwk/awk.zip}
24715@end table
24716
24717This version requires an ISO C (1990 standard) compiler;
24718the C compiler from
24719GCC (the GNU Compiler Collection)
24720works quite nicely.
24721
24722@xref{BTL},
24723for a list of extensions in this @command{awk} that are not in POSIX @command{awk}.
24724
24725@cindex Brennan, Michael
24726@cindex @command{mawk} program
24727@cindex source code, @command{mawk}
24728@item @command{mawk}
24729Michael Brennan has written an independent implementation of @command{awk},
24730called @command{mawk}.  It is available under the GPL
24731(@pxref{Copying}),
24732just as @command{gawk} is.
24733
24734You can get it via anonymous @command{ftp} to the host
24735@code{@w{ftp.whidbey.net}}.  Change directory to @file{/pub/brennan}.
24736Use ``binary'' or ``image'' mode, and retrieve @file{mawk1.3.3.tar.gz}
24737(or the latest version that is there).
24738
24739@command{gunzip} may be used to decompress this file. Installation
24740is similar to @command{gawk}'s
24741(@pxref{Unix Installation}).
24742
24743@cindex extensions, @command{mawk}
24744@command{mawk} has the following extensions that are not in POSIX @command{awk}:
24745
24746@itemize @bullet
24747@item
24748The @code{fflush} built-in function for flushing buffered output
24749(@pxref{I/O Functions}).
24750
24751@item
24752The @samp{**} and @samp{**=} operators
24753(@pxref{Arithmetic Ops}
24754and also see
24755@ref{Assignment Ops}).
24756
24757@item
24758The use of @code{func} as an abbreviation for @code{function}
24759(@pxref{Definition Syntax}).
24760
24761@item
24762The @samp{\x} escape sequence
24763(@pxref{Escape Sequences}).
24764
24765@item
24766The @file{/dev/stdout}, and @file{/dev/stderr}
24767special files
24768(@pxref{Special Files}).
24769Use @code{"-"} instead of @code{"/dev/stdin"} with @command{mawk}.
24770
24771@item
24772The ability for @code{FS} and for the third
24773argument to @code{split} to be null strings
24774(@pxref{Single Character Fields}).
24775
24776@item
24777The ability to delete all of an array at once with @samp{delete @var{array}}
24778(@pxref{Delete}).
24779
24780@item
24781The ability for @code{RS} to be a regexp
24782(@pxref{Records}).
24783
24784@item
24785The @code{BINMODE} special variable for non-Unix operating systems
24786(@pxref{PC Using}).
24787@end itemize
24788
24789The next version of @command{mawk} will support @code{nextfile}.
24790
24791@cindex Sumner, Andrew
24792@cindex @command{awka} compiler for @command{awk}
24793@cindex source code, @command{awka}
24794@item @command{awka}
24795Written by Andrew Sumner,
24796@command{awka} translates @command{awk} programs into C, compiles them,
24797and links them with a library of functions that provides the core
24798@command{awk} functionality.
24799It also has a number of extensions.
24800
24801The @command{awk} translator is released under the GPL, and the library
24802is under the LGPL.
24803
24804To get @command{awka}, go to @uref{http://awka.sourceforge.net}.
24805You can reach Andrew Sumner at @email{andrew@@zbcom.net}.
24806
24807@cindex Beebe, Nelson H.F.
24808@cindex @command{pawk} profiling Bell Labs @command{awk}
24809@item @command{pawk}
24810Nelson H.F.@: Beebe at the University of Utah has modified
24811the Bell Labs @command{awk} to provide timing and profiling information.
24812It is different from @command{pgawk}
24813(@pxref{Profiling}),
24814in that it uses CPU-based profiling, not line-count
24815profiling.  You may find it at either
24816@uref{ftp://ftp.math.utah.edu/pub/pawk/pawk-20020210.tar.gz}
24817or
24818@uref{http://www.math.utah.edu/pub/pawk/pawk-20020210.tar.gz}.
24819
24820@end table
24821@c ENDOFRANGE gligawk
24822@c ENDOFRANGE ingawk
24823@c ENDOFRANGE awkim
24824
24825@node Notes
24826@appendix Implementation Notes
24827@c STARTOFRANGE gawii
24828@cindex @command{gawk}, implementation issues
24829@c STARTOFRANGE impis
24830@cindex implementation issues, @command{gawk}
24831
24832This appendix contains information mainly of interest to implementors and
24833maintainers of @command{gawk}.  Everything in it applies specifically to
24834@command{gawk} and not to other implementations.
24835
24836@menu
24837* Compatibility Mode::          How to disable certain @command{gawk}
24838                                extensions.
24839* Additions::                   Making Additions To @command{gawk}.
24840* Dynamic Extensions::          Adding new built-in functions to
24841                                @command{gawk}.
24842* Future Extensions::           New features that may be implemented one day.
24843@end menu
24844
24845@node Compatibility Mode
24846@appendixsec Downward Compatibility and Debugging
24847@cindex @command{gawk}, implementation issues, downward compatibility
24848@cindex @command{gawk}, implementation issues, debugging
24849@cindex troubleshooting, @command{gawk}
24850@c first comma is part of primary
24851@cindex implementation issues, @command{gawk}, debugging
24852
24853@xref{POSIX/GNU},
24854for a summary of the GNU extensions to the @command{awk} language and program.
24855All of these features can be turned off by invoking @command{gawk} with the
24856@option{--traditional} option or with the @option{--posix} option.
24857
24858If @command{gawk} is compiled for debugging with @samp{-DDEBUG}, then there
24859is one more option available on the command line:
24860
24861@table @code
24862@item -W parsedebug
24863@itemx --parsedebug
24864Prints out the parse stack information as the program is being parsed.
24865@end table
24866
24867This option is intended only for serious @command{gawk} developers
24868and not for the casual user.  It probably has not even been compiled into
24869your version of @command{gawk}, since it slows down execution.
24870
24871@node Additions
24872@appendixsec Making Additions to @command{gawk}
24873
24874If you find that you want to enhance @command{gawk} in a significant
24875fashion, you are perfectly free to do so.  That is the point of having
24876free software; the source code is available and you are free to change
24877it as you want (@pxref{Copying}).
24878
24879This @value{SECTION} discusses the ways you might want to change @command{gawk}
24880as well as any considerations you should bear in mind.
24881
24882@menu
24883* Adding Code::                 Adding code to the main body of
24884                                @command{gawk}.
24885* New Ports::                   Porting @command{gawk} to a new operating
24886                                system.
24887@end menu
24888
24889@node Adding Code
24890@appendixsubsec Adding New Features
24891
24892@c STARTOFRANGE adfgaw
24893@cindex adding, features to @command{gawk}
24894@c STARTOFRANGE fadgaw
24895@cindex features, adding to @command{gawk}
24896@c STARTOFRANGE gawadf
24897@cindex @command{gawk}, features, adding
24898You are free to add any new features you like to @command{gawk}.
24899However, if you want your changes to be incorporated into the @command{gawk}
24900distribution, there are several steps that you need to take in order to
24901make it possible for me to include your changes:
24902
24903@enumerate 1
24904@item
24905Before building the new feature into @command{gawk} itself,
24906consider writing it as an extension module
24907(@pxref{Dynamic Extensions}).
24908If that's not possible, continue with the rest of the steps in this list.
24909
24910@item
24911Get the latest version.
24912It is much easier for me to integrate changes if they are relative to
24913the most recent distributed version of @command{gawk}.  If your version of
24914@command{gawk} is very old, I may not be able to integrate them at all.
24915(@xref{Getting},
24916for information on getting the latest version of @command{gawk}.)
24917
24918@item
24919@ifnotinfo
24920Follow the @cite{GNU Coding Standards}.
24921@end ifnotinfo
24922@ifinfo
24923See @inforef{Top, , Version, standards, GNU Coding Standards}.
24924@end ifinfo
24925This document describes how GNU software should be written. If you haven't
24926read it, please do so, preferably @emph{before} starting to modify @command{gawk}.
24927(The @cite{GNU Coding Standards} are available from
24928the GNU Project's
24929@command{ftp}
24930site, at
24931@uref{ftp://ftp.gnu.org/gnu/GNUinfo/standards.text}.
24932An HTML version, suitable for reading with a WWW browser, is
24933available at
24934@uref{http://www.gnu.org/prep/standards_toc.html}.
24935Texinfo, Info, and DVI versions are also available.)
24936
24937@cindex @command{gawk}, coding style in
24938@item
24939Use the @command{gawk} coding style.
24940The C code for @command{gawk} follows the instructions in the
24941@cite{GNU Coding Standards}, with minor exceptions.  The code is formatted
24942using the traditional ``K&R'' style, particularly as regards to the placement
24943of braces and the use of tabs.  In brief, the coding rules for @command{gawk}
24944are as follows:
24945
24946@itemize @bullet
24947@item
24948Use ANSI/ISO style (prototype) function headers when defining functions.
24949
24950@item
24951Put the name of the function at the beginning of its own line.
24952
24953@item
24954Put the return type of the function, even if it is @code{int}, on the
24955line above the line with the name and arguments of the function.
24956
24957@item
24958Put spaces around parentheses used in control structures
24959(@code{if}, @code{while}, @code{for}, @code{do}, @code{switch},
24960and @code{return}).
24961
24962@item
24963Do not put spaces in front of parentheses used in function calls.
24964
24965@item
24966Put spaces around all C operators and after commas in function calls.
24967
24968@item
24969Do not use the comma operator to produce multiple side effects, except
24970in @code{for} loop initialization and increment parts, and in macro bodies.
24971
24972@item
24973Use real tabs for indenting, not spaces.
24974
24975@item
24976Use the ``K&R'' brace layout style.
24977
24978@item
24979Use comparisons against @code{NULL} and @code{'\0'} in the conditions of
24980@code{if}, @code{while}, and @code{for} statements, as well as in the @code{case}s
24981of @code{switch} statements, instead of just the
24982plain pointer or character value.
24983
24984@item
24985Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants
24986and the character constant @code{'\0'} where appropriate, instead of @code{1}
24987and @code{0}.
24988
24989@item
24990Use the @code{ISALPHA}, @code{ISDIGIT}, etc.@: macros, instead of the
24991traditional lowercase versions; these macros are better behaved for
24992non-ASCII character sets.
24993
24994@item
24995Provide one-line descriptive comments for each function.
24996
24997@item
24998Do not use @samp{#elif}. Many older Unix C compilers cannot handle it.
24999
25000@item
25001Do not use the @code{alloca} function for allocating memory off the stack.
25002Its use causes more portability trouble than is worth the minor benefit of not having
25003to free the storage. Instead, use @code{malloc} and @code{free}.
25004@end itemize
25005
25006@strong{Note:}
25007If I have to reformat your code to follow the coding style used in
25008@command{gawk}, I may not bother to integrate your changes at all.
25009
25010@item
25011Be prepared to sign the appropriate paperwork.
25012In order for the FSF to distribute your changes, you must either place
25013those changes in the public domain and submit a signed statement to that
25014effect, or assign the copyright in your changes to the FSF.
25015Both of these actions are easy to do and @emph{many} people have done so
25016already. If you have questions, please contact me
25017(@pxref{Bugs}),
25018or @email{gnu@@gnu.org}.
25019
25020@cindex Texinfo
25021@item
25022Update the documentation.
25023Along with your new code, please supply new sections and/or chapters
25024for this @value{DOCUMENT}.  If at all possible, please use real
25025Texinfo, instead of just supplying unformatted ASCII text (although
25026even that is better than no documentation at all).
25027Conventions to be followed in @cite{@value{TITLE}} are provided
25028after the @samp{@@bye} at the end of the Texinfo source file.
25029If possible, please update the @command{man} page as well.
25030
25031You will also have to sign paperwork for your documentation changes.
25032
25033@item
25034Submit changes as context diffs or unified diffs.
25035Use @samp{diff -c -r -N} or @samp{diff -u -r -N} to compare
25036the original @command{gawk} source tree with your version.
25037(I find context diffs to be more readable but unified diffs are
25038more compact.)
25039I recommend using the GNU version of @command{diff}.
25040Send the output produced by either run of @command{diff} to me when you
25041submit your changes.
25042(@xref{Bugs}, for the electronic mail
25043information.)
25044
25045Using this format makes it easy for me to apply your changes to the
25046master version of the @command{gawk} source code (using @code{patch}).
25047If I have to apply the changes manually, using a text editor, I may
25048not do so, particularly if there are lots of changes.
25049
25050@item
25051Include an entry for the @file{ChangeLog} file with your submission.
25052This helps further minimize the amount of work I have to do,
25053making it easier for me to accept patches.
25054@end enumerate
25055
25056Although this sounds like a lot of work, please remember that while you
25057may write the new code, I have to maintain it and support it. If it
25058isn't possible for me to do that with a minimum of extra work, then I
25059probably will not.
25060@c ENDOFRANGE adfgaw
25061@c ENDOFRANGE gawadf
25062@c ENDOFRANGE fadgaw
25063
25064@node New Ports
25065@appendixsubsec Porting @command{gawk} to a New Operating System
25066@cindex portability, @command{gawk}
25067@cindex operating systems, porting @command{gawk} to
25068
25069@cindex porting @command{gawk}
25070If you want to port @command{gawk} to a new operating system, there are
25071several steps:
25072
25073@enumerate 1
25074@item
25075Follow the guidelines in
25076@ifinfo
25077@ref{Adding Code},
25078@end ifinfo
25079@ifnotinfo
25080the previous @value{SECTION}
25081@end ifnotinfo
25082concerning coding style, submission of diffs, and so on.
25083
25084@item
25085When doing a port, bear in mind that your code must coexist peacefully
25086with the rest of @command{gawk} and the other ports. Avoid gratuitous
25087changes to the system-independent parts of the code. If at all possible,
25088avoid sprinkling @samp{#ifdef}s just for your port throughout the
25089code.
25090
25091If the changes needed for a particular system affect too much of the
25092code, I probably will not accept them.  In such a case, you can, of course,
25093distribute your changes on your own, as long as you comply
25094with the GPL
25095(@pxref{Copying}).
25096
25097@item
25098A number of the files that come with @command{gawk} are maintained by other
25099people at the Free Software Foundation.  Thus, you should not change them
25100unless it is for a very good reason; i.e., changes are not out of the
25101question, but changes to these files are scrutinized extra carefully.
25102The files are @file{getopt.h}, @file{getopt.c},
25103@file{getopt1.c}, @file{regex.h}, @file{regex.c}, @file{dfa.h},
25104@file{dfa.c}, @file{install-sh}, and @file{mkinstalldirs}.
25105
25106@item
25107Be willing to continue to maintain the port.
25108Non-Unix operating systems are supported by volunteers who maintain
25109the code needed to compile and run @command{gawk} on their systems. If noone
25110volunteers to maintain a port, it becomes unsupported and it may
25111be necessary to remove it from the distribution.
25112
25113@item
25114Supply an appropriate @file{gawkmisc.???} file.
25115Each port has its own @file{gawkmisc.???} that implements certain
25116operating system specific functions. This is cleaner than a plethora of
25117@samp{#ifdef}s scattered throughout the code.  The @file{gawkmisc.c} in
25118the main source directory includes the appropriate
25119@file{gawkmisc.???} file from each subdirectory.
25120Be sure to update it as well.
25121
25122Each port's @file{gawkmisc.???} file has a suffix reminiscent of the machine
25123or operating system for the port---for example, @file{pc/gawkmisc.pc} and
25124@file{vms/gawkmisc.vms}. The use of separate suffixes, instead of plain
25125@file{gawkmisc.c}, makes it possible to move files from a port's subdirectory
25126into the main subdirectory, without accidentally destroying the real
25127@file{gawkmisc.c} file.  (Currently, this is only an issue for the
25128PC operating system ports.)
25129
25130@item
25131Supply a @file{Makefile} as well as any other C source and header files that are
25132necessary for your operating system.  All your code should be in a
25133separate subdirectory, with a name that is the same as, or reminiscent
25134of, either your operating system or the computer system.  If possible,
25135try to structure things so that it is not necessary to move files out
25136of the subdirectory into the main source directory.  If that is not
25137possible, then be sure to avoid using names for your files that
25138duplicate the names of files in the main source directory.
25139
25140@item
25141Update the documentation.
25142Please write a section (or sections) for this @value{DOCUMENT} describing the
25143installation and compilation steps needed to compile and/or install
25144@command{gawk} for your system.
25145
25146@item
25147Be prepared to sign the appropriate paperwork.
25148In order for the FSF to distribute your code, you must either place
25149your code in the public domain and submit a signed statement to that
25150effect, or assign the copyright in your code to the FSF.
25151@ifinfo
25152Both of these actions are easy to do and @emph{many} people have done so
25153already. If you have questions, please contact me, or
25154@email{gnu@@gnu.org}.
25155@end ifinfo
25156@end enumerate
25157
25158Following these steps makes it much easier to integrate your changes
25159into @command{gawk} and have them coexist happily with other
25160operating systems' code that is already there.
25161
25162In the code that you supply and maintain, feel free to use a
25163coding style and brace layout that suits your taste.
25164
25165@node Dynamic Extensions
25166@appendixsec Adding New Built-in Functions to @command{gawk}
25167@cindex Robinson, Will
25168@cindex robot, the
25169@cindex Lost In Space
25170@quotation
25171@i{Danger Will Robinson!  Danger!!@*
25172Warning! Warning!}@*
25173The Robot
25174@end quotation
25175
25176@c STARTOFRANGE gladfgaw
25177@cindex @command{gawk}, functions, adding
25178@c STARTOFRANGE adfugaw
25179@cindex adding, functions to @command{gawk}
25180@c STARTOFRANGE fubadgaw
25181@cindex functions, built-in, adding to @command{gawk}
25182Beginning with @command{gawk} 3.1, it is possible to add new built-in
25183functions to @command{gawk} using dynamically loaded libraries. This
25184facility is available on systems (such as GNU/Linux) that support
25185the @code{dlopen} and @code{dlsym} functions.
25186This @value{SECTION} describes how to write and use dynamically
25187loaded extentions for @command{gawk}.
25188Experience with programming in
25189C or C++ is necessary when reading this @value{SECTION}.
25190
25191@strong{Caution:} The facilities described in this @value{SECTION}
25192are very much subject to change in the next @command{gawk} release.
25193Be aware that you may have to re-do everything, perhaps from scratch,
25194upon the next release.
25195
25196@menu
25197* Internals::                   A brief look at some @command{gawk} internals.
25198* Sample Library::              A example of new functions.
25199@end menu
25200
25201@node Internals
25202@appendixsubsec A Minimal Introduction to @command{gawk} Internals
25203@c STARTOFRANGE gawint
25204@cindex @command{gawk}, internals
25205
25206The truth is that @command{gawk} was not designed for simple extensibility.
25207The facilities for adding functions using shared libraries work, but
25208are something of a ``bag on the side.''  Thus, this tour is
25209brief and simplistic; would-be @command{gawk} hackers are encouraged to
25210spend some time reading the source code before trying to write
25211extensions based on the material presented here.  Of particular note
25212are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}.
25213Reading @file{awk.y} in order to see how the parse tree is built
25214would also be of use.
25215
25216@cindex @code{awk.h} file (internal)
25217With the disclaimers out of the way, the following types, structure
25218members, functions, and macros are declared in @file{awk.h} and are of
25219use when writing extensions.  The next @value{SECTION}
25220shows how they are used:
25221
25222@table @code
25223@cindex floating-point, numbers, @code{AWKNUM} internal type
25224@cindex numbers, floating-point, @code{AWKNUM} internal type
25225@cindex @code{AWKNUM} internal type
25226@item AWKNUM
25227An @code{AWKNUM} is the internal type of @command{awk}
25228floating-point numbers.  Typically, it is a C @code{double}.
25229
25230@cindex @code{NODE} internal type
25231@cindex strings, @code{NODE} internal type
25232@cindex numbers, @code{NODE} internal type
25233@item NODE
25234Just about everything is done using objects of type @code{NODE}.
25235These contain both strings and numbers, as well as variables and arrays.
25236
25237@cindex @code{force_number} internal function
25238@cindex numeric, values
25239@item AWKNUM force_number(NODE *n)
25240This macro forces a value to be numeric. It returns the actual
25241numeric value contained in the node.
25242It may end up calling an internal @command{gawk} function.
25243
25244@cindex @code{force_string} internal function
25245@item void force_string(NODE *n)
25246This macro guarantees that a @code{NODE}'s string value is current.
25247It may end up calling an internal @command{gawk} function.
25248It also guarantees that the string is zero-terminated.
25249
25250@c comma is part of primary
25251@cindex parameters, number of
25252@cindex @code{param_cnt} internal variable
25253@item n->param_cnt
25254The number of parameters actually passed in a function call at runtime.
25255
25256@cindex @code{stptr} internal variable
25257@cindex @code{stlen} internal variable
25258@item n->stptr
25259@itemx n->stlen
25260The data and length of a @code{NODE}'s string value, respectively.
25261The string is @emph{not} guaranteed to be zero-terminated.
25262If you need to pass the string value to a C library function, save
25263the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it,
25264call the routine, and then restore the value.
25265
25266@cindex @code{type} internal variable
25267@item n->type
25268The type of the @code{NODE}. This is a C @code{enum}. Values should
25269be either @code{Node_var} or @code{Node_var_array} for function
25270parameters.
25271
25272@cindex @code{vname} internal variable
25273@item n->vname
25274The ``variable name'' of a node.  This is not of much use inside
25275externally written extensions.
25276
25277@cindex arrays, associative, clearing
25278@cindex @code{assoc_clear} internal function
25279@item void assoc_clear(NODE *n)
25280Clears the associative array pointed to by @code{n}.
25281Make sure that @samp{n->type == Node_var_array} first.
25282
25283@cindex arrays, elements, installing
25284@cindex @code{assoc_lookup} internal function
25285@item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)
25286Finds, and installs if necessary, array elements.
25287@code{symbol} is the array, @code{subs} is the subscript.
25288This is usually a value created with @code{tmp_string} (see below).
25289@code{reference} should be @code{TRUE} if it is an error to use the
25290value before it is created. Typically, @code{FALSE} is the
25291correct value to use from extension functions.
25292
25293@cindex strings
25294@cindex @code{make_string} internal function
25295@item NODE *make_string(char *s, size_t len)
25296Take a C string and turn it into a pointer to a @code{NODE} that
25297can be stored appropriately.  This is permanent storage; understanding
25298of @command{gawk} memory management is helpful.
25299
25300@cindex numbers
25301@cindex @code{make_number} internal function
25302@item NODE *make_number(AWKNUM val)
25303Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
25304can be stored appropriately.  This is permanent storage; understanding
25305of @command{gawk} memory management is helpful.
25306
25307@cindex @code{tmp_string} internal function
25308@item NODE *tmp_string(char *s, size_t len);
25309Take a C string and turn it into a pointer to a @code{NODE} that
25310can be stored appropriately.  This is temporary storage; understanding
25311of @command{gawk} memory management is helpful.
25312
25313@cindex @code{tmp_number} internal function
25314@item NODE *tmp_number(AWKNUM val)
25315Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
25316can be stored appropriately.  This is temporary storage;
25317understanding of @command{gawk} memory management is helpful.
25318
25319@c comma is part of primary
25320@cindex nodes, duplicating
25321@cindex @code{dupnode} internal function
25322@item NODE *dupnode(NODE *n)
25323Duplicate a node.  In most cases, this increments an internal
25324reference count instead of actually duplicating the entire @code{NODE};
25325understanding of @command{gawk} memory management is helpful.
25326
25327@cindex memory, releasing
25328@cindex @code{free_temp} internal macro
25329@item void free_temp(NODE *n)
25330This macro releases the memory associated with a @code{NODE}
25331allocated with @code{tmp_string} or @code{tmp_number}.
25332Understanding of @command{gawk} memory management is helpful.
25333
25334@cindex @code{make_builtin} internal function
25335@item void make_builtin(char *name, NODE *(*func)(NODE *), int count)
25336Register a C function pointed to by @code{func} as new built-in
25337function @code{name}. @code{name} is a regular C string. @code{count}
25338is the maximum number of arguments that the function takes.
25339The function should be written in the following manner:
25340
25341@example
25342/* do_xxx --- do xxx function for gawk */
25343
25344NODE *
25345do_xxx(NODE *tree)
25346@{
25347    @dots{}
25348@}
25349@end example
25350
25351@cindex arguments, retrieving
25352@cindex @code{get_argument} internal function
25353@item NODE *get_argument(NODE *tree, int i)
25354This function is called from within a C extension function to get
25355the @code{i}-th argument from the function call.
25356The first argument is argument zero.
25357
25358@c last comma is part of secondary
25359@cindex functions, return values, setting
25360@cindex @code{set_value} internal function
25361@item void set_value(NODE *tree)
25362This function is called from within a C extension function to set
25363the return value from the extension function.  This value is
25364what the @command{awk} program sees as the return value from the
25365new @command{awk} function.
25366
25367@cindex @code{ERRNO} variable
25368@cindex @code{update_ERRNO} internal function
25369@item void update_ERRNO(void)
25370This function is called from within a C extension function to set
25371the value of @command{gawk}'s @code{ERRNO} variable, based on the current
25372value of the C @code{errno} variable.
25373It is provided as a convenience.
25374@end table
25375
25376An argument that is supposed to be an array needs to be handled with
25377some extra code, in case the array being passed in is actually
25378from a function parameter.
25379
25380In versions of @command{gawk} up to and including 3.1.2, the
25381following boilerplate code shows how to do this:
25382
25383@smallexample
25384NODE *the_arg;
25385
25386the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
25387
25388/* if a parameter, get it off the stack */
25389if (the_arg->type == Node_param_list)
25390    the_arg = stack_ptr[the_arg->param_cnt];
25391
25392/* parameter referenced an array, get it */
25393if (the_arg->type == Node_array_ref)
25394    the_arg = the_arg->orig_array;
25395
25396/* check type */
25397if (the_arg->type != Node_var && the_arg->type != Node_var_array)
25398    fatal("newfunc: third argument is not an array");
25399
25400/* force it to be an array, if necessary, clear it */
25401the_arg->type = Node_var_array;
25402assoc_clear(the_arg);
25403@end smallexample
25404
25405For versions 3.1.3 and later, the internals changed.  In particular,
25406the interface was actually @emph{simplified} drastically.  The
25407following boilerplate code now suffices:
25408
25409@smallexample
25410NODE *the_arg;
25411
25412the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
25413
25414/* force it to be an array: */
25415the_arg = get_array(the_arg);
25416
25417/* if necessary, clear it: */
25418assoc_clear(the_arg);
25419@end smallexample
25420
25421Again, you should spend time studying the @command{gawk} internals;
25422don't just blindly copy this code.
25423@c ENDOFRANGE gawint
25424
25425@node Sample Library
25426@appendixsubsec Directory and File Operation Built-ins
25427@c comma is part of primary
25428@c STARTOFRANGE chdirg
25429@cindex @code{chdir} function, implementing in @command{gawk}
25430@c comma is part of primary
25431@c STARTOFRANGE statg
25432@cindex @code{stat} function, implementing in @command{gawk}
25433@c last comma is part of secondary
25434@c STARTOFRANGE filre
25435@cindex files, information about, retrieving
25436@c STARTOFRANGE dirch
25437@cindex directories, changing
25438
25439Two useful functions that are not in @command{awk} are @code{chdir}
25440(so that an @command{awk} program can change its directory) and
25441@code{stat} (so that an @command{awk} program can gather information about
25442a file).
25443This @value{SECTION} implements these functions for @command{gawk} in an
25444external extension library.
25445
25446@menu
25447* Internal File Description::   What the new functions will do.
25448* Internal File Ops::           The code for internal file operations.
25449* Using Internal File Ops::     How to use an external extension.
25450@end menu
25451
25452@node Internal File Description
25453@appendixsubsubsec Using @code{chdir} and @code{stat}
25454
25455This @value{SECTION} shows how to use the new functions at the @command{awk}
25456level once they've been integrated into the running @command{gawk}
25457interpreter.
25458Using @code{chdir} is very straightforward. It takes one argument,
25459the new directory to change to:
25460
25461@example
25462@dots{}
25463newdir = "/home/arnold/funstuff"
25464ret = chdir(newdir)
25465if (ret < 0) @{
25466    printf("could not change to %s: %s\n",
25467                   newdir, ERRNO) > "/dev/stderr"
25468    exit 1
25469@}
25470@dots{}
25471@end example
25472
25473The return value is negative if the @code{chdir} failed,
25474and @code{ERRNO}
25475(@pxref{Built-in Variables})
25476is set to a string indicating the error.
25477
25478Using @code{stat} is a bit more complicated.
25479The C @code{stat} function fills in a structure that has a fair
25480amount of information.
25481The right way to model this in @command{awk} is to fill in an associative
25482array with the appropriate information:
25483
25484@c broke printf for page breaking
25485@example
25486file = "/home/arnold/.profile"
25487fdata[1] = "x"    # force `fdata' to be an array
25488ret = stat(file, fdata)
25489if (ret < 0) @{
25490    printf("could not stat %s: %s\n",
25491             file, ERRNO) > "/dev/stderr"
25492    exit 1
25493@}
25494printf("size of %s is %d bytes\n", file, fdata["size"])
25495@end example
25496
25497The @code{stat} function always clears the data array, even if
25498the @code{stat} fails.  It fills in the following elements:
25499
25500@table @code
25501@item "name"
25502The name of the file that was @code{stat}'ed.
25503
25504@item "dev"
25505@itemx "ino"
25506The file's device and inode numbers, respectively.
25507
25508@item "mode"
25509The file's mode, as a numeric value. This includes both the file's
25510type and its permissions.
25511
25512@item "nlink"
25513The number of hard links (directory entries) the file has.
25514
25515@item "uid"
25516@itemx "gid"
25517The numeric user and group ID numbers of the file's owner.
25518
25519@item "size"
25520The size in bytes of the file.
25521
25522@item "blocks"
25523The number of disk blocks the file actually occupies. This may not
25524be a function of the file's size if the file has holes.
25525
25526@item "atime"
25527@itemx "mtime"
25528@itemx "ctime"
25529The file's last access, modification, and inode update times,
25530respectively.  These are numeric timestamps, suitable for formatting
25531with @code{strftime}
25532(@pxref{Built-in}).
25533
25534@item "pmode"
25535The file's ``printable mode.''  This is a string representation of
25536the file's type and permissions, such as what is produced by
25537@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
25538
25539@item "type"
25540A printable string representation of the file's type.  The value
25541is one of the following:
25542
25543@table @code
25544@item "blockdev"
25545@itemx "chardev"
25546The file is a block or character device (``special file'').
25547
25548@ignore
25549@item "door"
25550The file is a Solaris ``door'' (special file used for
25551interprocess communications).
25552@end ignore
25553
25554@item "directory"
25555The file is a directory.
25556
25557@item "fifo"
25558The file is a named-pipe (also known as a FIFO).
25559
25560@item "file"
25561The file is just a regular file.
25562
25563@item "socket"
25564The file is an @code{AF_UNIX} (``Unix domain'') socket in the
25565filesystem.
25566
25567@item "symlink"
25568The file is a symbolic link.
25569@end table
25570@end table
25571
25572Several additional elements may be present depending upon the operating
25573system and the type of the file.  You can test for them in your @command{awk}
25574program by using the @code{in} operator
25575(@pxref{Reference to Elements}):
25576
25577@table @code
25578@item "blksize"
25579The preferred block size for I/O to the file. This field is not
25580present on all POSIX-like systems in the C @code{stat} structure.
25581
25582@item "linkval"
25583If the file is a symbolic link, this element is the name of the
25584file the link points to (i.e., the value of the link).
25585
25586@item "rdev"
25587@itemx "major"
25588@itemx "minor"
25589If the file is a block or character device file, then these values
25590represent the numeric device number and the major and minor components
25591of that number, respectively.
25592@end table
25593
25594@node Internal File Ops
25595@appendixsubsubsec C Code for @code{chdir} and @code{stat}
25596
25597Here is the C code for these extensions.  They were written for
25598GNU/Linux.  The code needs some more work for complete portability
25599to other POSIX-compliant systems:@footnote{This version is edited
25600slightly for presentation.  The complete version can be found in
25601@file{extension/filefuncs.c} in the @command{gawk} distribution.}
25602
25603@c break line for page breaking
25604@example
25605#include "awk.h"
25606
25607#include <sys/sysmacros.h>
25608
25609/*  do_chdir --- provide dynamically loaded
25610                 chdir() builtin for gawk */
25611
25612static NODE *
25613do_chdir(tree)
25614NODE *tree;
25615@{
25616    NODE *newdir;
25617    int ret = -1;
25618
25619    newdir = get_argument(tree, 0);
25620@end example
25621
25622The file includes the @code{"awk.h"} header file for definitions
25623for the @command{gawk} internals.  It includes @code{<sys/sysmacros.h>}
25624for access to the @code{major} and @code{minor} macros.
25625
25626@cindex programming conventions, @command{gawk} internals
25627By convention, for an @command{awk} function @code{foo}, the function that
25628implements it is called @samp{do_foo}.  The function should take
25629a @samp{NODE *} argument, usually called @code{tree}, that
25630represents the argument list to the function.  The @code{newdir}
25631variable represents the new directory to change to, retrieved
25632with @code{get_argument}.  Note that the first argument is
25633numbered zero.
25634
25635This code actually accomplishes the @code{chdir}. It first forces
25636the argument to be a string and passes the string value to the
25637@code{chdir} system call. If the @code{chdir} fails, @code{ERRNO}
25638is updated.
25639The result of @code{force_string} has to be freed with @code{free_temp}:
25640
25641@example
25642    if (newdir != NULL) @{
25643        (void) force_string(newdir);
25644        ret = chdir(newdir->stptr);
25645        if (ret < 0)
25646            update_ERRNO();
25647
25648        free_temp(newdir);
25649    @}
25650@end example
25651
25652Finally, the function returns the return value to the @command{awk} level,
25653using @code{set_value}. Then it must return a value from the call to
25654the new built-in (this value ignored by the interpreter):
25655
25656@example
25657    /* Set the return value */
25658    set_value(tmp_number((AWKNUM) ret));
25659
25660    /* Just to make the interpreter happy */
25661    return tmp_number((AWKNUM) 0);
25662@}
25663@end example
25664
25665The @code{stat} built-in is more involved.  First comes a function
25666that turns a numeric mode into a printable representation
25667(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
25668
25669@c break line for page breaking
25670@example
25671/* format_mode --- turn a stat mode field
25672                   into something readable */
25673
25674static char *
25675format_mode(fmode)
25676unsigned long fmode;
25677@{
25678    @dots{}
25679@}
25680@end example
25681
25682Next comes the actual @code{do_stat} function itself.  First come the
25683variable declarations and argument checking:
25684
25685@ignore
25686Changed message for page breaking. Used to be:
25687    "stat: called with incorrect number of arguments (%d), should be 2",
25688@end ignore
25689@example
25690/* do_stat --- provide a stat() function for gawk */
25691
25692static NODE *
25693do_stat(tree)
25694NODE *tree;
25695@{
25696    NODE *file, *array;
25697    struct stat sbuf;
25698    int ret;
25699    char *msg;
25700    NODE **aptr;
25701    char *pmode;    /* printable mode */
25702    char *type = "unknown";
25703
25704    /* check arg count */
25705    if (tree->param_cnt != 2)
25706        fatal(
25707    "stat: called with %d arguments, should be 2",
25708            tree->param_cnt);
25709@end example
25710
25711Then comes the actual work. First, we get the arguments.
25712Then, we always clear the array.  To get the file information,
25713we use @code{lstat}, in case the file is a symbolic link.
25714If there's an error, we set @code{ERRNO} and return:
25715
25716@c comment made multiline for page breaking
25717@example
25718    /*
25719     * directory is first arg,
25720     * array to hold results is second
25721     */
25722    file = get_argument(tree, 0);
25723    array = get_argument(tree, 1);
25724
25725    /* empty out the array */
25726    assoc_clear(array);
25727
25728    /* lstat the file, if error, set ERRNO and return */
25729    (void) force_string(file);
25730    ret = lstat(file->stptr, & sbuf);
25731    if (ret < 0) @{
25732        update_ERRNO();
25733
25734        set_value(tmp_number((AWKNUM) ret));
25735
25736        free_temp(file);
25737        return tmp_number((AWKNUM) 0);
25738    @}
25739@end example
25740
25741Now comes the tedious part: filling in the array.  Only a few of the
25742calls are shown here, since they all follow the same pattern:
25743
25744@example
25745    /* fill in the array */
25746    aptr = assoc_lookup(array, tmp_string("name", 4), FALSE);
25747    *aptr = dupnode(file);
25748
25749    aptr = assoc_lookup(array, tmp_string("mode", 4), FALSE);
25750    *aptr = make_number((AWKNUM) sbuf.st_mode);
25751
25752    aptr = assoc_lookup(array, tmp_string("pmode", 5), FALSE);
25753    pmode = format_mode(sbuf.st_mode);
25754    *aptr = make_string(pmode, strlen(pmode));
25755@end example
25756
25757When done, we free the temporary value containing the @value{FN},
25758set the return value, and return:
25759
25760@example
25761    free_temp(file);
25762
25763    /* Set the return value */
25764    set_value(tmp_number((AWKNUM) ret));
25765
25766    /* Just to make the interpreter happy */
25767    return tmp_number((AWKNUM) 0);
25768@}
25769@end example
25770
25771@cindex programming conventions, @command{gawk} internals
25772Finally, it's necessary to provide the ``glue'' that loads the
25773new function(s) into @command{gawk}.  By convention, each library has
25774a routine named @code{dlload} that does the job:
25775
25776@example
25777/* dlload --- load new builtins in this library */
25778
25779NODE *
25780dlload(tree, dl)
25781NODE *tree;
25782void *dl;
25783@{
25784    make_builtin("chdir", do_chdir, 1);
25785    make_builtin("stat", do_stat, 2);
25786    return tmp_number((AWKNUM) 0);
25787@}
25788@end example
25789
25790And that's it!  As an exercise, consider adding functions to
25791implement system calls such as @code{chown}, @code{chmod}, and @code{umask}.
25792
25793@node Using Internal File Ops
25794@appendixsubsubsec Integrating the Extensions
25795
25796@c last comma is part of secondary
25797@cindex @command{gawk}, interpreter, adding code to
25798Now that the code is written, it must be possible to add it at
25799runtime to the running @command{gawk} interpreter.  First, the
25800code must be compiled.  Assuming that the functions are in
25801a file named @file{filefuncs.c}, and @var{idir} is the location
25802of the @command{gawk} include files,
25803the following steps create
25804a GNU/Linux shared library:
25805
25806@example
25807$ gcc -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c
25808$ ld -o filefuncs.so -shared filefuncs.o
25809@end example
25810
25811@cindex @code{extension} function (@command{gawk})
25812Once the library exists, it is loaded by calling the @code{extension}
25813built-in function.
25814This function takes two arguments: the name of the
25815library to load and the name of a function to call when the library
25816is first loaded. This function adds the new functions to @command{gawk}.
25817It returns the value returned by the initialization function
25818within the shared library:
25819
25820@example
25821# file testff.awk
25822BEGIN @{
25823    extension("./filefuncs.so", "dlload")
25824
25825    chdir(".")  # no-op
25826
25827    data[1] = 1 # force `data' to be an array
25828    print "Info for testff.awk"
25829    ret = stat("testff.awk", data)
25830    print "ret =", ret
25831    for (i in data)
25832        printf "data[\"%s\"] = %s\n", i, data[i]
25833    print "testff.awk modified:",
25834        strftime("%m %d %y %H:%M:%S", data["mtime"])
25835@}
25836@end example
25837
25838Here are the results of running the program:
25839
25840@example
25841$ gawk -f testff.awk
25842@print{} Info for testff.awk
25843@print{} ret = 0
25844@print{} data["blksize"] = 4096
25845@print{} data["mtime"] = 932361936
25846@print{} data["mode"] = 33188
25847@print{} data["type"] = file
25848@print{} data["dev"] = 2065
25849@print{} data["gid"] = 10
25850@print{} data["ino"] = 878597
25851@print{} data["ctime"] = 971431797
25852@print{} data["blocks"] = 2
25853@print{} data["nlink"] = 1
25854@print{} data["name"] = testff.awk
25855@print{} data["atime"] = 971608519
25856@print{} data["pmode"] = -rw-r--r--
25857@print{} data["size"] = 607
25858@print{} data["uid"] = 2076
25859@print{} testff.awk modified: 07 19 99 08:25:36
25860@end example
25861@c ENDOFRANGE filre
25862@c ENDOFRANGE dirch
25863@c ENDOFRANGE statg
25864@c ENDOFRANGE chdirg
25865@c ENDOFRANGE gladfgaw
25866@c ENDOFRANGE adfugaw
25867@c ENDOFRANGE fubadgaw
25868
25869@node Future Extensions
25870@appendixsec Probable Future Extensions
25871@ignore
25872From emory!scalpel.netlabs.com!lwall Tue Oct 31 12:43:17 1995
25873Return-Path: <emory!scalpel.netlabs.com!lwall>
25874Message-Id: <9510311732.AA28472@scalpel.netlabs.com>
25875To: arnold@skeeve.atl.ga.us (Arnold D. Robbins)
25876Subject: Re: May I quote you?
25877In-Reply-To: Your message of "Tue, 31 Oct 95 09:11:00 EST."
25878             <m0tAHPQ-00014MC@skeeve.atl.ga.us>
25879Date: Tue, 31 Oct 95 09:32:46 -0800
25880From: Larry Wall <emory!scalpel.netlabs.com!lwall>
25881
25882: Greetings. I am working on the release of gawk 3.0. Part of it will be a
25883: thoroughly updated manual. One of the sections deals with planned future
25884: extensions and enhancements.  I have the following at the beginning
25885: of it:
25886:
25887: @cindex PERL
25888: @cindex Wall, Larry
25889: @display
25890: @i{AWK is a language similar to PERL, only considerably more elegant.} @*
25891: Arnold Robbins
25892: @sp 1
25893: @i{Hey!} @*
25894: Larry Wall
25895: @end display
25896:
25897: Before I actually release this for publication, I wanted to get your
25898: permission to quote you.  (Hopefully, in the spirit of much of GNU, the
25899: implied humor is visible... :-)
25900
25901I think that would be fine.
25902
25903Larry
25904@end ignore
25905@cindex PERL
25906@cindex Wall, Larry
25907@cindex Robbins, Arnold
25908@quotation
25909@i{AWK is a language similar to PERL, only considerably more elegant.}@*
25910Arnold Robbins
25911
25912@i{Hey!}@*
25913Larry Wall
25914@end quotation
25915
25916This @value{SECTION} briefly lists extensions and possible improvements
25917that indicate the directions we are
25918currently considering for @command{gawk}.  The file @file{FUTURES} in the
25919@command{gawk} distribution lists these extensions as well.
25920
25921Following is a list of probable future changes visible at the
25922@command{awk} language level:
25923
25924@c these are ordered by likelihood
25925@table @asis
25926@item Loadable module interface
25927It is not clear that the @command{awk}-level interface to the
25928modules facility is as good as it should be.  The interface needs to be
25929redesigned, particularly taking namespace issues into account, as
25930well as possibly including issues such as library search path order
25931and versioning.
25932
25933@item @code{RECLEN} variable for fixed-length records
25934Along with @code{FIELDWIDTHS}, this would speed up the processing of
25935fixed-length records.
25936@code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"},
25937depending upon which kind of record processing is in effect.
25938
25939@item Additional @code{printf} specifiers
25940The 1999 ISO C standard added a number of additional @code{printf}
25941format specifiers.  These should be evaluated for possible inclusion
25942in @command{gawk}.
25943
25944@ignore
25945@item A @samp{%'d} flag
25946Add @samp{%'d} for putting in commas in formatting numeric values.
25947@end ignore
25948
25949@item Databases
25950It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
25951
25952@item Large character sets
25953It would be nice if @command{gawk} could handle UTF-8 and other
25954character sets that are larger than eight bits.
25955
25956@item More @code{lint} warnings
25957There are more things that could be checked for portability.
25958@end table
25959
25960Following is a list of probable improvements that will make @command{gawk}'s
25961source code easier to work with:
25962
25963@table @asis
25964@item Loadable module mechanics
25965The current extension mechanism works
25966(@pxref{Dynamic Extensions}),
25967but is rather primitive. It requires a fair amount of manual work
25968to create and integrate a loadable module.
25969Nor is the current mechanism as portable as might be desired.
25970The GNU @command{libtool} package provides a number of features that
25971would make using loadable modules much easier.
25972@command{gawk} should be changed to use @command{libtool}.
25973
25974@item Loadable module internals
25975The API to its internals that @command{gawk} ``exports'' should be revised.
25976Too many things are needlessly exposed.  A new API should be designed
25977and implemented to make module writing easier.
25978
25979@item Better array subscript management
25980@command{gawk}'s management of array subscript storage could use revamping,
25981so that using the same value to index multiple arrays only
25982stores one copy of the index value.
25983
25984@item Integrating the DBUG library
25985Integrating Fred Fish's DBUG library would be helpful during development,
25986but it's a lot of work to do.
25987@end table
25988
25989Following is a list of probable improvements that will make @command{gawk}
25990perform better:
25991
25992@table @asis
25993@c NEXT ED: remove this item. awka and mawk do these respectively
25994@item Compilation of @command{awk} programs
25995@command{gawk} uses a Bison (YACC-like)
25996parser to convert the script given it into a syntax tree; the syntax
25997tree is then executed by a simple recursive evaluator.  This method incurs
25998a lot of overhead, since the recursive evaluator performs many procedure
25999calls to do even the simplest things.
26000
26001It should be possible for @command{gawk} to convert the script's parse tree
26002into a C program which the user would then compile, using the normal
26003C compiler and a special @command{gawk} library to provide all the needed
26004functions (regexps, fields, associative arrays, type coercion, and so on).
26005
26006@c last comma is part of secondary
26007@cindex @command{gawk}, interpreter, adding code to
26008An easier possibility might be for an intermediate phase of @command{gawk} to
26009convert the parse tree into a linear byte code form like the one used
26010in GNU Emacs Lisp.  The recursive evaluator would then be replaced by
26011a straight line byte code interpreter that would be intermediate in speed
26012between running a compiled program and doing what @command{gawk} does
26013now.
26014@end table
26015
26016Finally,
26017the programs in the test suite could use documenting in this @value{DOCUMENT}.
26018
26019@xref{Additions},
26020if you are interested in tackling any of these projects.
26021@c ENDOFRANGE impis
26022@c ENDOFRANGE gawii
26023
26024@node Basic Concepts
26025@appendix Basic Programming Concepts
26026@cindex programming, concepts
26027@c STARTOFRANGE procon
26028@cindex programming, concepts
26029
26030This @value{APPENDIX} attempts to define some of the basic concepts
26031and terms that are used throughout the rest of this @value{DOCUMENT}.
26032As this @value{DOCUMENT} is specifically about @command{awk},
26033and not about computer programming in general, the coverage here
26034is by necessity fairly cursory and simplistic.
26035(If you need more background, there are many
26036other introductory texts that you should refer to instead.)
26037
26038@menu
26039* Basic High Level::            The high level view.
26040* Basic Data Typing::           A very quick intro to data types.
26041* Floating Point Issues::       Stuff to know about floating-point numbers.
26042@end menu
26043
26044@node Basic High Level
26045@appendixsec What a Program Does
26046
26047@cindex processing data
26048At the most basic level, the job of a program is to process
26049some input data and produce results.
26050
26051@c NEXT ED: Use real images here
26052@iftex
26053@tex
26054\expandafter\ifx\csname graph\endcsname\relax \csname newbox\endcsname\graph\fi
26055\expandafter\ifx\csname graphtemp\endcsname\relax \csname newdimen\endcsname\graphtemp\fi
26056\setbox\graph=\vtop{\vskip 0pt\hbox{%
26057    \special{pn 20}%
26058    \special{pa 2425 200}%
26059    \special{pa 2850 200}%
26060    \special{fp}%
26061    \special{sh 1.000}%
26062    \special{pn 20}%
26063    \special{pa 2750 175}%
26064    \special{pa 2850 200}%
26065    \special{pa 2750 225}%
26066    \special{pa 2750 175}%
26067    \special{fp}%
26068    \special{pn 20}%
26069    \special{pa 850 200}%
26070    \special{pa 1250 200}%
26071    \special{fp}%
26072    \special{sh 1.000}%
26073    \special{pn 20}%
26074    \special{pa 1150 175}%
26075    \special{pa 1250 200}%
26076    \special{pa 1150 225}%
26077    \special{pa 1150 175}%
26078    \special{fp}%
26079    \special{pn 20}%
26080    \special{pa 2950 400}%
26081    \special{pa 3650 400}%
26082    \special{pa 3650 0}%
26083    \special{pa 2950 0}%
26084    \special{pa 2950 400}%
26085    \special{fp}%
26086    \special{pn 10}%
26087    \special{ar 1800 200 450 200 0 6.28319}%
26088    \graphtemp=.5ex\advance\graphtemp by 0.200in
26089    \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss Results\hss}}%
26090    \graphtemp=.5ex\advance\graphtemp by 0.200in
26091    \rlap{\kern 1.800in\lower\graphtemp\hbox to 0pt{\hss Program\hss}}%
26092    \special{pn 10}%
26093    \special{pa 0 400}%
26094    \special{pa 700 400}%
26095    \special{pa 700 0}%
26096    \special{pa 0 0}%
26097    \special{pa 0 400}%
26098    \special{fp}%
26099    \graphtemp=.5ex\advance\graphtemp by 0.200in
26100    \rlap{\kern 0.350in\lower\graphtemp\hbox to 0pt{\hss Data\hss}}%
26101    \hbox{\vrule depth0.400in width0pt height 0pt}%
26102    \kern 3.650in
26103  }%
26104}%
26105\centerline{\box\graph}
26106@end tex
26107@end iftex
26108@ifnottex
26109@example
26110                  _______
26111+------+         /       \         +---------+
26112| Data | -----> < Program > -----> | Results |
26113+------+         \_______/         +---------+
26114@end example
26115@end ifnottex
26116
26117@cindex compiled programs
26118@cindex interpreted programs
26119The ``program'' in the figure can be either a compiled
26120program@footnote{Compiled programs are typically written
26121in lower-level languages such as C, C++, Fortran, or Ada,
26122and then translated, or @dfn{compiled}, into a form that
26123the computer can execute directly.}
26124(such as @command{ls}),
26125or it may be @dfn{interpreted}.  In the latter case, a machine-executable
26126program such as @command{awk} reads your program, and then uses the
26127instructions in your program to process the data.
26128
26129@cindex programming, basic steps
26130When you write a program, it usually consists
26131of the following, very basic set of steps:
26132
26133@c NEXT ED: Use real images here
26134@iftex
26135@tex
26136\expandafter\ifx\csname graph\endcsname\relax \csname newbox\endcsname\graph\fi
26137\expandafter\ifx\csname graphtemp\endcsname\relax \csname newdimen\endcsname\graphtemp\fi
26138\setbox\graph=\vtop{\vskip 0pt\hbox{%
26139    \graphtemp=.5ex\advance\graphtemp by 0.600in
26140    \rlap{\kern 2.800in\lower\graphtemp\hbox to 0pt{\hss Yes\hss}}%
26141    \graphtemp=.5ex\advance\graphtemp by 0.100in
26142    \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss No\hss}}%
26143    \special{pn 8}%
26144    \special{pa 2100 1000}%
26145    \special{pa 1600 1000}%
26146    \special{pa 1600 1000}%
26147    \special{pa 1600 300}%
26148    \special{fp}%
26149    \special{sh 1.000}%
26150    \special{pn 8}%
26151    \special{pa 1575 400}%
26152    \special{pa 1600 300}%
26153    \special{pa 1625 400}%
26154    \special{pa 1575 400}%
26155    \special{fp}%
26156    \special{pn 8}%
26157    \special{pa 2600 500}%
26158    \special{pa 2600 900}%
26159    \special{fp}%
26160    \special{sh 1.000}%
26161    \special{pn 8}%
26162    \special{pa 2625 800}%
26163    \special{pa 2600 900}%
26164    \special{pa 2575 800}%
26165    \special{pa 2625 800}%
26166    \special{fp}%
26167    \special{pn 8}%
26168    \special{pa 3200 200}%
26169    \special{pa 4000 200}%
26170    \special{fp}%
26171    \special{sh 1.000}%
26172    \special{pn 8}%
26173    \special{pa 3900 175}%
26174    \special{pa 4000 200}%
26175    \special{pa 3900 225}%
26176    \special{pa 3900 175}%
26177    \special{fp}%
26178    \special{pn 8}%
26179    \special{pa 1400 200}%
26180    \special{pa 2100 200}%
26181    \special{fp}%
26182    \special{sh 1.000}%
26183    \special{pn 8}%
26184    \special{pa 2000 175}%
26185    \special{pa 2100 200}%
26186    \special{pa 2000 225}%
26187    \special{pa 2000 175}%
26188    \special{fp}%
26189    \special{pn 8}%
26190    \special{ar 2600 1000 400 100 0 6.28319}%
26191    \graphtemp=.5ex\advance\graphtemp by 1.000in
26192    \rlap{\kern 2.600in\lower\graphtemp\hbox to 0pt{\hss Process\hss}}%
26193    \special{pn 8}%
26194    \special{pa 2200 400}%
26195    \special{pa 3100 400}%
26196    \special{pa 3100 0}%
26197    \special{pa 2200 0}%
26198    \special{pa 2200 400}%
26199    \special{fp}%
26200    \graphtemp=.5ex\advance\graphtemp by 0.200in
26201    \rlap{\kern 2.688in\lower\graphtemp\hbox to 0pt{\hss More Data?\hss}}%
26202    \special{pn 8}%
26203    \special{ar 650 200 650 200 0 6.28319}%
26204    \graphtemp=.5ex\advance\graphtemp by 0.200in
26205    \rlap{\kern 0.613in\lower\graphtemp\hbox to 0pt{\hss Initialization\hss}}%
26206    \special{pn 8}%
26207    \special{ar 0 200 0 0 0 6.28319}%
26208    \special{pn 8}%
26209    \special{ar 4550 200 450 100 0 6.28319}%
26210    \graphtemp=.5ex\advance\graphtemp by 0.200in
26211    \rlap{\kern 4.600in\lower\graphtemp\hbox to 0pt{\hss Clean Up\hss}}%
26212    \hbox{\vrule depth1.100in width0pt height 0pt}%
26213    \kern 5.000in
26214  }%
26215}%
26216\centerline{\box\graph}
26217@end tex
26218@end iftex
26219@ifnottex
26220@example
26221                              ______
26222+----------------+           / More \  No       +----------+
26223| Initialization | -------> <  Data  > -------> | Clean Up |
26224+----------------+    ^      \   ?  /           +----------+
26225                      |       +--+-+
26226                      |          | Yes
26227                      |          |
26228                      |          V
26229                      |     +---------+
26230                      +-----+ Process |
26231                            +---------+
26232@end example
26233@end ifnottex
26234
26235@table @asis
26236@item Initialization
26237These are the things you do before actually starting to process
26238data, such as checking arguments, initializing any data you need
26239to work with, and so on.
26240This step corresponds to @command{awk}'s @code{BEGIN} rule
26241(@pxref{BEGIN/END}).
26242
26243If you were baking a cake, this might consist of laying out all the
26244mixing bowls and the baking pan, and making sure you have all the
26245ingredients that you need.
26246
26247@item Processing
26248This is where the actual work is done.  Your program reads data,
26249one logical chunk at a time, and processes it as appropriate.
26250
26251In most programming languages, you have to manually manage the reading
26252of data, checking to see if there is more each time you read a chunk.
26253@command{awk}'s pattern-action paradigm
26254(@pxref{Getting Started})
26255handles the mechanics of this for you.
26256
26257In baking a cake, the processing corresponds to the actual labor:
26258breaking eggs, mixing the flour, water, and other ingredients, and then putting the cake
26259into the oven.
26260
26261@item Clean Up
26262Once you've processed all the data, you may have things you need to
26263do before exiting.
26264This step corresponds to @command{awk}'s @code{END} rule
26265(@pxref{BEGIN/END}).
26266
26267After the cake comes out of the oven, you still have to wrap it in
26268plastic wrap to keep anyone from tasting it, as well as wash
26269the mixing bowls and utensils.
26270@end table
26271
26272@cindex algorithms
26273An @dfn{algorithm} is a detailed set of instructions necessary to accomplish
26274a task, or process data.  It is much the same as a recipe for baking
26275a cake.  Programs implement algorithms.  Often, it is up to you to design
26276the algorithm and implement it, simultaneously.
26277
26278@cindex records
26279@cindex fields
26280The ``logical chunks'' we talked about previously are called @dfn{records},
26281similar to the records a company keeps on employees, a school keeps for
26282students, or a doctor keeps for patients.
26283Each record has many component parts, such as first and last names,
26284date of birth, address, and so on.  The component parts are referred
26285to as the @dfn{fields} of the record.
26286
26287The act of reading data is termed @dfn{input}, and that of
26288generating results, not too surprisingly, is termed @dfn{output}.
26289They are often referred to together as ``input/output,''
26290and even more often, as ``I/O'' for short.
26291(You will also see ``input'' and ``output'' used as verbs.)
26292
26293@cindex data-driven languages
26294@c comma is part of primary
26295@cindex languages, data-driven
26296@command{awk} manages the reading of data for you, as well as the
26297breaking it up into records and fields.  Your program's job is to
26298tell @command{awk} what to with the data.  You do this by describing
26299@dfn{patterns} in the data to look for, and @dfn{actions} to execute
26300when those patterns are seen.  This @dfn{data-driven} nature of
26301@command{awk} programs usually makes them both easier to write
26302and easier to read.
26303
26304@node Basic Data Typing
26305@appendixsec Data Values in a Computer
26306
26307@cindex variables
26308In a program,
26309you keep track of information and values in things called @dfn{variables}.
26310A variable is just a name for a given value, such as @code{first_name},
26311@code{last_name}, @code{address}, and so on.
26312@command{awk} has several predefined variables, and it has
26313special names to refer to the current input record
26314and the fields of the record.
26315You may also group multiple
26316associated values under one name, as an array.
26317
26318@cindex values, numeric
26319@cindex values, string
26320@cindex scalar values
26321Data, particularly in @command{awk}, consists of either numeric
26322values, such as 42 or 3.1415927, or string values.
26323String values are essentially anything that's not a number, such as a name.
26324Strings are sometimes referred to as @dfn{character data}, since they
26325store the individual characters that comprise them.
26326Individual variables, as well as numeric and string variables, are
26327referred to as @dfn{scalar} values.
26328Groups of values, such as arrays, are not scalars.
26329
26330@cindex integers
26331@cindex floating-point, numbers
26332@cindex numbers, floating-point
26333Within computers, there are two kinds of numeric values: @dfn{integers}
26334and @dfn{floating-point}.
26335In school, integer values were referred to as ``whole'' numbers---that is,
26336numbers without any fractional part, such as 1, 42, or @minus{}17.
26337The advantage to integer numbers is that they represent values exactly.
26338The disadvantage is that their range is limited.  On most modern systems,
26339this range is @minus{}2,147,483,648 to 2,147,483,647.
26340
26341@cindex unsigned integers
26342@cindex integers, unsigned
26343Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
26344Signed values may be negative or positive, with the range of values just
26345described.
26346Unsigned values are always positive.  On most modern systems,
26347the range is from 0 to 4,294,967,295.
26348
26349@cindex double-precision floating-point
26350@cindex single-precision floating-point
26351Floating-point numbers represent what are called ``real'' numbers; i.e.,
26352those that do have a fractional part, such as 3.1415927.
26353The advantage to floating-point numbers is that they
26354can represent a much larger range of values.
26355The disadvantage is that there are numbers that they cannot represent
26356exactly.
26357@command{awk} uses @dfn{double-precision} floating-point numbers, which
26358can hold more digits than @dfn{single-precision}
26359floating-point numbers.
26360Floating-point issues are discussed more fully in
26361@ref{Floating Point Issues}.
26362
26363At the very lowest level, computers store values as groups of binary digits,
26364or @dfn{bits}.  Modern computers group bits into groups of eight, called @dfn{bytes}.
26365Advanced applications sometimes have to manipulate bits directly,
26366and @command{gawk} provides functions for doing so.
26367
26368@cindex null strings
26369While you are probably used to the idea of a number without a value (i.e., zero),
26370it takes a bit more getting used to the idea of zero-length character data.
26371Nevertheless, such a thing exists.
26372It is called the @dfn{null string}.
26373The null string is character data that has no value.
26374In other words, it is empty.  It is written in @command{awk} programs
26375like this: @code{""}.
26376
26377Humans are used to working in decimal; i.e., base 10.  In base 10,
26378numbers go from 0 to 9, and then ``roll over'' into the next
26379column.  (Remember grade school? 42 is 4 times 10 plus 2.)
26380
26381There are other number bases though.  Computers commonly use base 2
26382or @dfn{binary}, base 8 or @dfn{octal}, and base 16 or @dfn{hexadecimal}.
26383In binary, each column represents two times the value in the column to
26384its right. Each column may contain either a 0 or a 1.
26385Thus, binary 1010 represents 1 times 8, plus 0 times 4, plus 1 times 2,
26386plus 0 times 1, or decimal 10.
26387Octal and hexadecimal are discussed more in
26388@ref{Nondecimal-numbers}.
26389
26390Programs are written in programming languages.
26391Hundreds, if not thousands, of programming languages exist.
26392One of the most popular is the C programming language.
26393The C language had a very strong influence on the design of
26394the @command{awk} language.
26395
26396@cindex Kernighan, Brian
26397@cindex Ritchie, Dennis
26398There have been several versions of C.  The first is often referred to
26399as ``K&R'' C, after the initials of Brian Kernighan and Dennis Ritchie,
26400the authors of the first book on C.  (Dennis Ritchie created the language,
26401and Brian Kernighan was one of the creators of @command{awk}.)
26402
26403In the mid-1980s, an effort began to produce an international standard
26404for C.  This work culminated in 1989, with the production of the ANSI
26405standard for C.  This standard became an ISO standard in 1990.
26406Where it makes sense, POSIX @command{awk} is compatible with 1990 ISO C.
26407
26408In 1999, a revised ISO C standard was approved and released.
26409Future versions of @command{gawk} will be as compatible as possible
26410with this standard.
26411
26412@node Floating Point Issues
26413@appendixsec Floating-Point Number Caveats
26414
26415As mentioned earlier, floating-point numbers represent what are called
26416``real'' numbers, i.e., those that have a fractional part.  @command{awk}
26417uses double-precision floating-point numbers to represent all
26418numeric values.  This @value{SECTION} describes some of the issues
26419involved in using floating-point numbers.
26420
26421There is a very nice paper on floating-point arithmetic by
26422David Goldberg, ``What Every
26423Computer Scientist Should Know About Floating-point Arithmetic,''
26424@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03),
264255-48.@footnote{@uref{http://www.validlab.com/goldberg/paper.ps}.}
26426This is worth reading if you are interested in the details,
26427but it does require a background in computer science.
26428
26429Internally, @command{awk} keeps both the numeric value
26430(double-precision floating-point) and the string value for a variable.
26431Separately, @command{awk} keeps
26432track of what type the variable has
26433(@pxref{Typing and Comparison}),
26434which plays a role in how variables are used in comparisons.
26435
26436It is important to note that the string value for a number may not
26437reflect the full value (all the digits) that the numeric value
26438actually contains.
26439The following program (@file{values.awk}) illustrates this:
26440
26441@example
26442@{
26443   $1 = $2 + $3
26444   # see it for what it is
26445   printf("$1 = %.12g\n", $1)
26446   # use CONVFMT
26447   a = "<" $1 ">"
26448   print "a =", a
26449@group
26450   # use OFMT
26451   print "$1 =", $1
26452@end group
26453@}
26454@end example
26455
26456@noindent
26457This program shows the full value of the sum of @code{$2} and @code{$3}
26458using @code{printf}, and then prints the string values obtained
26459from both automatic conversion (via @code{CONVFMT}) and
26460from printing (via @code{OFMT}).
26461
26462Here is what happens when the program is run:
26463
26464@example
26465$ echo 2 3.654321 1.2345678 | awk -f values.awk
26466@print{} $1 = 4.8888888
26467@print{} a = <4.88889>
26468@print{} $1 = 4.88889
26469@end example
26470
26471This makes it clear that the full numeric value is different from
26472what the default string representations show.
26473
26474@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
26475at least six significant digits.  For some applications, you might want to
26476change it to specify more precision.
26477On most modern machines, most of the time,
2647817 digits is enough to capture a floating-point number's
26479value exactly.@footnote{Pathological cases can require up to
26480752 digits (!), but we doubt that you need to worry about this.}
26481
26482@cindex floating-point
26483Unlike numbers in the abstract sense (such as what you studied in high school
26484or college math), numbers stored in computers are limited in certain ways.
26485They cannot represent an infinite number of digits, nor can they always
26486represent things exactly.
26487In particular,
26488floating-point numbers cannot
26489always represent values exactly.  Here is an example:
26490
26491@example
26492$ awk '@{ printf("%010d\n", $1 * 100) @}'
26493515.79
26494@print{} 0000051579
26495515.80
26496@print{} 0000051579
26497515.81
26498@print{} 0000051580
26499515.82
26500@print{} 0000051582
26501@kbd{@value{CTL}-d}
26502@end example
26503
26504@noindent
26505This shows that some values can be represented exactly,
26506whereas others are only approximated.  This is not a ``bug''
26507in @command{awk}, but simply an artifact of how computers
26508represent numbers.
26509
26510@cindex negative zero
26511@cindex positive zero
26512@c comma is part of primary
26513@cindex zero, negative vs.@: positive
26514Another peculiarity of floating-point numbers on modern systems
26515is that they often have more than one representation for the number zero!
26516In particular, it is possible to represent ``minus zero'' as well as
26517regular, or ``positive'' zero.
26518
26519This example shows that negative and positive zero are distinct values
26520when stored internally, but that they are in fact equal to each other,
26521as well as to ``regular'' zero:
26522
26523@smallexample
26524$ gawk 'BEGIN @{ mz = -0 ; pz = 0
26525> printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz
26526> printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0
26527> @}'
26528@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
26529@print{} mz == 0 -> 1, pz == 0 -> 1
26530@end smallexample
26531
26532It helps to keep this in mind should you process numeric data
26533that contains negative zero values; the fact that the zero is negative
26534is noted and can affect comparisons.
26535@c ENDOFRANGE procon
26536
26537@node Glossary
26538@unnumbered Glossary
26539
26540@table @asis
26541@item Action
26542A series of @command{awk} statements attached to a rule.  If the rule's
26543pattern matches an input record, @command{awk} executes the
26544rule's action.  Actions are always enclosed in curly braces.
26545(@xref{Action Overview}.)
26546
26547@cindex Spencer, Henry
26548@cindex @command{sed} utility
26549@cindex amazing @command{awk} assembler (@command{aaa})
26550@item Amazing @command{awk} Assembler
26551Henry Spencer at the University of Toronto wrote a retargetable assembler
26552completely as @command{sed} and @command{awk} scripts.  It is thousands
26553of lines long, including machine descriptions for several eight-bit
26554microcomputers.  It is a good example of a program that would have been
26555better written in another language.
26556You can get it from @uref{ftp://ftp.freefriends.org/arnold/Awkstuff/aaa.tgz}.
26557
26558@cindex amazingly workable formatter (@command{awf})
26559@cindex @command{awf} (amazingly workable formatter) program
26560@item Amazingly Workable Formatter (@command{awf})
26561Henry Spencer at the University of Toronto wrote a formatter that accepts
26562a large subset of the @samp{nroff -ms} and @samp{nroff -man} formatting
26563commands, using @command{awk} and @command{sh}.
26564It is available over the Internet
26565from @uref{ftp://ftp.freefriends.org/arnold/Awkstuff/awf.tgz}.
26566
26567@item Anchor
26568The regexp metacharacters @samp{^} and @samp{$}, which force the match
26569to the beginning or end of the string, respectively.
26570
26571@cindex ANSI
26572@item ANSI
26573The American National Standards Institute.  This organization produces
26574many standards, among them the standards for the C and C++ programming
26575languages.
26576These standards often become international standards as well. See also
26577``ISO.''
26578
26579@item Array
26580A grouping of multiple values under the same name.
26581Most languages just provide sequential arrays.
26582@command{awk} provides associative arrays.
26583
26584@item Assertion
26585A statement in a program that a condition is true at this point in the program.
26586Useful for reasoning about how a program is supposed to behave.
26587
26588@item Assignment
26589An @command{awk} expression that changes the value of some @command{awk}
26590variable or data object.  An object that you can assign to is called an
26591@dfn{lvalue}.  The assigned values are called @dfn{rvalues}.
26592@xref{Assignment Ops}.
26593
26594@item Associative Array
26595Arrays in which the indices may be numbers or strings, not just
26596sequential integers in a fixed range.
26597
26598@item @command{awk} Language
26599The language in which @command{awk} programs are written.
26600
26601@item @command{awk} Program
26602An @command{awk} program consists of a series of @dfn{patterns} and
26603@dfn{actions}, collectively known as @dfn{rules}.  For each input record
26604given to the program, the program's rules are all processed in turn.
26605@command{awk} programs may also contain function definitions.
26606
26607@item @command{awk} Script
26608Another name for an @command{awk} program.
26609
26610@item Bash
26611The GNU version of the standard shell
26612@ifnotinfo
26613(the @b{B}ourne-@b{A}gain @b{SH}ell).
26614@end ifnotinfo
26615@ifinfo
26616(the Bourne-Again SHell).
26617@end ifinfo
26618See also ``Bourne Shell.''
26619
26620@item BBS
26621See ``Bulletin Board System.''
26622
26623@item Bit
26624Short for ``Binary Digit.''
26625All values in computer memory ultimately reduce to binary digits: values
26626that are either zero or one.
26627Groups of bits may be interpreted differently---as integers,
26628floating-point numbers, character data, addresses of other
26629memory objects, or other data.
26630@command{awk} lets you work with floating-point numbers and strings.
26631@command{gawk} lets you manipulate bit values with the built-in
26632functions described in
26633@ref{Bitwise Functions}.
26634
26635Computers are often defined by how many bits they use to represent integer
26636values.  Typical systems are 32-bit systems, but 64-bit systems are
26637becoming increasingly popular, and 16-bit systems are waning in
26638popularity.
26639
26640@item Boolean Expression
26641Named after the English mathematician Boole. See also ``Logical Expression.''
26642
26643@item Bourne Shell
26644The standard shell (@file{/bin/sh}) on Unix and Unix-like systems,
26645originally written by Steven R.@: Bourne.
26646Many shells (@command{bash}, @command{ksh}, @command{pdksh}, @command{zsh}) are
26647generally upwardly compatible with the Bourne shell.
26648
26649@item Built-in Function
26650The @command{awk} language provides built-in functions that perform various
26651numerical, I/O-related, and string computations.  Examples are
26652@code{sqrt} (for the square root of a number) and @code{substr} (for a
26653substring of a string).
26654@command{gawk} provides functions for timestamp management, bit manipulation,
26655and runtime string translation.
26656(@xref{Built-in}.)
26657
26658@item Built-in Variable
26659@code{ARGC},
26660@code{ARGV},
26661@code{CONVFMT},
26662@code{ENVIRON},
26663@code{FILENAME},
26664@code{FNR},
26665@code{FS},
26666@code{NF},
26667@code{NR},
26668@code{OFMT},
26669@code{OFS},
26670@code{ORS},
26671@code{RLENGTH},
26672@code{RSTART},
26673@code{RS},
26674and
26675@code{SUBSEP}
26676are the variables that have special meaning to @command{awk}.
26677In addition,
26678@code{ARGIND},
26679@code{BINMODE},
26680@code{ERRNO},
26681@code{FIELDWIDTHS},
26682@code{IGNORECASE},
26683@code{LINT},
26684@code{PROCINFO},
26685@code{RT},
26686and
26687@code{TEXTDOMAIN}
26688are the variables that have special meaning to @command{gawk}.
26689Changing some of them affects @command{awk}'s running environment.
26690(@xref{Built-in Variables}.)
26691
26692@item Braces
26693See ``Curly Braces.''
26694
26695@item Bulletin Board System
26696A computer system allowing users to log in and read and/or leave messages
26697for other users of the system, much like leaving paper notes on a bulletin
26698board.
26699
26700@item C
26701The system programming language that most GNU software is written in.  The
26702@command{awk} programming language has C-like syntax, and this @value{DOCUMENT}
26703points out similarities between @command{awk} and C when appropriate.
26704
26705In general, @command{gawk} attempts to be as similar to the 1990 version
26706of ISO C as makes sense.  Future versions of @command{gawk} may adopt features
26707from the newer 1999 standard, as appropriate.
26708
26709@item C++
26710A popular object-oriented programming language derived from C.
26711
26712@cindex ISO 8859-1
26713@cindex ISO Latin-1
26714@cindex character sets (machine character encodings)
26715@item Character Set
26716The set of numeric codes used by a computer system to represent the
26717characters (letters, numbers, punctuation, etc.) of a particular country
26718or place. The most common character set in use today is ASCII (American
26719Standard Code for Information Interchange).  Many European
26720countries use an extension of ASCII known as ISO-8859-1 (ISO Latin-1).
26721
26722@cindex @command{chem} utility
26723@item CHEM
26724A preprocessor for @command{pic} that reads descriptions of molecules
26725and produces @command{pic} input for drawing them.
26726It was written in @command{awk}
26727by Brian Kernighan and Jon Bentley, and is available from
26728@uref{http://cm.bell-labs.com/netlib/typesetting/chem.gz}.
26729
26730@item Coprocess
26731A subordinate program with which two-way communications is possible.
26732
26733@cindex compiled programs
26734@item Compiler
26735A program that translates human-readable source code into
26736machine-executable object code.  The object code is then executed
26737directly by the computer.
26738See also ``Interpreter.''
26739
26740@item Compound Statement
26741A series of @command{awk} statements, enclosed in curly braces.  Compound
26742statements may be nested.
26743(@xref{Statements}.)
26744
26745@item Concatenation
26746Concatenating two strings means sticking them together, one after another,
26747producing a new string.  For example, the string @samp{foo} concatenated with
26748the string @samp{bar} gives the string @samp{foobar}.
26749(@xref{Concatenation}.)
26750
26751@item Conditional Expression
26752An expression using the @samp{?:} ternary operator, such as
26753@samp{@var{expr1} ? @var{expr2} : @var{expr3}}.  The expression
26754@var{expr1} is evaluated; if the result is true, the value of the whole
26755expression is the value of @var{expr2}; otherwise the value is
26756@var{expr3}.  In either case, only one of @var{expr2} and @var{expr3}
26757is evaluated. (@xref{Conditional Exp}.)
26758
26759@item Comparison Expression
26760A relation that is either true or false, such as @samp{(a < b)}.
26761Comparison expressions are used in @code{if}, @code{while}, @code{do},
26762and @code{for}
26763statements, and in patterns to select which input records to process.
26764(@xref{Typing and Comparison}.)
26765
26766@item Curly Braces
26767The characters @samp{@{} and @samp{@}}.  Curly braces are used in
26768@command{awk} for delimiting actions, compound statements, and function
26769bodies.
26770
26771@cindex dark corner
26772@item Dark Corner
26773An area in the language where specifications often were (or still
26774are) not clear, leading to unexpected or undesirable behavior.
26775Such areas are marked in this @value{DOCUMENT} with
26776@iftex
26777the picture of a flashlight in the margin
26778@end iftex
26779@ifnottex
26780``(d.c.)'' in the text
26781@end ifnottex
26782and are indexed under the heading ``dark corner.''
26783
26784@item Data Driven
26785A description of @command{awk} programs, where you specify the data you
26786are interested in processing, and what to do when that data is seen.
26787
26788@item Data Objects
26789These are numbers and strings of characters.  Numbers are converted into
26790strings and vice versa, as needed.
26791(@xref{Conversion}.)
26792
26793@item Deadlock
26794The situation in which two communicating processes are each waiting
26795for the other to perform an action.
26796
26797@item Double-Precision
26798An internal representation of numbers that can have fractional parts.
26799Double-precision numbers keep track of more digits than do single-precision
26800numbers, but operations on them are sometimes more expensive.  This is the way
26801@command{awk} stores numeric values.  It is the C type @code{double}.
26802
26803@item Dynamic Regular Expression
26804A dynamic regular expression is a regular expression written as an
26805ordinary expression.  It could be a string constant, such as
26806@code{"foo"}, but it may also be an expression whose value can vary.
26807(@xref{Computed Regexps}.)
26808
26809@item Environment
26810A collection of strings, of the form @var{name@code{=}val}, that each
26811program has available to it. Users generally place values into the
26812environment in order to provide information to various programs. Typical
26813examples are the environment variables @env{HOME} and @env{PATH}.
26814
26815@item Empty String
26816See ``Null String.''
26817
26818@cindex epoch, definition of
26819@item Epoch
26820The date used as the ``beginning of time'' for timestamps.
26821Time values in Unix systems are represented as seconds since the epoch,
26822with library functions available for converting these values into
26823standard date and time formats.
26824
26825The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC.
26826See also ``GMT'' and ``UTC.''
26827
26828@item Escape Sequences
26829A special sequence of characters used for describing nonprinting
26830characters, such as @samp{\n} for newline or @samp{\033} for the ASCII
26831ESC (Escape) character. (@xref{Escape Sequences}.)
26832
26833@item FDL
26834See ``Free Documentation License.''
26835
26836@item Field
26837When @command{awk} reads an input record, it splits the record into pieces
26838separated by whitespace (or by a separator regexp that you can
26839change by setting the built-in variable @code{FS}).  Such pieces are
26840called fields.  If the pieces are of fixed length, you can use the built-in
26841variable @code{FIELDWIDTHS} to describe their lengths.
26842(@xref{Field Separators},
26843and
26844@ref{Constant Size}.)
26845
26846@item Flag
26847A variable whose truth value indicates the existence or nonexistence
26848of some condition.
26849
26850@item Floating-Point Number
26851Often referred to in mathematical terms as a ``rational'' or real number,
26852this is just a number that can have a fractional part.
26853See also ``Double-Precision'' and ``Single-Precision.''
26854
26855@item Format
26856Format strings are used to control the appearance of output in the
26857@code{strftime} and @code{sprintf} functions, and are used in the
26858@code{printf} statement as well.  Also, data conversions from numbers to strings
26859are controlled by the format string contained in the built-in variable
26860@code{CONVFMT}. (@xref{Control Letters}.)
26861
26862@item Free Documentation License
26863This document describes the terms under which this @value{DOCUMENT}
26864is published and may be copied. (@xref{GNU Free Documentation License}.)
26865
26866@item Function
26867A specialized group of statements used to encapsulate general
26868or program-specific tasks.  @command{awk} has a number of built-in
26869functions, and also allows you to define your own.
26870(@xref{Functions}.)
26871
26872@item FSF
26873See ``Free Software Foundation.''
26874
26875@cindex FSF (Free Software Foundation)
26876@cindex Free Software Foundation (FSF)
26877@cindex Stallman, Richard
26878@item Free Software Foundation
26879A nonprofit organization dedicated
26880to the production and distribution of freely distributable software.
26881It was founded by Richard M.@: Stallman, the author of the original
26882Emacs editor.  GNU Emacs is the most widely used version of Emacs today.
26883
26884@item @command{gawk}
26885The GNU implementation of @command{awk}.
26886
26887@cindex GPL (General Public License)
26888@cindex General Public License (GPL)
26889@cindex GNU General Public License
26890@item General Public License
26891This document describes the terms under which @command{gawk} and its source
26892code may be distributed. (@xref{Copying}.)
26893
26894@item GMT
26895``Greenwich Mean Time.''
26896This is the old term for UTC.
26897It is the time of day used as the epoch for Unix and POSIX systems.
26898See also ``Epoch'' and ``UTC.''
26899
26900@cindex FSF (Free Software Foundation)
26901@cindex Free Software Foundation (FSF)
26902@cindex GNU Project
26903@item GNU
26904``GNU's not Unix''.  An on-going project of the Free Software Foundation
26905to create a complete, freely distributable, POSIX-compliant computing
26906environment.
26907
26908@item GNU/Linux
26909A variant of the GNU system using the Linux kernel, instead of the
26910Free Software Foundation's Hurd kernel.
26911Linux is a stable, efficient, full-featured clone of Unix that has
26912been ported to a variety of architectures.
26913It is most popular on PC-class systems, but runs well on a variety of
26914other systems too.
26915The Linux kernel source code is available under the terms of the GNU General
26916Public License, which is perhaps its most important aspect.
26917
26918@item GPL
26919See ``General Public License.''
26920
26921@item Hexadecimal
26922Base 16 notation, where the digits are @code{0}--@code{9} and
26923@code{A}--@code{F}, with @samp{A}
26924representing 10, @samp{B} representing 11, and so on, up to @samp{F} for 15.
26925Hexadecimal numbers are written in C using a leading @samp{0x},
26926to indicate their base.  Thus, @code{0x12} is 18 (1 times 16 plus 2).
26927
26928@item I/O
26929Abbreviation for ``Input/Output,'' the act of moving data into and/or
26930out of a running program.
26931
26932@item Input Record
26933A single chunk of data that is read in by @command{awk}.  Usually, an @command{awk} input
26934record consists of one line of text.
26935(@xref{Records}.)
26936
26937@item Integer
26938A whole number, i.e., a number that does not have a fractional part.
26939
26940@item Internationalization
26941The process of writing or modifying a program so
26942that it can use multiple languages without requiring
26943further source code changes.
26944
26945@cindex interpreted programs
26946@item Interpreter
26947A program that reads human-readable source code directly, and uses
26948the instructions in it to process data and produce results.
26949@command{awk} is typically (but not always) implemented as an interpreter.
26950See also ``Compiler.''
26951
26952@item Interval Expression
26953A component of a regular expression that lets you specify repeated matches of
26954some part of the regexp.  Interval expressions were not traditionally available
26955in @command{awk} programs.
26956
26957@cindex ISO
26958@item ISO
26959The International Standards Organization.
26960This organization produces international standards for many things, including
26961programming languages, such as C and C++.
26962In the computer arena, important standards like those for C, C++, and POSIX
26963become both American national and ISO international standards simultaneously.
26964This @value{DOCUMENT} refers to Standard C as ``ISO C'' throughout.
26965
26966@item Keyword
26967In the @command{awk} language, a keyword is a word that has special
26968meaning.  Keywords are reserved and may not be used as variable names.
26969
26970@command{gawk}'s keywords are:
26971@code{BEGIN},
26972@code{END},
26973@code{if},
26974@code{else},
26975@code{while},
26976@code{do@dots{}while},
26977@code{for},
26978@code{for@dots{}in},
26979@code{break},
26980@code{continue},
26981@code{delete},
26982@code{next},
26983@code{nextfile},
26984@code{function},
26985@code{func},
26986and
26987@code{exit}.
26988
26989@cindex LGPL (Lesser General Public License)
26990@cindex Lesser General Public License (LGPL)
26991@cindex GNU Lesser General Public License
26992@item Lesser General Public License
26993This document describes the terms under which binary library archives
26994or shared objects,
26995and their source code may be distributed.
26996
26997@item Linux
26998See ``GNU/Linux.''
26999
27000@item LGPL
27001See ``Lesser General Public License.''
27002
27003@item Localization
27004The process of providing the data necessary for an
27005internationalized program to work in a particular language.
27006
27007@item Logical Expression
27008An expression using the operators for logic, AND, OR, and NOT, written
27009@samp{&&}, @samp{||}, and @samp{!} in @command{awk}. Often called Boolean
27010expressions, after the mathematician who pioneered this kind of
27011mathematical logic.
27012
27013@item Lvalue
27014An expression that can appear on the left side of an assignment
27015operator.  In most languages, lvalues can be variables or array
27016elements.  In @command{awk}, a field designator can also be used as an
27017lvalue.
27018
27019@item Matching
27020The act of testing a string against a regular expression.  If the
27021regexp describes the contents of the string, it is said to @dfn{match} it.
27022
27023@item Metacharacters
27024Characters used within a regexp that do not stand for themselves.
27025Instead, they denote regular expression operations, such as repetition,
27026grouping, or alternation.
27027
27028@item Null String
27029A string with no characters in it.  It is represented explicitly in
27030@command{awk} programs by placing two double quote characters next to
27031each other (@code{""}).  It can appear in input data by having two successive
27032occurrences of the field separator appear next to each other.
27033
27034@item Number
27035A numeric-valued data object.  Modern @command{awk} implementations use
27036double-precision floating-point to represent numbers.
27037Very old @command{awk} implementations use single-precision floating-point.
27038
27039@item Octal
27040Base-eight notation, where the digits are @code{0}--@code{7}.
27041Octal numbers are written in C using a leading @samp{0},
27042to indicate their base.  Thus, @code{013} is 11 (one times 8 plus 3).
27043
27044@cindex P1003.2 POSIX standard
27045@item P1003.2
27046See ``POSIX.''
27047
27048@item Pattern
27049Patterns tell @command{awk} which input records are interesting to which
27050rules.
27051
27052A pattern is an arbitrary conditional expression against which input is
27053tested.  If the condition is satisfied, the pattern is said to @dfn{match}
27054the input record.  A typical pattern might compare the input record against
27055a regular expression. (@xref{Pattern Overview}.)
27056
27057@item POSIX
27058The name for a series of standards
27059@c being developed by the IEEE
27060that specify a Portable Operating System interface.  The ``IX'' denotes
27061the Unix heritage of these standards.  The main standard of interest for
27062@command{awk} users is
27063@cite{IEEE Standard for Information Technology, Standard 1003.2-1992,
27064Portable Operating System Interface (POSIX) Part 2: Shell and Utilities}.
27065Informally, this standard is often referred to as simply ``P1003.2.''
27066
27067@item Precedence
27068The order in which operations are performed when operators are used
27069without explicit parentheses.
27070
27071@item Private
27072Variables and/or functions that are meant for use exclusively by library
27073functions and not for the main @command{awk} program. Special care must be
27074taken when naming such variables and functions.
27075(@xref{Library Names}.)
27076
27077@item Range (of input lines)
27078A sequence of consecutive lines from the input file(s).  A pattern
27079can specify ranges of input lines for @command{awk} to process or it can
27080specify single lines. (@xref{Pattern Overview}.)
27081
27082@item Recursion
27083When a function calls itself, either directly or indirectly.
27084If this isn't clear, refer to the entry for ``recursion.''
27085
27086@item Redirection
27087Redirection means performing input from something other than the standard input
27088stream, or performing output to something other than the standard output stream.
27089
27090You can redirect the output of the @code{print} and @code{printf} statements
27091to a file or a system command, using the @samp{>}, @samp{>>}, @samp{|}, and @samp{|&}
27092operators.  You can redirect input to the @code{getline} statement using
27093the @samp{<}, @samp{|}, and @samp{|&} operators.
27094(@xref{Redirection},
27095and @ref{Getline}.)
27096
27097@item Regexp
27098Short for @dfn{regular expression}.  A regexp is a pattern that denotes a
27099set of strings, possibly an infinite set.  For example, the regexp
27100@samp{R.*xp} matches any string starting with the letter @samp{R}
27101and ending with the letters @samp{xp}.  In @command{awk}, regexps are
27102used in patterns and in conditional expressions.  Regexps may contain
27103escape sequences. (@xref{Regexp}.)
27104
27105@item Regular Expression
27106See ``regexp.''
27107
27108@item Regular Expression Constant
27109A regular expression constant is a regular expression written within
27110slashes, such as @code{/foo/}.  This regular expression is chosen
27111when you write the @command{awk} program and cannot be changed during
27112its execution. (@xref{Regexp Usage}.)
27113
27114@item Rule
27115A segment of an @command{awk} program that specifies how to process single
27116input records.  A rule consists of a @dfn{pattern} and an @dfn{action}.
27117@command{awk} reads an input record; then, for each rule, if the input record
27118satisfies the rule's pattern, @command{awk} executes the rule's action.
27119Otherwise, the rule does nothing for that input record.
27120
27121@item Rvalue
27122A value that can appear on the right side of an assignment operator.
27123In @command{awk}, essentially every expression has a value. These values
27124are rvalues.
27125
27126@item Scalar
27127A single value, be it a number or a string.
27128Regular variables are scalars; arrays and functions are not.
27129
27130@item Search Path
27131In @command{gawk}, a list of directories to search for @command{awk} program source files.
27132In the shell, a list of directories to search for executable programs.
27133
27134@item Seed
27135The initial value, or starting point, for a sequence of random numbers.
27136
27137@item @command{sed}
27138See ``Stream Editor.''
27139
27140@item Shell
27141The command interpreter for Unix and POSIX-compliant systems.
27142The shell works both interactively, and as a programming language
27143for batch files, or shell scripts.
27144
27145@item Short-Circuit
27146The nature of the @command{awk} logical operators @samp{&&} and @samp{||}.
27147If the value of the entire expression is determinable from evaluating just
27148the lefthand side of these operators, the righthand side is not
27149evaluated.
27150(@xref{Boolean Ops}.)
27151
27152@item Side Effect
27153A side effect occurs when an expression has an effect aside from merely
27154producing a value.  Assignment expressions, increment and decrement
27155expressions, and function calls have side effects.
27156(@xref{Assignment Ops}.)
27157
27158@item Single-Precision
27159An internal representation of numbers that can have fractional parts.
27160Single-precision numbers keep track of fewer digits than do double-precision
27161numbers, but operations on them are sometimes less expensive in terms of CPU time.
27162This is the type used by some very old versions of @command{awk} to store
27163numeric values.  It is the C type @code{float}.
27164
27165@item Space
27166The character generated by hitting the space bar on the keyboard.
27167
27168@item Special File
27169A @value{FN} interpreted internally by @command{gawk}, instead of being handed
27170directly to the underlying operating system---for example, @file{/dev/stderr}.
27171(@xref{Special Files}.)
27172
27173@item Stream Editor
27174A program that reads records from an input stream and processes them one
27175or more at a time.  This is in contrast with batch programs, which may
27176expect to read their input files in entirety before starting to do
27177anything, as well as with interactive programs which require input from the
27178user.
27179
27180@item String
27181A datum consisting of a sequence of characters, such as @samp{I am a
27182string}.  Constant strings are written with double quotes in the
27183@command{awk} language and may contain escape sequences.
27184(@xref{Escape Sequences}.)
27185
27186@item Tab
27187The character generated by hitting the @kbd{TAB} key on the keyboard.
27188It usually expands to up to eight spaces upon output.
27189
27190@item Text Domain
27191A unique name that identifies an application.
27192Used for grouping messages that are translated at runtime
27193into the local language.
27194
27195@item Timestamp
27196A value in the ``seconds since the epoch'' format used by Unix
27197and POSIX systems.  Used for the @command{gawk} functions
27198@code{mktime}, @code{strftime}, and @code{systime}.
27199See also ``Epoch'' and ``UTC.''
27200
27201@cindex Linux
27202@cindex GNU/Linux
27203@cindex Unix
27204@cindex BSD-based operating systems
27205@cindex NetBSD
27206@cindex FreeBSD
27207@cindex OpenBSD
27208@item Unix
27209A computer operating system originally developed in the early 1970's at
27210AT&T Bell Laboratories.  It initially became popular in universities around
27211the world and later moved into commercial environments as a software
27212development system and network server system. There are many commercial
27213versions of Unix, as well as several work-alike systems whose source code
27214is freely available (such as GNU/Linux, NetBSD, FreeBSD, and OpenBSD).
27215
27216@item UTC
27217The accepted abbreviation for ``Universal Coordinated Time.''
27218This is standard time in Greenwich, England, which is used as a
27219reference time for day and date calculations.
27220See also ``Epoch'' and ``GMT.''
27221
27222@item Whitespace
27223A sequence of space, TAB, or newline characters occurring inside an input
27224record or a string.
27225@end table
27226
27227@node Copying
27228@unnumbered GNU General Public License
27229@center Version 2, June 1991
27230
27231@display
27232Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
2723359 Temple Place, Suite 330, Boston, MA 02111, USA
27234
27235Everyone is permitted to copy and distribute verbatim copies
27236of this license document, but changing it is not allowed.
27237@end display
27238
27239@c fakenode --- for prepinfo
27240@unnumberedsec Preamble
27241
27242  The licenses for most software are designed to take away your
27243freedom to share and change it.  By contrast, the GNU General Public
27244License is intended to guarantee your freedom to share and change free
27245software---to make sure the software is free for all its users.  This
27246General Public License applies to most of the Free Software
27247Foundation's software and to any other program whose authors commit to
27248using it.  (Some other Free Software Foundation software is covered by
27249the GNU Library General Public License instead.)  You can apply it to
27250your programs, too.
27251
27252  When we speak of free software, we are referring to freedom, not
27253price.  Our General Public Licenses are designed to make sure that you
27254have the freedom to distribute copies of free software (and charge for
27255this service if you wish), that you receive source code or can get it
27256if you want it, that you can change the software or use pieces of it
27257in new free programs; and that you know you can do these things.
27258
27259  To protect your rights, we need to make restrictions that forbid
27260anyone to deny you these rights or to ask you to surrender the rights.
27261These restrictions translate to certain responsibilities for you if you
27262distribute copies of the software, or if you modify it.
27263
27264  For example, if you distribute copies of such a program, whether
27265gratis or for a fee, you must give the recipients all the rights that
27266you have.  You must make sure that they, too, receive or can get the
27267source code.  And you must show them these terms so they know their
27268rights.
27269
27270  We protect your rights with two steps: (1) copyright the software, and
27271(2) offer you this license which gives you legal permission to copy,
27272distribute and/or modify the software.
27273
27274  Also, for each author's protection and ours, we want to make certain
27275that everyone understands that there is no warranty for this free
27276software.  If the software is modified by someone else and passed on, we
27277want its recipients to know that what they have is not the original, so
27278that any problems introduced by others will not reflect on the original
27279authors' reputations.
27280
27281  Finally, any free program is threatened constantly by software
27282patents.  We wish to avoid the danger that redistributors of a free
27283program will individually obtain patent licenses, in effect making the
27284program proprietary.  To prevent this, we have made it clear that any
27285patent must be licensed for everyone's free use or not licensed at all.
27286
27287  The precise terms and conditions for copying, distribution and
27288modification follow.
27289
27290@ifnotinfo
27291@c fakenode --- for prepinfo
27292@unnumberedsec Terms and Conditions for Copying, Distribution and Modification
27293@end ifnotinfo
27294@ifinfo
27295@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
27296@end ifinfo
27297
27298@enumerate 0
27299@item
27300This License applies to any program or other work which contains
27301a notice placed by the copyright holder saying it may be distributed
27302under the terms of this General Public License.  The ``Program'', below,
27303refers to any such program or work, and a ``work based on the Program''
27304means either the Program or any derivative work under copyright law:
27305that is to say, a work containing the Program or a portion of it,
27306either verbatim or with modifications and/or translated into another
27307language.  (Hereinafter, translation is included without limitation in
27308the term ``modification''.)  Each licensee is addressed as ``you''.
27309
27310Activities other than copying, distribution and modification are not
27311covered by this License; they are outside its scope.  The act of
27312running the Program is not restricted, and the output from the Program
27313is covered only if its contents constitute a work based on the
27314Program (independent of having been made by running the Program).
27315Whether that is true depends on what the Program does.
27316
27317@item
27318You may copy and distribute verbatim copies of the Program's
27319source code as you receive it, in any medium, provided that you
27320conspicuously and appropriately publish on each copy an appropriate
27321copyright notice and disclaimer of warranty; keep intact all the
27322notices that refer to this License and to the absence of any warranty;
27323and give any other recipients of the Program a copy of this License
27324along with the Program.
27325
27326You may charge a fee for the physical act of transferring a copy, and
27327you may at your option offer warranty protection in exchange for a fee.
27328
27329@item
27330You may modify your copy or copies of the Program or any portion
27331of it, thus forming a work based on the Program, and copy and
27332distribute such modifications or work under the terms of Section 1
27333above, provided that you also meet all of these conditions:
27334
27335@enumerate a
27336@item
27337You must cause the modified files to carry prominent notices
27338stating that you changed the files and the date of any change.
27339
27340@item
27341You must cause any work that you distribute or publish, that in
27342whole or in part contains or is derived from the Program or any
27343part thereof, to be licensed as a whole at no charge to all third
27344parties under the terms of this License.
27345
27346@item
27347If the modified program normally reads commands interactively
27348when run, you must cause it, when started running for such
27349interactive use in the most ordinary way, to print or display an
27350announcement including an appropriate copyright notice and a
27351notice that there is no warranty (or else, saying that you provide
27352a warranty) and that users may redistribute the program under
27353these conditions, and telling the user how to view a copy of this
27354License.  (Exception: if the Program itself is interactive but
27355does not normally print such an announcement, your work based on
27356the Program is not required to print an announcement.)
27357@end enumerate
27358
27359These requirements apply to the modified work as a whole.  If
27360identifiable sections of that work are not derived from the Program,
27361and can be reasonably considered independent and separate works in
27362themselves, then this License, and its terms, do not apply to those
27363sections when you distribute them as separate works.  But when you
27364distribute the same sections as part of a whole which is a work based
27365on the Program, the distribution of the whole must be on the terms of
27366this License, whose permissions for other licensees extend to the
27367entire whole, and thus to each and every part regardless of who wrote it.
27368
27369Thus, it is not the intent of this section to claim rights or contest
27370your rights to work written entirely by you; rather, the intent is to
27371exercise the right to control the distribution of derivative or
27372collective works based on the Program.
27373
27374In addition, mere aggregation of another work not based on the Program
27375with the Program (or with a work based on the Program) on a volume of
27376a storage or distribution medium does not bring the other work under
27377the scope of this License.
27378
27379@item
27380You may copy and distribute the Program (or a work based on it,
27381under Section 2) in object code or executable form under the terms of
27382Sections 1 and 2 above provided that you also do one of the following:
27383
27384@enumerate a
27385@item
27386Accompany it with the complete corresponding machine-readable
27387source code, which must be distributed under the terms of Sections
273881 and 2 above on a medium customarily used for software interchange; or,
27389
27390@item
27391Accompany it with a written offer, valid for at least three
27392years, to give any third party, for a charge no more than your
27393cost of physically performing source distribution, a complete
27394machine-readable copy of the corresponding source code, to be
27395distributed under the terms of Sections 1 and 2 above on a medium
27396customarily used for software interchange; or,
27397
27398@item
27399Accompany it with the information you received as to the offer
27400to distribute corresponding source code.  (This alternative is
27401allowed only for noncommercial distribution and only if you
27402received the program in object code or executable form with such
27403an offer, in accord with Subsection b above.)
27404@end enumerate
27405
27406The source code for a work means the preferred form of the work for
27407making modifications to it.  For an executable work, complete source
27408code means all the source code for all modules it contains, plus any
27409associated interface definition files, plus the scripts used to
27410control compilation and installation of the executable.  However, as a
27411special exception, the source code distributed need not include
27412anything that is normally distributed (in either source or binary
27413form) with the major components (compiler, kernel, and so on) of the
27414operating system on which the executable runs, unless that component
27415itself accompanies the executable.
27416
27417If distribution of executable or object code is made by offering
27418access to copy from a designated place, then offering equivalent
27419access to copy the source code from the same place counts as
27420distribution of the source code, even though third parties are not
27421compelled to copy the source along with the object code.
27422
27423@item
27424You may not copy, modify, sublicense, or distribute the Program
27425except as expressly provided under this License.  Any attempt
27426otherwise to copy, modify, sublicense or distribute the Program is
27427void, and will automatically terminate your rights under this License.
27428However, parties who have received copies, or rights, from you under
27429this License will not have their licenses terminated so long as such
27430parties remain in full compliance.
27431
27432@item
27433You are not required to accept this License, since you have not
27434signed it.  However, nothing else grants you permission to modify or
27435distribute the Program or its derivative works.  These actions are
27436prohibited by law if you do not accept this License.  Therefore, by
27437modifying or distributing the Program (or any work based on the
27438Program), you indicate your acceptance of this License to do so, and
27439all its terms and conditions for copying, distributing or modifying
27440the Program or works based on it.
27441
27442@item
27443Each time you redistribute the Program (or any work based on the
27444Program), the recipient automatically receives a license from the
27445original licensor to copy, distribute or modify the Program subject to
27446these terms and conditions.  You may not impose any further
27447restrictions on the recipients' exercise of the rights granted herein.
27448You are not responsible for enforcing compliance by third parties to
27449this License.
27450
27451@item
27452If, as a consequence of a court judgment or allegation of patent
27453infringement or for any other reason (not limited to patent issues),
27454conditions are imposed on you (whether by court order, agreement or
27455otherwise) that contradict the conditions of this License, they do not
27456excuse you from the conditions of this License.  If you cannot
27457distribute so as to satisfy simultaneously your obligations under this
27458License and any other pertinent obligations, then as a consequence you
27459may not distribute the Program at all.  For example, if a patent
27460license would not permit royalty-free redistribution of the Program by
27461all those who receive copies directly or indirectly through you, then
27462the only way you could satisfy both it and this License would be to
27463refrain entirely from distribution of the Program.
27464
27465If any portion of this section is held invalid or unenforceable under
27466any particular circumstance, the balance of the section is intended to
27467apply and the section as a whole is intended to apply in other
27468circumstances.
27469
27470It is not the purpose of this section to induce you to infringe any
27471patents or other property right claims or to contest validity of any
27472such claims; this section has the sole purpose of protecting the
27473integrity of the free software distribution system, which is
27474implemented by public license practices.  Many people have made
27475generous contributions to the wide range of software distributed
27476through that system in reliance on consistent application of that
27477system; it is up to the author/donor to decide if he or she is willing
27478to distribute software through any other system and a licensee cannot
27479impose that choice.
27480
27481This section is intended to make thoroughly clear what is believed to
27482be a consequence of the rest of this License.
27483
27484@item
27485If the distribution and/or use of the Program is restricted in
27486certain countries either by patents or by copyrighted interfaces, the
27487original copyright holder who places the Program under this License
27488may add an explicit geographical distribution limitation excluding
27489those countries, so that distribution is permitted only in or among
27490countries not thus excluded.  In such case, this License incorporates
27491the limitation as if written in the body of this License.
27492
27493@item
27494The Free Software Foundation may publish revised and/or new versions
27495of the General Public License from time to time.  Such new versions will
27496be similar in spirit to the present version, but may differ in detail to
27497address new problems or concerns.
27498
27499Each version is given a distinguishing version number.  If the Program
27500specifies a version number of this License which applies to it and ``any
27501later version'', you have the option of following the terms and conditions
27502either of that version or of any later version published by the Free
27503Software Foundation.  If the Program does not specify a version number of
27504this License, you may choose any version ever published by the Free Software
27505Foundation.
27506
27507@item
27508If you wish to incorporate parts of the Program into other free
27509programs whose distribution conditions are different, write to the author
27510to ask for permission.  For software which is copyrighted by the Free
27511Software Foundation, write to the Free Software Foundation; we sometimes
27512make exceptions for this.  Our decision will be guided by the two goals
27513of preserving the free status of all derivatives of our free software and
27514of promoting the sharing and reuse of software generally.
27515
27516@ifnotinfo
27517@c fakenode --- for prepinfo
27518@heading NO WARRANTY
27519@end ifnotinfo
27520@ifinfo
27521@center NO WARRANTY
27522@end ifinfo
27523
27524@item
27525BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
27526FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW@.  EXCEPT WHEN
27527OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
27528PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
27529OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
27530MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE@.  THE ENTIRE RISK AS
27531TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU@.  SHOULD THE
27532PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
27533REPAIR OR CORRECTION.
27534
27535@item
27536IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
27537WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
27538REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
27539INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
27540OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
27541TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
27542YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
27543PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
27544POSSIBILITY OF SUCH DAMAGES.
27545@end enumerate
27546
27547@ifnotinfo
27548@c fakenode --- for prepinfo
27549@heading END OF TERMS AND CONDITIONS
27550@end ifnotinfo
27551@ifinfo
27552@center END OF TERMS AND CONDITIONS
27553@end ifinfo
27554
27555@page
27556@c fakenode --- for prepinfo
27557@unnumberedsec How to Apply These Terms to Your New Programs
27558
27559  If you develop a new program, and you want it to be of the greatest
27560possible use to the public, the best way to achieve this is to make it
27561free software which everyone can redistribute and change under these terms.
27562
27563  To do so, attach the following notices to the program.  It is safest
27564to attach them to the start of each source file to most effectively
27565convey the exclusion of warranty; and each file should have at least
27566the ``copyright'' line and a pointer to where the full notice is found.
27567
27568@smallexample
27569@var{one line to give the program's name and an idea of what it does.}
27570Copyright (C) @var{year}  @var{name of author}
27571
27572This program is free software; you can redistribute it and/or
27573modify it under the terms of the GNU General Public License
27574as published by the Free Software Foundation; either version 2
27575of the License, or (at your option) any later version.
27576
27577This program is distributed in the hope that it will be useful,
27578but WITHOUT ANY WARRANTY; without even the implied warranty of
27579MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE@.  See the
27580GNU General Public License for more details.
27581
27582You should have received a copy of the GNU General Public License
27583along with this program; if not, write to the Free Software
27584Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA.
27585@end smallexample
27586
27587Also add information on how to contact you by electronic and paper mail.
27588
27589If the program is interactive, make it output a short notice like this
27590when it starts in an interactive mode:
27591
27592@smallexample
27593Gnomovision version 69, Copyright (C) @var{year} @var{name of author}
27594Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
27595type `show w'.  This is free software, and you are welcome
27596to redistribute it under certain conditions; type `show c'
27597for details.
27598@end smallexample
27599
27600The hypothetical commands @samp{show w} and @samp{show c} should show
27601the appropriate parts of the General Public License.  Of course, the
27602commands you use may be called something other than @samp{show w} and
27603@samp{show c}; they could even be mouse-clicks or menu items---whatever
27604suits your program.
27605
27606You should also get your employer (if you work as a programmer) or your
27607school, if any, to sign a ``copyright disclaimer'' for the program, if
27608necessary.  Here is a sample; alter the names:
27609
27610@smallexample
27611@group
27612Yoyodyne, Inc., hereby disclaims all copyright
27613interest in the program `Gnomovision'
27614(which makes passes at compilers) written
27615by James Hacker.
27616
27617@var{signature of Ty Coon}, 1 April 1989
27618Ty Coon, President of Vice
27619@end group
27620@end smallexample
27621
27622This General Public License does not permit incorporating your program into
27623proprietary programs.  If your program is a subroutine library, you may
27624consider it more useful to permit linking proprietary applications with the
27625library.  If this is what you want to do, use the GNU Lesser General
27626Public License instead of this License.
27627
27628@node GNU Free Documentation License
27629@unnumbered GNU Free Documentation License
27630
27631@cindex FDL (Free Documentation License)
27632@cindex Free Documentation License (FDL)
27633@cindex GNU Free Documentation License
27634@center Version 1.2, November 2002
27635
27636@display
27637Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2763859 Temple Place, Suite 330, Boston, MA  02111-1307, USA
27639
27640Everyone is permitted to copy and distribute verbatim copies
27641of this license document, but changing it is not allowed.
27642@end display
27643
27644@enumerate 0
27645@item
27646PREAMBLE
27647
27648The purpose of this License is to make a manual, textbook, or other
27649functional and useful document @dfn{free} in the sense of freedom: to
27650assure everyone the effective freedom to copy and redistribute it,
27651with or without modifying it, either commercially or noncommercially.
27652Secondarily, this License preserves for the author and publisher a way
27653to get credit for their work, while not being considered responsible
27654for modifications made by others.
27655
27656This License is a kind of ``copyleft'', which means that derivative
27657works of the document must themselves be free in the same sense.  It
27658complements the GNU General Public License, which is a copyleft
27659license designed for free software.
27660
27661We have designed this License in order to use it for manuals for free
27662software, because free software needs free documentation: a free
27663program should come with manuals providing the same freedoms that the
27664software does.  But this License is not limited to software manuals;
27665it can be used for any textual work, regardless of subject matter or
27666whether it is published as a printed book.  We recommend this License
27667principally for works whose purpose is instruction or reference.
27668
27669@item
27670APPLICABILITY AND DEFINITIONS
27671
27672This License applies to any manual or other work, in any medium, that
27673contains a notice placed by the copyright holder saying it can be
27674distributed under the terms of this License.  Such a notice grants a
27675world-wide, royalty-free license, unlimited in duration, to use that
27676work under the conditions stated herein.  The ``Document'', below,
27677refers to any such manual or work.  Any member of the public is a
27678licensee, and is addressed as ``you''.  You accept the license if you
27679copy, modify or distribute the work in a way requiring permission
27680under copyright law.
27681
27682A ``Modified Version'' of the Document means any work containing the
27683Document or a portion of it, either copied verbatim, or with
27684modifications and/or translated into another language.
27685
27686A ``Secondary Section'' is a named appendix or a front-matter section
27687of the Document that deals exclusively with the relationship of the
27688publishers or authors of the Document to the Document's overall
27689subject (or to related matters) and contains nothing that could fall
27690directly within that overall subject.  (Thus, if the Document is in
27691part a textbook of mathematics, a Secondary Section may not explain
27692any mathematics.)  The relationship could be a matter of historical
27693connection with the subject or with related matters, or of legal,
27694commercial, philosophical, ethical or political position regarding
27695them.
27696
27697The ``Invariant Sections'' are certain Secondary Sections whose titles
27698are designated, as being those of Invariant Sections, in the notice
27699that says that the Document is released under this License.  If a
27700section does not fit the above definition of Secondary then it is not
27701allowed to be designated as Invariant.  The Document may contain zero
27702Invariant Sections.  If the Document does not identify any Invariant
27703Sections then there are none.
27704
27705The ``Cover Texts'' are certain short passages of text that are listed,
27706as Front-Cover Texts or Back-Cover Texts, in the notice that says that
27707the Document is released under this License.  A Front-Cover Text may
27708be at most 5 words, and a Back-Cover Text may be at most 25 words.
27709
27710A ``Transparent'' copy of the Document means a machine-readable copy,
27711represented in a format whose specification is available to the
27712general public, that is suitable for revising the document
27713straightforwardly with generic text editors or (for images composed of
27714pixels) generic paint programs or (for drawings) some widely available
27715drawing editor, and that is suitable for input to text formatters or
27716for automatic translation to a variety of formats suitable for input
27717to text formatters.  A copy made in an otherwise Transparent file
27718format whose markup, or absence of markup, has been arranged to thwart
27719or discourage subsequent modification by readers is not Transparent.
27720An image format is not Transparent if used for any substantial amount
27721of text.  A copy that is not ``Transparent'' is called ``Opaque''.
27722
27723Examples of suitable formats for Transparent copies include plain
27724@sc{ascii} without markup, Texinfo input format, La@TeX{} input
27725format, @acronym{SGML} or @acronym{XML} using a publicly available
27726@acronym{DTD}, and standard-conforming simple @acronym{HTML},
27727PostScript or @acronym{PDF} designed for human modification.  Examples
27728of transparent image formats include @acronym{PNG}, @acronym{XCF} and
27729@acronym{JPG}.  Opaque formats include proprietary formats that can be
27730read and edited only by proprietary word processors, @acronym{SGML} or
27731@acronym{XML} for which the @acronym{DTD} and/or processing tools are
27732not generally available, and the machine-generated @acronym{HTML},
27733PostScript or @acronym{PDF} produced by some word processors for
27734output purposes only.
27735
27736The ``Title Page'' means, for a printed book, the title page itself,
27737plus such following pages as are needed to hold, legibly, the material
27738this License requires to appear in the title page.  For works in
27739formats which do not have any title page as such, ``Title Page'' means
27740the text near the most prominent appearance of the work's title,
27741preceding the beginning of the body of the text.
27742
27743A section ``Entitled XYZ'' means a named subunit of the Document whose
27744title either is precisely XYZ or contains XYZ in parentheses following
27745text that translates XYZ in another language.  (Here XYZ stands for a
27746specific section name mentioned below, such as ``Acknowledgements'',
27747``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
27748of such a section when you modify the Document means that it remains a
27749section ``Entitled XYZ'' according to this definition.
27750
27751The Document may include Warranty Disclaimers next to the notice which
27752states that this License applies to the Document.  These Warranty
27753Disclaimers are considered to be included by reference in this
27754License, but only as regards disclaiming warranties: any other
27755implication that these Warranty Disclaimers may have is void and has
27756no effect on the meaning of this License.
27757
27758@item
27759VERBATIM COPYING
27760
27761You may copy and distribute the Document in any medium, either
27762commercially or noncommercially, provided that this License, the
27763copyright notices, and the license notice saying this License applies
27764to the Document are reproduced in all copies, and that you add no other
27765conditions whatsoever to those of this License.  You may not use
27766technical measures to obstruct or control the reading or further
27767copying of the copies you make or distribute.  However, you may accept
27768compensation in exchange for copies.  If you distribute a large enough
27769number of copies you must also follow the conditions in section 3.
27770
27771You may also lend copies, under the same conditions stated above, and
27772you may publicly display copies.
27773
27774@item
27775COPYING IN QUANTITY
27776
27777If you publish printed copies (or copies in media that commonly have
27778printed covers) of the Document, numbering more than 100, and the
27779Document's license notice requires Cover Texts, you must enclose the
27780copies in covers that carry, clearly and legibly, all these Cover
27781Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
27782the back cover.  Both covers must also clearly and legibly identify
27783you as the publisher of these copies.  The front cover must present
27784the full title with all words of the title equally prominent and
27785visible.  You may add other material on the covers in addition.
27786Copying with changes limited to the covers, as long as they preserve
27787the title of the Document and satisfy these conditions, can be treated
27788as verbatim copying in other respects.
27789
27790If the required texts for either cover are too voluminous to fit
27791legibly, you should put the first ones listed (as many as fit
27792reasonably) on the actual cover, and continue the rest onto adjacent
27793pages.
27794
27795If you publish or distribute Opaque copies of the Document numbering
27796more than 100, you must either include a machine-readable Transparent
27797copy along with each Opaque copy, or state in or with each Opaque copy
27798a computer-network location from which the general network-using
27799public has access to download using public-standard network protocols
27800a complete Transparent copy of the Document, free of added material.
27801If you use the latter option, you must take reasonably prudent steps,
27802when you begin distribution of Opaque copies in quantity, to ensure
27803that this Transparent copy will remain thus accessible at the stated
27804location until at least one year after the last time you distribute an
27805Opaque copy (directly or through your agents or retailers) of that
27806edition to the public.
27807
27808It is requested, but not required, that you contact the authors of the
27809Document well before redistributing any large number of copies, to give
27810them a chance to provide you with an updated version of the Document.
27811
27812@item
27813MODIFICATIONS
27814
27815You may copy and distribute a Modified Version of the Document under
27816the conditions of sections 2 and 3 above, provided that you release
27817the Modified Version under precisely this License, with the Modified
27818Version filling the role of the Document, thus licensing distribution
27819and modification of the Modified Version to whoever possesses a copy
27820of it.  In addition, you must do these things in the Modified Version:
27821
27822@enumerate A
27823@item
27824Use in the Title Page (and on the covers, if any) a title distinct
27825from that of the Document, and from those of previous versions
27826(which should, if there were any, be listed in the History section
27827of the Document).  You may use the same title as a previous version
27828if the original publisher of that version gives permission.
27829
27830@item
27831List on the Title Page, as authors, one or more persons or entities
27832responsible for authorship of the modifications in the Modified
27833Version, together with at least five of the principal authors of the
27834Document (all of its principal authors, if it has fewer than five),
27835unless they release you from this requirement.
27836
27837@item
27838State on the Title page the name of the publisher of the
27839Modified Version, as the publisher.
27840
27841@item
27842Preserve all the copyright notices of the Document.
27843
27844@item
27845Add an appropriate copyright notice for your modifications
27846adjacent to the other copyright notices.
27847
27848@item
27849Include, immediately after the copyright notices, a license notice
27850giving the public permission to use the Modified Version under the
27851terms of this License, in the form shown in the Addendum below.
27852
27853@item
27854Preserve in that license notice the full lists of Invariant Sections
27855and required Cover Texts given in the Document's license notice.
27856
27857@item
27858Include an unaltered copy of this License.
27859
27860@item
27861Preserve the section Entitled ``History'', Preserve its Title, and add
27862to it an item stating at least the title, year, new authors, and
27863publisher of the Modified Version as given on the Title Page.  If
27864there is no section Entitled ``History'' in the Document, create one
27865stating the title, year, authors, and publisher of the Document as
27866given on its Title Page, then add an item describing the Modified
27867Version as stated in the previous sentence.
27868
27869@item
27870Preserve the network location, if any, given in the Document for
27871public access to a Transparent copy of the Document, and likewise
27872the network locations given in the Document for previous versions
27873it was based on.  These may be placed in the ``History'' section.
27874You may omit a network location for a work that was published at
27875least four years before the Document itself, or if the original
27876publisher of the version it refers to gives permission.
27877
27878@item
27879For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
27880the Title of the section, and preserve in the section all the
27881substance and tone of each of the contributor acknowledgements and/or
27882dedications given therein.
27883
27884@item
27885Preserve all the Invariant Sections of the Document,
27886unaltered in their text and in their titles.  Section numbers
27887or the equivalent are not considered part of the section titles.
27888
27889@item
27890Delete any section Entitled ``Endorsements''.  Such a section
27891may not be included in the Modified Version.
27892
27893@item
27894Do not retitle any existing section to be Entitled ``Endorsements'' or
27895to conflict in title with any Invariant Section.
27896
27897@item
27898Preserve any Warranty Disclaimers.
27899@end enumerate
27900
27901If the Modified Version includes new front-matter sections or
27902appendices that qualify as Secondary Sections and contain no material
27903copied from the Document, you may at your option designate some or all
27904of these sections as invariant.  To do this, add their titles to the
27905list of Invariant Sections in the Modified Version's license notice.
27906These titles must be distinct from any other section titles.
27907
27908You may add a section Entitled ``Endorsements'', provided it contains
27909nothing but endorsements of your Modified Version by various
27910parties---for example, statements of peer review or that the text has
27911been approved by an organization as the authoritative definition of a
27912standard.
27913
27914You may add a passage of up to five words as a Front-Cover Text, and a
27915passage of up to 25 words as a Back-Cover Text, to the end of the list
27916of Cover Texts in the Modified Version.  Only one passage of
27917Front-Cover Text and one of Back-Cover Text may be added by (or
27918through arrangements made by) any one entity.  If the Document already
27919includes a cover text for the same cover, previously added by you or
27920by arrangement made by the same entity you are acting on behalf of,
27921you may not add another; but you may replace the old one, on explicit
27922permission from the previous publisher that added the old one.
27923
27924The author(s) and publisher(s) of the Document do not by this License
27925give permission to use their names for publicity for or to assert or
27926imply endorsement of any Modified Version.
27927
27928@item
27929COMBINING DOCUMENTS
27930
27931You may combine the Document with other documents released under this
27932License, under the terms defined in section 4 above for modified
27933versions, provided that you include in the combination all of the
27934Invariant Sections of all of the original documents, unmodified, and
27935list them all as Invariant Sections of your combined work in its
27936license notice, and that you preserve all their Warranty Disclaimers.
27937
27938The combined work need only contain one copy of this License, and
27939multiple identical Invariant Sections may be replaced with a single
27940copy.  If there are multiple Invariant Sections with the same name but
27941different contents, make the title of each such section unique by
27942adding at the end of it, in parentheses, the name of the original
27943author or publisher of that section if known, or else a unique number.
27944Make the same adjustment to the section titles in the list of
27945Invariant Sections in the license notice of the combined work.
27946
27947In the combination, you must combine any sections Entitled ``History''
27948in the various original documents, forming one section Entitled
27949``History''; likewise combine any sections Entitled ``Acknowledgements'',
27950and any sections Entitled ``Dedications''.  You must delete all
27951sections Entitled ``Endorsements.''
27952
27953@item
27954COLLECTIONS OF DOCUMENTS
27955
27956You may make a collection consisting of the Document and other documents
27957released under this License, and replace the individual copies of this
27958License in the various documents with a single copy that is included in
27959the collection, provided that you follow the rules of this License for
27960verbatim copying of each of the documents in all other respects.
27961
27962You may extract a single document from such a collection, and distribute
27963it individually under this License, provided you insert a copy of this
27964License into the extracted document, and follow this License in all
27965other respects regarding verbatim copying of that document.
27966
27967@item
27968AGGREGATION WITH INDEPENDENT WORKS
27969
27970A compilation of the Document or its derivatives with other separate
27971and independent documents or works, in or on a volume of a storage or
27972distribution medium, is called an ``aggregate'' if the copyright
27973resulting from the compilation is not used to limit the legal rights
27974of the compilation's users beyond what the individual works permit.
27975When the Document is included an aggregate, this License does not
27976apply to the other works in the aggregate which are not themselves
27977derivative works of the Document.
27978
27979If the Cover Text requirement of section 3 is applicable to these
27980copies of the Document, then if the Document is less than one half of
27981the entire aggregate, the Document's Cover Texts may be placed on
27982covers that bracket the Document within the aggregate, or the
27983electronic equivalent of covers if the Document is in electronic form.
27984Otherwise they must appear on printed covers that bracket the whole
27985aggregate.
27986
27987@item
27988TRANSLATION
27989
27990Translation is considered a kind of modification, so you may
27991distribute translations of the Document under the terms of section 4.
27992Replacing Invariant Sections with translations requires special
27993permission from their copyright holders, but you may include
27994translations of some or all Invariant Sections in addition to the
27995original versions of these Invariant Sections.  You may include a
27996translation of this License, and all the license notices in the
27997Document, and any Warrany Disclaimers, provided that you also include
27998the original English version of this License and the original versions
27999of those notices and disclaimers.  In case of a disagreement between
28000the translation and the original version of this License or a notice
28001or disclaimer, the original version will prevail.
28002
28003If a section in the Document is Entitled ``Acknowledgements'',
28004``Dedications'', or ``History'', the requirement (section 4) to Preserve
28005its Title (section 1) will typically require changing the actual
28006title.
28007
28008@item
28009TERMINATION
28010
28011You may not copy, modify, sublicense, or distribute the Document except
28012as expressly provided for under this License.  Any other attempt to
28013copy, modify, sublicense or distribute the Document is void, and will
28014automatically terminate your rights under this License.  However,
28015parties who have received copies, or rights, from you under this
28016License will not have their licenses terminated so long as such
28017parties remain in full compliance.
28018
28019@item
28020FUTURE REVISIONS OF THIS LICENSE
28021
28022The Free Software Foundation may publish new, revised versions
28023of the GNU Free Documentation License from time to time.  Such new
28024versions will be similar in spirit to the present version, but may
28025differ in detail to address new problems or concerns.  See
28026@uref{http://www.gnu.org/copyleft/}.
28027
28028Each version of the License is given a distinguishing version number.
28029If the Document specifies that a particular numbered version of this
28030License ``or any later version'' applies to it, you have the option of
28031following the terms and conditions either of that specified version or
28032of any later version that has been published (not as a draft) by the
28033Free Software Foundation.  If the Document does not specify a version
28034number of this License, you may choose any version ever published (not
28035as a draft) by the Free Software Foundation.
28036@end enumerate
28037
28038@c fakenode --- for prepinfo
28039@unnumberedsec ADDENDUM: How to use this License for your documents
28040
28041To use this License in a document you have written, include a copy of
28042the License in the document and put the following copyright and
28043license notices just after the title page:
28044
28045@smallexample
28046@group
28047  Copyright (C)  @var{year}  @var{your name}.
28048  Permission is granted to copy, distribute and/or modify this document
28049  under the terms of the GNU Free Documentation License, Version 1.2
28050  or any later version published by the Free Software Foundation;
28051  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
28052  A copy of the license is included in the section entitled ``GNU
28053  Free Documentation License''.
28054@end group
28055@end smallexample
28056
28057If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
28058replace the ``with...Texts.'' line with this:
28059
28060@smallexample
28061@group
28062    with the Invariant Sections being @var{list their titles}, with
28063    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
28064    being @var{list}.
28065@end group
28066@end smallexample
28067
28068If you have Invariant Sections without Cover Texts, or some other
28069combination of the three, merge those two alternatives to suit the
28070situation.
28071
28072If your document contains nontrivial examples of program code, we
28073recommend releasing these examples in parallel under your choice of
28074free software license, such as the GNU General Public License,
28075to permit their use in free software.
28076
28077@c Local Variables:
28078@c ispell-local-pdict: "ispell-dict"
28079@c End:
28080
28081
28082@node Index
28083@unnumbered Index
28084@printindex cp
28085
28086@bye
28087
28088Unresolved Issues:
28089------------------
280901. From ADR.
28091
28092   Robert J. Chassell points out that awk programs should have some indication
28093   of how to use them.  It would be useful to perhaps have a "programming
28094   style" section of the manual that would include this and other tips.
28095
280962. The default AWKPATH search path should be configurable via `configure'
28097   The default and how this changes needs to be documented.
28098
28099Consistency issues:
28100	/.../ regexps are in @code, not @samp
28101	".." strings are in @code, not @samp
28102	no @print before @dots
28103	values of expressions in the text (@code{x} has the value 15),
28104		should be in roman, not @code
28105	Use   TAB   and not   tab
28106	Use   ESC   and not   ESCAPE
28107	Use   space and not   blank	to describe the space bar's character
28108	The term "blank" is thus basically reserved for "blank lines" etc.
28109	To make dark corners work, the @value{DARKCORNER} has to be outside
28110		closing `.' of a sentence and after (pxref{...}).  This is
28111		a change from earlier versions.
28112	" " should have an @w{} around it
28113	Use "non-" only with language names or acronyms, or the words bug and option
28114	Use @command{ftp} when talking about anonymous ftp
28115	Use uppercase and lowercase, not "upper-case" and "lower-case"
28116		or "upper case" and "lower case"
28117	Use "single precision" and "double precision", not "single-precision" or "double-precision"
28118	Use alphanumeric, not alpha-numeric
28119	Use POSIX-compliant, not POSIX compliant
28120	Use --foo, not -Wfoo when describing long options
28121	Use "Bell Laboratories", but not "Bell Labs".
28122	Use "behavior" instead of "behaviour".
28123	Use "zeros" instead of "zeroes".
28124	Use "nonzero" not "non-zero".
28125	Use "runtime" not "run time" or "run-time".
28126	Use "command-line" not "command line".
28127	Use "online" not "on-line".
28128	Use "whitespace" not "white space".
28129	Use "Input/Output", not "input/output". Also "I/O", not "i/o".
28130	Use "lefthand"/"righthand", not "left-hand"/"right-hand".
28131	Use "workaround", not "work-around".
28132	Use "startup"/"cleanup", not "start-up"/"clean-up"
28133	Use @code{do}, and not @code{do}-@code{while}, except where
28134		actually discussing the do-while.
28135	Use "versus" in text and "vs." in index entries
28136	The words "a", "and", "as", "between", "for", "from", "in", "of",
28137		"on", "that", "the", "to", "with", and "without",
28138		should not be capitalized in @chapter, @section etc.
28139		"Into" and "How" should.
28140	Search for @dfn; make sure important items are also indexed.
28141	"e.g." should always be followed by a comma.
28142	"i.e." should always be followed by a comma.
28143	The numbers zero through ten should be spelled out, except when
28144		talking about file descriptor numbers. > 10 and < 0, it's
28145		ok to use numbers.
28146	In tables, put command-line options in @code, while in the text,
28147		put them in @option.
28148	When using @strong, use "Note:" or "Caution:" with colons and
28149		not exclamation points.  Do not surround the paragraphs
28150		with @quotation ... @end quotation.
28151	For most cases, do NOT put a comma before "and", "or" or "but".
28152		But exercise taste with this rule.
28153	Don't show the awk command with a program in quotes when it's
28154		just the program.  I.e.
28155
28156			{
28157				....
28158			}
28159
28160		not
28161			awk '{
28162				...
28163			}'
28164
28165	Do show it when showing command-line arguments, data files, etc, even
28166		if there is no output shown.
28167
28168	Use numbered lists only to show a sequential series of steps.
28169
28170	Use @code{xxx} for the xxx operator in indexing statements, not @samp.
28171
28172Date: Wed, 13 Apr 94 15:20:52 -0400
28173From: rms@gnu.org (Richard Stallman)
28174To: gnu-prog@gnu.org
28175Subject: A reminder: no pathnames in GNU
28176
28177It's a GNU convention to use the term "file name" for the name of a
28178file, never "pathname".  We use the term "path" for search paths,
28179which are lists of file names.  Using it for a single file name as
28180well is potentially confusing to users.
28181
28182So please check any documentation you maintain, if you think you might
28183have used "pathname".
28184
28185Note that "file name" should be two words when it appears as ordinary
28186text.  It's ok as one word when it's a metasyntactic variable, though.
28187
28188------------------------
28189ORA uses filename, thus the macro.
28190
28191Suggestions:
28192------------
28193Enhance FIELDWIDTHS with some way to indicate "the rest of the record".
28194E.g., a length of 0 or -1 or something.  May be "n"?
28195
28196Make FIELDWIDTHS be an array?
28197
28198% Next edition:
28199%	1. Talk about common extensions, those in nawk, gawk, mawk
28200%	2. Use @code{foo} for variables and @code{foo()} for functions
28201%	3. Standardize the error messages from the functions and programs
28202%	   in Chapters 12 and 13.
28203%	4. Nuke the BBS stuff and use something that won't be obsolete
28204%	5. Reorg chapters 5 & 7 like so:
28205%Chapter 5:
28206% - Constants, Variables, and Conversions
28207%   + Constant Expressions
28208%   + Using Regular Expression Constants
28209%   + Variables
28210%   + Conversion of Strings and Numbers
28211% - Operators
28212%   + Arithmetic Operators
28213%   + String Concatenation
28214%   + Assignment Expressions
28215%   + Increment and Decrement Operators
28216% - Truth Values and Conditions
28217%   + True and False in Awk
28218%   + Boolean Expressions
28219%   + Conditional Expressions
28220% - Function Calls
28221% - Operator Precedence
28222%
28223%Chapter 7:
28224%  - Array Basics
28225%    + Introduction to Arrays
28226%    + Referring to an Array Element
28227%    + Assigning Array Elements
28228%    + Basic Array Example
28229%    + Scanning All Elements of an Array
28230%  - The delete Statement
28231%  - Using Numbers to Subscript Arrays
28232%  - Using Uninitialized Variables as Subscripts
28233%  - Multidimensional Arrays
28234%    + Scanning Multidimensional Arrays
28235%  - Sorting Array Values and Indices with gawk
28236