1=head1 NAME 2 3perlform - Perl formats 4 5=head1 DESCRIPTION 6 7Perl has a mechanism to help you generate simple reports and charts. To 8facilitate this, Perl helps you code up your output page close to how it 9will look when it's printed. It can keep track of things like how many 10lines are on a page, what page you're on, when to print page headers, 11etc. Keywords are borrowed from FORTRAN: format() to declare and write() 12to execute; see their entries in L<perlfunc>. Fortunately, the layout is 13much more legible, more like BASIC's PRINT USING statement. Think of it 14as a poor man's nroff(1). 15 16Formats, like packages and subroutines, are declared rather than 17executed, so they may occur at any point in your program. (Usually it's 18best to keep them all together though.) They have their own namespace 19apart from all the other "types" in Perl. This means that if you have a 20function named "Foo", it is not the same thing as having a format named 21"Foo". However, the default name for the format associated with a given 22filehandle is the same as the name of the filehandle. Thus, the default 23format for STDOUT is named "STDOUT", and the default format for filehandle 24TEMP is named "TEMP". They just look the same. They aren't. 25 26Output record formats are declared as follows: 27 28 format NAME = 29 FORMLIST 30 . 31 32If name is omitted, format "STDOUT" is defined. FORMLIST consists of 33a sequence of lines, each of which may be one of three types: 34 35=over 4 36 37=item 1. 38 39A comment, indicated by putting a '#' in the first column. 40 41=item 2. 42 43A "picture" line giving the format for one output line. 44 45=item 3. 46 47An argument line supplying values to plug into the previous picture line. 48 49=back 50 51Picture lines are printed exactly as they look, except for certain fields 52that substitute values into the line. Each field in a picture line starts 53with either "@" (at) or "^" (caret). These lines do not undergo any kind 54of variable interpolation. The at field (not to be confused with the array 55marker @) is the normal kind of field; the other kind, caret fields, are used 56to do rudimentary multi-line text block filling. The length of the field 57is supplied by padding out the field with multiple "E<lt>", "E<gt>", or "|" 58characters to specify, respectively, left justification, right 59justification, or centering. If the variable would exceed the width 60specified, it is truncated. 61 62As an alternate form of right justification, you may also use "#" 63characters (with an optional ".") to specify a numeric field. This way 64you can line up the decimal points. If any value supplied for these 65fields contains a newline, only the text up to the newline is printed. 66Finally, the special field "@*" can be used for printing multi-line, 67nontruncated values; it should appear by itself on a line. 68 69The values are specified on the following line in the same order as 70the picture fields. The expressions providing the values should be 71separated by commas. The expressions are all evaluated in a list context 72before the line is processed, so a single list expression could produce 73multiple list elements. The expressions may be spread out to more than 74one line if enclosed in braces. If so, the opening brace must be the first 75token on the first line. If an expression evaluates to a number with a 76decimal part, and if the corresponding picture specifies that the decimal 77part should appear in the output (that is, any picture except multiple "#" 78characters B<without> an embedded "."), the character used for the decimal 79point is B<always> determined by the current LC_NUMERIC locale. This 80means that, if, for example, the run-time environment happens to specify a 81German locale, "," will be used instead of the default ".". See 82L<perllocale> and L<"WARNINGS"> for more information. 83 84Picture fields that begin with ^ rather than @ are treated specially. 85With a # field, the field is blanked out if the value is undefined. For 86other field types, the caret enables a kind of fill mode. Instead of an 87arbitrary expression, the value supplied must be a scalar variable name 88that contains a text string. Perl puts as much text as it can into the 89field, and then chops off the front of the string so that the next time 90the variable is referenced, more of the text can be printed. (Yes, this 91means that the variable itself is altered during execution of the write() 92call, and is not returned.) Normally you would use a sequence of fields 93in a vertical stack to print out a block of text. You might wish to end 94the final field with the text "...", which will appear in the output if 95the text was too long to appear in its entirety. You can change which 96characters are legal to break on by changing the variable C<$:> (that's 97$FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a 98list of the desired characters. 99 100Using caret fields can produce variable length records. If the text 101to be formatted is short, you can suppress blank lines by putting a 102"~" (tilde) character anywhere in the line. The tilde will be translated 103to a space upon output. If you put a second tilde contiguous to the 104first, the line will be repeated until all the fields on the line are 105exhausted. (If you use a field of the at variety, the expression you 106supply had better not give the same value every time forever!) 107 108Top-of-form processing is by default handled by a format with the 109same name as the current filehandle with "_TOP" concatenated to it. 110It's triggered at the top of each page. See L<perlfunc/write>. 111 112Examples: 113 114 # a report on the /etc/passwd file 115 format STDOUT_TOP = 116 Passwd File 117 Name Login Office Uid Gid Home 118 ------------------------------------------------------------------ 119 . 120 format STDOUT = 121 @<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<< 122 $name, $login, $office,$uid,$gid, $home 123 . 124 125 126 # a report from a bug report form 127 format STDOUT_TOP = 128 Bug Reports 129 @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>> 130 $system, $%, $date 131 ------------------------------------------------------------------ 132 . 133 format STDOUT = 134 Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 135 $subject 136 Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 137 $index, $description 138 Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 139 $priority, $date, $description 140 From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 141 $from, $description 142 Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 143 $programmer, $description 144 ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 145 $description 146 ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 147 $description 148 ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 149 $description 150 ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< 151 $description 152 ~ ^<<<<<<<<<<<<<<<<<<<<<<<... 153 $description 154 . 155 156It is possible to intermix print()s with write()s on the same output 157channel, but you'll have to handle C<$-> (C<$FORMAT_LINES_LEFT>) 158yourself. 159 160=head2 Format Variables 161 162The current format name is stored in the variable C<$~> (C<$FORMAT_NAME>), 163and the current top of form format name is in C<$^> (C<$FORMAT_TOP_NAME>). 164The current output page number is stored in C<$%> (C<$FORMAT_PAGE_NUMBER>), 165and the number of lines on the page is in C<$=> (C<$FORMAT_LINES_PER_PAGE>). 166Whether to autoflush output on this handle is stored in C<$|> 167(C<$OUTPUT_AUTOFLUSH>). The string output before each top of page (except 168the first) is stored in C<$^L> (C<$FORMAT_FORMFEED>). These variables are 169set on a per-filehandle basis, so you'll need to select() into a different 170one to affect them: 171 172 select((select(OUTF), 173 $~ = "My_Other_Format", 174 $^ = "My_Top_Format" 175 )[0]); 176 177Pretty ugly, eh? It's a common idiom though, so don't be too surprised 178when you see it. You can at least use a temporary variable to hold 179the previous filehandle: (this is a much better approach in general, 180because not only does legibility improve, you now have intermediary 181stage in the expression to single-step the debugger through): 182 183 $ofh = select(OUTF); 184 $~ = "My_Other_Format"; 185 $^ = "My_Top_Format"; 186 select($ofh); 187 188If you use the English module, you can even read the variable names: 189 190 use English; 191 $ofh = select(OUTF); 192 $FORMAT_NAME = "My_Other_Format"; 193 $FORMAT_TOP_NAME = "My_Top_Format"; 194 select($ofh); 195 196But you still have those funny select()s. So just use the FileHandle 197module. Now, you can access these special variables using lowercase 198method names instead: 199 200 use FileHandle; 201 format_name OUTF "My_Other_Format"; 202 format_top_name OUTF "My_Top_Format"; 203 204Much better! 205 206=head1 NOTES 207 208Because the values line may contain arbitrary expressions (for at fields, 209not caret fields), you can farm out more sophisticated processing 210to other functions, like sprintf() or one of your own. For example: 211 212 format Ident = 213 @<<<<<<<<<<<<<<< 214 &commify($n) 215 . 216 217To get a real at or caret into the field, do this: 218 219 format Ident = 220 I have an @ here. 221 "@" 222 . 223 224To center a whole line of text, do something like this: 225 226 format Ident = 227 @||||||||||||||||||||||||||||||||||||||||||||||| 228 "Some text line" 229 . 230 231There is no builtin way to say "float this to the right hand side 232of the page, however wide it is." You have to specify where it goes. 233The truly desperate can generate their own format on the fly, based 234on the current number of columns, and then eval() it: 235 236 $format = "format STDOUT = \n" 237 . '^' . '<' x $cols . "\n" 238 . '$entry' . "\n" 239 . "\t^" . "<" x ($cols-8) . "~~\n" 240 . '$entry' . "\n" 241 . ".\n"; 242 print $format if $Debugging; 243 eval $format; 244 die $@ if $@; 245 246Which would generate a format looking something like this: 247 248 format STDOUT = 249 ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 250 $entry 251 ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~ 252 $entry 253 . 254 255Here's a little program that's somewhat like fmt(1): 256 257 format = 258 ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~ 259 $_ 260 261 . 262 263 $/ = ''; 264 while (<>) { 265 s/\s*\n\s*/ /g; 266 write; 267 } 268 269=head2 Footers 270 271While $FORMAT_TOP_NAME contains the name of the current header format, 272there is no corresponding mechanism to automatically do the same thing 273for a footer. Not knowing how big a format is going to be until you 274evaluate it is one of the major problems. It's on the TODO list. 275 276Here's one strategy: If you have a fixed-size footer, you can get footers 277by checking $FORMAT_LINES_LEFT before each write() and print the footer 278yourself if necessary. 279 280Here's another strategy: Open a pipe to yourself, using C<open(MYSELF, "|-")> 281(see L<perlfunc/open()>) and always write() to MYSELF instead of STDOUT. 282Have your child process massage its STDIN to rearrange headers and footers 283however you like. Not very convenient, but doable. 284 285=head2 Accessing Formatting Internals 286 287For low-level access to the formatting mechanism. you may use formline() 288and access C<$^A> (the $ACCUMULATOR variable) directly. 289 290For example: 291 292 $str = formline <<'END', 1,2,3; 293 @<<< @||| @>>> 294 END 295 296 print "Wow, I just stored `$^A' in the accumulator!\n"; 297 298Or to make an swrite() subroutine, which is to write() what sprintf() 299is to printf(), do this: 300 301 use Carp; 302 sub swrite { 303 croak "usage: swrite PICTURE ARGS" unless @_; 304 my $format = shift; 305 $^A = ""; 306 formline($format,@_); 307 return $^A; 308 } 309 310 $string = swrite(<<'END', 1, 2, 3); 311 Check me out 312 @<<< @||| @>>> 313 END 314 print $string; 315 316=head1 WARNINGS 317 318The lone dot that ends a format can also prematurely end a mail 319message passing through a misconfigured Internet mailer (and based on 320experience, such misconfiguration is the rule, not the exception). So 321when sending format code through mail, you should indent it so that 322the format-ending dot is not on the left margin; this will prevent 323SMTP cutoff. 324 325Lexical variables (declared with "my") are not visible within a 326format unless the format is declared within the scope of the lexical 327variable. (They weren't visible at all before version 5.001.) 328 329Formats are the only part of Perl that unconditionally use information 330from a program's locale; if a program's environment specifies an 331LC_NUMERIC locale, it is always used to specify the decimal point 332character in formatted output. Perl ignores all other aspects of locale 333handling unless the C<use locale> pragma is in effect. Formatted output 334cannot be controlled by C<use locale> because the pragma is tied to the 335block structure of the program, and, for historical reasons, formats 336exist outside that block structure. See L<perllocale> for further 337discussion of locale handling. 338 339Inside of an expression, the whitespace characters \n, \t and \f are 340considered to be equivalent to a single space. Thus, you could think 341of this filter being applied to each value in the format: 342 343 $value =~ tr/\n\t\f/ /; 344 345The remaining whitespace character, \r, forces the printing of a new 346line if allowed by the picture line. 347