1=encoding utf8 2 3=head1 NAME 4 5perlpodspec - Plain Old Documentation: format specification and notes 6 7=head1 DESCRIPTION 8 9This document is detailed notes on the Pod markup language. Most 10people will only have to read L<perlpod|perlpod> to know how to write 11in Pod, but this document may answer some incidental questions to do 12with parsing and rendering Pod. 13 14In this document, "must" / "must not", "should" / 15"should not", and "may" have their conventional (cf. RFC 2119) 16meanings: "X must do Y" means that if X doesn't do Y, it's against 17this specification, and should really be fixed. "X should do Y" 18means that it's recommended, but X may fail to do Y, if there's a 19good reason. "X may do Y" is merely a note that X can do Y at 20will (although it is up to the reader to detect any connotation of 21"and I think it would be I<nice> if X did Y" versus "it wouldn't 22really I<bother> me if X did Y"). 23 24Notably, when I say "the parser should do Y", the 25parser may fail to do Y, if the calling application explicitly 26requests that the parser I<not> do Y. I often phrase this as 27"the parser should, by default, do Y." This doesn't I<require> 28the parser to provide an option for turning off whatever 29feature Y is (like expanding tabs in verbatim paragraphs), although 30it implicates that such an option I<may> be provided. 31 32=head1 Pod Definitions 33 34Pod is embedded in files, typically Perl source files, although you 35can write a file that's nothing but Pod. 36 37A B<line> in a file consists of zero or more non-newline characters, 38terminated by either a newline or the end of the file. 39 40A B<newline sequence> is usually a platform-dependent concept, but 41Pod parsers should understand it to mean any of CR (ASCII 13), LF 42(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in 43addition to any other system-specific meaning. The first CR/CRLF/LF 44sequence in the file may be used as the basis for identifying the 45newline sequence for parsing the rest of the file. 46 47A B<blank line> is a line consisting entirely of zero or more spaces 48(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. 49A B<non-blank line> is a line containing one or more characters other 50than space or tab (and terminated by a newline or end-of-file). 51 52(I<Note:> Many older Pod parsers did not accept a line consisting of 53spaces/tabs and then a newline as a blank line. The only lines they 54considered blank were lines consisting of I<no characters at all>, 55terminated by a newline.) 56 57B<Whitespace> is used in this document as a blanket term for spaces, 58tabs, and newline sequences. (By itself, this term usually refers 59to literal whitespace. That is, sequences of whitespace characters 60in Pod source, as opposed to "EE<lt>32>", which is a formatting 61code that I<denotes> a whitespace character.) 62 63A B<Pod parser> is a module meant for parsing Pod (regardless of 64whether this involves calling callbacks or building a parse tree or 65directly formatting it). A B<Pod formatter> (or B<Pod translator>) 66is a module or program that converts Pod to some other format (HTML, 67plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a 68formatter or translator, or might be a program that does something 69else with the Pod (like counting words, scanning for index points, 70etc.). 71 72Pod content is contained in B<Pod blocks>. A Pod block starts with a 73line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line 74that matches C<m/\A=cut/> or up to the end of the file if there is 75no C<m/\A=cut/> line. 76 77=for comment 78 The current perlsyn says: 79 [beginquote] 80 Note that pod translators should look at only paragraphs beginning 81 with a pod directive (it makes parsing easier), whereas the compiler 82 actually knows to look for pod escapes even in the middle of a 83 paragraph. This means that the following secret stuff will be ignored 84 by both the compiler and the translators. 85 $a=3; 86 =secret stuff 87 warn "Neither POD nor CODE!?" 88 =cut back 89 print "got $a\n"; 90 You probably shouldn't rely upon the warn() being podded out forever. 91 Not all pod translators are well-behaved in this regard, and perhaps 92 the compiler will become pickier. 93 [endquote] 94 I think that those paragraphs should just be removed; paragraph-based 95 parsing seems to have been largely abandoned, because of the hassle 96 with non-empty blank lines messing up what people meant by "paragraph". 97 Even if the "it makes parsing easier" bit were especially true, 98 it wouldn't be worth the confusion of having perl and pod2whatever 99 actually disagree on what can constitute a Pod block. 100 101Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph 102consists of non-blank lines of text, separated by one or more blank 103lines. 104 105For purposes of Pod processing, there are four types of paragraphs in 106a Pod block: 107 108=over 109 110=item * 111 112A command paragraph (also called a "directive"). The first line of 113this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are 114typically one line, as in: 115 116 =head1 NOTES 117 118 =item * 119 120But they may span several (non-blank) lines: 121 122 =for comment 123 Hm, I wonder what it would look like if 124 you tried to write a BNF for Pod from this. 125 126 =head3 Dr. Strangelove, or: How I Learned to 127 Stop Worrying and Love the Bomb 128 129I<Some> command paragraphs allow formatting codes in their content 130(i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: 131 132 =head1 Did You Remember to C<use strict;>? 133 134In other words, the Pod processing handler for "head1" will apply the 135same processing to "Did You Remember to CE<lt>use strict;>?" that it 136would to an ordinary paragraph (i.e., formatting codes like 137"CE<lt>...>") are parsed and presumably formatted appropriately, and 138whitespace in the form of literal spaces and/or tabs is not 139significant. 140 141=item * 142 143A B<verbatim paragraph>. The first line of this paragraph must be a 144literal space or tab, and this paragraph must not be inside a "=begin 145I<identifier>", ... "=end I<identifier>" sequence unless 146"I<identifier>" begins with a colon (":"). That is, if a paragraph 147starts with a literal space or tab, but I<is> inside a 148"=begin I<identifier>", ... "=end I<identifier>" region, then it's 149a data paragraph, unless "I<identifier>" begins with a colon. 150 151Whitespace I<is> significant in verbatim paragraphs (although, in 152processing, tabs are probably expanded). 153 154=item * 155 156An B<ordinary paragraph>. A paragraph is an ordinary paragraph 157if its first line matches neither C<m/\A=[a-zA-Z]/> nor 158C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", 159... "=end I<identifier>" sequence unless "I<identifier>" begins with 160a colon (":"). 161 162=item * 163 164A B<data paragraph>. This is a paragraph that I<is> inside a "=begin 165I<identifier>" ... "=end I<identifier>" sequence where 166"I<identifier>" does I<not> begin with a literal colon (":"). In 167some sense, a data paragraph is not part of Pod at all (i.e., 168effectively it's "out-of-band"), since it's not subject to most kinds 169of Pod parsing; but it is specified here, since Pod 170parsers need to be able to call an event for it, or store it in some 171form in a parse tree, or at least just parse I<around> it. 172 173=back 174 175For example: consider the following paragraphs: 176 177 # <- that's the 0th column 178 179 =head1 Foo 180 181 Stuff 182 183 $foo->bar 184 185 =cut 186 187Here, "=head1 Foo" and "=cut" are command paragraphs because the first 188line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" 189is a verbatim paragraph, because its first line starts with a literal 190whitespace character (and there's no "=begin"..."=end" region around). 191 192The "=begin I<identifier>" ... "=end I<identifier>" commands stop 193paragraphs that they surround from being parsed as ordinary or verbatim 194paragraphs, if I<identifier> doesn't begin with a colon. This 195is discussed in detail in the section 196L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 197 198=head1 Pod Commands 199 200This section is intended to supplement and clarify the discussion in 201L<perlpod/"Command Paragraph">. These are the currently recognized 202Pod commands: 203 204=over 205 206=item "=head1", "=head2", "=head3", "=head4" 207 208This command indicates that the text in the remainder of the paragraph 209is a heading. That text may contain formatting codes. Examples: 210 211 =head1 Object Attributes 212 213 =head3 What B<Not> to Do! 214 215=item "=pod" 216 217This command indicates that this paragraph begins a Pod block. (If we 218are already in the middle of a Pod block, this command has no effect at 219all.) If there is any text in this command paragraph after "=pod", 220it must be ignored. Examples: 221 222 =pod 223 224 This is a plain Pod paragraph. 225 226 =pod This text is ignored. 227 228=item "=cut" 229 230This command indicates that this line is the end of this previously 231started Pod block. If there is any text after "=cut" on the line, it must be 232ignored. Examples: 233 234 =cut 235 236 =cut The documentation ends here. 237 238 =cut 239 # This is the first line of program text. 240 sub foo { # This is the second. 241 242It is an error to try to I<start> a Pod block with a "=cut" command. In 243that case, the Pod processor must halt parsing of the input file, and 244must by default emit a warning. 245 246=item "=over" 247 248This command indicates that this is the start of a list/indent 249region. If there is any text following the "=over", it must consist 250of only a nonzero positive numeral. The semantics of this numeral is 251explained in the L</"About =over...=back Regions"> section, further 252below. Formatting codes are not expanded. Examples: 253 254 =over 3 255 256 =over 3.5 257 258 =over 259 260=item "=item" 261 262This command indicates that an item in a list begins here. Formatting 263codes are processed. The semantics of the (optional) text in the 264remainder of this paragraph are 265explained in the L</"About =over...=back Regions"> section, further 266below. Examples: 267 268 =item 269 270 =item * 271 272 =item * 273 274 =item 14 275 276 =item 3. 277 278 =item C<< $thing->stuff(I<dodad>) >> 279 280 =item For transporting us beyond seas to be tried for pretended 281 offenses 282 283 =item He is at this time transporting large armies of foreign 284 mercenaries to complete the works of death, desolation and 285 tyranny, already begun with circumstances of cruelty and perfidy 286 scarcely paralleled in the most barbarous ages, and totally 287 unworthy the head of a civilized nation. 288 289=item "=back" 290 291This command indicates that this is the end of the region begun 292by the most recent "=over" command. It permits no text after the 293"=back" command. 294 295=item "=begin formatname" 296 297=item "=begin formatname parameter" 298 299This marks the following paragraphs (until the matching "=end 300formatname") as being for some special kind of processing. Unless 301"formatname" begins with a colon, the contained non-command 302paragraphs are data paragraphs. But if "formatname" I<does> begin 303with a colon, then non-command paragraphs are ordinary paragraphs 304or data paragraphs. This is discussed in detail in the section 305L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 306 307It is advised that formatnames match the regexp 308C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the 309formatname is a parameter that may be used by the formatter when dealing 310with this region. This parameter must not be repeated in the "=end" 311paragraph. Implementors should anticipate future expansion in the 312semantics and syntax of the first parameter to "=begin"/"=end"/"=for". 313 314=item "=end formatname" 315 316This marks the end of the region opened by the matching 317"=begin formatname" region. If "formatname" is not the formatname 318of the most recent open "=begin formatname" region, then this 319is an error, and must generate an error message. This 320is discussed in detail in the section 321L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 322 323=item "=for formatname text..." 324 325This is synonymous with: 326 327 =begin formatname 328 329 text... 330 331 =end formatname 332 333That is, it creates a region consisting of a single paragraph; that 334paragraph is to be treated as a normal paragraph if "formatname" 335begins with a ":"; if "formatname" I<doesn't> begin with a colon, 336then "text..." will constitute a data paragraph. There is no way 337to use "=for formatname text..." to express "text..." as a verbatim 338paragraph. 339 340=item "=encoding encodingname" 341 342This command, which should occur early in the document (at least 343before any non-US-ASCII data!), declares that this document is 344encoded in the encoding I<encodingname>, which must be 345an encoding name that L<Encode> recognizes. (Encode's list 346of supported encodings, in L<Encode::Supported>, is useful here.) 347If the Pod parser cannot decode the declared encoding, it 348should emit a warning and may abort parsing the document 349altogether. 350 351A document having more than one "=encoding" line should be 352considered an error. Pod processors may silently tolerate this if 353the not-first "=encoding" lines are just duplicates of the 354first one (e.g., if there's a "=encoding utf8" line, and later on 355another "=encoding utf8" line). But Pod processors should complain if 356there are contradictory "=encoding" lines in the same document 357(e.g., if there is a "=encoding utf8" early in the document and 358"=encoding big5" later). Pod processors that recognize BOMs 359may also complain if they see an "=encoding" line 360that contradicts the BOM (e.g., if a document with a UTF-16LE 361BOM has an "=encoding shiftjis" line). 362 363=back 364 365If a Pod processor sees any command other than the ones listed 366above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", 367or "=w123"), that processor must by default treat this as an 368error. It must not process the paragraph beginning with that 369command, must by default warn of this as an error, and may 370abort the parse. A Pod parser may allow a way for particular 371applications to add to the above list of known commands, and to 372stipulate, for each additional command, whether formatting 373codes should be processed. 374 375Future versions of this specification may add additional 376commands. 377 378 379 380=head1 Pod Formatting Codes 381 382(Note that in previous drafts of this document and of perlpod, 383formatting codes were referred to as "interior sequences", and 384this term may still be found in the documentation for Pod parsers, 385and in error messages from Pod processors.) 386 387There are two syntaxes for formatting codes: 388 389=over 390 391=item * 392 393A formatting code starts with a capital letter (just US-ASCII [A-Z]) 394followed by a "<", any number of characters, and ending with the first 395matching ">". Examples: 396 397 That's what I<you> think! 398 399 What's C<dump()> for? 400 401 X<C<chmod> and C<unlink()> Under Different Operating Systems> 402 403=item * 404 405A formatting code starts with a capital letter (just US-ASCII [A-Z]) 406followed by two or more "<"'s, one or more whitespace characters, 407any number of characters, one or more whitespace characters, 408and ending with the first matching sequence of two or more ">"'s, where 409the number of ">"'s equals the number of "<"'s in the opening of this 410formatting code. Examples: 411 412 That's what I<< you >> think! 413 414 C<<< open(X, ">>thing.dat") || die $! >>> 415 416 B<< $foo->bar(); >> 417 418With this syntax, the whitespace character(s) after the "CE<lt><<" 419and before the ">>" (or whatever letter) are I<not> renderable. They 420do not signify whitespace, are merely part of the formatting codes 421themselves. That is, these are all synonymous: 422 423 C<thing> 424 C<< thing >> 425 C<< thing >> 426 C<<< thing >>> 427 C<<<< 428 thing 429 >>>> 430 431and so on. 432 433Finally, the multiple-angle-bracket form does I<not> alter the interpretation 434of nested formatting codes, meaning that the following four example lines are 435identical in meaning: 436 437 B<example: C<$a E<lt>=E<gt> $b>> 438 439 B<example: C<< $a <=> $b >>> 440 441 B<example: C<< $a E<lt>=E<gt> $b >>> 442 443 B<<< example: C<< $a E<lt>=E<gt> $b >> >>> 444 445=back 446 447In parsing Pod, a notably tricky part is the correct parsing of 448(potentially nested!) formatting codes. Implementors should 449consult the code in the C<parse_text> routine in Pod::Parser as an 450example of a correct implementation. 451 452=over 453 454=item C<IE<lt>textE<gt>> -- italic text 455 456See the brief discussion in L<perlpod/"Formatting Codes">. 457 458=item C<BE<lt>textE<gt>> -- bold text 459 460See the brief discussion in L<perlpod/"Formatting Codes">. 461 462=item C<CE<lt>codeE<gt>> -- code text 463 464See the brief discussion in L<perlpod/"Formatting Codes">. 465 466=item C<FE<lt>filenameE<gt>> -- style for filenames 467 468See the brief discussion in L<perlpod/"Formatting Codes">. 469 470=item C<XE<lt>topic nameE<gt>> -- an index entry 471 472See the brief discussion in L<perlpod/"Formatting Codes">. 473 474This code is unusual in that most formatters completely discard 475this code and its content. Other formatters will render it with 476invisible codes that can be used in building an index of 477the current document. 478 479=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code 480 481Discussed briefly in L<perlpod/"Formatting Codes">. 482 483This code is unusual is that it should have no content. That is, 484a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether 485or not it complains, the I<potatoes> text should ignored. 486 487=item C<LE<lt>nameE<gt>> -- a hyperlink 488 489The complicated syntaxes of this code are discussed at length in 490L<perlpod/"Formatting Codes">, and implementation details are 491discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the 492contents of LE<lt>content> is tricky. Notably, the content has to be 493checked for whether it looks like a URL, or whether it has to be split 494on literal "|" and/or "/" (in the right order!), and so on, 495I<before> EE<lt>...> codes are resolved. 496 497=item C<EE<lt>escapeE<gt>> -- a character escape 498 499See L<perlpod/"Formatting Codes">, and several points in 500L</Notes on Implementing Pod Processors>. 501 502=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces 503 504This formatting code is syntactically simple, but semantically 505complex. What it means is that each space in the printable 506content of this code signifies a non-breaking space. 507 508Consider: 509 510 C<$x ? $y : $z> 511 512 S<C<$x ? $y : $z>> 513 514Both signify the monospace (c[ode] style) text consisting of 515"$x", one space, "?", one space, ":", one space, "$z". The 516difference is that in the latter, with the S code, those spaces 517are not "normal" spaces, but instead are non-breaking spaces. 518 519=back 520 521 522If a Pod processor sees any formatting code other than the ones 523listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that 524processor must by default treat this as an error. 525A Pod parser may allow a way for particular 526applications to add to the above list of known formatting codes; 527a Pod parser might even allow a way to stipulate, for each additional 528command, whether it requires some form of special processing, as 529LE<lt>...> does. 530 531Future versions of this specification may add additional 532formatting codes. 533 534Historical note: A few older Pod processors would not see a ">" as 535closing a "CE<lt>" code, if the ">" was immediately preceded by 536a "-". This was so that this: 537 538 C<$foo->bar> 539 540would parse as equivalent to this: 541 542 C<$foo-E<gt>bar> 543 544instead of as equivalent to a "C" formatting code containing 545only "$foo-", and then a "bar>" outside the "C" formatting code. This 546problem has since been solved by the addition of syntaxes like this: 547 548 C<< $foo->bar >> 549 550Compliant parsers must not treat "->" as special. 551 552Formatting codes absolutely cannot span paragraphs. If a code is 553opened in one paragraph, and no closing code is found by the end of 554that paragraph, the Pod parser must close that formatting code, 555and should complain (as in "Unterminated I code in the paragraph 556starting at line 123: 'Time objects are not...'"). So these 557two paragraphs: 558 559 I<I told you not to do this! 560 561 Don't make me say it again!> 562 563...must I<not> be parsed as two paragraphs in italics (with the I 564code starting in one paragraph and starting in another.) Instead, 565the first paragraph should generate a warning, but that aside, the 566above code must parse as if it were: 567 568 I<I told you not to do this!> 569 570 Don't make me say it again!E<gt> 571 572(In SGMLish jargon, all Pod commands are like block-level 573elements, whereas all Pod formatting codes are like inline-level 574elements.) 575 576 577 578=head1 Notes on Implementing Pod Processors 579 580The following is a long section of miscellaneous requirements 581and suggestions to do with Pod processing. 582 583=over 584 585=item * 586 587Pod formatters should tolerate lines in verbatim blocks that are of 588any length, even if that means having to break them (possibly several 589times, for very long lines) to avoid text running off the side of the 590page. Pod formatters may warn of such line-breaking. Such warnings 591are particularly appropriate for lines are over 100 characters long, which 592are usually not intentional. 593 594=item * 595 596Pod parsers must recognize I<all> of the three well-known newline 597formats: CR, LF, and CRLF. See L<perlport|perlport>. 598 599=item * 600 601Pod parsers should accept input lines that are of any length. 602 603=item * 604 605Since Perl recognizes a Unicode Byte Order Mark at the start of files 606as signaling that the file is Unicode encoded as in UTF-16 (whether 607big-endian or little-endian) or UTF-8, Pod parsers should do the 608same. Otherwise, the character encoding should be understood as 609being UTF-8 if the first highbit byte sequence in the file seems 610valid as a UTF-8 sequence, or otherwise as Latin-1. 611 612Future versions of this specification may specify 613how Pod can accept other encodings. Presumably treatment of other 614encodings in Pod parsing would be as in XML parsing: whatever the 615encoding declared by a particular Pod file, content is to be 616stored in memory as Unicode characters. 617 618=item * 619 620The well known Unicode Byte Order Marks are as follows: if the 621file begins with the two literal byte values 0xFE 0xFF, this is 622the BOM for big-endian UTF-16. If the file begins with the two 623literal byte value 0xFF 0xFE, this is the BOM for little-endian 624UTF-16. If the file begins with the three literal byte values 6250xEF 0xBB 0xBF, this is the BOM for UTF-8. 626 627=for comment 628 use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 629 0xEF 0xBB 0xBF 630 631=for comment 632 If toke.c is modified to support UTF-32, add mention of those here. 633 634=item * 635 636A naive but sufficient heuristic for testing the first highbit 637byte-sequence in a BOM-less file (whether in code or in Pod!), to see 638whether that sequence is valid as UTF-8 (RFC 2279) is to check whether 639that the first byte in the sequence is in the range 0xC0 - 0xFD 640I<and> whether the next byte is in the range 6410x80 - 0xBF. If so, the parser may conclude that this file is in 642UTF-8, and all highbit sequences in the file should be assumed to 643be UTF-8. Otherwise the parser should treat the file as being 644in Latin-1. In the unlikely circumstance that the first highbit 645sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one 646can cater to our heuristic (as well as any more intelligent heuristic) 647by prefacing that line with a comment line containing a highbit 648sequence that is clearly I<not> valid as UTF-8. A line consisting 649of simply "#", an e-acute, and any non-highbit byte, 650is sufficient to establish this file's encoding. 651 652=for comment 653 If/WHEN some brave soul makes these heuristics into a generic 654 text-file class (or PerlIO layer?), we can presumably delete 655 mention of these icky details from this file, and can instead 656 tell people to just use appropriate class/layer. 657 Auto-recognition of newline sequences would be another desirable 658 feature of such a class/layer. 659 HINT HINT HINT. 660 661=for comment 662 "The probability that a string of characters 663 in any other encoding appears as valid UTF-8 is low" - RFC2279 664 665=item * 666 667This document's requirements and suggestions about encodings 668do not apply to Pod processors running on non-ASCII platforms, 669notably EBCDIC platforms. 670 671=item * 672 673Pod processors must treat a "=for [label] [content...]" paragraph as 674meaning the same thing as a "=begin [label]" paragraph, content, and 675an "=end [label]" paragraph. (The parser may conflate these two 676constructs, or may leave them distinct, in the expectation that the 677formatter will nevertheless treat them the same.) 678 679=item * 680 681When rendering Pod to a format that allows comments (i.e., to nearly 682any format other than plaintext), a Pod formatter must insert comment 683text identifying its name and version number, and the name and 684version numbers of any modules it might be using to process the Pod. 685Minimal examples: 686 687 %% POD::Pod2PS v3.14159, using POD::Parser v1.92 688 689 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> 690 691 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} 692 693 .\" Pod::Man version 3.14159, using POD::Parser version 1.92 694 695Formatters may also insert additional comments, including: the 696release date of the Pod formatter program, the contact address for 697the author(s) of the formatter, the current time, the name of input 698file, the formatting options in effect, version of Perl used, etc. 699 700Formatters may also choose to note errors/warnings as comments, 701besides or instead of emitting them otherwise (as in messages to 702STDERR, or C<die>ing). 703 704=item * 705 706Pod parsers I<may> emit warnings or error messages ("Unknown E code 707EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or 708C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow 709suppressing all such STDERR output, and instead allow an option for 710reporting errors/warnings 711in some other way, whether by triggering a callback, or noting errors 712in some attribute of the document object, or some similarly unobtrusive 713mechanism -- or even by appending a "Pod Errors" section to the end of 714the parsed form of the document. 715 716=item * 717 718In cases of exceptionally aberrant documents, Pod parsers may abort the 719parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where 720possible, the parser library may simply close the input file 721and add text like "*** Formatting Aborted ***" to the end of the 722(partial) in-memory document. 723 724=item * 725 726In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) 727are understood (i.e., I<not> verbatim paragraphs, but I<including> 728ordinary paragraphs, and command paragraphs that produce renderable 729text, like "=head1"), literal whitespace should generally be considered 730"insignificant", in that one literal space has the same meaning as any 731(nonzero) number of literal spaces, literal newlines, and literal tabs 732(as long as this produces no blank lines, since those would terminate 733the paragraph). Pod parsers should compact literal whitespace in each 734processed paragraph, but may provide an option for overriding this 735(since some processing tasks do not require it), or may follow 736additional special rules (for example, specially treating 737period-space-space or period-newline sequences). 738 739=item * 740 741Pod parsers should not, by default, try to coerce apostrophe (') and 742quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to 743turn backtick (`) into anything else but a single backtick character 744(distinct from an open quote character!), nor "--" into anything but 745two minus signs. They I<must never> do any of those things to text 746in CE<lt>...> formatting codes, and never I<ever> to text in verbatim 747paragraphs. 748 749=item * 750 751When rendering Pod to a format that has two kinds of hyphens (-), one 752that's a non-breaking hyphen, and another that's a breakable hyphen 753(as in "object-oriented", which can be split across lines as 754"object-", newline, "oriented"), formatters are encouraged to 755generally translate "-" to non-breaking hyphen, but may apply 756heuristics to convert some of these to breaking hyphens. 757 758=item * 759 760Pod formatters should make reasonable efforts to keep words of Perl 761code from being broken across lines. For example, "Foo::Bar" in some 762formatting systems is seen as eligible for being broken across lines 763as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should 764be avoided where possible, either by disabling all line-breaking in 765mid-word, or by wrapping particular words with internal punctuation 766in "don't break this across lines" codes (which in some formats may 767not be a single code, but might be a matter of inserting non-breaking 768zero-width spaces between every pair of characters in a word.) 769 770=item * 771 772Pod parsers should, by default, expand tabs in verbatim paragraphs as 773they are processed, before passing them to the formatter or other 774processor. Parsers may also allow an option for overriding this. 775 776=item * 777 778Pod parsers should, by default, remove newlines from the end of 779ordinary and verbatim paragraphs before passing them to the 780formatter. For example, while the paragraph you're reading now 781could be considered, in Pod source, to end with (and contain) 782the newline(s) that end it, it should be processed as ending with 783(and containing) the period character that ends this sentence. 784 785=item * 786 787Pod parsers, when reporting errors, should make some effort to report 788an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near 789line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph 790number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where 791this is problematic, the paragraph number should at least be 792accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in 793Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for 794the CE<lt>interest rate> attribute...'"). 795 796=item * 797 798Pod parsers, when processing a series of verbatim paragraphs one 799after another, should consider them to be one large verbatim 800paragraph that happens to contain blank lines. I.e., these two 801lines, which have a blank line between them: 802 803 use Foo; 804 805 print Foo->VERSION 806 807should be unified into one paragraph ("\tuse Foo;\n\n\tprint 808Foo->VERSION") before being passed to the formatter or other 809processor. Parsers may also allow an option for overriding this. 810 811While this might be too cumbersome to implement in event-based Pod 812parsers, it is straightforward for parsers that return parse trees. 813 814=item * 815 816Pod formatters, where feasible, are advised to avoid splitting short 817verbatim paragraphs (under twelve lines, say) across pages. 818 819=item * 820 821Pod parsers must treat a line with only spaces and/or tabs on it as a 822"blank line" such as separates paragraphs. (Some older parsers 823recognized only two adjacent newlines as a "blank line" but would not 824recognize a newline, a space, and a newline, as a blank line. This 825is noncompliant behavior.) 826 827=item * 828 829Authors of Pod formatters/processors should make every effort to 830avoid writing their own Pod parser. There are already several in 831CPAN, with a wide range of interface styles -- and one of them, 832Pod::Parser, comes with modern versions of Perl. 833 834=item * 835 836Characters in Pod documents may be conveyed either as literals, or by 837number in EE<lt>n> codes, or by an equivalent mnemonic, as in 838EE<lt>eacute> which is exactly equivalent to EE<lt>233>. 839 840Characters in the range 32-126 refer to those well known US-ASCII 841characters (also defined there by Unicode, with the same meaning), 842which all Pod formatters must render faithfully. Characters 843in the ranges 0-31 and 127-159 should not be used (neither as 844literals, nor as EE<lt>number> codes), except for the 845literal byte-sequences for newline (13, 13 10, or 10), and tab (9). 846 847Characters in the range 160-255 refer to Latin-1 characters (also 848defined there by Unicode, with the same meaning). Characters above 849255 should be understood to refer to Unicode characters. 850 851=item * 852 853Be warned 854that some formatters cannot reliably render characters outside 32-126; 855and many are able to handle 32-126 and 160-255, but nothing above 856255. 857 858=item * 859 860Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for 861less-than and greater-than, Pod parsers must understand "EE<lt>sol>" 862for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, 863pipe). Pod parsers should also understand "EE<lt>lchevron>" and 864"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., 865"left-pointing double angle quotation mark" = "left pointing 866guillemet" and "right-pointing double angle quotation mark" = "right 867pointing guillemet". (These look like little "<<" and ">>", and they 868are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" 869and "EE<lt>raquo>".) 870 871=item * 872 873Pod parsers should understand all "EE<lt>html>" codes as defined 874in the entity declarations in the most recent XHTML specification at 875C<www.W3.org>. Pod parsers must understand at least the entities 876that define characters in the range 160-255 (Latin-1). Pod parsers, 877when faced with some unknown "EE<lt>I<identifier>>" code, 878shouldn't simply replace it with nullstring (by default, at least), 879but may pass it through as a string consisting of the literal characters 880E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the 881alternative option of processing such unknown 882"EE<lt>I<identifier>>" codes by firing an event especially 883for such codes, or by adding a special node-type to the in-memory 884document tree. Such "EE<lt>I<identifier>>" may have special meaning 885to some processors, or some processors may choose to add them to 886a special error report. 887 888=item * 889 890Pod parsers must also support the XHTML codes "EE<lt>quot>" for 891character 34 (doublequote, "), "EE<lt>amp>" for character 38 892(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). 893 894=item * 895 896Note that in all cases of "EE<lt>whatever>", I<whatever> (whether 897an htmlname, or a number in any base) must consist only of 898alphanumeric characters -- that is, I<whatever> must watch 899C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because 900it contains spaces, which aren't alphanumeric characters. This 901presumably does not I<need> special treatment by a Pod processor; 902" 0 1 2 3 " doesn't look like a number in any base, so it would 903presumably be looked up in the table of HTML-like names. Since 904there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ", 905this will be treated as an error. However, Pod processors may 906treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically> 907invalid, potentially earning a different error message than the 908error message (or warning, or event) generated by a merely unknown 909(but theoretically valid) htmlname, as in "EE<lt>qacute>" 910[sic]. However, Pod parsers are not required to make this 911distinction. 912 913=item * 914 915Note that EE<lt>number> I<must not> be interpreted as simply 916"codepoint I<number> in the current/native character set". It always 917means only "the character represented by codepoint I<number> in 918Unicode." (This is identical to the semantics of &#I<number>; in XML.) 919 920This will likely require many formatters to have tables mapping from 921treatable Unicode codepoints (such as the "\xE9" for the e-acute 922character) to the escape sequences or codes necessary for conveying 923such sequences in the target output format. A converter to *roff 924would, for example know that "\xE9" (whether conveyed literally, or via 925a EE<lt>...> sequence) is to be conveyed as "e\\*'". 926Similarly, a program rendering Pod in a Mac OS application window, would 927presumably need to know that "\xE9" maps to codepoint 142 in MacRoman 928encoding that (at time of writing) is native for Mac OS. Such 929Unicode2whatever mappings are presumably already widely available for 930common output formats. (Such mappings may be incomplete! Implementers 931are not expected to bend over backwards in an attempt to render 932Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any 933of the other weird things that Unicode can encode.) And 934if a Pod document uses a character not found in such a mapping, the 935formatter should consider it an unrenderable character. 936 937=item * 938 939If, surprisingly, the implementor of a Pod formatter can't find a 940satisfactory pre-existing table mapping from Unicode characters to 941escapes in the target format (e.g., a decent table of Unicode 942characters to *roff escapes), it will be necessary to build such a 943table. If you are in this circumstance, you should begin with the 944characters in the range 0x00A0 - 0x00FF, which is mostly the heavily 945used accented characters. Then proceed (as patience permits and 946fastidiousness compels) through the characters that the (X)HTML 947standards groups judged important enough to merit mnemonics 948for. These are declared in the (X)HTML specifications at the 949www.W3.org site. At time of writing (September 2001), the most recent 950entity declaration files are: 951 952 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 953 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 954 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent 955 956Then you can progress through any remaining notable Unicode characters 957in the range 0x2000-0x204D (consult the character tables at 958www.unicode.org), and whatever else strikes your fancy. For example, 959in F<xhtml-symbol.ent>, there is the entry: 960 961 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> 962 963While the mapping "infin" to the character "\x{221E}" will (hopefully) 964have been already handled by the Pod parser, the presence of the 965character in this file means that it's reasonably important enough to 966include in a formatter's table that maps from notable Unicode characters 967to the codes necessary for rendering them. So for a Unicode-to-*roff 968mapping, for example, this would merit the entry: 969 970 "\x{221E}" => '\(in', 971 972It is eagerly hoped that in the future, increasing numbers of formats 973(and formatters) will support Unicode characters directly (as (X)HTML 974does with C<∞>, C<∞>, or C<∞>), reducing the need 975for idiosyncratic mappings of Unicode-to-I<my_escapes>. 976 977=item * 978 979It is up to individual Pod formatter to display good judgement when 980confronted with an unrenderable character (which is distinct from an 981unknown EE<lt>thing> sequence that the parser couldn't resolve to 982anything, renderable or not). It is good practice to map Latin letters 983with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding 984unaccented US-ASCII letters (like a simple character 101, "e"), but 985clearly this is often not feasible, and an unrenderable character may 986be represented as "?", or the like. In attempting a sane fallback 987(as from EE<lt>233> to "e"), Pod formatters may use the 988%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or 989L<Text::Unidecode|Text::Unidecode>, if available. 990 991For example, this Pod text: 992 993 magic is enabled if you set C<$Currency> to 'E<euro>'. 994 995may be rendered as: 996"magic is enabled if you set C<$Currency> to 'I<?>'" or as 997"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as 998"magic is enabled if you set C<$Currency> to '[x20AC]', etc. 999 1000A Pod formatter may also note, in a comment or warning, a list of what 1001unrenderable characters were encountered. 1002 1003=item * 1004 1005EE<lt>...> may freely appear in any formatting code (other than 1006in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The 1007EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The 1008EE<lt>euro>1,000,000 Solution|Million::Euros>". 1009 1010=item * 1011 1012Some Pod formatters output to formats that implement non-breaking 1013spaces as an individual character (which I'll call "NBSP"), and 1014others output to formats that implement non-breaking spaces just as 1015spaces wrapped in a "don't break this across lines" code. Note that 1016at the level of Pod, both sorts of codes can occur: Pod can contain a 1017NBSP character (whether as a literal, or as a "EE<lt>160>" or 1018"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo 1019IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in 1020such codes are taken to represent non-breaking spaces. Pod 1021parsers should consider supporting the optional parsing of "SE<lt>foo 1022IE<lt>barE<gt> baz>" as if it were 1023"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the 1024optional parsing of groups of words joined by NBSP's as if each group 1025were in a SE<lt>...> code, so that formatters may use the 1026representation that maps best to what the output format demands. 1027 1028=item * 1029 1030Some processors may find that the C<SE<lt>...E<gt>> code is easiest to 1031implement by replacing each space in the parse tree under the content 1032of the S, with an NBSP. But note: the replacement should apply I<not> to 1033spaces in I<all> text, but I<only> to spaces in I<printable> text. (This 1034distinction may or may not be evident in the particular tree/event 1035model implemented by the Pod parser.) For example, consider this 1036unusual case: 1037 1038 S<L</Autoloaded Functions>> 1039 1040This means that the space in the middle of the visible link text must 1041not be broken across lines. In other words, it's the same as this: 1042 1043 L<"AutoloadedE<160>Functions"/Autoloaded Functions> 1044 1045However, a misapplied space-to-NBSP replacement could (wrongly) 1046produce something equivalent to this: 1047 1048 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> 1049 1050...which is almost definitely not going to work as a hyperlink (assuming 1051this formatter outputs a format supporting hypertext). 1052 1053Formatters may choose to just not support the S format code, 1054especially in cases where the output format simply has no NBSP 1055character/code and no code for "don't break this stuff across lines". 1056 1057=item * 1058 1059Besides the NBSP character discussed above, implementors are reminded 1060of the existence of the other "special" character in Latin-1, the 1061"soft hyphen" character, also known as "discretionary hyphen", 1062i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = 1063C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation 1064point. That is, it normally renders as nothing, but may render as a 1065"-" if a formatter breaks the word at that point. Pod formatters 1066should, as appropriate, do one of the following: 1) render this with 1067a code with the same meaning (e.g., "\-" in RTF), 2) pass it through 1068in the expectation that the formatter understands this character as 1069such, or 3) delete it. 1070 1071For example: 1072 1073 sigE<shy>action 1074 manuE<shy>script 1075 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi 1076 1077These signal to a formatter that if it is to hyphenate "sigaction" 1078or "manuscript", then it should be done as 1079"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" 1080(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't 1081show up at all). And if it is 1082to hyphenate "Jarkko" and/or "Hietaniemi", it can do 1083so only at the points where there is a C<EE<lt>shyE<gt>> code. 1084 1085In practice, it is anticipated that this character will not be used 1086often, but formatters should either support it, or delete it. 1087 1088=item * 1089 1090If you think that you want to add a new command to Pod (like, say, a 1091"=biblio" command), consider whether you could get the same 1092effect with a for or begin/end sequence: "=for biblio ..." or "=begin 1093biblio" ... "=end biblio". Pod processors that don't understand 1094"=for biblio", etc, will simply ignore it, whereas they may complain 1095loudly if they see "=biblio". 1096 1097=item * 1098 1099Throughout this document, "Pod" has been the preferred spelling for 1100the name of the documentation format. One may also use "POD" or 1101"pod". For the documentation that is (typically) in the Pod 1102format, you may use "pod", or "Pod", or "POD". Understanding these 1103distinctions is useful; but obsessing over how to spell them, usually 1104is not. 1105 1106=back 1107 1108 1109 1110 1111 1112=head1 About LE<lt>...E<gt> Codes 1113 1114As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> 1115code is the most complex of the Pod formatting codes. The points below 1116will hopefully clarify what it means and how processors should deal 1117with it. 1118 1119=over 1120 1121=item * 1122 1123In parsing an LE<lt>...> code, Pod parsers must distinguish at least 1124four attributes: 1125 1126=over 1127 1128=item First: 1129 1130The link-text. If there is none, this must be undef. (E.g., in 1131"LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". 1132In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no 1133link text. Note that link text may contain formatting.) 1134 1135=item Second: 1136 1137The possibly inferred link-text; i.e., if there was no real link 1138text, then this is the text that we'll infer in its place. (E.g., for 1139"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) 1140 1141=item Third: 1142 1143The name or URL, or undef if none. (E.g., in "LE<lt>Perl 1144Functions|perlfunc>", the name (also sometimes called the page) 1145is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) 1146 1147=item Fourth: 1148 1149The section (AKA "item" in older perlpods), or undef if none. E.g., 1150in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note 1151that this is not the same as a manpage section like the "5" in "man 5 1152crontab". "Section Foo" in the Pod sense means the part of the text 1153that's introduced by the heading or item whose text is "Foo".) 1154 1155=back 1156 1157Pod parsers may also note additional attributes including: 1158 1159=over 1160 1161=item Fifth: 1162 1163A flag for whether item 3 (if present) is a URL (like 1164"http://lists.perl.org" is), in which case there should be no section 1165attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or 1166possibly a man page name (like "crontab(5)" is). 1167 1168=item Sixth: 1169 1170The raw original LE<lt>...> content, before text is split on 1171"|", "/", etc, and before EE<lt>...> codes are expanded. 1172 1173=back 1174 1175(The above were numbered only for concise reference below. It is not 1176a requirement that these be passed as an actual list or array.) 1177 1178For example: 1179 1180 L<Foo::Bar> 1181 => undef, # link text 1182 "Foo::Bar", # possibly inferred link text 1183 "Foo::Bar", # name 1184 undef, # section 1185 'pod', # what sort of link 1186 "Foo::Bar" # original content 1187 1188 L<Perlport's section on NL's|perlport/Newlines> 1189 => "Perlport's section on NL's", # link text 1190 "Perlport's section on NL's", # possibly inferred link text 1191 "perlport", # name 1192 "Newlines", # section 1193 'pod', # what sort of link 1194 "Perlport's section on NL's|perlport/Newlines" # orig. content 1195 1196 L<perlport/Newlines> 1197 => undef, # link text 1198 '"Newlines" in perlport', # possibly inferred link text 1199 "perlport", # name 1200 "Newlines", # section 1201 'pod', # what sort of link 1202 "perlport/Newlines" # original content 1203 1204 L<crontab(5)/"DESCRIPTION"> 1205 => undef, # link text 1206 '"DESCRIPTION" in crontab(5)', # possibly inferred link text 1207 "crontab(5)", # name 1208 "DESCRIPTION", # section 1209 'man', # what sort of link 1210 'crontab(5)/"DESCRIPTION"' # original content 1211 1212 L</Object Attributes> 1213 => undef, # link text 1214 '"Object Attributes"', # possibly inferred link text 1215 undef, # name 1216 "Object Attributes", # section 1217 'pod', # what sort of link 1218 "/Object Attributes" # original content 1219 1220 L<http://www.perl.org/> 1221 => undef, # link text 1222 "http://www.perl.org/", # possibly inferred link text 1223 "http://www.perl.org/", # name 1224 undef, # section 1225 'url', # what sort of link 1226 "http://www.perl.org/" # original content 1227 1228 L<Perl.org|http://www.perl.org/> 1229 => "Perl.org", # link text 1230 "http://www.perl.org/", # possibly inferred link text 1231 "http://www.perl.org/", # name 1232 undef, # section 1233 'url', # what sort of link 1234 "Perl.org|http://www.perl.org/" # original content 1235 1236Note that you can distinguish URL-links from anything else by the 1237fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So 1238C<LE<lt>http://www.perl.comE<gt>> is a URL, but 1239C<LE<lt>HTTP::ResponseE<gt>> isn't. 1240 1241=item * 1242 1243In case of LE<lt>...> codes with no "text|" part in them, 1244older formatters have exhibited great variation in actually displaying 1245the link or cross reference. For example, LE<lt>crontab(5)> would render 1246as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" 1247or just "C<crontab(5)>". 1248 1249Pod processors must now treat "text|"-less links as follows: 1250 1251 L<name> => L<name|name> 1252 L</section> => L<"section"|/section> 1253 L<name/section> => L<"section" in name|name/section> 1254 1255=item * 1256 1257Note that section names might contain markup. I.e., if a section 1258starts with: 1259 1260 =head2 About the C<-M> Operator 1261 1262or with: 1263 1264 =item About the C<-M> Operator 1265 1266then a link to it would look like this: 1267 1268 L<somedoc/About the C<-M> Operator> 1269 1270Formatters may choose to ignore the markup for purposes of resolving 1271the link and use only the renderable characters in the section name, 1272as in: 1273 1274 <h1><a name="About_the_-M_Operator">About the <code>-M</code> 1275 Operator</h1> 1276 1277 ... 1278 1279 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> 1280 Operator" in somedoc</a> 1281 1282=item * 1283 1284Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> 1285links from C<LE<lt>name/itemE<gt>> links (and their targets). These 1286have been merged syntactically and semantically in the current 1287specification, and I<section> can refer either to a "=headI<n> Heading 1288Content" command or to a "=item Item Content" command. This 1289specification does not specify what behavior should be in the case 1290of a given document having several things all seeming to produce the 1291same I<section> identifier (e.g., in HTML, several things all producing 1292the same I<anchorname> in <a name="I<anchorname>">...</a> 1293elements). Where Pod processors can control this behavior, they should 1294use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the 1295I<first> "Bar" section in Foo. 1296 1297But for some processors/formats this cannot be easily controlled; as 1298with the HTML example, the behavior of multiple ambiguous 1299<a name="I<anchorname>">...</a> is most easily just left up to 1300browsers to decide. 1301 1302=item * 1303 1304In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes 1305for formatting or for EE<lt>...> escapes, as in: 1306 1307 L<B<ummE<234>stuff>|...> 1308 1309For C<LE<lt>...E<gt>> codes without a "name|" part, only 1310C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is, 1311authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". 1312 1313Note, however, that formatting codes and ZE<lt>>'s can occur in any 1314and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, 1315and I<url>). 1316 1317Authors must not nest LE<lt>...> codes. For example, "LE<lt>The 1318LE<lt>Foo::Bar> man page>" should be treated as an error. 1319 1320=item * 1321 1322Note that Pod authors may use formatting codes inside the "text" 1323part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). 1324 1325In other words, this is valid: 1326 1327 Go read L<the docs on C<$.>|perlvar/"$."> 1328 1329Some output formats that do allow rendering "LE<lt>...>" codes as 1330hypertext, might not allow the link-text to be formatted; in 1331that case, formatters will have to just ignore that formatting. 1332 1333=item * 1334 1335At time of writing, C<LE<lt>nameE<gt>> values are of two types: 1336either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which 1337might be a real Perl module or program in an @INC / PATH 1338directory, or a .pod file in those places); or the name of a Unix 1339man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> 1340in ambiguous between a Pod page called "chmod", or the Unix man page 1341"chmod" (in whatever man-section). However, the presence of a string 1342in parens, as in "crontab(5)", is sufficient to signal that what 1343is being discussed is not a Pod page, and so is presumably a 1344Unix man page. The distinction is of no importance to many 1345Pod processors, but some processors that render to hypertext formats 1346may need to distinguish them in order to know how to render a 1347given C<LE<lt>fooE<gt>> code. 1348 1349=item * 1350 1351Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in 1352C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from 1353C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only 1354slightly less ambiguous. This syntax is no longer in the specification, and 1355has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was 1356formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>> 1357syntax, for a while at least. The suggested heuristic for distinguishing 1358C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any 1359whitespace, it's a I<section>. Pod processors should warn about this being 1360deprecated syntax. 1361 1362=back 1363 1364=head1 About =over...=back Regions 1365 1366"=over"..."=back" regions are used for various kinds of list-like 1367structures. (I use the term "region" here simply as a collective 1368term for everything from the "=over" to the matching "=back".) 1369 1370=over 1371 1372=item * 1373 1374The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... 1375"=back" is used for giving the formatter a clue as to how many 1376"spaces" (ems, or roughly equivalent units) it should tab over, 1377although many formatters will have to convert this to an absolute 1378measurement that may not exactly match with the size of spaces (or M's) 1379in the document's base font. Other formatters may have to completely 1380ignore the number. The lack of any explicit I<indentlevel> parameter is 1381equivalent to an I<indentlevel> value of 4. Pod processors may 1382complain if I<indentlevel> is present but is not a positive number 1383matching C<m/\A(\d*\.)?\d+\z/>. 1384 1385=item * 1386 1387Authors of Pod formatters are reminded that "=over" ... "=back" may 1388map to several different constructs in your output format. For 1389example, in converting Pod to (X)HTML, it can map to any of 1390<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or 1391<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or 1392<dt>. 1393 1394=item * 1395 1396Each "=over" ... "=back" region should be one of the following: 1397 1398=over 1399 1400=item * 1401 1402An "=over" ... "=back" region containing only "=item *" commands, 1403each followed by some number of ordinary/verbatim paragraphs, other 1404nested "=over" ... "=back" regions, "=for..." paragraphs, and 1405"=begin"..."=end" regions. 1406 1407(Pod processors must tolerate a bare "=item" as if it were "=item 1408*".) Whether "*" is rendered as a literal asterisk, an "o", or as 1409some kind of real bullet character, is left up to the Pod formatter, 1410and may depend on the level of nesting. 1411 1412=item * 1413 1414An "=over" ... "=back" region containing only 1415C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) 1416followed by some number of ordinary/verbatim paragraphs, other nested 1417"=over" ... "=back" regions, "=for..." paragraphs, and/or 1418"=begin"..."=end" codes. Note that the numbers must start at 1 1419in each section, and must proceed in order and without skipping 1420numbers. 1421 1422(Pod processors must tolerate lines like "=item 1" as if they were 1423"=item 1.", with the period.) 1424 1425=item * 1426 1427An "=over" ... "=back" region containing only "=item [text]" 1428commands, each one (or each group of them) followed by some number of 1429ordinary/verbatim paragraphs, other nested "=over" ... "=back" 1430regions, or "=for..." paragraphs, and "=begin"..."=end" regions. 1431 1432The "=item [text]" paragraph should not match 1433C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it 1434match just C<m/\A=item\s*\z/>. 1435 1436=item * 1437 1438An "=over" ... "=back" region containing no "=item" paragraphs at 1439all, and containing only some number of 1440ordinary/verbatim paragraphs, and possibly also some nested "=over" 1441... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" 1442regions. Such an itemless "=over" ... "=back" region in Pod is 1443equivalent in meaning to a "<blockquote>...</blockquote>" element in 1444HTML. 1445 1446=back 1447 1448Note that with all the above cases, you can determine which type of 1449"=over" ... "=back" you have, by examining the first (non-"=cut", 1450non-"=pod") Pod paragraph after the "=over" command. 1451 1452=item * 1453 1454Pod formatters I<must> tolerate arbitrarily large amounts of text 1455in the "=item I<text...>" paragraph. In practice, most such 1456paragraphs are short, as in: 1457 1458 =item For cutting off our trade with all parts of the world 1459 1460But they may be arbitrarily long: 1461 1462 =item For transporting us beyond seas to be tried for pretended 1463 offenses 1464 1465 =item He is at this time transporting large armies of foreign 1466 mercenaries to complete the works of death, desolation and 1467 tyranny, already begun with circumstances of cruelty and perfidy 1468 scarcely paralleled in the most barbarous ages, and totally 1469 unworthy the head of a civilized nation. 1470 1471=item * 1472 1473Pod processors should tolerate "=item *" / "=item I<number>" commands 1474with no accompanying paragraph. The middle item is an example: 1475 1476 =over 1477 1478 =item 1 1479 1480 Pick up dry cleaning. 1481 1482 =item 2 1483 1484 =item 3 1485 1486 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. 1487 1488 =back 1489 1490=item * 1491 1492No "=over" ... "=back" region can contain headings. Processors may 1493treat such a heading as an error. 1494 1495=item * 1496 1497Note that an "=over" ... "=back" region should have some 1498content. That is, authors should not have an empty region like this: 1499 1500 =over 1501 1502 =back 1503 1504Pod processors seeing such a contentless "=over" ... "=back" region, 1505may ignore it, or may report it as an error. 1506 1507=item * 1508 1509Processors must tolerate an "=over" list that goes off the end of the 1510document (i.e., which has no matching "=back"), but they may warn 1511about such a list. 1512 1513=item * 1514 1515Authors of Pod formatters should note that this construct: 1516 1517 =item Neque 1518 1519 =item Porro 1520 1521 =item Quisquam Est 1522 1523 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1524 velit, sed quia non numquam eius modi tempora incidunt ut 1525 labore et dolore magnam aliquam quaerat voluptatem. 1526 1527 =item Ut Enim 1528 1529is semantically ambiguous, in a way that makes formatting decisions 1530a bit difficult. On the one hand, it could be mention of an item 1531"Neque", mention of another item "Porro", and mention of another 1532item "Quisquam Est", with just the last one requiring the explanatory 1533paragraph "Qui dolorem ipsum quia dolor..."; and then an item 1534"Ut Enim". In that case, you'd want to format it like so: 1535 1536 Neque 1537 1538 Porro 1539 1540 Quisquam Est 1541 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1542 velit, sed quia non numquam eius modi tempora incidunt ut 1543 labore et dolore magnam aliquam quaerat voluptatem. 1544 1545 Ut Enim 1546 1547But it could equally well be a discussion of three (related or equivalent) 1548items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph 1549explaining them all, and then a new item "Ut Enim". In that case, you'd 1550probably want to format it like so: 1551 1552 Neque 1553 Porro 1554 Quisquam Est 1555 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1556 velit, sed quia non numquam eius modi tempora incidunt ut 1557 labore et dolore magnam aliquam quaerat voluptatem. 1558 1559 Ut Enim 1560 1561But (for the foreseeable future), Pod does not provide any way for Pod 1562authors to distinguish which grouping is meant by the above 1563"=item"-cluster structure. So formatters should format it like so: 1564 1565 Neque 1566 1567 Porro 1568 1569 Quisquam Est 1570 1571 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1572 velit, sed quia non numquam eius modi tempora incidunt ut 1573 labore et dolore magnam aliquam quaerat voluptatem. 1574 1575 Ut Enim 1576 1577That is, there should be (at least roughly) equal spacing between 1578items as between paragraphs (although that spacing may well be less 1579than the full height of a line of text). This leaves it to the reader 1580to use (con)textual cues to figure out whether the "Qui dolorem 1581ipsum..." paragraph applies to the "Quisquam Est" item or to all three 1582items "Neque", "Porro", and "Quisquam Est". While not an ideal 1583situation, this is preferable to providing formatting cues that may 1584be actually contrary to the author's intent. 1585 1586=back 1587 1588 1589 1590=head1 About Data Paragraphs and "=begin/=end" Regions 1591 1592Data paragraphs are typically used for inlining non-Pod data that is 1593to be used (typically passed through) when rendering the document to 1594a specific format: 1595 1596 =begin rtf 1597 1598 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1599 1600 =end rtf 1601 1602The exact same effect could, incidentally, be achieved with a single 1603"=for" paragraph: 1604 1605 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1606 1607(Although that is not formally a data paragraph, it has the same 1608meaning as one, and Pod parsers may parse it as one.) 1609 1610Another example of a data paragraph: 1611 1612 =begin html 1613 1614 I like <em>PIE</em>! 1615 1616 <hr>Especially pecan pie! 1617 1618 =end html 1619 1620If these were ordinary paragraphs, the Pod parser would try to 1621expand the "EE<lt>/em>" (in the first paragraph) as a formatting 1622code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this 1623is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> 1624the identifier "html" doesn't begin have a ":" prefix, the contents 1625of this region are stored as data paragraphs, instead of being 1626processed as ordinary paragraphs (or if they began with a spaces 1627and/or tabs, as verbatim paragraphs). 1628 1629As a further example: At time of writing, no "biblio" identifier is 1630supported, but suppose some processor were written to recognize it as 1631a way of (say) denoting a bibliographic reference (necessarily 1632containing formatting codes in ordinary paragraphs). The fact that 1633"biblio" paragraphs were meant for ordinary processing would be 1634indicated by prefacing each "biblio" identifier with a colon: 1635 1636 =begin :biblio 1637 1638 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1639 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1640 1641 =end :biblio 1642 1643This would signal to the parser that paragraphs in this begin...end 1644region are subject to normal handling as ordinary/verbatim paragraphs 1645(while still tagged as meant only for processors that understand the 1646"biblio" identifier). The same effect could be had with: 1647 1648 =for :biblio 1649 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1650 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1651 1652The ":" on these identifiers means simply "process this stuff 1653normally, even though the result will be for some special target". 1654I suggest that parser APIs report "biblio" as the target identifier, 1655but also report that it had a ":" prefix. (And similarly, with the 1656above "html", report "html" as the target identifier, and note the 1657I<lack> of a ":" prefix.) 1658 1659Note that a "=begin I<identifier>"..."=end I<identifier>" region where 1660I<identifier> begins with a colon, I<can> contain commands. For example: 1661 1662 =begin :biblio 1663 1664 Wirth's classic is available in several editions, including: 1665 1666 =for comment 1667 hm, check abebooks.com for how much used copies cost. 1668 1669 =over 1670 1671 =item 1672 1673 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1674 Teubner, Stuttgart. [Yes, it's in German.] 1675 1676 =item 1677 1678 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1679 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1680 1681 =back 1682 1683 =end :biblio 1684 1685Note, however, a "=begin I<identifier>"..."=end I<identifier>" 1686region where I<identifier> does I<not> begin with a colon, should not 1687directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", 1688nor "=item". For example, this may be considered invalid: 1689 1690 =begin somedata 1691 1692 This is a data paragraph. 1693 1694 =head1 Don't do this! 1695 1696 This is a data paragraph too. 1697 1698 =end somedata 1699 1700A Pod processor may signal that the above (specifically the "=head1" 1701paragraph) is an error. Note, however, that the following should 1702I<not> be treated as an error: 1703 1704 =begin somedata 1705 1706 This is a data paragraph. 1707 1708 =cut 1709 1710 # Yup, this isn't Pod anymore. 1711 sub excl { (rand() > .5) ? "hoo!" : "hah!" } 1712 1713 =pod 1714 1715 This is a data paragraph too. 1716 1717 =end somedata 1718 1719And this too is valid: 1720 1721 =begin someformat 1722 1723 This is a data paragraph. 1724 1725 And this is a data paragraph. 1726 1727 =begin someotherformat 1728 1729 This is a data paragraph too. 1730 1731 And this is a data paragraph too. 1732 1733 =begin :yetanotherformat 1734 1735 =head2 This is a command paragraph! 1736 1737 This is an ordinary paragraph! 1738 1739 And this is a verbatim paragraph! 1740 1741 =end :yetanotherformat 1742 1743 =end someotherformat 1744 1745 Another data paragraph! 1746 1747 =end someformat 1748 1749The contents of the above "=begin :yetanotherformat" ... 1750"=end :yetanotherformat" region I<aren't> data paragraphs, because 1751the immediately containing region's identifier (":yetanotherformat") 1752begins with a colon. In practice, most regions that contain 1753data paragraphs will contain I<only> data paragraphs; however, 1754the above nesting is syntactically valid as Pod, even if it is 1755rare. However, the handlers for some formats, like "html", 1756will accept only data paragraphs, not nested regions; and they may 1757complain if they see (targeted for them) nested regions, or commands, 1758other than "=end", "=pod", and "=cut". 1759 1760Also consider this valid structure: 1761 1762 =begin :biblio 1763 1764 Wirth's classic is available in several editions, including: 1765 1766 =over 1767 1768 =item 1769 1770 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1771 Teubner, Stuttgart. [Yes, it's in German.] 1772 1773 =item 1774 1775 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1776 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1777 1778 =back 1779 1780 Buy buy buy! 1781 1782 =begin html 1783 1784 <img src='wirth_spokesmodeling_book.png'> 1785 1786 <hr> 1787 1788 =end html 1789 1790 Now now now! 1791 1792 =end :biblio 1793 1794There, the "=begin html"..."=end html" region is nested inside 1795the larger "=begin :biblio"..."=end :biblio" region. Note that the 1796content of the "=begin html"..."=end html" region is data 1797paragraph(s), because the immediately containing region's identifier 1798("html") I<doesn't> begin with a colon. 1799 1800Pod parsers, when processing a series of data paragraphs one 1801after another (within a single region), should consider them to 1802be one large data paragraph that happens to contain blank lines. So 1803the content of the above "=begin html"..."=end html" I<may> be stored 1804as two data paragraphs (one consisting of 1805"<img src='wirth_spokesmodeling_book.png'>\n" 1806and another consisting of "<hr>\n"), but I<should> be stored as 1807a single data paragraph (consisting of 1808"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). 1809 1810Pod processors should tolerate empty 1811"=begin I<something>"..."=end I<something>" regions, 1812empty "=begin :I<something>"..."=end :I<something>" regions, and 1813contentless "=for I<something>" and "=for :I<something>" 1814paragraphs. I.e., these should be tolerated: 1815 1816 =for html 1817 1818 =begin html 1819 1820 =end html 1821 1822 =begin :biblio 1823 1824 =end :biblio 1825 1826Incidentally, note that there's no easy way to express a data 1827paragraph starting with something that looks like a command. Consider: 1828 1829 =begin stuff 1830 1831 =shazbot 1832 1833 =end stuff 1834 1835There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data 1836paragraph "=shazbot\n". However, you can express a data paragraph consisting 1837of "=shazbot\n" using this code: 1838 1839 =for stuff =shazbot 1840 1841The situation where this is necessary, is presumably quite rare. 1842 1843Note that =end commands must match the currently open =begin command. That 1844is, they must properly nest. For example, this is valid: 1845 1846 =begin outer 1847 1848 X 1849 1850 =begin inner 1851 1852 Y 1853 1854 =end inner 1855 1856 Z 1857 1858 =end outer 1859 1860while this is invalid: 1861 1862 =begin outer 1863 1864 X 1865 1866 =begin inner 1867 1868 Y 1869 1870 =end outer 1871 1872 Z 1873 1874 =end inner 1875 1876This latter is improper because when the "=end outer" command is seen, the 1877currently open region has the formatname "inner", not "outer". (It just 1878happens that "outer" is the format name of a higher-up region.) This is 1879an error. Processors must by default report this as an error, and may halt 1880processing the document containing that error. A corollary of this is that 1881regions cannot "overlap". That is, the latter block above does not represent 1882a region called "outer" which contains X and Y, overlapping a region called 1883"inner" which contains Y and Z. But because it is invalid (as all 1884apparently overlapping regions would be), it doesn't represent that, or 1885anything at all. 1886 1887Similarly, this is invalid: 1888 1889 =begin thing 1890 1891 =end hting 1892 1893This is an error because the region is opened by "thing", and the "=end" 1894tries to close "hting" [sic]. 1895 1896This is also invalid: 1897 1898 =begin thing 1899 1900 =end 1901 1902This is invalid because every "=end" command must have a formatname 1903parameter. 1904 1905=head1 SEE ALSO 1906 1907L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, 1908L<podchecker> 1909 1910=head1 AUTHOR 1911 1912Sean M. Burke 1913 1914=cut 1915 1916 1917