1=encoding utf8 2 3=head1 NAME 4 5perlpodspec - Plain Old Documentation: format specification and notes 6 7=head1 DESCRIPTION 8 9This document is detailed notes on the Pod markup language. Most 10people will only have to read L<perlpod|perlpod> to know how to write 11in Pod, but this document may answer some incidental questions to do 12with parsing and rendering Pod. 13 14In this document, "must" / "must not", "should" / 15"should not", and "may" have their conventional (cf. RFC 2119) 16meanings: "X must do Y" means that if X doesn't do Y, it's against 17this specification, and should really be fixed. "X should do Y" 18means that it's recommended, but X may fail to do Y, if there's a 19good reason. "X may do Y" is merely a note that X can do Y at 20will (although it is up to the reader to detect any connotation of 21"and I think it would be I<nice> if X did Y" versus "it wouldn't 22really I<bother> me if X did Y"). 23 24Notably, when I say "the parser should do Y", the 25parser may fail to do Y, if the calling application explicitly 26requests that the parser I<not> do Y. I often phrase this as 27"the parser should, by default, do Y." This doesn't I<require> 28the parser to provide an option for turning off whatever 29feature Y is (like expanding tabs in verbatim paragraphs), although 30it implicates that such an option I<may> be provided. 31 32=head1 Pod Definitions 33 34Pod is embedded in files, typically Perl source files, although you 35can write a file that's nothing but Pod. 36 37A B<line> in a file consists of zero or more non-newline characters, 38terminated by either a newline or the end of the file. 39 40A B<newline sequence> is usually a platform-dependent concept, but 41Pod parsers should understand it to mean any of CR (ASCII 13), LF 42(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in 43addition to any other system-specific meaning. The first CR/CRLF/LF 44sequence in the file may be used as the basis for identifying the 45newline sequence for parsing the rest of the file. 46 47A B<blank line> is a line consisting entirely of zero or more spaces 48(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. 49A B<non-blank line> is a line containing one or more characters other 50than space or tab (and terminated by a newline or end-of-file). 51 52(I<Note:> Many older Pod parsers did not accept a line consisting of 53spaces/tabs and then a newline as a blank line. The only lines they 54considered blank were lines consisting of I<no characters at all>, 55terminated by a newline.) 56 57B<Whitespace> is used in this document as a blanket term for spaces, 58tabs, and newline sequences. (By itself, this term usually refers 59to literal whitespace. That is, sequences of whitespace characters 60in Pod source, as opposed to "EE<lt>32>", which is a formatting 61code that I<denotes> a whitespace character.) 62 63A B<Pod parser> is a module meant for parsing Pod (regardless of 64whether this involves calling callbacks or building a parse tree or 65directly formatting it). A B<Pod formatter> (or B<Pod translator>) 66is a module or program that converts Pod to some other format (HTML, 67plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a 68formatter or translator, or might be a program that does something 69else with the Pod (like counting words, scanning for index points, 70etc.). 71 72Pod content is contained in B<Pod blocks>. A Pod block starts with a 73line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line 74that matches C<m/\A=cut/> or up to the end of the file if there is 75no C<m/\A=cut/> line. 76 77=for comment 78 The current perlsyn says: 79 [beginquote] 80 Note that pod translators should look at only paragraphs beginning 81 with a pod directive (it makes parsing easier), whereas the compiler 82 actually knows to look for pod escapes even in the middle of a 83 paragraph. This means that the following secret stuff will be ignored 84 by both the compiler and the translators. 85 $a=3; 86 =secret stuff 87 warn "Neither POD nor CODE!?" 88 =cut back 89 print "got $a\n"; 90 You probably shouldn't rely upon the warn() being podded out forever. 91 Not all pod translators are well-behaved in this regard, and perhaps 92 the compiler will become pickier. 93 [endquote] 94 I think that those paragraphs should just be removed; paragraph-based 95 parsing seems to have been largely abandoned, because of the hassle 96 with non-empty blank lines messing up what people meant by "paragraph". 97 Even if the "it makes parsing easier" bit were especially true, 98 it wouldn't be worth the confusion of having perl and pod2whatever 99 actually disagree on what can constitute a Pod block. 100 101Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph 102consists of non-blank lines of text, separated by one or more blank 103lines. 104 105For purposes of Pod processing, there are four types of paragraphs in 106a Pod block: 107 108=over 109 110=item * 111 112A command paragraph (also called a "directive"). The first line of 113this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are 114typically one line, as in: 115 116 =head1 NOTES 117 118 =item * 119 120But they may span several (non-blank) lines: 121 122 =for comment 123 Hm, I wonder what it would look like if 124 you tried to write a BNF for Pod from this. 125 126 =head3 Dr. Strangelove, or: How I Learned to 127 Stop Worrying and Love the Bomb 128 129I<Some> command paragraphs allow formatting codes in their content 130(i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: 131 132 =head1 Did You Remember to C<use strict;>? 133 134In other words, the Pod processing handler for "head1" will apply the 135same processing to "Did You Remember to CE<lt>use strict;>?" that it 136would to an ordinary paragraph (i.e., formatting codes like 137"CE<lt>...>") are parsed and presumably formatted appropriately, and 138whitespace in the form of literal spaces and/or tabs is not 139significant. 140 141=item * 142 143A B<verbatim paragraph>. The first line of this paragraph must be a 144literal space or tab, and this paragraph must not be inside a "=begin 145I<identifier>", ... "=end I<identifier>" sequence unless 146"I<identifier>" begins with a colon (":"). That is, if a paragraph 147starts with a literal space or tab, but I<is> inside a 148"=begin I<identifier>", ... "=end I<identifier>" region, then it's 149a data paragraph, unless "I<identifier>" begins with a colon. 150 151Whitespace I<is> significant in verbatim paragraphs (although, in 152processing, tabs are probably expanded). 153 154=item * 155 156An B<ordinary paragraph>. A paragraph is an ordinary paragraph 157if its first line matches neither C<m/\A=[a-zA-Z]/> nor 158C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", 159... "=end I<identifier>" sequence unless "I<identifier>" begins with 160a colon (":"). 161 162=item * 163 164A B<data paragraph>. This is a paragraph that I<is> inside a "=begin 165I<identifier>" ... "=end I<identifier>" sequence where 166"I<identifier>" does I<not> begin with a literal colon (":"). In 167some sense, a data paragraph is not part of Pod at all (i.e., 168effectively it's "out-of-band"), since it's not subject to most kinds 169of Pod parsing; but it is specified here, since Pod 170parsers need to be able to call an event for it, or store it in some 171form in a parse tree, or at least just parse I<around> it. 172 173=back 174 175For example: consider the following paragraphs: 176 177 # <- that's the 0th column 178 179 =head1 Foo 180 181 Stuff 182 183 $foo->bar 184 185 =cut 186 187Here, "=head1 Foo" and "=cut" are command paragraphs because the first 188line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" 189is a verbatim paragraph, because its first line starts with a literal 190whitespace character (and there's no "=begin"..."=end" region around). 191 192The "=begin I<identifier>" ... "=end I<identifier>" commands stop 193paragraphs that they surround from being parsed as ordinary or verbatim 194paragraphs, if I<identifier> doesn't begin with a colon. This 195is discussed in detail in the section 196L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 197 198=head1 Pod Commands 199 200This section is intended to supplement and clarify the discussion in 201L<perlpod/"Command Paragraph">. These are the currently recognized 202Pod commands: 203 204=over 205 206=item "=head1", "=head2", "=head3", "=head4" 207 208This command indicates that the text in the remainder of the paragraph 209is a heading. That text may contain formatting codes. Examples: 210 211 =head1 Object Attributes 212 213 =head3 What B<Not> to Do! 214 215=item "=pod" 216 217This command indicates that this paragraph begins a Pod block. (If we 218are already in the middle of a Pod block, this command has no effect at 219all.) If there is any text in this command paragraph after "=pod", 220it must be ignored. Examples: 221 222 =pod 223 224 This is a plain Pod paragraph. 225 226 =pod This text is ignored. 227 228=item "=cut" 229 230This command indicates that this line is the end of this previously 231started Pod block. If there is any text after "=cut" on the line, it must be 232ignored. Examples: 233 234 =cut 235 236 =cut The documentation ends here. 237 238 =cut 239 # This is the first line of program text. 240 sub foo { # This is the second. 241 242It is an error to try to I<start> a Pod block with a "=cut" command. In 243that case, the Pod processor must halt parsing of the input file, and 244must by default emit a warning. 245 246=item "=over" 247 248This command indicates that this is the start of a list/indent 249region. If there is any text following the "=over", it must consist 250of only a nonzero positive numeral. The semantics of this numeral is 251explained in the L</"About =over...=back Regions"> section, further 252below. Formatting codes are not expanded. Examples: 253 254 =over 3 255 256 =over 3.5 257 258 =over 259 260=item "=item" 261 262This command indicates that an item in a list begins here. Formatting 263codes are processed. The semantics of the (optional) text in the 264remainder of this paragraph are 265explained in the L</"About =over...=back Regions"> section, further 266below. Examples: 267 268 =item 269 270 =item * 271 272 =item * 273 274 =item 14 275 276 =item 3. 277 278 =item C<< $thing->stuff(I<dodad>) >> 279 280 =item For transporting us beyond seas to be tried for pretended 281 offenses 282 283 =item He is at this time transporting large armies of foreign 284 mercenaries to complete the works of death, desolation and 285 tyranny, already begun with circumstances of cruelty and perfidy 286 scarcely paralleled in the most barbarous ages, and totally 287 unworthy the head of a civilized nation. 288 289=item "=back" 290 291This command indicates that this is the end of the region begun 292by the most recent "=over" command. It permits no text after the 293"=back" command. 294 295=item "=begin formatname" 296 297=item "=begin formatname parameter" 298 299This marks the following paragraphs (until the matching "=end 300formatname") as being for some special kind of processing. Unless 301"formatname" begins with a colon, the contained non-command 302paragraphs are data paragraphs. But if "formatname" I<does> begin 303with a colon, then non-command paragraphs are ordinary paragraphs 304or data paragraphs. This is discussed in detail in the section 305L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 306 307It is advised that formatnames match the regexp 308C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the 309formatname is a parameter that may be used by the formatter when dealing 310with this region. This parameter must not be repeated in the "=end" 311paragraph. Implementors should anticipate future expansion in the 312semantics and syntax of the first parameter to "=begin"/"=end"/"=for". 313 314=item "=end formatname" 315 316This marks the end of the region opened by the matching 317"=begin formatname" region. If "formatname" is not the formatname 318of the most recent open "=begin formatname" region, then this 319is an error, and must generate an error message. This 320is discussed in detail in the section 321L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 322 323=item "=for formatname text..." 324 325This is synonymous with: 326 327 =begin formatname 328 329 text... 330 331 =end formatname 332 333That is, it creates a region consisting of a single paragraph; that 334paragraph is to be treated as a normal paragraph if "formatname" 335begins with a ":"; if "formatname" I<doesn't> begin with a colon, 336then "text..." will constitute a data paragraph. There is no way 337to use "=for formatname text..." to express "text..." as a verbatim 338paragraph. 339 340=item "=encoding encodingname" 341 342This command, which should occur early in the document (at least 343before any non-US-ASCII data!), declares that this document is 344encoded in the encoding I<encodingname>, which must be 345an encoding name that L<Encode> recognizes. (Encode's list 346of supported encodings, in L<Encode::Supported>, is useful here.) 347If the Pod parser cannot decode the declared encoding, it 348should emit a warning and may abort parsing the document 349altogether. 350 351A document having more than one "=encoding" line should be 352considered an error. Pod processors may silently tolerate this if 353the not-first "=encoding" lines are just duplicates of the 354first one (e.g., if there's a "=encoding utf8" line, and later on 355another "=encoding utf8" line). But Pod processors should complain if 356there are contradictory "=encoding" lines in the same document 357(e.g., if there is a "=encoding utf8" early in the document and 358"=encoding big5" later). Pod processors that recognize BOMs 359may also complain if they see an "=encoding" line 360that contradicts the BOM (e.g., if a document with a UTF-16LE 361BOM has an "=encoding shiftjis" line). 362 363=back 364 365If a Pod processor sees any command other than the ones listed 366above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", 367or "=w123"), that processor must by default treat this as an 368error. It must not process the paragraph beginning with that 369command, must by default warn of this as an error, and may 370abort the parse. A Pod parser may allow a way for particular 371applications to add to the above list of known commands, and to 372stipulate, for each additional command, whether formatting 373codes should be processed. 374 375Future versions of this specification may add additional 376commands. 377 378 379 380=head1 Pod Formatting Codes 381 382(Note that in previous drafts of this document and of perlpod, 383formatting codes were referred to as "interior sequences", and 384this term may still be found in the documentation for Pod parsers, 385and in error messages from Pod processors.) 386 387There are two syntaxes for formatting codes: 388 389=over 390 391=item * 392 393A formatting code starts with a capital letter (just US-ASCII [A-Z]) 394followed by a "<", any number of characters, and ending with the first 395matching ">". Examples: 396 397 That's what I<you> think! 398 399 What's C<dump()> for? 400 401 X<C<chmod> and C<unlink()> Under Different Operating Systems> 402 403=item * 404 405A formatting code starts with a capital letter (just US-ASCII [A-Z]) 406followed by two or more "<"'s, one or more whitespace characters, 407any number of characters, one or more whitespace characters, 408and ending with the first matching sequence of two or more ">"'s, where 409the number of ">"'s equals the number of "<"'s in the opening of this 410formatting code. Examples: 411 412 That's what I<< you >> think! 413 414 C<<< open(X, ">>thing.dat") || die $! >>> 415 416 B<< $foo->bar(); >> 417 418With this syntax, the whitespace character(s) after the "CE<lt><<" 419and before the ">>" (or whatever letter) are I<not> renderable. They 420do not signify whitespace, are merely part of the formatting codes 421themselves. That is, these are all synonymous: 422 423 C<thing> 424 C<< thing >> 425 C<< thing >> 426 C<<< thing >>> 427 C<<<< 428 thing 429 >>>> 430 431and so on. 432 433Finally, the multiple-angle-bracket form does I<not> alter the interpretation 434of nested formatting codes, meaning that the following four example lines are 435identical in meaning: 436 437 B<example: C<$a E<lt>=E<gt> $b>> 438 439 B<example: C<< $a <=> $b >>> 440 441 B<example: C<< $a E<lt>=E<gt> $b >>> 442 443 B<<< example: C<< $a E<lt>=E<gt> $b >> >>> 444 445=back 446 447In parsing Pod, a notably tricky part is the correct parsing of 448(potentially nested!) formatting codes. Implementors should 449consult the code in the C<parse_text> routine in Pod::Parser as an 450example of a correct implementation. 451 452=over 453 454=item C<IE<lt>textE<gt>> -- italic text 455 456See the brief discussion in L<perlpod/"Formatting Codes">. 457 458=item C<BE<lt>textE<gt>> -- bold text 459 460See the brief discussion in L<perlpod/"Formatting Codes">. 461 462=item C<CE<lt>codeE<gt>> -- code text 463 464See the brief discussion in L<perlpod/"Formatting Codes">. 465 466=item C<FE<lt>filenameE<gt>> -- style for filenames 467 468See the brief discussion in L<perlpod/"Formatting Codes">. 469 470=item C<XE<lt>topic nameE<gt>> -- an index entry 471 472See the brief discussion in L<perlpod/"Formatting Codes">. 473 474This code is unusual in that most formatters completely discard 475this code and its content. Other formatters will render it with 476invisible codes that can be used in building an index of 477the current document. 478 479=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code 480 481Discussed briefly in L<perlpod/"Formatting Codes">. 482 483This code is unusual is that it should have no content. That is, 484a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether 485or not it complains, the I<potatoes> text should ignored. 486 487=item C<LE<lt>nameE<gt>> -- a hyperlink 488 489The complicated syntaxes of this code are discussed at length in 490L<perlpod/"Formatting Codes">, and implementation details are 491discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the 492contents of LE<lt>content> is tricky. Notably, the content has to be 493checked for whether it looks like a URL, or whether it has to be split 494on literal "|" and/or "/" (in the right order!), and so on, 495I<before> EE<lt>...> codes are resolved. 496 497=item C<EE<lt>escapeE<gt>> -- a character escape 498 499See L<perlpod/"Formatting Codes">, and several points in 500L</Notes on Implementing Pod Processors>. 501 502=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces 503 504This formatting code is syntactically simple, but semantically 505complex. What it means is that each space in the printable 506content of this code signifies a non-breaking space. 507 508Consider: 509 510 C<$x ? $y : $z> 511 512 S<C<$x ? $y : $z>> 513 514Both signify the monospace (c[ode] style) text consisting of 515"$x", one space, "?", one space, ":", one space, "$z". The 516difference is that in the latter, with the S code, those spaces 517are not "normal" spaces, but instead are non-breaking spaces. 518 519=back 520 521 522If a Pod processor sees any formatting code other than the ones 523listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that 524processor must by default treat this as an error. 525A Pod parser may allow a way for particular 526applications to add to the above list of known formatting codes; 527a Pod parser might even allow a way to stipulate, for each additional 528command, whether it requires some form of special processing, as 529LE<lt>...> does. 530 531Future versions of this specification may add additional 532formatting codes. 533 534Historical note: A few older Pod processors would not see a ">" as 535closing a "CE<lt>" code, if the ">" was immediately preceded by 536a "-". This was so that this: 537 538 C<$foo->bar> 539 540would parse as equivalent to this: 541 542 C<$foo-E<gt>bar> 543 544instead of as equivalent to a "C" formatting code containing 545only "$foo-", and then a "bar>" outside the "C" formatting code. This 546problem has since been solved by the addition of syntaxes like this: 547 548 C<< $foo->bar >> 549 550Compliant parsers must not treat "->" as special. 551 552Formatting codes absolutely cannot span paragraphs. If a code is 553opened in one paragraph, and no closing code is found by the end of 554that paragraph, the Pod parser must close that formatting code, 555and should complain (as in "Unterminated I code in the paragraph 556starting at line 123: 'Time objects are not...'"). So these 557two paragraphs: 558 559 I<I told you not to do this! 560 561 Don't make me say it again!> 562 563...must I<not> be parsed as two paragraphs in italics (with the I 564code starting in one paragraph and starting in another.) Instead, 565the first paragraph should generate a warning, but that aside, the 566above code must parse as if it were: 567 568 I<I told you not to do this!> 569 570 Don't make me say it again!E<gt> 571 572(In SGMLish jargon, all Pod commands are like block-level 573elements, whereas all Pod formatting codes are like inline-level 574elements.) 575 576 577 578=head1 Notes on Implementing Pod Processors 579 580The following is a long section of miscellaneous requirements 581and suggestions to do with Pod processing. 582 583=over 584 585=item * 586 587Pod formatters should tolerate lines in verbatim blocks that are of 588any length, even if that means having to break them (possibly several 589times, for very long lines) to avoid text running off the side of the 590page. Pod formatters may warn of such line-breaking. Such warnings 591are particularly appropriate for lines are over 100 characters long, which 592are usually not intentional. 593 594=item * 595 596Pod parsers must recognize I<all> of the three well-known newline 597formats: CR, LF, and CRLF. See L<perlport|perlport>. 598 599=item * 600 601Pod parsers should accept input lines that are of any length. 602 603=item * 604 605Since Perl recognizes a Unicode Byte Order Mark at the start of files 606as signaling that the file is Unicode encoded as in UTF-16 (whether 607big-endian or little-endian) or UTF-8, Pod parsers should do the 608same. Otherwise, the character encoding should be understood as 609being UTF-8 if the first highbit byte sequence in the file seems 610valid as a UTF-8 sequence, or otherwise as Latin-1. 611 612Future versions of this specification may specify 613how Pod can accept other encodings. Presumably treatment of other 614encodings in Pod parsing would be as in XML parsing: whatever the 615encoding declared by a particular Pod file, content is to be 616stored in memory as Unicode characters. 617 618=item * 619 620The well known Unicode Byte Order Marks are as follows: if the 621file begins with the two literal byte values 0xFE 0xFF, this is 622the BOM for big-endian UTF-16. If the file begins with the two 623literal byte value 0xFF 0xFE, this is the BOM for little-endian 624UTF-16. If the file begins with the three literal byte values 6250xEF 0xBB 0xBF, this is the BOM for UTF-8. 626 627=for comment 628 use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 629 0xEF 0xBB 0xBF 630 631=for comment 632 If toke.c is modified to support UTF-32, add mention of those here. 633 634=item * 635 636A naive but often sufficient heuristic for testing the first highbit 637byte-sequence in a BOM-less file (whether in code or in Pod!), to see 638whether that sequence is valid as UTF-8 (RFC 2279) is to check whether 639that the first byte in the sequence is in the range 0xC2 - 0xFD 640I<and> whether the next byte is in the range 6410x80 - 0xBF. If so, the parser may conclude that this file is in 642UTF-8, and all highbit sequences in the file should be assumed to 643be UTF-8. Otherwise the parser should treat the file as being 644in Latin-1. (A better check is to pass a copy of the sequence to 645L<utf8::decode()|utf8> which performs a full validity check on the 646sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This 647function is always pre-loaded, is fast because it is written in C, and 648will only get called at most once, so you don't need to avoid it out of 649performance concerns.) 650In the unlikely circumstance that the first highbit 651sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one 652can cater to our heuristic (as well as any more intelligent heuristic) 653by prefacing that line with a comment line containing a highbit 654sequence that is clearly I<not> valid as UTF-8. A line consisting 655of simply "#", an e-acute, and any non-highbit byte, 656is sufficient to establish this file's encoding. 657 658=for comment 659 If/WHEN some brave soul makes these heuristics into a generic 660 text-file class (or PerlIO layer?), we can presumably delete 661 mention of these icky details from this file, and can instead 662 tell people to just use appropriate class/layer. 663 Auto-recognition of newline sequences would be another desirable 664 feature of such a class/layer. 665 HINT HINT HINT. 666 667=for comment 668 "The probability that a string of characters 669 in any other encoding appears as valid UTF-8 is low" - RFC2279 670 671=item * 672 673This document's requirements and suggestions about encodings 674do not apply to Pod processors running on non-ASCII platforms, 675notably EBCDIC platforms. 676 677=item * 678 679Pod processors must treat a "=for [label] [content...]" paragraph as 680meaning the same thing as a "=begin [label]" paragraph, content, and 681an "=end [label]" paragraph. (The parser may conflate these two 682constructs, or may leave them distinct, in the expectation that the 683formatter will nevertheless treat them the same.) 684 685=item * 686 687When rendering Pod to a format that allows comments (i.e., to nearly 688any format other than plaintext), a Pod formatter must insert comment 689text identifying its name and version number, and the name and 690version numbers of any modules it might be using to process the Pod. 691Minimal examples: 692 693 %% POD::Pod2PS v3.14159, using POD::Parser v1.92 694 695 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> 696 697 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} 698 699 .\" Pod::Man version 3.14159, using POD::Parser version 1.92 700 701Formatters may also insert additional comments, including: the 702release date of the Pod formatter program, the contact address for 703the author(s) of the formatter, the current time, the name of input 704file, the formatting options in effect, version of Perl used, etc. 705 706Formatters may also choose to note errors/warnings as comments, 707besides or instead of emitting them otherwise (as in messages to 708STDERR, or C<die>ing). 709 710=item * 711 712Pod parsers I<may> emit warnings or error messages ("Unknown E code 713EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or 714C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow 715suppressing all such STDERR output, and instead allow an option for 716reporting errors/warnings 717in some other way, whether by triggering a callback, or noting errors 718in some attribute of the document object, or some similarly unobtrusive 719mechanism -- or even by appending a "Pod Errors" section to the end of 720the parsed form of the document. 721 722=item * 723 724In cases of exceptionally aberrant documents, Pod parsers may abort the 725parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where 726possible, the parser library may simply close the input file 727and add text like "*** Formatting Aborted ***" to the end of the 728(partial) in-memory document. 729 730=item * 731 732In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) 733are understood (i.e., I<not> verbatim paragraphs, but I<including> 734ordinary paragraphs, and command paragraphs that produce renderable 735text, like "=head1"), literal whitespace should generally be considered 736"insignificant", in that one literal space has the same meaning as any 737(nonzero) number of literal spaces, literal newlines, and literal tabs 738(as long as this produces no blank lines, since those would terminate 739the paragraph). Pod parsers should compact literal whitespace in each 740processed paragraph, but may provide an option for overriding this 741(since some processing tasks do not require it), or may follow 742additional special rules (for example, specially treating 743period-space-space or period-newline sequences). 744 745=item * 746 747Pod parsers should not, by default, try to coerce apostrophe (') and 748quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to 749turn backtick (`) into anything else but a single backtick character 750(distinct from an open quote character!), nor "--" into anything but 751two minus signs. They I<must never> do any of those things to text 752in CE<lt>...> formatting codes, and never I<ever> to text in verbatim 753paragraphs. 754 755=item * 756 757When rendering Pod to a format that has two kinds of hyphens (-), one 758that's a non-breaking hyphen, and another that's a breakable hyphen 759(as in "object-oriented", which can be split across lines as 760"object-", newline, "oriented"), formatters are encouraged to 761generally translate "-" to non-breaking hyphen, but may apply 762heuristics to convert some of these to breaking hyphens. 763 764=item * 765 766Pod formatters should make reasonable efforts to keep words of Perl 767code from being broken across lines. For example, "Foo::Bar" in some 768formatting systems is seen as eligible for being broken across lines 769as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should 770be avoided where possible, either by disabling all line-breaking in 771mid-word, or by wrapping particular words with internal punctuation 772in "don't break this across lines" codes (which in some formats may 773not be a single code, but might be a matter of inserting non-breaking 774zero-width spaces between every pair of characters in a word.) 775 776=item * 777 778Pod parsers should, by default, expand tabs in verbatim paragraphs as 779they are processed, before passing them to the formatter or other 780processor. Parsers may also allow an option for overriding this. 781 782=item * 783 784Pod parsers should, by default, remove newlines from the end of 785ordinary and verbatim paragraphs before passing them to the 786formatter. For example, while the paragraph you're reading now 787could be considered, in Pod source, to end with (and contain) 788the newline(s) that end it, it should be processed as ending with 789(and containing) the period character that ends this sentence. 790 791=item * 792 793Pod parsers, when reporting errors, should make some effort to report 794an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near 795line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph 796number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where 797this is problematic, the paragraph number should at least be 798accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in 799Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for 800the CE<lt>interest rate> attribute...'"). 801 802=item * 803 804Pod parsers, when processing a series of verbatim paragraphs one 805after another, should consider them to be one large verbatim 806paragraph that happens to contain blank lines. I.e., these two 807lines, which have a blank line between them: 808 809 use Foo; 810 811 print Foo->VERSION 812 813should be unified into one paragraph ("\tuse Foo;\n\n\tprint 814Foo->VERSION") before being passed to the formatter or other 815processor. Parsers may also allow an option for overriding this. 816 817While this might be too cumbersome to implement in event-based Pod 818parsers, it is straightforward for parsers that return parse trees. 819 820=item * 821 822Pod formatters, where feasible, are advised to avoid splitting short 823verbatim paragraphs (under twelve lines, say) across pages. 824 825=item * 826 827Pod parsers must treat a line with only spaces and/or tabs on it as a 828"blank line" such as separates paragraphs. (Some older parsers 829recognized only two adjacent newlines as a "blank line" but would not 830recognize a newline, a space, and a newline, as a blank line. This 831is noncompliant behavior.) 832 833=item * 834 835Authors of Pod formatters/processors should make every effort to 836avoid writing their own Pod parser. There are already several in 837CPAN, with a wide range of interface styles -- and one of them, 838Pod::Parser, comes with modern versions of Perl. 839 840=item * 841 842Characters in Pod documents may be conveyed either as literals, or by 843number in EE<lt>n> codes, or by an equivalent mnemonic, as in 844EE<lt>eacute> which is exactly equivalent to EE<lt>233>. 845 846Characters in the range 32-126 refer to those well known US-ASCII 847characters (also defined there by Unicode, with the same meaning), 848which all Pod formatters must render faithfully. Characters 849in the ranges 0-31 and 127-159 should not be used (neither as 850literals, nor as EE<lt>number> codes), except for the 851literal byte-sequences for newline (13, 13 10, or 10), and tab (9). 852 853Characters in the range 160-255 refer to Latin-1 characters (also 854defined there by Unicode, with the same meaning). Characters above 855255 should be understood to refer to Unicode characters. 856 857=item * 858 859Be warned 860that some formatters cannot reliably render characters outside 32-126; 861and many are able to handle 32-126 and 160-255, but nothing above 862255. 863 864=item * 865 866Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for 867less-than and greater-than, Pod parsers must understand "EE<lt>sol>" 868for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, 869pipe). Pod parsers should also understand "EE<lt>lchevron>" and 870"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., 871"left-pointing double angle quotation mark" = "left pointing 872guillemet" and "right-pointing double angle quotation mark" = "right 873pointing guillemet". (These look like little "<<" and ">>", and they 874are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" 875and "EE<lt>raquo>".) 876 877=item * 878 879Pod parsers should understand all "EE<lt>html>" codes as defined 880in the entity declarations in the most recent XHTML specification at 881C<www.W3.org>. Pod parsers must understand at least the entities 882that define characters in the range 160-255 (Latin-1). Pod parsers, 883when faced with some unknown "EE<lt>I<identifier>>" code, 884shouldn't simply replace it with nullstring (by default, at least), 885but may pass it through as a string consisting of the literal characters 886E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the 887alternative option of processing such unknown 888"EE<lt>I<identifier>>" codes by firing an event especially 889for such codes, or by adding a special node-type to the in-memory 890document tree. Such "EE<lt>I<identifier>>" may have special meaning 891to some processors, or some processors may choose to add them to 892a special error report. 893 894=item * 895 896Pod parsers must also support the XHTML codes "EE<lt>quot>" for 897character 34 (doublequote, "), "EE<lt>amp>" for character 38 898(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). 899 900=item * 901 902Note that in all cases of "EE<lt>whatever>", I<whatever> (whether 903an htmlname, or a number in any base) must consist only of 904alphanumeric characters -- that is, I<whatever> must watch 905C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because 906it contains spaces, which aren't alphanumeric characters. This 907presumably does not I<need> special treatment by a Pod processor; 908" 0 1 2 3 " doesn't look like a number in any base, so it would 909presumably be looked up in the table of HTML-like names. Since 910there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ", 911this will be treated as an error. However, Pod processors may 912treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically> 913invalid, potentially earning a different error message than the 914error message (or warning, or event) generated by a merely unknown 915(but theoretically valid) htmlname, as in "EE<lt>qacute>" 916[sic]. However, Pod parsers are not required to make this 917distinction. 918 919=item * 920 921Note that EE<lt>number> I<must not> be interpreted as simply 922"codepoint I<number> in the current/native character set". It always 923means only "the character represented by codepoint I<number> in 924Unicode." (This is identical to the semantics of &#I<number>; in XML.) 925 926This will likely require many formatters to have tables mapping from 927treatable Unicode codepoints (such as the "\xE9" for the e-acute 928character) to the escape sequences or codes necessary for conveying 929such sequences in the target output format. A converter to *roff 930would, for example know that "\xE9" (whether conveyed literally, or via 931a EE<lt>...> sequence) is to be conveyed as "e\\*'". 932Similarly, a program rendering Pod in a Mac OS application window, would 933presumably need to know that "\xE9" maps to codepoint 142 in MacRoman 934encoding that (at time of writing) is native for Mac OS. Such 935Unicode2whatever mappings are presumably already widely available for 936common output formats. (Such mappings may be incomplete! Implementers 937are not expected to bend over backwards in an attempt to render 938Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any 939of the other weird things that Unicode can encode.) And 940if a Pod document uses a character not found in such a mapping, the 941formatter should consider it an unrenderable character. 942 943=item * 944 945If, surprisingly, the implementor of a Pod formatter can't find a 946satisfactory pre-existing table mapping from Unicode characters to 947escapes in the target format (e.g., a decent table of Unicode 948characters to *roff escapes), it will be necessary to build such a 949table. If you are in this circumstance, you should begin with the 950characters in the range 0x00A0 - 0x00FF, which is mostly the heavily 951used accented characters. Then proceed (as patience permits and 952fastidiousness compels) through the characters that the (X)HTML 953standards groups judged important enough to merit mnemonics 954for. These are declared in the (X)HTML specifications at the 955www.W3.org site. At time of writing (September 2001), the most recent 956entity declaration files are: 957 958 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 959 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 960 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent 961 962Then you can progress through any remaining notable Unicode characters 963in the range 0x2000-0x204D (consult the character tables at 964www.unicode.org), and whatever else strikes your fancy. For example, 965in F<xhtml-symbol.ent>, there is the entry: 966 967 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> 968 969While the mapping "infin" to the character "\x{221E}" will (hopefully) 970have been already handled by the Pod parser, the presence of the 971character in this file means that it's reasonably important enough to 972include in a formatter's table that maps from notable Unicode characters 973to the codes necessary for rendering them. So for a Unicode-to-*roff 974mapping, for example, this would merit the entry: 975 976 "\x{221E}" => '\(in', 977 978It is eagerly hoped that in the future, increasing numbers of formats 979(and formatters) will support Unicode characters directly (as (X)HTML 980does with C<∞>, C<∞>, or C<∞>), reducing the need 981for idiosyncratic mappings of Unicode-to-I<my_escapes>. 982 983=item * 984 985It is up to individual Pod formatter to display good judgement when 986confronted with an unrenderable character (which is distinct from an 987unknown EE<lt>thing> sequence that the parser couldn't resolve to 988anything, renderable or not). It is good practice to map Latin letters 989with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding 990unaccented US-ASCII letters (like a simple character 101, "e"), but 991clearly this is often not feasible, and an unrenderable character may 992be represented as "?", or the like. In attempting a sane fallback 993(as from EE<lt>233> to "e"), Pod formatters may use the 994%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or 995L<Text::Unidecode|Text::Unidecode>, if available. 996 997For example, this Pod text: 998 999 magic is enabled if you set C<$Currency> to 'E<euro>'. 1000 1001may be rendered as: 1002"magic is enabled if you set C<$Currency> to 'I<?>'" or as 1003"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as 1004"magic is enabled if you set C<$Currency> to '[x20AC]', etc. 1005 1006A Pod formatter may also note, in a comment or warning, a list of what 1007unrenderable characters were encountered. 1008 1009=item * 1010 1011EE<lt>...> may freely appear in any formatting code (other than 1012in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The 1013EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The 1014EE<lt>euro>1,000,000 Solution|Million::Euros>". 1015 1016=item * 1017 1018Some Pod formatters output to formats that implement non-breaking 1019spaces as an individual character (which I'll call "NBSP"), and 1020others output to formats that implement non-breaking spaces just as 1021spaces wrapped in a "don't break this across lines" code. Note that 1022at the level of Pod, both sorts of codes can occur: Pod can contain a 1023NBSP character (whether as a literal, or as a "EE<lt>160>" or 1024"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo 1025IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in 1026such codes are taken to represent non-breaking spaces. Pod 1027parsers should consider supporting the optional parsing of "SE<lt>foo 1028IE<lt>barE<gt> baz>" as if it were 1029"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the 1030optional parsing of groups of words joined by NBSP's as if each group 1031were in a SE<lt>...> code, so that formatters may use the 1032representation that maps best to what the output format demands. 1033 1034=item * 1035 1036Some processors may find that the C<SE<lt>...E<gt>> code is easiest to 1037implement by replacing each space in the parse tree under the content 1038of the S, with an NBSP. But note: the replacement should apply I<not> to 1039spaces in I<all> text, but I<only> to spaces in I<printable> text. (This 1040distinction may or may not be evident in the particular tree/event 1041model implemented by the Pod parser.) For example, consider this 1042unusual case: 1043 1044 S<L</Autoloaded Functions>> 1045 1046This means that the space in the middle of the visible link text must 1047not be broken across lines. In other words, it's the same as this: 1048 1049 L<"AutoloadedE<160>Functions"/Autoloaded Functions> 1050 1051However, a misapplied space-to-NBSP replacement could (wrongly) 1052produce something equivalent to this: 1053 1054 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> 1055 1056...which is almost definitely not going to work as a hyperlink (assuming 1057this formatter outputs a format supporting hypertext). 1058 1059Formatters may choose to just not support the S format code, 1060especially in cases where the output format simply has no NBSP 1061character/code and no code for "don't break this stuff across lines". 1062 1063=item * 1064 1065Besides the NBSP character discussed above, implementors are reminded 1066of the existence of the other "special" character in Latin-1, the 1067"soft hyphen" character, also known as "discretionary hyphen", 1068i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = 1069C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation 1070point. That is, it normally renders as nothing, but may render as a 1071"-" if a formatter breaks the word at that point. Pod formatters 1072should, as appropriate, do one of the following: 1) render this with 1073a code with the same meaning (e.g., "\-" in RTF), 2) pass it through 1074in the expectation that the formatter understands this character as 1075such, or 3) delete it. 1076 1077For example: 1078 1079 sigE<shy>action 1080 manuE<shy>script 1081 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi 1082 1083These signal to a formatter that if it is to hyphenate "sigaction" 1084or "manuscript", then it should be done as 1085"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" 1086(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't 1087show up at all). And if it is 1088to hyphenate "Jarkko" and/or "Hietaniemi", it can do 1089so only at the points where there is a C<EE<lt>shyE<gt>> code. 1090 1091In practice, it is anticipated that this character will not be used 1092often, but formatters should either support it, or delete it. 1093 1094=item * 1095 1096If you think that you want to add a new command to Pod (like, say, a 1097"=biblio" command), consider whether you could get the same 1098effect with a for or begin/end sequence: "=for biblio ..." or "=begin 1099biblio" ... "=end biblio". Pod processors that don't understand 1100"=for biblio", etc, will simply ignore it, whereas they may complain 1101loudly if they see "=biblio". 1102 1103=item * 1104 1105Throughout this document, "Pod" has been the preferred spelling for 1106the name of the documentation format. One may also use "POD" or 1107"pod". For the documentation that is (typically) in the Pod 1108format, you may use "pod", or "Pod", or "POD". Understanding these 1109distinctions is useful; but obsessing over how to spell them, usually 1110is not. 1111 1112=back 1113 1114 1115 1116 1117 1118=head1 About LE<lt>...E<gt> Codes 1119 1120As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> 1121code is the most complex of the Pod formatting codes. The points below 1122will hopefully clarify what it means and how processors should deal 1123with it. 1124 1125=over 1126 1127=item * 1128 1129In parsing an LE<lt>...> code, Pod parsers must distinguish at least 1130four attributes: 1131 1132=over 1133 1134=item First: 1135 1136The link-text. If there is none, this must be undef. (E.g., in 1137"LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". 1138In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no 1139link text. Note that link text may contain formatting.) 1140 1141=item Second: 1142 1143The possibly inferred link-text; i.e., if there was no real link 1144text, then this is the text that we'll infer in its place. (E.g., for 1145"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) 1146 1147=item Third: 1148 1149The name or URL, or undef if none. (E.g., in "LE<lt>Perl 1150Functions|perlfunc>", the name (also sometimes called the page) 1151is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) 1152 1153=item Fourth: 1154 1155The section (AKA "item" in older perlpods), or undef if none. E.g., 1156in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note 1157that this is not the same as a manpage section like the "5" in "man 5 1158crontab". "Section Foo" in the Pod sense means the part of the text 1159that's introduced by the heading or item whose text is "Foo".) 1160 1161=back 1162 1163Pod parsers may also note additional attributes including: 1164 1165=over 1166 1167=item Fifth: 1168 1169A flag for whether item 3 (if present) is a URL (like 1170"http://lists.perl.org" is), in which case there should be no section 1171attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or 1172possibly a man page name (like "crontab(5)" is). 1173 1174=item Sixth: 1175 1176The raw original LE<lt>...> content, before text is split on 1177"|", "/", etc, and before EE<lt>...> codes are expanded. 1178 1179=back 1180 1181(The above were numbered only for concise reference below. It is not 1182a requirement that these be passed as an actual list or array.) 1183 1184For example: 1185 1186 L<Foo::Bar> 1187 => undef, # link text 1188 "Foo::Bar", # possibly inferred link text 1189 "Foo::Bar", # name 1190 undef, # section 1191 'pod', # what sort of link 1192 "Foo::Bar" # original content 1193 1194 L<Perlport's section on NL's|perlport/Newlines> 1195 => "Perlport's section on NL's", # link text 1196 "Perlport's section on NL's", # possibly inferred link text 1197 "perlport", # name 1198 "Newlines", # section 1199 'pod', # what sort of link 1200 "Perlport's section on NL's|perlport/Newlines" 1201 # original content 1202 1203 L<perlport/Newlines> 1204 => undef, # link text 1205 '"Newlines" in perlport', # possibly inferred link text 1206 "perlport", # name 1207 "Newlines", # section 1208 'pod', # what sort of link 1209 "perlport/Newlines" # original content 1210 1211 L<crontab(5)/"DESCRIPTION"> 1212 => undef, # link text 1213 '"DESCRIPTION" in crontab(5)', # possibly inferred link text 1214 "crontab(5)", # name 1215 "DESCRIPTION", # section 1216 'man', # what sort of link 1217 'crontab(5)/"DESCRIPTION"' # original content 1218 1219 L</Object Attributes> 1220 => undef, # link text 1221 '"Object Attributes"', # possibly inferred link text 1222 undef, # name 1223 "Object Attributes", # section 1224 'pod', # what sort of link 1225 "/Object Attributes" # original content 1226 1227 L<http://www.perl.org/> 1228 => undef, # link text 1229 "http://www.perl.org/", # possibly inferred link text 1230 "http://www.perl.org/", # name 1231 undef, # section 1232 'url', # what sort of link 1233 "http://www.perl.org/" # original content 1234 1235 L<Perl.org|http://www.perl.org/> 1236 => "Perl.org", # link text 1237 "http://www.perl.org/", # possibly inferred link text 1238 "http://www.perl.org/", # name 1239 undef, # section 1240 'url', # what sort of link 1241 "Perl.org|http://www.perl.org/" # original content 1242 1243Note that you can distinguish URL-links from anything else by the 1244fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So 1245C<LE<lt>http://www.perl.comE<gt>> is a URL, but 1246C<LE<lt>HTTP::ResponseE<gt>> isn't. 1247 1248=item * 1249 1250In case of LE<lt>...> codes with no "text|" part in them, 1251older formatters have exhibited great variation in actually displaying 1252the link or cross reference. For example, LE<lt>crontab(5)> would render 1253as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" 1254or just "C<crontab(5)>". 1255 1256Pod processors must now treat "text|"-less links as follows: 1257 1258 L<name> => L<name|name> 1259 L</section> => L<"section"|/section> 1260 L<name/section> => L<"section" in name|name/section> 1261 1262=item * 1263 1264Note that section names might contain markup. I.e., if a section 1265starts with: 1266 1267 =head2 About the C<-M> Operator 1268 1269or with: 1270 1271 =item About the C<-M> Operator 1272 1273then a link to it would look like this: 1274 1275 L<somedoc/About the C<-M> Operator> 1276 1277Formatters may choose to ignore the markup for purposes of resolving 1278the link and use only the renderable characters in the section name, 1279as in: 1280 1281 <h1><a name="About_the_-M_Operator">About the <code>-M</code> 1282 Operator</h1> 1283 1284 ... 1285 1286 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> 1287 Operator" in somedoc</a> 1288 1289=item * 1290 1291Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> 1292links from C<LE<lt>name/itemE<gt>> links (and their targets). These 1293have been merged syntactically and semantically in the current 1294specification, and I<section> can refer either to a "=headI<n> Heading 1295Content" command or to a "=item Item Content" command. This 1296specification does not specify what behavior should be in the case 1297of a given document having several things all seeming to produce the 1298same I<section> identifier (e.g., in HTML, several things all producing 1299the same I<anchorname> in <a name="I<anchorname>">...</a> 1300elements). Where Pod processors can control this behavior, they should 1301use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the 1302I<first> "Bar" section in Foo. 1303 1304But for some processors/formats this cannot be easily controlled; as 1305with the HTML example, the behavior of multiple ambiguous 1306<a name="I<anchorname>">...</a> is most easily just left up to 1307browsers to decide. 1308 1309=item * 1310 1311In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes 1312for formatting or for EE<lt>...> escapes, as in: 1313 1314 L<B<ummE<234>stuff>|...> 1315 1316For C<LE<lt>...E<gt>> codes without a "name|" part, only 1317C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is, 1318authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". 1319 1320Note, however, that formatting codes and ZE<lt>>'s can occur in any 1321and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, 1322and I<url>). 1323 1324Authors must not nest LE<lt>...> codes. For example, "LE<lt>The 1325LE<lt>Foo::Bar> man page>" should be treated as an error. 1326 1327=item * 1328 1329Note that Pod authors may use formatting codes inside the "text" 1330part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). 1331 1332In other words, this is valid: 1333 1334 Go read L<the docs on C<$.>|perlvar/"$."> 1335 1336Some output formats that do allow rendering "LE<lt>...>" codes as 1337hypertext, might not allow the link-text to be formatted; in 1338that case, formatters will have to just ignore that formatting. 1339 1340=item * 1341 1342At time of writing, C<LE<lt>nameE<gt>> values are of two types: 1343either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which 1344might be a real Perl module or program in an @INC / PATH 1345directory, or a .pod file in those places); or the name of a Unix 1346man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> 1347in ambiguous between a Pod page called "chmod", or the Unix man page 1348"chmod" (in whatever man-section). However, the presence of a string 1349in parens, as in "crontab(5)", is sufficient to signal that what 1350is being discussed is not a Pod page, and so is presumably a 1351Unix man page. The distinction is of no importance to many 1352Pod processors, but some processors that render to hypertext formats 1353may need to distinguish them in order to know how to render a 1354given C<LE<lt>fooE<gt>> code. 1355 1356=item * 1357 1358Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in 1359C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from 1360C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only 1361slightly less ambiguous. This syntax is no longer in the specification, and 1362has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was 1363formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>> 1364syntax, for a while at least. The suggested heuristic for distinguishing 1365C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any 1366whitespace, it's a I<section>. Pod processors should warn about this being 1367deprecated syntax. 1368 1369=back 1370 1371=head1 About =over...=back Regions 1372 1373"=over"..."=back" regions are used for various kinds of list-like 1374structures. (I use the term "region" here simply as a collective 1375term for everything from the "=over" to the matching "=back".) 1376 1377=over 1378 1379=item * 1380 1381The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... 1382"=back" is used for giving the formatter a clue as to how many 1383"spaces" (ems, or roughly equivalent units) it should tab over, 1384although many formatters will have to convert this to an absolute 1385measurement that may not exactly match with the size of spaces (or M's) 1386in the document's base font. Other formatters may have to completely 1387ignore the number. The lack of any explicit I<indentlevel> parameter is 1388equivalent to an I<indentlevel> value of 4. Pod processors may 1389complain if I<indentlevel> is present but is not a positive number 1390matching C<m/\A(\d*\.)?\d+\z/>. 1391 1392=item * 1393 1394Authors of Pod formatters are reminded that "=over" ... "=back" may 1395map to several different constructs in your output format. For 1396example, in converting Pod to (X)HTML, it can map to any of 1397<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or 1398<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or 1399<dt>. 1400 1401=item * 1402 1403Each "=over" ... "=back" region should be one of the following: 1404 1405=over 1406 1407=item * 1408 1409An "=over" ... "=back" region containing only "=item *" commands, 1410each followed by some number of ordinary/verbatim paragraphs, other 1411nested "=over" ... "=back" regions, "=for..." paragraphs, and 1412"=begin"..."=end" regions. 1413 1414(Pod processors must tolerate a bare "=item" as if it were "=item 1415*".) Whether "*" is rendered as a literal asterisk, an "o", or as 1416some kind of real bullet character, is left up to the Pod formatter, 1417and may depend on the level of nesting. 1418 1419=item * 1420 1421An "=over" ... "=back" region containing only 1422C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) 1423followed by some number of ordinary/verbatim paragraphs, other nested 1424"=over" ... "=back" regions, "=for..." paragraphs, and/or 1425"=begin"..."=end" codes. Note that the numbers must start at 1 1426in each section, and must proceed in order and without skipping 1427numbers. 1428 1429(Pod processors must tolerate lines like "=item 1" as if they were 1430"=item 1.", with the period.) 1431 1432=item * 1433 1434An "=over" ... "=back" region containing only "=item [text]" 1435commands, each one (or each group of them) followed by some number of 1436ordinary/verbatim paragraphs, other nested "=over" ... "=back" 1437regions, or "=for..." paragraphs, and "=begin"..."=end" regions. 1438 1439The "=item [text]" paragraph should not match 1440C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it 1441match just C<m/\A=item\s*\z/>. 1442 1443=item * 1444 1445An "=over" ... "=back" region containing no "=item" paragraphs at 1446all, and containing only some number of 1447ordinary/verbatim paragraphs, and possibly also some nested "=over" 1448... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" 1449regions. Such an itemless "=over" ... "=back" region in Pod is 1450equivalent in meaning to a "<blockquote>...</blockquote>" element in 1451HTML. 1452 1453=back 1454 1455Note that with all the above cases, you can determine which type of 1456"=over" ... "=back" you have, by examining the first (non-"=cut", 1457non-"=pod") Pod paragraph after the "=over" command. 1458 1459=item * 1460 1461Pod formatters I<must> tolerate arbitrarily large amounts of text 1462in the "=item I<text...>" paragraph. In practice, most such 1463paragraphs are short, as in: 1464 1465 =item For cutting off our trade with all parts of the world 1466 1467But they may be arbitrarily long: 1468 1469 =item For transporting us beyond seas to be tried for pretended 1470 offenses 1471 1472 =item He is at this time transporting large armies of foreign 1473 mercenaries to complete the works of death, desolation and 1474 tyranny, already begun with circumstances of cruelty and perfidy 1475 scarcely paralleled in the most barbarous ages, and totally 1476 unworthy the head of a civilized nation. 1477 1478=item * 1479 1480Pod processors should tolerate "=item *" / "=item I<number>" commands 1481with no accompanying paragraph. The middle item is an example: 1482 1483 =over 1484 1485 =item 1 1486 1487 Pick up dry cleaning. 1488 1489 =item 2 1490 1491 =item 3 1492 1493 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. 1494 1495 =back 1496 1497=item * 1498 1499No "=over" ... "=back" region can contain headings. Processors may 1500treat such a heading as an error. 1501 1502=item * 1503 1504Note that an "=over" ... "=back" region should have some 1505content. That is, authors should not have an empty region like this: 1506 1507 =over 1508 1509 =back 1510 1511Pod processors seeing such a contentless "=over" ... "=back" region, 1512may ignore it, or may report it as an error. 1513 1514=item * 1515 1516Processors must tolerate an "=over" list that goes off the end of the 1517document (i.e., which has no matching "=back"), but they may warn 1518about such a list. 1519 1520=item * 1521 1522Authors of Pod formatters should note that this construct: 1523 1524 =item Neque 1525 1526 =item Porro 1527 1528 =item Quisquam Est 1529 1530 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1531 velit, sed quia non numquam eius modi tempora incidunt ut 1532 labore et dolore magnam aliquam quaerat voluptatem. 1533 1534 =item Ut Enim 1535 1536is semantically ambiguous, in a way that makes formatting decisions 1537a bit difficult. On the one hand, it could be mention of an item 1538"Neque", mention of another item "Porro", and mention of another 1539item "Quisquam Est", with just the last one requiring the explanatory 1540paragraph "Qui dolorem ipsum quia dolor..."; and then an item 1541"Ut Enim". In that case, you'd want to format it like so: 1542 1543 Neque 1544 1545 Porro 1546 1547 Quisquam Est 1548 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1549 velit, sed quia non numquam eius modi tempora incidunt ut 1550 labore et dolore magnam aliquam quaerat voluptatem. 1551 1552 Ut Enim 1553 1554But it could equally well be a discussion of three (related or equivalent) 1555items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph 1556explaining them all, and then a new item "Ut Enim". In that case, you'd 1557probably want to format it like so: 1558 1559 Neque 1560 Porro 1561 Quisquam Est 1562 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1563 velit, sed quia non numquam eius modi tempora incidunt ut 1564 labore et dolore magnam aliquam quaerat voluptatem. 1565 1566 Ut Enim 1567 1568But (for the foreseeable future), Pod does not provide any way for Pod 1569authors to distinguish which grouping is meant by the above 1570"=item"-cluster structure. So formatters should format it like so: 1571 1572 Neque 1573 1574 Porro 1575 1576 Quisquam Est 1577 1578 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1579 velit, sed quia non numquam eius modi tempora incidunt ut 1580 labore et dolore magnam aliquam quaerat voluptatem. 1581 1582 Ut Enim 1583 1584That is, there should be (at least roughly) equal spacing between 1585items as between paragraphs (although that spacing may well be less 1586than the full height of a line of text). This leaves it to the reader 1587to use (con)textual cues to figure out whether the "Qui dolorem 1588ipsum..." paragraph applies to the "Quisquam Est" item or to all three 1589items "Neque", "Porro", and "Quisquam Est". While not an ideal 1590situation, this is preferable to providing formatting cues that may 1591be actually contrary to the author's intent. 1592 1593=back 1594 1595 1596 1597=head1 About Data Paragraphs and "=begin/=end" Regions 1598 1599Data paragraphs are typically used for inlining non-Pod data that is 1600to be used (typically passed through) when rendering the document to 1601a specific format: 1602 1603 =begin rtf 1604 1605 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1606 1607 =end rtf 1608 1609The exact same effect could, incidentally, be achieved with a single 1610"=for" paragraph: 1611 1612 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1613 1614(Although that is not formally a data paragraph, it has the same 1615meaning as one, and Pod parsers may parse it as one.) 1616 1617Another example of a data paragraph: 1618 1619 =begin html 1620 1621 I like <em>PIE</em>! 1622 1623 <hr>Especially pecan pie! 1624 1625 =end html 1626 1627If these were ordinary paragraphs, the Pod parser would try to 1628expand the "EE<lt>/em>" (in the first paragraph) as a formatting 1629code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this 1630is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> 1631the identifier "html" doesn't begin have a ":" prefix, the contents 1632of this region are stored as data paragraphs, instead of being 1633processed as ordinary paragraphs (or if they began with a spaces 1634and/or tabs, as verbatim paragraphs). 1635 1636As a further example: At time of writing, no "biblio" identifier is 1637supported, but suppose some processor were written to recognize it as 1638a way of (say) denoting a bibliographic reference (necessarily 1639containing formatting codes in ordinary paragraphs). The fact that 1640"biblio" paragraphs were meant for ordinary processing would be 1641indicated by prefacing each "biblio" identifier with a colon: 1642 1643 =begin :biblio 1644 1645 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1646 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1647 1648 =end :biblio 1649 1650This would signal to the parser that paragraphs in this begin...end 1651region are subject to normal handling as ordinary/verbatim paragraphs 1652(while still tagged as meant only for processors that understand the 1653"biblio" identifier). The same effect could be had with: 1654 1655 =for :biblio 1656 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1657 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1658 1659The ":" on these identifiers means simply "process this stuff 1660normally, even though the result will be for some special target". 1661I suggest that parser APIs report "biblio" as the target identifier, 1662but also report that it had a ":" prefix. (And similarly, with the 1663above "html", report "html" as the target identifier, and note the 1664I<lack> of a ":" prefix.) 1665 1666Note that a "=begin I<identifier>"..."=end I<identifier>" region where 1667I<identifier> begins with a colon, I<can> contain commands. For example: 1668 1669 =begin :biblio 1670 1671 Wirth's classic is available in several editions, including: 1672 1673 =for comment 1674 hm, check abebooks.com for how much used copies cost. 1675 1676 =over 1677 1678 =item 1679 1680 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1681 Teubner, Stuttgart. [Yes, it's in German.] 1682 1683 =item 1684 1685 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1686 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1687 1688 =back 1689 1690 =end :biblio 1691 1692Note, however, a "=begin I<identifier>"..."=end I<identifier>" 1693region where I<identifier> does I<not> begin with a colon, should not 1694directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", 1695nor "=item". For example, this may be considered invalid: 1696 1697 =begin somedata 1698 1699 This is a data paragraph. 1700 1701 =head1 Don't do this! 1702 1703 This is a data paragraph too. 1704 1705 =end somedata 1706 1707A Pod processor may signal that the above (specifically the "=head1" 1708paragraph) is an error. Note, however, that the following should 1709I<not> be treated as an error: 1710 1711 =begin somedata 1712 1713 This is a data paragraph. 1714 1715 =cut 1716 1717 # Yup, this isn't Pod anymore. 1718 sub excl { (rand() > .5) ? "hoo!" : "hah!" } 1719 1720 =pod 1721 1722 This is a data paragraph too. 1723 1724 =end somedata 1725 1726And this too is valid: 1727 1728 =begin someformat 1729 1730 This is a data paragraph. 1731 1732 And this is a data paragraph. 1733 1734 =begin someotherformat 1735 1736 This is a data paragraph too. 1737 1738 And this is a data paragraph too. 1739 1740 =begin :yetanotherformat 1741 1742 =head2 This is a command paragraph! 1743 1744 This is an ordinary paragraph! 1745 1746 And this is a verbatim paragraph! 1747 1748 =end :yetanotherformat 1749 1750 =end someotherformat 1751 1752 Another data paragraph! 1753 1754 =end someformat 1755 1756The contents of the above "=begin :yetanotherformat" ... 1757"=end :yetanotherformat" region I<aren't> data paragraphs, because 1758the immediately containing region's identifier (":yetanotherformat") 1759begins with a colon. In practice, most regions that contain 1760data paragraphs will contain I<only> data paragraphs; however, 1761the above nesting is syntactically valid as Pod, even if it is 1762rare. However, the handlers for some formats, like "html", 1763will accept only data paragraphs, not nested regions; and they may 1764complain if they see (targeted for them) nested regions, or commands, 1765other than "=end", "=pod", and "=cut". 1766 1767Also consider this valid structure: 1768 1769 =begin :biblio 1770 1771 Wirth's classic is available in several editions, including: 1772 1773 =over 1774 1775 =item 1776 1777 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1778 Teubner, Stuttgart. [Yes, it's in German.] 1779 1780 =item 1781 1782 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1783 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1784 1785 =back 1786 1787 Buy buy buy! 1788 1789 =begin html 1790 1791 <img src='wirth_spokesmodeling_book.png'> 1792 1793 <hr> 1794 1795 =end html 1796 1797 Now now now! 1798 1799 =end :biblio 1800 1801There, the "=begin html"..."=end html" region is nested inside 1802the larger "=begin :biblio"..."=end :biblio" region. Note that the 1803content of the "=begin html"..."=end html" region is data 1804paragraph(s), because the immediately containing region's identifier 1805("html") I<doesn't> begin with a colon. 1806 1807Pod parsers, when processing a series of data paragraphs one 1808after another (within a single region), should consider them to 1809be one large data paragraph that happens to contain blank lines. So 1810the content of the above "=begin html"..."=end html" I<may> be stored 1811as two data paragraphs (one consisting of 1812"<img src='wirth_spokesmodeling_book.png'>\n" 1813and another consisting of "<hr>\n"), but I<should> be stored as 1814a single data paragraph (consisting of 1815"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). 1816 1817Pod processors should tolerate empty 1818"=begin I<something>"..."=end I<something>" regions, 1819empty "=begin :I<something>"..."=end :I<something>" regions, and 1820contentless "=for I<something>" and "=for :I<something>" 1821paragraphs. I.e., these should be tolerated: 1822 1823 =for html 1824 1825 =begin html 1826 1827 =end html 1828 1829 =begin :biblio 1830 1831 =end :biblio 1832 1833Incidentally, note that there's no easy way to express a data 1834paragraph starting with something that looks like a command. Consider: 1835 1836 =begin stuff 1837 1838 =shazbot 1839 1840 =end stuff 1841 1842There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data 1843paragraph "=shazbot\n". However, you can express a data paragraph consisting 1844of "=shazbot\n" using this code: 1845 1846 =for stuff =shazbot 1847 1848The situation where this is necessary, is presumably quite rare. 1849 1850Note that =end commands must match the currently open =begin command. That 1851is, they must properly nest. For example, this is valid: 1852 1853 =begin outer 1854 1855 X 1856 1857 =begin inner 1858 1859 Y 1860 1861 =end inner 1862 1863 Z 1864 1865 =end outer 1866 1867while this is invalid: 1868 1869 =begin outer 1870 1871 X 1872 1873 =begin inner 1874 1875 Y 1876 1877 =end outer 1878 1879 Z 1880 1881 =end inner 1882 1883This latter is improper because when the "=end outer" command is seen, the 1884currently open region has the formatname "inner", not "outer". (It just 1885happens that "outer" is the format name of a higher-up region.) This is 1886an error. Processors must by default report this as an error, and may halt 1887processing the document containing that error. A corollary of this is that 1888regions cannot "overlap". That is, the latter block above does not represent 1889a region called "outer" which contains X and Y, overlapping a region called 1890"inner" which contains Y and Z. But because it is invalid (as all 1891apparently overlapping regions would be), it doesn't represent that, or 1892anything at all. 1893 1894Similarly, this is invalid: 1895 1896 =begin thing 1897 1898 =end hting 1899 1900This is an error because the region is opened by "thing", and the "=end" 1901tries to close "hting" [sic]. 1902 1903This is also invalid: 1904 1905 =begin thing 1906 1907 =end 1908 1909This is invalid because every "=end" command must have a formatname 1910parameter. 1911 1912=head1 SEE ALSO 1913 1914L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, 1915L<podchecker> 1916 1917=head1 AUTHOR 1918 1919Sean M. Burke 1920 1921=cut 1922 1923 1924