1 2=head1 NAME 3 4perlpodspec - Plain Old Documentation: format specification and notes 5 6=head1 DESCRIPTION 7 8This document is detailed notes on the Pod markup language. Most 9people will only have to read L<perlpod|perlpod> to know how to write 10in Pod, but this document may answer some incidental questions to do 11with parsing and rendering Pod. 12 13In this document, "must" / "must not", "should" / 14"should not", and "may" have their conventional (cf. RFC 2119) 15meanings: "X must do Y" means that if X doesn't do Y, it's against 16this specification, and should really be fixed. "X should do Y" 17means that it's recommended, but X may fail to do Y, if there's a 18good reason. "X may do Y" is merely a note that X can do Y at 19will (although it is up to the reader to detect any connotation of 20"and I think it would be I<nice> if X did Y" versus "it wouldn't 21really I<bother> me if X did Y"). 22 23Notably, when I say "the parser should do Y", the 24parser may fail to do Y, if the calling application explicitly 25requests that the parser I<not> do Y. I often phrase this as 26"the parser should, by default, do Y." This doesn't I<require> 27the parser to provide an option for turning off whatever 28feature Y is (like expanding tabs in verbatim paragraphs), although 29it implicates that such an option I<may> be provided. 30 31=head1 Pod Definitions 32 33Pod is embedded in files, typically Perl source files, although you 34can write a file that's nothing but Pod. 35 36A B<line> in a file consists of zero or more non-newline characters, 37terminated by either a newline or the end of the file. 38 39A B<newline sequence> is usually a platform-dependent concept, but 40Pod parsers should understand it to mean any of CR (ASCII 13), LF 41(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in 42addition to any other system-specific meaning. The first CR/CRLF/LF 43sequence in the file may be used as the basis for identifying the 44newline sequence for parsing the rest of the file. 45 46A B<blank line> is a line consisting entirely of zero or more spaces 47(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. 48A B<non-blank line> is a line containing one or more characters other 49than space or tab (and terminated by a newline or end-of-file). 50 51(I<Note:> Many older Pod parsers did not accept a line consisting of 52spaces/tabs and then a newline as a blank line. The only lines they 53considered blank were lines consisting of I<no characters at all>, 54terminated by a newline.) 55 56B<Whitespace> is used in this document as a blanket term for spaces, 57tabs, and newline sequences. (By itself, this term usually refers 58to literal whitespace. That is, sequences of whitespace characters 59in Pod source, as opposed to "EE<lt>32>", which is a formatting 60code that I<denotes> a whitespace character.) 61 62A B<Pod parser> is a module meant for parsing Pod (regardless of 63whether this involves calling callbacks or building a parse tree or 64directly formatting it). A B<Pod formatter> (or B<Pod translator>) 65is a module or program that converts Pod to some other format (HTML, 66plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a 67formatter or translator, or might be a program that does something 68else with the Pod (like counting words, scanning for index points, 69etc.). 70 71Pod content is contained in B<Pod blocks>. A Pod block starts with a 72line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line 73that matches C<m/\A=cut/> or up to the end of the file if there is 74no C<m/\A=cut/> line. 75 76=for comment 77 The current perlsyn says: 78 [beginquote] 79 Note that pod translators should look at only paragraphs beginning 80 with a pod directive (it makes parsing easier), whereas the compiler 81 actually knows to look for pod escapes even in the middle of a 82 paragraph. This means that the following secret stuff will be ignored 83 by both the compiler and the translators. 84 $a=3; 85 =secret stuff 86 warn "Neither POD nor CODE!?" 87 =cut back 88 print "got $a\n"; 89 You probably shouldn't rely upon the warn() being podded out forever. 90 Not all pod translators are well-behaved in this regard, and perhaps 91 the compiler will become pickier. 92 [endquote] 93 I think that those paragraphs should just be removed; paragraph-based 94 parsing seems to have been largely abandoned, because of the hassle 95 with non-empty blank lines messing up what people meant by "paragraph". 96 Even if the "it makes parsing easier" bit were especially true, 97 it wouldn't be worth the confusion of having perl and pod2whatever 98 actually disagree on what can constitute a Pod block. 99 100Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph 101consists of non-blank lines of text, separated by one or more blank 102lines. 103 104For purposes of Pod processing, there are four types of paragraphs in 105a Pod block: 106 107=over 108 109=item * 110 111A command paragraph (also called a "directive"). The first line of 112this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are 113typically one line, as in: 114 115 =head1 NOTES 116 117 =item * 118 119But they may span several (non-blank) lines: 120 121 =for comment 122 Hm, I wonder what it would look like if 123 you tried to write a BNF for Pod from this. 124 125 =head3 Dr. Strangelove, or: How I Learned to 126 Stop Worrying and Love the Bomb 127 128I<Some> command paragraphs allow formatting codes in their content 129(i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: 130 131 =head1 Did You Remember to C<use strict;>? 132 133In other words, the Pod processing handler for "head1" will apply the 134same processing to "Did You Remember to CE<lt>use strict;>?" that it 135would to an ordinary paragraph (i.e., formatting codes like 136"CE<lt>...>") are parsed and presumably formatted appropriately, and 137whitespace in the form of literal spaces and/or tabs is not 138significant. 139 140=item * 141 142A B<verbatim paragraph>. The first line of this paragraph must be a 143literal space or tab, and this paragraph must not be inside a "=begin 144I<identifier>", ... "=end I<identifier>" sequence unless 145"I<identifier>" begins with a colon (":"). That is, if a paragraph 146starts with a literal space or tab, but I<is> inside a 147"=begin I<identifier>", ... "=end I<identifier>" region, then it's 148a data paragraph, unless "I<identifier>" begins with a colon. 149 150Whitespace I<is> significant in verbatim paragraphs (although, in 151processing, tabs are probably expanded). 152 153=item * 154 155An B<ordinary paragraph>. A paragraph is an ordinary paragraph 156if its first line matches neither C<m/\A=[a-zA-Z]/> nor 157C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", 158... "=end I<identifier>" sequence unless "I<identifier>" begins with 159a colon (":"). 160 161=item * 162 163A B<data paragraph>. This is a paragraph that I<is> inside a "=begin 164I<identifier>" ... "=end I<identifier>" sequence where 165"I<identifier>" does I<not> begin with a literal colon (":"). In 166some sense, a data paragraph is not part of Pod at all (i.e., 167effectively it's "out-of-band"), since it's not subject to most kinds 168of Pod parsing; but it is specified here, since Pod 169parsers need to be able to call an event for it, or store it in some 170form in a parse tree, or at least just parse I<around> it. 171 172=back 173 174For example: consider the following paragraphs: 175 176 # <- that's the 0th column 177 178 =head1 Foo 179 180 Stuff 181 182 $foo->bar 183 184 =cut 185 186Here, "=head1 Foo" and "=cut" are command paragraphs because the first 187line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" 188is a verbatim paragraph, because its first line starts with a literal 189whitespace character (and there's no "=begin"..."=end" region around). 190 191The "=begin I<identifier>" ... "=end I<identifier>" commands stop 192paragraphs that they surround from being parsed as ordinary or verbatim 193paragraphs, if I<identifier> doesn't begin with a colon. This 194is discussed in detail in the section 195L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 196 197=head1 Pod Commands 198 199This section is intended to supplement and clarify the discussion in 200L<perlpod/"Command Paragraph">. These are the currently recognized 201Pod commands: 202 203=over 204 205=item "=head1", "=head2", "=head3", "=head4" 206 207This command indicates that the text in the remainder of the paragraph 208is a heading. That text may contain formatting codes. Examples: 209 210 =head1 Object Attributes 211 212 =head3 What B<Not> to Do! 213 214=item "=pod" 215 216This command indicates that this paragraph begins a Pod block. (If we 217are already in the middle of a Pod block, this command has no effect at 218all.) If there is any text in this command paragraph after "=pod", 219it must be ignored. Examples: 220 221 =pod 222 223 This is a plain Pod paragraph. 224 225 =pod This text is ignored. 226 227=item "=cut" 228 229This command indicates that this line is the end of this previously 230started Pod block. If there is any text after "=cut" on the line, it must be 231ignored. Examples: 232 233 =cut 234 235 =cut The documentation ends here. 236 237 =cut 238 # This is the first line of program text. 239 sub foo { # This is the second. 240 241It is an error to try to I<start> a Pod block with a "=cut" command. In 242that case, the Pod processor must halt parsing of the input file, and 243must by default emit a warning. 244 245=item "=over" 246 247This command indicates that this is the start of a list/indent 248region. If there is any text following the "=over", it must consist 249of only a nonzero positive numeral. The semantics of this numeral is 250explained in the L</"About =over...=back Regions"> section, further 251below. Formatting codes are not expanded. Examples: 252 253 =over 3 254 255 =over 3.5 256 257 =over 258 259=item "=item" 260 261This command indicates that an item in a list begins here. Formatting 262codes are processed. The semantics of the (optional) text in the 263remainder of this paragraph are 264explained in the L</"About =over...=back Regions"> section, further 265below. Examples: 266 267 =item 268 269 =item * 270 271 =item * 272 273 =item 14 274 275 =item 3. 276 277 =item C<< $thing->stuff(I<dodad>) >> 278 279 =item For transporting us beyond seas to be tried for pretended 280 offenses 281 282 =item He is at this time transporting large armies of foreign 283 mercenaries to complete the works of death, desolation and 284 tyranny, already begun with circumstances of cruelty and perfidy 285 scarcely paralleled in the most barbarous ages, and totally 286 unworthy the head of a civilized nation. 287 288=item "=back" 289 290This command indicates that this is the end of the region begun 291by the most recent "=over" command. It permits no text after the 292"=back" command. 293 294=item "=begin formatname" 295 296=item "=begin formatname parameter" 297 298This marks the following paragraphs (until the matching "=end 299formatname") as being for some special kind of processing. Unless 300"formatname" begins with a colon, the contained non-command 301paragraphs are data paragraphs. But if "formatname" I<does> begin 302with a colon, then non-command paragraphs are ordinary paragraphs 303or data paragraphs. This is discussed in detail in the section 304L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 305 306It is advised that formatnames match the regexp 307C<m/\A:?[−a−zA−Z0−9_]+\z/>. Everything following whitespace after the 308formatname is a parameter that may be used by the formatter when dealing 309with this region. This parameter must not be repeated in the "=end" 310paragraph. Implementors should anticipate future expansion in the 311semantics and syntax of the first parameter to "=begin"/"=end"/"=for". 312 313=item "=end formatname" 314 315This marks the end of the region opened by the matching 316"=begin formatname" region. If "formatname" is not the formatname 317of the most recent open "=begin formatname" region, then this 318is an error, and must generate an error message. This 319is discussed in detail in the section 320L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 321 322=item "=for formatname text..." 323 324This is synonymous with: 325 326 =begin formatname 327 328 text... 329 330 =end formatname 331 332That is, it creates a region consisting of a single paragraph; that 333paragraph is to be treated as a normal paragraph if "formatname" 334begins with a ":"; if "formatname" I<doesn't> begin with a colon, 335then "text..." will constitute a data paragraph. There is no way 336to use "=for formatname text..." to express "text..." as a verbatim 337paragraph. 338 339=item "=encoding encodingname" 340 341This command, which should occur early in the document (at least 342before any non-US-ASCII data!), declares that this document is 343encoded in the encoding I<encodingname>, which must be 344an encoding name that L<Encode> recognizes. (Encode's list 345of supported encodings, in L<Encode::Supported>, is useful here.) 346If the Pod parser cannot decode the declared encoding, it 347should emit a warning and may abort parsing the document 348altogether. 349 350A document having more than one "=encoding" line should be 351considered an error. Pod processors may silently tolerate this if 352the not-first "=encoding" lines are just duplicates of the 353first one (e.g., if there's a "=encoding utf8" line, and later on 354another "=encoding utf8" line). But Pod processors should complain if 355there are contradictory "=encoding" lines in the same document 356(e.g., if there is a "=encoding utf8" early in the document and 357"=encoding big5" later). Pod processors that recognize BOMs 358may also complain if they see an "=encoding" line 359that contradicts the BOM (e.g., if a document with a UTF-16LE 360BOM has an "=encoding shiftjis" line). 361 362=back 363 364If a Pod processor sees any command other than the ones listed 365above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", 366or "=w123"), that processor must by default treat this as an 367error. It must not process the paragraph beginning with that 368command, must by default warn of this as an error, and may 369abort the parse. A Pod parser may allow a way for particular 370applications to add to the above list of known commands, and to 371stipulate, for each additional command, whether formatting 372codes should be processed. 373 374Future versions of this specification may add additional 375commands. 376 377 378 379=head1 Pod Formatting Codes 380 381(Note that in previous drafts of this document and of perlpod, 382formatting codes were referred to as "interior sequences", and 383this term may still be found in the documentation for Pod parsers, 384and in error messages from Pod processors.) 385 386There are two syntaxes for formatting codes: 387 388=over 389 390=item * 391 392A formatting code starts with a capital letter (just US-ASCII [A-Z]) 393followed by a "<", any number of characters, and ending with the first 394matching ">". Examples: 395 396 That's what I<you> think! 397 398 What's C<dump()> for? 399 400 X<C<chmod> and C<unlink()> Under Different Operating Systems> 401 402=item * 403 404A formatting code starts with a capital letter (just US-ASCII [A-Z]) 405followed by two or more "<"'s, one or more whitespace characters, 406any number of characters, one or more whitespace characters, 407and ending with the first matching sequence of two or more ">"'s, where 408the number of ">"'s equals the number of "<"'s in the opening of this 409formatting code. Examples: 410 411 That's what I<< you >> think! 412 413 C<<< open(X, ">>thing.dat") || die $! >>> 414 415 B<< $foo->bar(); >> 416 417With this syntax, the whitespace character(s) after the "CE<lt><<" 418and before the ">>" (or whatever letter) are I<not> renderable. They 419do not signify whitespace, are merely part of the formatting codes 420themselves. That is, these are all synonymous: 421 422 C<thing> 423 C<< thing >> 424 C<< thing >> 425 C<<< thing >>> 426 C<<<< 427 thing 428 >>>> 429 430and so on. 431 432Finally, the multiple-angle-bracket form does I<not> alter the interpretation 433of nested formatting codes, meaning that the following four example lines are 434identical in meaning: 435 436 B<example: C<$a E<lt>=E<gt> $b>> 437 438 B<example: C<< $a <=> $b >>> 439 440 B<example: C<< $a E<lt>=E<gt> $b >>> 441 442 B<<< example: C<< $a E<lt>=E<gt> $b >> >>> 443 444=back 445 446In parsing Pod, a notably tricky part is the correct parsing of 447(potentially nested!) formatting codes. Implementors should 448consult the code in the C<parse_text> routine in Pod::Parser as an 449example of a correct implementation. 450 451=over 452 453=item C<IE<lt>textE<gt>> -- italic text 454 455See the brief discussion in L<perlpod/"Formatting Codes">. 456 457=item C<BE<lt>textE<gt>> -- bold text 458 459See the brief discussion in L<perlpod/"Formatting Codes">. 460 461=item C<CE<lt>codeE<gt>> -- code text 462 463See the brief discussion in L<perlpod/"Formatting Codes">. 464 465=item C<FE<lt>filenameE<gt>> -- style for filenames 466 467See the brief discussion in L<perlpod/"Formatting Codes">. 468 469=item C<XE<lt>topic nameE<gt>> -- an index entry 470 471See the brief discussion in L<perlpod/"Formatting Codes">. 472 473This code is unusual in that most formatters completely discard 474this code and its content. Other formatters will render it with 475invisible codes that can be used in building an index of 476the current document. 477 478=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code 479 480Discussed briefly in L<perlpod/"Formatting Codes">. 481 482This code is unusual is that it should have no content. That is, 483a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether 484or not it complains, the I<potatoes> text should ignored. 485 486=item C<LE<lt>nameE<gt>> -- a hyperlink 487 488The complicated syntaxes of this code are discussed at length in 489L<perlpod/"Formatting Codes">, and implementation details are 490discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the 491contents of LE<lt>content> is tricky. Notably, the content has to be 492checked for whether it looks like a URL, or whether it has to be split 493on literal "|" and/or "/" (in the right order!), and so on, 494I<before> EE<lt>...> codes are resolved. 495 496=item C<EE<lt>escapeE<gt>> -- a character escape 497 498See L<perlpod/"Formatting Codes">, and several points in 499L</Notes on Implementing Pod Processors>. 500 501=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces 502 503This formatting code is syntactically simple, but semantically 504complex. What it means is that each space in the printable 505content of this code signifies a non-breaking space. 506 507Consider: 508 509 C<$x ? $y : $z> 510 511 S<C<$x ? $y : $z>> 512 513Both signify the monospace (c[ode] style) text consisting of 514"$x", one space, "?", one space, ":", one space, "$z". The 515difference is that in the latter, with the S code, those spaces 516are not "normal" spaces, but instead are non-breaking spaces. 517 518=back 519 520 521If a Pod processor sees any formatting code other than the ones 522listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that 523processor must by default treat this as an error. 524A Pod parser may allow a way for particular 525applications to add to the above list of known formatting codes; 526a Pod parser might even allow a way to stipulate, for each additional 527command, whether it requires some form of special processing, as 528LE<lt>...> does. 529 530Future versions of this specification may add additional 531formatting codes. 532 533Historical note: A few older Pod processors would not see a ">" as 534closing a "CE<lt>" code, if the ">" was immediately preceded by 535a "-". This was so that this: 536 537 C<$foo->bar> 538 539would parse as equivalent to this: 540 541 C<$foo-E<gt>bar> 542 543instead of as equivalent to a "C" formatting code containing 544only "$foo-", and then a "bar>" outside the "C" formatting code. This 545problem has since been solved by the addition of syntaxes like this: 546 547 C<< $foo->bar >> 548 549Compliant parsers must not treat "->" as special. 550 551Formatting codes absolutely cannot span paragraphs. If a code is 552opened in one paragraph, and no closing code is found by the end of 553that paragraph, the Pod parser must close that formatting code, 554and should complain (as in "Unterminated I code in the paragraph 555starting at line 123: 'Time objects are not...'"). So these 556two paragraphs: 557 558 I<I told you not to do this! 559 560 Don't make me say it again!> 561 562...must I<not> be parsed as two paragraphs in italics (with the I 563code starting in one paragraph and starting in another.) Instead, 564the first paragraph should generate a warning, but that aside, the 565above code must parse as if it were: 566 567 I<I told you not to do this!> 568 569 Don't make me say it again!E<gt> 570 571(In SGMLish jargon, all Pod commands are like block-level 572elements, whereas all Pod formatting codes are like inline-level 573elements.) 574 575 576 577=head1 Notes on Implementing Pod Processors 578 579The following is a long section of miscellaneous requirements 580and suggestions to do with Pod processing. 581 582=over 583 584=item * 585 586Pod formatters should tolerate lines in verbatim blocks that are of 587any length, even if that means having to break them (possibly several 588times, for very long lines) to avoid text running off the side of the 589page. Pod formatters may warn of such line-breaking. Such warnings 590are particularly appropriate for lines are over 100 characters long, which 591are usually not intentional. 592 593=item * 594 595Pod parsers must recognize I<all> of the three well-known newline 596formats: CR, LF, and CRLF. See L<perlport|perlport>. 597 598=item * 599 600Pod parsers should accept input lines that are of any length. 601 602=item * 603 604Since Perl recognizes a Unicode Byte Order Mark at the start of files 605as signaling that the file is Unicode encoded as in UTF-16 (whether 606big-endian or little-endian) or UTF-8, Pod parsers should do the 607same. Otherwise, the character encoding should be understood as 608being UTF-8 if the first highbit byte sequence in the file seems 609valid as a UTF-8 sequence, or otherwise as Latin-1. 610 611Future versions of this specification may specify 612how Pod can accept other encodings. Presumably treatment of other 613encodings in Pod parsing would be as in XML parsing: whatever the 614encoding declared by a particular Pod file, content is to be 615stored in memory as Unicode characters. 616 617=item * 618 619The well known Unicode Byte Order Marks are as follows: if the 620file begins with the two literal byte values 0xFE 0xFF, this is 621the BOM for big-endian UTF-16. If the file begins with the two 622literal byte value 0xFF 0xFE, this is the BOM for little-endian 623UTF-16. If the file begins with the three literal byte values 6240xEF 0xBB 0xBF, this is the BOM for UTF-8. 625 626=for comment 627 use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 628 0xEF 0xBB 0xBF 629 630=for comment 631 If toke.c is modified to support UTF-32, add mention of those here. 632 633=item * 634 635A naive but sufficient heuristic for testing the first highbit 636byte-sequence in a BOM-less file (whether in code or in Pod!), to see 637whether that sequence is valid as UTF-8 (RFC 2279) is to check whether 638that the first byte in the sequence is in the range 0xC0 - 0xFD 639I<and> whether the next byte is in the range 6400x80 - 0xBF. If so, the parser may conclude that this file is in 641UTF-8, and all highbit sequences in the file should be assumed to 642be UTF-8. Otherwise the parser should treat the file as being 643in Latin-1. In the unlikely circumstance that the first highbit 644sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one 645can cater to our heuristic (as well as any more intelligent heuristic) 646by prefacing that line with a comment line containing a highbit 647sequence that is clearly I<not> valid as UTF-8. A line consisting 648of simply "#", an e-acute, and any non-highbit byte, 649is sufficient to establish this file's encoding. 650 651=for comment 652 If/WHEN some brave soul makes these heuristics into a generic 653 text-file class (or PerlIO layer?), we can presumably delete 654 mention of these icky details from this file, and can instead 655 tell people to just use appropriate class/layer. 656 Auto-recognition of newline sequences would be another desirable 657 feature of such a class/layer. 658 HINT HINT HINT. 659 660=for comment 661 "The probability that a string of characters 662 in any other encoding appears as valid UTF-8 is low" - RFC2279 663 664=item * 665 666This document's requirements and suggestions about encodings 667do not apply to Pod processors running on non-ASCII platforms, 668notably EBCDIC platforms. 669 670=item * 671 672Pod processors must treat a "=for [label] [content...]" paragraph as 673meaning the same thing as a "=begin [label]" paragraph, content, and 674an "=end [label]" paragraph. (The parser may conflate these two 675constructs, or may leave them distinct, in the expectation that the 676formatter will nevertheless treat them the same.) 677 678=item * 679 680When rendering Pod to a format that allows comments (i.e., to nearly 681any format other than plaintext), a Pod formatter must insert comment 682text identifying its name and version number, and the name and 683version numbers of any modules it might be using to process the Pod. 684Minimal examples: 685 686 %% POD::Pod2PS v3.14159, using POD::Parser v1.92 687 688 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> 689 690 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} 691 692 .\" Pod::Man version 3.14159, using POD::Parser version 1.92 693 694Formatters may also insert additional comments, including: the 695release date of the Pod formatter program, the contact address for 696the author(s) of the formatter, the current time, the name of input 697file, the formatting options in effect, version of Perl used, etc. 698 699Formatters may also choose to note errors/warnings as comments, 700besides or instead of emitting them otherwise (as in messages to 701STDERR, or C<die>ing). 702 703=item * 704 705Pod parsers I<may> emit warnings or error messages ("Unknown E code 706EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or 707C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow 708suppressing all such STDERR output, and instead allow an option for 709reporting errors/warnings 710in some other way, whether by triggering a callback, or noting errors 711in some attribute of the document object, or some similarly unobtrusive 712mechanism -- or even by appending a "Pod Errors" section to the end of 713the parsed form of the document. 714 715=item * 716 717In cases of exceptionally aberrant documents, Pod parsers may abort the 718parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where 719possible, the parser library may simply close the input file 720and add text like "*** Formatting Aborted ***" to the end of the 721(partial) in-memory document. 722 723=item * 724 725In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) 726are understood (i.e., I<not> verbatim paragraphs, but I<including> 727ordinary paragraphs, and command paragraphs that produce renderable 728text, like "=head1"), literal whitespace should generally be considered 729"insignificant", in that one literal space has the same meaning as any 730(nonzero) number of literal spaces, literal newlines, and literal tabs 731(as long as this produces no blank lines, since those would terminate 732the paragraph). Pod parsers should compact literal whitespace in each 733processed paragraph, but may provide an option for overriding this 734(since some processing tasks do not require it), or may follow 735additional special rules (for example, specially treating 736period-space-space or period-newline sequences). 737 738=item * 739 740Pod parsers should not, by default, try to coerce apostrophe (') and 741quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to 742turn backtick (`) into anything else but a single backtick character 743(distinct from an open quote character!), nor "--" into anything but 744two minus signs. They I<must never> do any of those things to text 745in CE<lt>...> formatting codes, and never I<ever> to text in verbatim 746paragraphs. 747 748=item * 749 750When rendering Pod to a format that has two kinds of hyphens (-), one 751that's a non-breaking hyphen, and another that's a breakable hyphen 752(as in "object-oriented", which can be split across lines as 753"object-", newline, "oriented"), formatters are encouraged to 754generally translate "-" to non-breaking hyphen, but may apply 755heuristics to convert some of these to breaking hyphens. 756 757=item * 758 759Pod formatters should make reasonable efforts to keep words of Perl 760code from being broken across lines. For example, "Foo::Bar" in some 761formatting systems is seen as eligible for being broken across lines 762as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should 763be avoided where possible, either by disabling all line-breaking in 764mid-word, or by wrapping particular words with internal punctuation 765in "don't break this across lines" codes (which in some formats may 766not be a single code, but might be a matter of inserting non-breaking 767zero-width spaces between every pair of characters in a word.) 768 769=item * 770 771Pod parsers should, by default, expand tabs in verbatim paragraphs as 772they are processed, before passing them to the formatter or other 773processor. Parsers may also allow an option for overriding this. 774 775=item * 776 777Pod parsers should, by default, remove newlines from the end of 778ordinary and verbatim paragraphs before passing them to the 779formatter. For example, while the paragraph you're reading now 780could be considered, in Pod source, to end with (and contain) 781the newline(s) that end it, it should be processed as ending with 782(and containing) the period character that ends this sentence. 783 784=item * 785 786Pod parsers, when reporting errors, should make some effort to report 787an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near 788line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph 789number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where 790this is problematic, the paragraph number should at least be 791accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in 792Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for 793the CE<lt>interest rate> attribute...'"). 794 795=item * 796 797Pod parsers, when processing a series of verbatim paragraphs one 798after another, should consider them to be one large verbatim 799paragraph that happens to contain blank lines. I.e., these two 800lines, which have a blank line between them: 801 802 use Foo; 803 804 print Foo->VERSION 805 806should be unified into one paragraph ("\tuse Foo;\n\n\tprint 807Foo->VERSION") before being passed to the formatter or other 808processor. Parsers may also allow an option for overriding this. 809 810While this might be too cumbersome to implement in event-based Pod 811parsers, it is straightforward for parsers that return parse trees. 812 813=item * 814 815Pod formatters, where feasible, are advised to avoid splitting short 816verbatim paragraphs (under twelve lines, say) across pages. 817 818=item * 819 820Pod parsers must treat a line with only spaces and/or tabs on it as a 821"blank line" such as separates paragraphs. (Some older parsers 822recognized only two adjacent newlines as a "blank line" but would not 823recognize a newline, a space, and a newline, as a blank line. This 824is noncompliant behavior.) 825 826=item * 827 828Authors of Pod formatters/processors should make every effort to 829avoid writing their own Pod parser. There are already several in 830CPAN, with a wide range of interface styles -- and one of them, 831Pod::Parser, comes with modern versions of Perl. 832 833=item * 834 835Characters in Pod documents may be conveyed either as literals, or by 836number in EE<lt>n> codes, or by an equivalent mnemonic, as in 837EE<lt>eacute> which is exactly equivalent to EE<lt>233>. 838 839Characters in the range 32-126 refer to those well known US-ASCII 840characters (also defined there by Unicode, with the same meaning), 841which all Pod formatters must render faithfully. Characters 842in the ranges 0-31 and 127-159 should not be used (neither as 843literals, nor as EE<lt>number> codes), except for the 844literal byte-sequences for newline (13, 13 10, or 10), and tab (9). 845 846Characters in the range 160-255 refer to Latin-1 characters (also 847defined there by Unicode, with the same meaning). Characters above 848255 should be understood to refer to Unicode characters. 849 850=item * 851 852Be warned 853that some formatters cannot reliably render characters outside 32-126; 854and many are able to handle 32-126 and 160-255, but nothing above 855255. 856 857=item * 858 859Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for 860less-than and greater-than, Pod parsers must understand "EE<lt>sol>" 861for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, 862pipe). Pod parsers should also understand "EE<lt>lchevron>" and 863"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., 864"left-pointing double angle quotation mark" = "left pointing 865guillemet" and "right-pointing double angle quotation mark" = "right 866pointing guillemet". (These look like little "<<" and ">>", and they 867are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" 868and "EE<lt>raquo>".) 869 870=item * 871 872Pod parsers should understand all "EE<lt>html>" codes as defined 873in the entity declarations in the most recent XHTML specification at 874C<www.W3.org>. Pod parsers must understand at least the entities 875that define characters in the range 160-255 (Latin-1). Pod parsers, 876when faced with some unknown "EE<lt>I<identifier>>" code, 877shouldn't simply replace it with nullstring (by default, at least), 878but may pass it through as a string consisting of the literal characters 879E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the 880alternative option of processing such unknown 881"EE<lt>I<identifier>>" codes by firing an event especially 882for such codes, or by adding a special node-type to the in-memory 883document tree. Such "EE<lt>I<identifier>>" may have special meaning 884to some processors, or some processors may choose to add them to 885a special error report. 886 887=item * 888 889Pod parsers must also support the XHTML codes "EE<lt>quot>" for 890character 34 (doublequote, "), "EE<lt>amp>" for character 38 891(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). 892 893=item * 894 895Note that in all cases of "EE<lt>whatever>", I<whatever> (whether 896an htmlname, or a number in any base) must consist only of 897alphanumeric characters -- that is, I<whatever> must watch 898C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because 899it contains spaces, which aren't alphanumeric characters. This 900presumably does not I<need> special treatment by a Pod processor; 901" 0 1 2 3 " doesn't look like a number in any base, so it would 902presumably be looked up in the table of HTML-like names. Since 903there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ", 904this will be treated as an error. However, Pod processors may 905treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically> 906invalid, potentially earning a different error message than the 907error message (or warning, or event) generated by a merely unknown 908(but theoretically valid) htmlname, as in "EE<lt>qacute>" 909[sic]. However, Pod parsers are not required to make this 910distinction. 911 912=item * 913 914Note that EE<lt>number> I<must not> be interpreted as simply 915"codepoint I<number> in the current/native character set". It always 916means only "the character represented by codepoint I<number> in 917Unicode." (This is identical to the semantics of &#I<number>; in XML.) 918 919This will likely require many formatters to have tables mapping from 920treatable Unicode codepoints (such as the "\xE9" for the e-acute 921character) to the escape sequences or codes necessary for conveying 922such sequences in the target output format. A converter to *roff 923would, for example know that "\xE9" (whether conveyed literally, or via 924a EE<lt>...> sequence) is to be conveyed as "e\\*'". 925Similarly, a program rendering Pod in a Mac OS application window, would 926presumably need to know that "\xE9" maps to codepoint 142 in MacRoman 927encoding that (at time of writing) is native for Mac OS. Such 928Unicode2whatever mappings are presumably already widely available for 929common output formats. (Such mappings may be incomplete! Implementers 930are not expected to bend over backwards in an attempt to render 931Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any 932of the other weird things that Unicode can encode.) And 933if a Pod document uses a character not found in such a mapping, the 934formatter should consider it an unrenderable character. 935 936=item * 937 938If, surprisingly, the implementor of a Pod formatter can't find a 939satisfactory pre-existing table mapping from Unicode characters to 940escapes in the target format (e.g., a decent table of Unicode 941characters to *roff escapes), it will be necessary to build such a 942table. If you are in this circumstance, you should begin with the 943characters in the range 0x00A0 - 0x00FF, which is mostly the heavily 944used accented characters. Then proceed (as patience permits and 945fastidiousness compels) through the characters that the (X)HTML 946standards groups judged important enough to merit mnemonics 947for. These are declared in the (X)HTML specifications at the 948www.W3.org site. At time of writing (September 2001), the most recent 949entity declaration files are: 950 951 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 952 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 953 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent 954 955Then you can progress through any remaining notable Unicode characters 956in the range 0x2000-0x204D (consult the character tables at 957www.unicode.org), and whatever else strikes your fancy. For example, 958in F<xhtml-symbol.ent>, there is the entry: 959 960 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> 961 962While the mapping "infin" to the character "\x{221E}" will (hopefully) 963have been already handled by the Pod parser, the presence of the 964character in this file means that it's reasonably important enough to 965include in a formatter's table that maps from notable Unicode characters 966to the codes necessary for rendering them. So for a Unicode-to-*roff 967mapping, for example, this would merit the entry: 968 969 "\x{221E}" => '\(in', 970 971It is eagerly hoped that in the future, increasing numbers of formats 972(and formatters) will support Unicode characters directly (as (X)HTML 973does with C<∞>, C<∞>, or C<∞>), reducing the need 974for idiosyncratic mappings of Unicode-to-I<my_escapes>. 975 976=item * 977 978It is up to individual Pod formatter to display good judgement when 979confronted with an unrenderable character (which is distinct from an 980unknown EE<lt>thing> sequence that the parser couldn't resolve to 981anything, renderable or not). It is good practice to map Latin letters 982with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding 983unaccented US-ASCII letters (like a simple character 101, "e"), but 984clearly this is often not feasible, and an unrenderable character may 985be represented as "?", or the like. In attempting a sane fallback 986(as from EE<lt>233> to "e"), Pod formatters may use the 987%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or 988L<Text::Unidecode|Text::Unidecode>, if available. 989 990For example, this Pod text: 991 992 magic is enabled if you set C<$Currency> to 'E<euro>'. 993 994may be rendered as: 995"magic is enabled if you set C<$Currency> to 'I<?>'" or as 996"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as 997"magic is enabled if you set C<$Currency> to '[x20AC]', etc. 998 999A Pod formatter may also note, in a comment or warning, a list of what 1000unrenderable characters were encountered. 1001 1002=item * 1003 1004EE<lt>...> may freely appear in any formatting code (other than 1005in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The 1006EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The 1007EE<lt>euro>1,000,000 Solution|Million::Euros>". 1008 1009=item * 1010 1011Some Pod formatters output to formats that implement non-breaking 1012spaces as an individual character (which I'll call "NBSP"), and 1013others output to formats that implement non-breaking spaces just as 1014spaces wrapped in a "don't break this across lines" code. Note that 1015at the level of Pod, both sorts of codes can occur: Pod can contain a 1016NBSP character (whether as a literal, or as a "EE<lt>160>" or 1017"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo 1018IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in 1019such codes are taken to represent non-breaking spaces. Pod 1020parsers should consider supporting the optional parsing of "SE<lt>foo 1021IE<lt>barE<gt> baz>" as if it were 1022"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the 1023optional parsing of groups of words joined by NBSP's as if each group 1024were in a SE<lt>...> code, so that formatters may use the 1025representation that maps best to what the output format demands. 1026 1027=item * 1028 1029Some processors may find that the C<SE<lt>...E<gt>> code is easiest to 1030implement by replacing each space in the parse tree under the content 1031of the S, with an NBSP. But note: the replacement should apply I<not> to 1032spaces in I<all> text, but I<only> to spaces in I<printable> text. (This 1033distinction may or may not be evident in the particular tree/event 1034model implemented by the Pod parser.) For example, consider this 1035unusual case: 1036 1037 S<L</Autoloaded Functions>> 1038 1039This means that the space in the middle of the visible link text must 1040not be broken across lines. In other words, it's the same as this: 1041 1042 L<"AutoloadedE<160>Functions"/Autoloaded Functions> 1043 1044However, a misapplied space-to-NBSP replacement could (wrongly) 1045produce something equivalent to this: 1046 1047 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> 1048 1049...which is almost definitely not going to work as a hyperlink (assuming 1050this formatter outputs a format supporting hypertext). 1051 1052Formatters may choose to just not support the S format code, 1053especially in cases where the output format simply has no NBSP 1054character/code and no code for "don't break this stuff across lines". 1055 1056=item * 1057 1058Besides the NBSP character discussed above, implementors are reminded 1059of the existence of the other "special" character in Latin-1, the 1060"soft hyphen" character, also known as "discretionary hyphen", 1061i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = 1062C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation 1063point. That is, it normally renders as nothing, but may render as a 1064"-" if a formatter breaks the word at that point. Pod formatters 1065should, as appropriate, do one of the following: 1) render this with 1066a code with the same meaning (e.g., "\-" in RTF), 2) pass it through 1067in the expectation that the formatter understands this character as 1068such, or 3) delete it. 1069 1070For example: 1071 1072 sigE<shy>action 1073 manuE<shy>script 1074 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi 1075 1076These signal to a formatter that if it is to hyphenate "sigaction" 1077or "manuscript", then it should be done as 1078"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" 1079(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't 1080show up at all). And if it is 1081to hyphenate "Jarkko" and/or "Hietaniemi", it can do 1082so only at the points where there is a C<EE<lt>shyE<gt>> code. 1083 1084In practice, it is anticipated that this character will not be used 1085often, but formatters should either support it, or delete it. 1086 1087=item * 1088 1089If you think that you want to add a new command to Pod (like, say, a 1090"=biblio" command), consider whether you could get the same 1091effect with a for or begin/end sequence: "=for biblio ..." or "=begin 1092biblio" ... "=end biblio". Pod processors that don't understand 1093"=for biblio", etc, will simply ignore it, whereas they may complain 1094loudly if they see "=biblio". 1095 1096=item * 1097 1098Throughout this document, "Pod" has been the preferred spelling for 1099the name of the documentation format. One may also use "POD" or 1100"pod". For the documentation that is (typically) in the Pod 1101format, you may use "pod", or "Pod", or "POD". Understanding these 1102distinctions is useful; but obsessing over how to spell them, usually 1103is not. 1104 1105=back 1106 1107 1108 1109 1110 1111=head1 About LE<lt>...E<gt> Codes 1112 1113As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> 1114code is the most complex of the Pod formatting codes. The points below 1115will hopefully clarify what it means and how processors should deal 1116with it. 1117 1118=over 1119 1120=item * 1121 1122In parsing an LE<lt>...> code, Pod parsers must distinguish at least 1123four attributes: 1124 1125=over 1126 1127=item First: 1128 1129The link-text. If there is none, this must be undef. (E.g., in 1130"LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". 1131In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no 1132link text. Note that link text may contain formatting.) 1133 1134=item Second: 1135 1136The possibly inferred link-text; i.e., if there was no real link 1137text, then this is the text that we'll infer in its place. (E.g., for 1138"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) 1139 1140=item Third: 1141 1142The name or URL, or undef if none. (E.g., in "LE<lt>Perl 1143Functions|perlfunc>", the name (also sometimes called the page) 1144is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) 1145 1146=item Fourth: 1147 1148The section (AKA "item" in older perlpods), or undef if none. E.g., 1149in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note 1150that this is not the same as a manpage section like the "5" in "man 5 1151crontab". "Section Foo" in the Pod sense means the part of the text 1152that's introduced by the heading or item whose text is "Foo".) 1153 1154=back 1155 1156Pod parsers may also note additional attributes including: 1157 1158=over 1159 1160=item Fifth: 1161 1162A flag for whether item 3 (if present) is a URL (like 1163"http://lists.perl.org" is), in which case there should be no section 1164attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or 1165possibly a man page name (like "crontab(5)" is). 1166 1167=item Sixth: 1168 1169The raw original LE<lt>...> content, before text is split on 1170"|", "/", etc, and before EE<lt>...> codes are expanded. 1171 1172=back 1173 1174(The above were numbered only for concise reference below. It is not 1175a requirement that these be passed as an actual list or array.) 1176 1177For example: 1178 1179 L<Foo::Bar> 1180 => undef, # link text 1181 "Foo::Bar", # possibly inferred link text 1182 "Foo::Bar", # name 1183 undef, # section 1184 'pod', # what sort of link 1185 "Foo::Bar" # original content 1186 1187 L<Perlport's section on NL's|perlport/Newlines> 1188 => "Perlport's section on NL's", # link text 1189 "Perlport's section on NL's", # possibly inferred link text 1190 "perlport", # name 1191 "Newlines", # section 1192 'pod', # what sort of link 1193 "Perlport's section on NL's|perlport/Newlines" # orig. content 1194 1195 L<perlport/Newlines> 1196 => undef, # link text 1197 '"Newlines" in perlport', # possibly inferred link text 1198 "perlport", # name 1199 "Newlines", # section 1200 'pod', # what sort of link 1201 "perlport/Newlines" # original content 1202 1203 L<crontab(5)/"DESCRIPTION"> 1204 => undef, # link text 1205 '"DESCRIPTION" in crontab(5)', # possibly inferred link text 1206 "crontab(5)", # name 1207 "DESCRIPTION", # section 1208 'man', # what sort of link 1209 'crontab(5)/"DESCRIPTION"' # original content 1210 1211 L</Object Attributes> 1212 => undef, # link text 1213 '"Object Attributes"', # possibly inferred link text 1214 undef, # name 1215 "Object Attributes", # section 1216 'pod', # what sort of link 1217 "/Object Attributes" # original content 1218 1219 L<http://www.perl.org/> 1220 => undef, # link text 1221 "http://www.perl.org/", # possibly inferred link text 1222 "http://www.perl.org/", # name 1223 undef, # section 1224 'url', # what sort of link 1225 "http://www.perl.org/" # original content 1226 1227 L<Perl.org|http://www.perl.org/> 1228 => "Perl.org", # link text 1229 "http://www.perl.org/", # possibly inferred link text 1230 "http://www.perl.org/", # name 1231 undef, # section 1232 'url', # what sort of link 1233 "Perl.org|http://www.perl.org/" # original content 1234 1235Note that you can distinguish URL-links from anything else by the 1236fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So 1237C<LE<lt>http://www.perl.comE<gt>> is a URL, but 1238C<LE<lt>HTTP::ResponseE<gt>> isn't. 1239 1240=item * 1241 1242In case of LE<lt>...> codes with no "text|" part in them, 1243older formatters have exhibited great variation in actually displaying 1244the link or cross reference. For example, LE<lt>crontab(5)> would render 1245as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" 1246or just "C<crontab(5)>". 1247 1248Pod processors must now treat "text|"-less links as follows: 1249 1250 L<name> => L<name|name> 1251 L</section> => L<"section"|/section> 1252 L<name/section> => L<"section" in name|name/section> 1253 1254=item * 1255 1256Note that section names might contain markup. I.e., if a section 1257starts with: 1258 1259 =head2 About the C<-M> Operator 1260 1261or with: 1262 1263 =item About the C<-M> Operator 1264 1265then a link to it would look like this: 1266 1267 L<somedoc/About the C<-M> Operator> 1268 1269Formatters may choose to ignore the markup for purposes of resolving 1270the link and use only the renderable characters in the section name, 1271as in: 1272 1273 <h1><a name="About_the_-M_Operator">About the <code>-M</code> 1274 Operator</h1> 1275 1276 ... 1277 1278 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> 1279 Operator" in somedoc</a> 1280 1281=item * 1282 1283Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> 1284links from C<LE<lt>name/itemE<gt>> links (and their targets). These 1285have been merged syntactically and semantically in the current 1286specification, and I<section> can refer either to a "=headI<n> Heading 1287Content" command or to a "=item Item Content" command. This 1288specification does not specify what behavior should be in the case 1289of a given document having several things all seeming to produce the 1290same I<section> identifier (e.g., in HTML, several things all producing 1291the same I<anchorname> in <a name="I<anchorname>">...</a> 1292elements). Where Pod processors can control this behavior, they should 1293use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the 1294I<first> "Bar" section in Foo. 1295 1296But for some processors/formats this cannot be easily controlled; as 1297with the HTML example, the behavior of multiple ambiguous 1298<a name="I<anchorname>">...</a> is most easily just left up to 1299browsers to decide. 1300 1301=item * 1302 1303Authors wanting to link to a particular (absolute) URL, must do so 1304only with "LE<lt>scheme:...>" codes (like 1305LE<lt>http://www.perl.org>), and must not attempt "LE<lt>Some Site 1306Name|scheme:...>" codes. This restriction avoids many problems 1307in parsing and rendering LE<lt>...> codes. 1308 1309=item * 1310 1311In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes 1312for formatting or for EE<lt>...> escapes, as in: 1313 1314 L<B<ummE<234>stuff>|...> 1315 1316For C<LE<lt>...E<gt>> codes without a "name|" part, only 1317C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is, 1318authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". 1319 1320Note, however, that formatting codes and ZE<lt>>'s can occur in any 1321and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, 1322and I<url>). 1323 1324Authors must not nest LE<lt>...> codes. For example, "LE<lt>The 1325LE<lt>Foo::Bar> man page>" should be treated as an error. 1326 1327=item * 1328 1329Note that Pod authors may use formatting codes inside the "text" 1330part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). 1331 1332In other words, this is valid: 1333 1334 Go read L<the docs on C<$.>|perlvar/"$."> 1335 1336Some output formats that do allow rendering "LE<lt>...>" codes as 1337hypertext, might not allow the link-text to be formatted; in 1338that case, formatters will have to just ignore that formatting. 1339 1340=item * 1341 1342At time of writing, C<LE<lt>nameE<gt>> values are of two types: 1343either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which 1344might be a real Perl module or program in an @INC / PATH 1345directory, or a .pod file in those places); or the name of a Unix 1346man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> 1347in ambiguous between a Pod page called "chmod", or the Unix man page 1348"chmod" (in whatever man-section). However, the presence of a string 1349in parens, as in "crontab(5)", is sufficient to signal that what 1350is being discussed is not a Pod page, and so is presumably a 1351Unix man page. The distinction is of no importance to many 1352Pod processors, but some processors that render to hypertext formats 1353may need to distinguish them in order to know how to render a 1354given C<LE<lt>fooE<gt>> code. 1355 1356=item * 1357 1358Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in 1359C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from 1360C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only 1361slightly less ambiguous. This syntax is no longer in the specification, and 1362has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was 1363formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>> 1364syntax, for a while at least. The suggested heuristic for distinguishing 1365C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any 1366whitespace, it's a I<section>. Pod processors should warn about this being 1367deprecated syntax. 1368 1369=back 1370 1371=head1 About =over...=back Regions 1372 1373"=over"..."=back" regions are used for various kinds of list-like 1374structures. (I use the term "region" here simply as a collective 1375term for everything from the "=over" to the matching "=back".) 1376 1377=over 1378 1379=item * 1380 1381The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... 1382"=back" is used for giving the formatter a clue as to how many 1383"spaces" (ems, or roughly equivalent units) it should tab over, 1384although many formatters will have to convert this to an absolute 1385measurement that may not exactly match with the size of spaces (or M's) 1386in the document's base font. Other formatters may have to completely 1387ignore the number. The lack of any explicit I<indentlevel> parameter is 1388equivalent to an I<indentlevel> value of 4. Pod processors may 1389complain if I<indentlevel> is present but is not a positive number 1390matching C<m/\A(\d*\.)?\d+\z/>. 1391 1392=item * 1393 1394Authors of Pod formatters are reminded that "=over" ... "=back" may 1395map to several different constructs in your output format. For 1396example, in converting Pod to (X)HTML, it can map to any of 1397<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or 1398<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or 1399<dt>. 1400 1401=item * 1402 1403Each "=over" ... "=back" region should be one of the following: 1404 1405=over 1406 1407=item * 1408 1409An "=over" ... "=back" region containing only "=item *" commands, 1410each followed by some number of ordinary/verbatim paragraphs, other 1411nested "=over" ... "=back" regions, "=for..." paragraphs, and 1412"=begin"..."=end" regions. 1413 1414(Pod processors must tolerate a bare "=item" as if it were "=item 1415*".) Whether "*" is rendered as a literal asterisk, an "o", or as 1416some kind of real bullet character, is left up to the Pod formatter, 1417and may depend on the level of nesting. 1418 1419=item * 1420 1421An "=over" ... "=back" region containing only 1422C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) 1423followed by some number of ordinary/verbatim paragraphs, other nested 1424"=over" ... "=back" regions, "=for..." paragraphs, and/or 1425"=begin"..."=end" codes. Note that the numbers must start at 1 1426in each section, and must proceed in order and without skipping 1427numbers. 1428 1429(Pod processors must tolerate lines like "=item 1" as if they were 1430"=item 1.", with the period.) 1431 1432=item * 1433 1434An "=over" ... "=back" region containing only "=item [text]" 1435commands, each one (or each group of them) followed by some number of 1436ordinary/verbatim paragraphs, other nested "=over" ... "=back" 1437regions, or "=for..." paragraphs, and "=begin"..."=end" regions. 1438 1439The "=item [text]" paragraph should not match 1440C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it 1441match just C<m/\A=item\s*\z/>. 1442 1443=item * 1444 1445An "=over" ... "=back" region containing no "=item" paragraphs at 1446all, and containing only some number of 1447ordinary/verbatim paragraphs, and possibly also some nested "=over" 1448... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" 1449regions. Such an itemless "=over" ... "=back" region in Pod is 1450equivalent in meaning to a "<blockquote>...</blockquote>" element in 1451HTML. 1452 1453=back 1454 1455Note that with all the above cases, you can determine which type of 1456"=over" ... "=back" you have, by examining the first (non-"=cut", 1457non-"=pod") Pod paragraph after the "=over" command. 1458 1459=item * 1460 1461Pod formatters I<must> tolerate arbitrarily large amounts of text 1462in the "=item I<text...>" paragraph. In practice, most such 1463paragraphs are short, as in: 1464 1465 =item For cutting off our trade with all parts of the world 1466 1467But they may be arbitrarily long: 1468 1469 =item For transporting us beyond seas to be tried for pretended 1470 offenses 1471 1472 =item He is at this time transporting large armies of foreign 1473 mercenaries to complete the works of death, desolation and 1474 tyranny, already begun with circumstances of cruelty and perfidy 1475 scarcely paralleled in the most barbarous ages, and totally 1476 unworthy the head of a civilized nation. 1477 1478=item * 1479 1480Pod processors should tolerate "=item *" / "=item I<number>" commands 1481with no accompanying paragraph. The middle item is an example: 1482 1483 =over 1484 1485 =item 1 1486 1487 Pick up dry cleaning. 1488 1489 =item 2 1490 1491 =item 3 1492 1493 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. 1494 1495 =back 1496 1497=item * 1498 1499No "=over" ... "=back" region can contain headings. Processors may 1500treat such a heading as an error. 1501 1502=item * 1503 1504Note that an "=over" ... "=back" region should have some 1505content. That is, authors should not have an empty region like this: 1506 1507 =over 1508 1509 =back 1510 1511Pod processors seeing such a contentless "=over" ... "=back" region, 1512may ignore it, or may report it as an error. 1513 1514=item * 1515 1516Processors must tolerate an "=over" list that goes off the end of the 1517document (i.e., which has no matching "=back"), but they may warn 1518about such a list. 1519 1520=item * 1521 1522Authors of Pod formatters should note that this construct: 1523 1524 =item Neque 1525 1526 =item Porro 1527 1528 =item Quisquam Est 1529 1530 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1531 velit, sed quia non numquam eius modi tempora incidunt ut 1532 labore et dolore magnam aliquam quaerat voluptatem. 1533 1534 =item Ut Enim 1535 1536is semantically ambiguous, in a way that makes formatting decisions 1537a bit difficult. On the one hand, it could be mention of an item 1538"Neque", mention of another item "Porro", and mention of another 1539item "Quisquam Est", with just the last one requiring the explanatory 1540paragraph "Qui dolorem ipsum quia dolor..."; and then an item 1541"Ut Enim". In that case, you'd want to format it like so: 1542 1543 Neque 1544 1545 Porro 1546 1547 Quisquam Est 1548 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1549 velit, sed quia non numquam eius modi tempora incidunt ut 1550 labore et dolore magnam aliquam quaerat voluptatem. 1551 1552 Ut Enim 1553 1554But it could equally well be a discussion of three (related or equivalent) 1555items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph 1556explaining them all, and then a new item "Ut Enim". In that case, you'd 1557probably want to format it like so: 1558 1559 Neque 1560 Porro 1561 Quisquam Est 1562 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1563 velit, sed quia non numquam eius modi tempora incidunt ut 1564 labore et dolore magnam aliquam quaerat voluptatem. 1565 1566 Ut Enim 1567 1568But (for the foreseeable future), Pod does not provide any way for Pod 1569authors to distinguish which grouping is meant by the above 1570"=item"-cluster structure. So formatters should format it like so: 1571 1572 Neque 1573 1574 Porro 1575 1576 Quisquam Est 1577 1578 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1579 velit, sed quia non numquam eius modi tempora incidunt ut 1580 labore et dolore magnam aliquam quaerat voluptatem. 1581 1582 Ut Enim 1583 1584That is, there should be (at least roughly) equal spacing between 1585items as between paragraphs (although that spacing may well be less 1586than the full height of a line of text). This leaves it to the reader 1587to use (con)textual cues to figure out whether the "Qui dolorem 1588ipsum..." paragraph applies to the "Quisquam Est" item or to all three 1589items "Neque", "Porro", and "Quisquam Est". While not an ideal 1590situation, this is preferable to providing formatting cues that may 1591be actually contrary to the author's intent. 1592 1593=back 1594 1595 1596 1597=head1 About Data Paragraphs and "=begin/=end" Regions 1598 1599Data paragraphs are typically used for inlining non-Pod data that is 1600to be used (typically passed through) when rendering the document to 1601a specific format: 1602 1603 =begin rtf 1604 1605 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1606 1607 =end rtf 1608 1609The exact same effect could, incidentally, be achieved with a single 1610"=for" paragraph: 1611 1612 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1613 1614(Although that is not formally a data paragraph, it has the same 1615meaning as one, and Pod parsers may parse it as one.) 1616 1617Another example of a data paragraph: 1618 1619 =begin html 1620 1621 I like <em>PIE</em>! 1622 1623 <hr>Especially pecan pie! 1624 1625 =end html 1626 1627If these were ordinary paragraphs, the Pod parser would try to 1628expand the "EE<lt>/em>" (in the first paragraph) as a formatting 1629code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this 1630is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> 1631the identifier "html" doesn't begin have a ":" prefix, the contents 1632of this region are stored as data paragraphs, instead of being 1633processed as ordinary paragraphs (or if they began with a spaces 1634and/or tabs, as verbatim paragraphs). 1635 1636As a further example: At time of writing, no "biblio" identifier is 1637supported, but suppose some processor were written to recognize it as 1638a way of (say) denoting a bibliographic reference (necessarily 1639containing formatting codes in ordinary paragraphs). The fact that 1640"biblio" paragraphs were meant for ordinary processing would be 1641indicated by prefacing each "biblio" identifier with a colon: 1642 1643 =begin :biblio 1644 1645 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1646 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1647 1648 =end :biblio 1649 1650This would signal to the parser that paragraphs in this begin...end 1651region are subject to normal handling as ordinary/verbatim paragraphs 1652(while still tagged as meant only for processors that understand the 1653"biblio" identifier). The same effect could be had with: 1654 1655 =for :biblio 1656 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1657 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1658 1659The ":" on these identifiers means simply "process this stuff 1660normally, even though the result will be for some special target". 1661I suggest that parser APIs report "biblio" as the target identifier, 1662but also report that it had a ":" prefix. (And similarly, with the 1663above "html", report "html" as the target identifier, and note the 1664I<lack> of a ":" prefix.) 1665 1666Note that a "=begin I<identifier>"..."=end I<identifier>" region where 1667I<identifier> begins with a colon, I<can> contain commands. For example: 1668 1669 =begin :biblio 1670 1671 Wirth's classic is available in several editions, including: 1672 1673 =for comment 1674 hm, check abebooks.com for how much used copies cost. 1675 1676 =over 1677 1678 =item 1679 1680 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1681 Teubner, Stuttgart. [Yes, it's in German.] 1682 1683 =item 1684 1685 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1686 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1687 1688 =back 1689 1690 =end :biblio 1691 1692Note, however, a "=begin I<identifier>"..."=end I<identifier>" 1693region where I<identifier> does I<not> begin with a colon, should not 1694directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", 1695nor "=item". For example, this may be considered invalid: 1696 1697 =begin somedata 1698 1699 This is a data paragraph. 1700 1701 =head1 Don't do this! 1702 1703 This is a data paragraph too. 1704 1705 =end somedata 1706 1707A Pod processor may signal that the above (specifically the "=head1" 1708paragraph) is an error. Note, however, that the following should 1709I<not> be treated as an error: 1710 1711 =begin somedata 1712 1713 This is a data paragraph. 1714 1715 =cut 1716 1717 # Yup, this isn't Pod anymore. 1718 sub excl { (rand() > .5) ? "hoo!" : "hah!" } 1719 1720 =pod 1721 1722 This is a data paragraph too. 1723 1724 =end somedata 1725 1726And this too is valid: 1727 1728 =begin someformat 1729 1730 This is a data paragraph. 1731 1732 And this is a data paragraph. 1733 1734 =begin someotherformat 1735 1736 This is a data paragraph too. 1737 1738 And this is a data paragraph too. 1739 1740 =begin :yetanotherformat 1741 1742 =head2 This is a command paragraph! 1743 1744 This is an ordinary paragraph! 1745 1746 And this is a verbatim paragraph! 1747 1748 =end :yetanotherformat 1749 1750 =end someotherformat 1751 1752 Another data paragraph! 1753 1754 =end someformat 1755 1756The contents of the above "=begin :yetanotherformat" ... 1757"=end :yetanotherformat" region I<aren't> data paragraphs, because 1758the immediately containing region's identifier (":yetanotherformat") 1759begins with a colon. In practice, most regions that contain 1760data paragraphs will contain I<only> data paragraphs; however, 1761the above nesting is syntactically valid as Pod, even if it is 1762rare. However, the handlers for some formats, like "html", 1763will accept only data paragraphs, not nested regions; and they may 1764complain if they see (targeted for them) nested regions, or commands, 1765other than "=end", "=pod", and "=cut". 1766 1767Also consider this valid structure: 1768 1769 =begin :biblio 1770 1771 Wirth's classic is available in several editions, including: 1772 1773 =over 1774 1775 =item 1776 1777 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1778 Teubner, Stuttgart. [Yes, it's in German.] 1779 1780 =item 1781 1782 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1783 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1784 1785 =back 1786 1787 Buy buy buy! 1788 1789 =begin html 1790 1791 <img src='wirth_spokesmodeling_book.png'> 1792 1793 <hr> 1794 1795 =end html 1796 1797 Now now now! 1798 1799 =end :biblio 1800 1801There, the "=begin html"..."=end html" region is nested inside 1802the larger "=begin :biblio"..."=end :biblio" region. Note that the 1803content of the "=begin html"..."=end html" region is data 1804paragraph(s), because the immediately containing region's identifier 1805("html") I<doesn't> begin with a colon. 1806 1807Pod parsers, when processing a series of data paragraphs one 1808after another (within a single region), should consider them to 1809be one large data paragraph that happens to contain blank lines. So 1810the content of the above "=begin html"..."=end html" I<may> be stored 1811as two data paragraphs (one consisting of 1812"<img src='wirth_spokesmodeling_book.png'>\n" 1813and another consisting of "<hr>\n"), but I<should> be stored as 1814a single data paragraph (consisting of 1815"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). 1816 1817Pod processors should tolerate empty 1818"=begin I<something>"..."=end I<something>" regions, 1819empty "=begin :I<something>"..."=end :I<something>" regions, and 1820contentless "=for I<something>" and "=for :I<something>" 1821paragraphs. I.e., these should be tolerated: 1822 1823 =for html 1824 1825 =begin html 1826 1827 =end html 1828 1829 =begin :biblio 1830 1831 =end :biblio 1832 1833Incidentally, note that there's no easy way to express a data 1834paragraph starting with something that looks like a command. Consider: 1835 1836 =begin stuff 1837 1838 =shazbot 1839 1840 =end stuff 1841 1842There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data 1843paragraph "=shazbot\n". However, you can express a data paragraph consisting 1844of "=shazbot\n" using this code: 1845 1846 =for stuff =shazbot 1847 1848The situation where this is necessary, is presumably quite rare. 1849 1850Note that =end commands must match the currently open =begin command. That 1851is, they must properly nest. For example, this is valid: 1852 1853 =begin outer 1854 1855 X 1856 1857 =begin inner 1858 1859 Y 1860 1861 =end inner 1862 1863 Z 1864 1865 =end outer 1866 1867while this is invalid: 1868 1869 =begin outer 1870 1871 X 1872 1873 =begin inner 1874 1875 Y 1876 1877 =end outer 1878 1879 Z 1880 1881 =end inner 1882 1883This latter is improper because when the "=end outer" command is seen, the 1884currently open region has the formatname "inner", not "outer". (It just 1885happens that "outer" is the format name of a higher-up region.) This is 1886an error. Processors must by default report this as an error, and may halt 1887processing the document containing that error. A corollary of this is that 1888regions cannot "overlap". That is, the latter block above does not represent 1889a region called "outer" which contains X and Y, overlapping a region called 1890"inner" which contains Y and Z. But because it is invalid (as all 1891apparently overlapping regions would be), it doesn't represent that, or 1892anything at all. 1893 1894Similarly, this is invalid: 1895 1896 =begin thing 1897 1898 =end hting 1899 1900This is an error because the region is opened by "thing", and the "=end" 1901tries to close "hting" [sic]. 1902 1903This is also invalid: 1904 1905 =begin thing 1906 1907 =end 1908 1909This is invalid because every "=end" command must have a formatname 1910parameter. 1911 1912=head1 SEE ALSO 1913 1914L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, 1915L<podchecker> 1916 1917=head1 AUTHOR 1918 1919Sean M. Burke 1920 1921=cut 1922 1923 1924