1=head1 NAME 2 3perldebguts - Guts of Perl debugging 4 5=head1 DESCRIPTION 6 7This is not the perldebug(1) manpage, which tells you how to use 8the debugger. This manpage describes low-level details concerning 9the debugger's internals, which range from difficult to impossible 10to understand for anyone who isn't incredibly intimate with Perl's guts. 11Caveat lector. 12 13=head1 Debugger Internals 14 15Perl has special debugging hooks at compile-time and run-time used 16to create debugging environments. These hooks are not to be confused 17with the I<perl -Dxxx> command described in L<perlrun>, which is 18usable only if a special Perl is built per the instructions in the 19F<INSTALL> podpage in the Perl source tree. 20 21For example, whenever you call Perl's built-in C<caller> function 22from the package C<DB>, the arguments that the corresponding stack 23frame was called with are copied to the C<@DB::args> array. These 24mechanisms are enabled by calling Perl with the B<-d> switch. 25Specifically, the following additional features are enabled 26(cf. L<perlvar/$^P>): 27 28=over 4 29 30=item * 31 32Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require 33'perl5db.pl'}> if not present) before the first line of your program. 34 35=item * 36 37Each array C<@{"_<$filename"}> holds the lines of $filename for a 38file compiled by Perl. The same is also true for C<eval>ed strings 39that contain subroutines, or which are currently being executed. 40The $filename for C<eval>ed strings looks like C<(eval 34)>. 41Code assertions in regexes look like C<(re_eval 19)>. 42 43Values in this array are magical in numeric context: they compare 44equal to zero only if the line is not breakable. 45 46=item * 47 48Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed 49by line number. Individual entries (as opposed to the whole hash) 50are settable. Perl only cares about Boolean true here, although 51the values used by F<perl5db.pl> have the form 52C<"$break_condition\0$action">. 53 54The same holds for evaluated strings that contain subroutines, or 55which are currently being executed. The $filename for C<eval>ed strings 56looks like C<(eval 34)> or C<(re_eval 19)>. 57 58=item * 59 60Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is 61also the case for evaluated strings that contain subroutines, or 62which are currently being executed. The $filename for C<eval>ed 63strings looks like C<(eval 34)> or C<(re_eval 19)>. 64 65=item * 66 67After each C<require>d file is compiled, but before it is executed, 68C<DB::postponed(*{"_<$filename"})> is called if the subroutine 69C<DB::postponed> exists. Here, the $filename is the expanded name of 70the C<require>d file, as found in the values of %INC. 71 72=item * 73 74After each subroutine C<subname> is compiled, the existence of 75C<$DB::postponed{subname}> is checked. If this key exists, 76C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine 77also exists. 78 79=item * 80 81A hash C<%DB::sub> is maintained, whose keys are subroutine names 82and whose values have the form C<filename:startline-endline>. 83C<filename> has the form C<(eval 34)> for subroutines defined inside 84C<eval>s, or C<(re_eval 19)> for those within regex code assertions. 85 86=item * 87 88When the execution of your program reaches a point that can hold a 89breakpoint, the C<DB::DB()> subroutine is called if any of the variables 90C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables 91are not C<local>izable. This feature is disabled when executing 92inside C<DB::DB()>, including functions called from it 93unless C<< $^D & (1<<30) >> is true. 94 95=item * 96 97When execution of the program reaches a subroutine call, a call to 98C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the 99name of the called subroutine. (This doesn't happen if the subroutine 100was compiled in the C<DB> package.) 101 102=back 103 104Note that if C<&DB::sub> needs external data for it to work, no 105subroutine call is possible without it. As an example, the standard 106debugger's C<&DB::sub> depends on the C<$DB::deep> variable 107(it defines how many levels of recursion deep into the debugger you can go 108before a mandatory break). If C<$DB::deep> is not defined, subroutine 109calls are not possible, even though C<&DB::sub> exists. 110 111=head2 Writing Your Own Debugger 112 113=head3 Environment Variables 114 115The C<PERL5DB> environment variable can be used to define a debugger. 116For example, the minimal "working" debugger (it actually doesn't do anything) 117consists of one line: 118 119 sub DB::DB {} 120 121It can easily be defined like this: 122 123 $ PERL5DB="sub DB::DB {}" perl -d your-script 124 125Another brief debugger, slightly more useful, can be created 126with only the line: 127 128 sub DB::DB {print ++$i; scalar <STDIN>} 129 130This debugger prints a number which increments for each statement 131encountered and waits for you to hit a newline before continuing 132to the next statement. 133 134The following debugger is actually useful: 135 136 { 137 package DB; 138 sub DB {} 139 sub sub {print ++$i, " $sub\n"; &$sub} 140 } 141 142It prints the sequence number of each subroutine call and the name of the 143called subroutine. Note that C<&DB::sub> is being compiled into the 144package C<DB> through the use of the C<package> directive. 145 146When it starts, the debugger reads your rc file (F<./.perldb> or 147F<~/.perldb> under Unix), which can set important options. 148(A subroutine (C<&afterinit>) can be defined here as well; it is executed 149after the debugger completes its own initialization.) 150 151After the rc file is read, the debugger reads the PERLDB_OPTS 152environment variable and uses it to set debugger options. The 153contents of this variable are treated as if they were the argument 154of an C<o ...> debugger command (q.v. in L<perldebug/Options>). 155 156=head3 Debugger internal variables 157In addition to the file and subroutine-related variables mentioned above, 158the debugger also maintains various magical internal variables. 159 160=over 4 161 162=item * 163 164C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which 165holds the lines of the currently-selected file (compiled by Perl), either 166explicitly chosen with the debugger's C<f> command, or implicitly by flow 167of execution. 168 169Values in this array are magical in numeric context: they compare 170equal to zero only if the line is not breakable. 171 172=item * 173 174C<%DB::dbline>, is an alias for C<%{"::_<current_file"}>, which 175contains breakpoints and actions keyed by line number in 176the currently-selected file, either explicitly chosen with the 177debugger's C<f> command, or implicitly by flow of execution. 178 179As previously noted, individual entries (as opposed to the whole hash) 180are settable. Perl only cares about Boolean true here, although 181the values used by F<perl5db.pl> have the form 182C<"$break_condition\0$action">. 183 184=back 185 186=head3 Debugger customization functions 187 188Some functions are provided to simplify customization. 189 190=over 4 191 192=item * 193 194See L<perldebug/"Options"> for description of options parsed by 195C<DB::parse_options(string)> parses debugger options; see 196L<pperldebug/Options> for a description of options recognized. 197 198=item * 199 200C<DB::dump_trace(skip[,count])> skips the specified number of frames 201and returns a list containing information about the calling frames (all 202of them, if C<count> is missing). Each entry is reference to a hash 203with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine 204name, or info about C<eval>), C<args> (C<undef> or a reference to 205an array), C<file>, and C<line>. 206 207=item * 208 209C<DB::print_trace(FH, skip[, count[, short]])> prints 210formatted info about caller frames. The last two functions may be 211convenient as arguments to C<< < >>, C<< << >> commands. 212 213=back 214 215Note that any variables and functions that are not documented in 216this manpages (or in L<perldebug>) are considered for internal 217use only, and as such are subject to change without notice. 218 219=head1 Frame Listing Output Examples 220 221The C<frame> option can be used to control the output of frame 222information. For example, contrast this expression trace: 223 224 $ perl -de 42 225 Stack dump during die enabled outside of evals. 226 227 Loading DB routines from perl5db.pl patch level 0.94 228 Emacs support available. 229 230 Enter h or `h h' for help. 231 232 main::(-e:1): 0 233 DB<1> sub foo { 14 } 234 235 DB<2> sub bar { 3 } 236 237 DB<3> t print foo() * bar() 238 main::((eval 172):3): print foo() + bar(); 239 main::foo((eval 168):2): 240 main::bar((eval 170):2): 241 42 242 243with this one, once the C<o>ption C<frame=2> has been set: 244 245 DB<4> o f=2 246 frame = '2' 247 DB<5> t print foo() * bar() 248 3: foo() * bar() 249 entering main::foo 250 2: sub foo { 14 }; 251 exited main::foo 252 entering main::bar 253 2: sub bar { 3 }; 254 exited main::bar 255 42 256 257By way of demonstration, we present below a laborious listing 258resulting from setting your C<PERLDB_OPTS> environment variable to 259the value C<f=n N>, and running I<perl -d -V> from the command line. 260Examples use various values of C<n> are shown to give you a feel 261for the difference between settings. Long those it may be, this 262is not a complete listing, but only excerpts. 263 264=over 4 265 266=item 1 267 268 entering main::BEGIN 269 entering Config::BEGIN 270 Package lib/Exporter.pm. 271 Package lib/Carp.pm. 272 Package lib/Config.pm. 273 entering Config::TIEHASH 274 entering Exporter::import 275 entering Exporter::export 276 entering Config::myconfig 277 entering Config::FETCH 278 entering Config::FETCH 279 entering Config::FETCH 280 entering Config::FETCH 281 282=item 2 283 284 entering main::BEGIN 285 entering Config::BEGIN 286 Package lib/Exporter.pm. 287 Package lib/Carp.pm. 288 exited Config::BEGIN 289 Package lib/Config.pm. 290 entering Config::TIEHASH 291 exited Config::TIEHASH 292 entering Exporter::import 293 entering Exporter::export 294 exited Exporter::export 295 exited Exporter::import 296 exited main::BEGIN 297 entering Config::myconfig 298 entering Config::FETCH 299 exited Config::FETCH 300 entering Config::FETCH 301 exited Config::FETCH 302 entering Config::FETCH 303 304=item 4 305 306 in $=main::BEGIN() from /dev/null:0 307 in $=Config::BEGIN() from lib/Config.pm:2 308 Package lib/Exporter.pm. 309 Package lib/Carp.pm. 310 Package lib/Config.pm. 311 in $=Config::TIEHASH('Config') from lib/Config.pm:644 312 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 313 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li 314 in @=Config::myconfig() from /dev/null:0 315 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 316 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 317 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 318 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 319 in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 320 in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 321 322=item 6 323 324 in $=main::BEGIN() from /dev/null:0 325 in $=Config::BEGIN() from lib/Config.pm:2 326 Package lib/Exporter.pm. 327 Package lib/Carp.pm. 328 out $=Config::BEGIN() from lib/Config.pm:0 329 Package lib/Config.pm. 330 in $=Config::TIEHASH('Config') from lib/Config.pm:644 331 out $=Config::TIEHASH('Config') from lib/Config.pm:644 332 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 333 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ 334 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ 335 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 336 out $=main::BEGIN() from /dev/null:0 337 in @=Config::myconfig() from /dev/null:0 338 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 339 out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 340 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 341 out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 342 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 343 out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 344 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 345 346=item 14 347 348 in $=main::BEGIN() from /dev/null:0 349 in $=Config::BEGIN() from lib/Config.pm:2 350 Package lib/Exporter.pm. 351 Package lib/Carp.pm. 352 out $=Config::BEGIN() from lib/Config.pm:0 353 Package lib/Config.pm. 354 in $=Config::TIEHASH('Config') from lib/Config.pm:644 355 out $=Config::TIEHASH('Config') from lib/Config.pm:644 356 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 357 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E 358 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E 359 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 360 out $=main::BEGIN() from /dev/null:0 361 in @=Config::myconfig() from /dev/null:0 362 in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 363 out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 364 in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 365 out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 366 367=item 30 368 369 in $=CODE(0x15eca4)() from /dev/null:0 370 in $=CODE(0x182528)() from lib/Config.pm:2 371 Package lib/Exporter.pm. 372 out $=CODE(0x182528)() from lib/Config.pm:0 373 scalar context return from CODE(0x182528): undef 374 Package lib/Config.pm. 375 in $=Config::TIEHASH('Config') from lib/Config.pm:628 376 out $=Config::TIEHASH('Config') from lib/Config.pm:628 377 scalar context return from Config::TIEHASH: empty hash 378 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 379 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 380 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 381 scalar context return from Exporter::export: '' 382 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 383 scalar context return from Exporter::import: '' 384 385=back 386 387In all cases shown above, the line indentation shows the call tree. 388If bit 2 of C<frame> is set, a line is printed on exit from a 389subroutine as well. If bit 4 is set, the arguments are printed 390along with the caller info. If bit 8 is set, the arguments are 391printed even if they are tied or references. If bit 16 is set, the 392return value is printed, too. 393 394When a package is compiled, a line like this 395 396 Package lib/Carp.pm. 397 398is printed with proper indentation. 399 400=head1 Debugging regular expressions 401 402There are two ways to enable debugging output for regular expressions. 403 404If your perl is compiled with C<-DDEBUGGING>, you may use the 405B<-Dr> flag on the command line. 406 407Otherwise, one can C<use re 'debug'>, which has effects at 408compile time and run time. It is not lexically scoped. 409 410=head2 Compile-time output 411 412The debugging output at compile time looks like this: 413 414 Compiling REx `[bc]d(ef*g)+h[ij]k$' 415 size 45 Got 364 bytes for offset annotations. 416 first at 1 417 rarest char g at 0 418 rarest char d at 0 419 1: ANYOF[bc](12) 420 12: EXACT <d>(14) 421 14: CURLYX[0] {1,32767}(28) 422 16: OPEN1(18) 423 18: EXACT <e>(20) 424 20: STAR(23) 425 21: EXACT <f>(0) 426 23: EXACT <g>(25) 427 25: CLOSE1(27) 428 27: WHILEM[1/1](0) 429 28: NOTHING(29) 430 29: EXACT <h>(31) 431 31: ANYOF[ij](42) 432 42: EXACT <k>(44) 433 44: EOL(45) 434 45: END(0) 435 anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) 436 stclass `ANYOF[bc]' minlen 7 437 Offsets: [45] 438 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] 439 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] 440 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] 441 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] 442 Omitting $` $& $' support. 443 444The first line shows the pre-compiled form of the regex. The second 445shows the size of the compiled form (in arbitrary units, usually 4464-byte words) and the total number of bytes allocated for the 447offset/length table, usually 4+C<size>*8. The next line shows the 448label I<id> of the first node that does a match. 449 450The 451 452 anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) 453 stclass `ANYOF[bc]' minlen 7 454 455line (split into two lines above) contains optimizer 456information. In the example shown, the optimizer found that the match 457should contain a substring C<de> at offset 1, plus substring C<gh> 458at some offset between 3 and infinity. Moreover, when checking for 459these substrings (to abandon impossible matches quickly), Perl will check 460for the substring C<gh> before checking for the substring C<de>. The 461optimizer may also use the knowledge that the match starts (at the 462C<first> I<id>) with a character class, and no string 463shorter than 7 characters can possibly match. 464 465The fields of interest which may appear in this line are 466 467=over 4 468 469=item C<anchored> I<STRING> C<at> I<POS> 470 471=item C<floating> I<STRING> C<at> I<POS1..POS2> 472 473See above. 474 475=item C<matching floating/anchored> 476 477Which substring to check first. 478 479=item C<minlen> 480 481The minimal length of the match. 482 483=item C<stclass> I<TYPE> 484 485Type of first matching node. 486 487=item C<noscan> 488 489Don't scan for the found substrings. 490 491=item C<isall> 492 493Means that the optimizer information is all that the regular 494expression contains, and thus one does not need to enter the regex engine at 495all. 496 497=item C<GPOS> 498 499Set if the pattern contains C<\G>. 500 501=item C<plus> 502 503Set if the pattern starts with a repeated char (as in C<x+y>). 504 505=item C<implicit> 506 507Set if the pattern starts with C<.*>. 508 509=item C<with eval> 510 511Set if the pattern contain eval-groups, such as C<(?{ code })> and 512C<(??{ code })>. 513 514=item C<anchored(TYPE)> 515 516If the pattern may match only at a handful of places, (with C<TYPE> 517being C<BOL>, C<MBOL>, or C<GPOS>. See the table below. 518 519=back 520 521If a substring is known to match at end-of-line only, it may be 522followed by C<$>, as in C<floating `k'$>. 523 524The optimizer-specific information is used to avoid entering (a slow) regex 525engine on strings that will not definitely match. If the C<isall> flag 526is set, a call to the regex engine may be avoided even when the optimizer 527found an appropriate place for the match. 528 529Above the optimizer section is the list of I<nodes> of the compiled 530form of the regex. Each line has format 531 532C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) 533 534=head2 Types of nodes 535 536Here are the possible types, with short descriptions: 537 538 # TYPE arg-description [num-args] [longjump-len] DESCRIPTION 539 540 # Exit points 541 END no End of program. 542 SUCCEED no Return from a subroutine, basically. 543 544 # Anchors: 545 BOL no Match "" at beginning of line. 546 MBOL no Same, assuming multiline. 547 SBOL no Same, assuming singleline. 548 EOS no Match "" at end of string. 549 EOL no Match "" at end of line. 550 MEOL no Same, assuming multiline. 551 SEOL no Same, assuming singleline. 552 BOUND no Match "" at any word boundary 553 BOUNDL no Match "" at any word boundary 554 NBOUND no Match "" at any word non-boundary 555 NBOUNDL no Match "" at any word non-boundary 556 GPOS no Matches where last m//g left off. 557 558 # [Special] alternatives 559 ANY no Match any one character (except newline). 560 SANY no Match any one character. 561 ANYOF sv Match character in (or not in) this class. 562 ALNUM no Match any alphanumeric character 563 ALNUML no Match any alphanumeric char in locale 564 NALNUM no Match any non-alphanumeric character 565 NALNUML no Match any non-alphanumeric char in locale 566 SPACE no Match any whitespace character 567 SPACEL no Match any whitespace char in locale 568 NSPACE no Match any non-whitespace character 569 NSPACEL no Match any non-whitespace char in locale 570 DIGIT no Match any numeric character 571 NDIGIT no Match any non-numeric character 572 573 # BRANCH The set of branches constituting a single choice are hooked 574 # together with their "next" pointers, since precedence prevents 575 # anything being concatenated to any individual branch. The 576 # "next" pointer of the last BRANCH in a choice points to the 577 # thing following the whole choice. This is also where the 578 # final "next" pointer of each individual branch points; each 579 # branch starts with the operand node of a BRANCH node. 580 # 581 BRANCH node Match this alternative, or the next... 582 583 # BACK Normal "next" pointers all implicitly point forward; BACK 584 # exists to make loop structures possible. 585 # not used 586 BACK no Match "", "next" ptr points backward. 587 588 # Literals 589 EXACT sv Match this string (preceded by length). 590 EXACTF sv Match this string, folded (prec. by length). 591 EXACTFL sv Match this string, folded in locale (w/len). 592 593 # Do nothing 594 NOTHING no Match empty string. 595 # A variant of above which delimits a group, thus stops optimizations 596 TAIL no Match empty string. Can jump here from outside. 597 598 # STAR,PLUS '?', and complex '*' and '+', are implemented as circular 599 # BRANCH structures using BACK. Simple cases (one character 600 # per match) are implemented with STAR and PLUS for speed 601 # and to minimize recursive plunges. 602 # 603 STAR node Match this (simple) thing 0 or more times. 604 PLUS node Match this (simple) thing 1 or more times. 605 606 CURLY sv 2 Match this simple thing {n,m} times. 607 CURLYN no 2 Match next-after-this simple thing 608 # {n,m} times, set parens. 609 CURLYM no 2 Match this medium-complex thing {n,m} times. 610 CURLYX sv 2 Match this complex thing {n,m} times. 611 612 # This terminator creates a loop structure for CURLYX 613 WHILEM no Do curly processing and see if rest matches. 614 615 # OPEN,CLOSE,GROUPP ...are numbered at compile time. 616 OPEN num 1 Mark this point in input as start of #n. 617 CLOSE num 1 Analogous to OPEN. 618 619 REF num 1 Match some already matched string 620 REFF num 1 Match already matched string, folded 621 REFFL num 1 Match already matched string, folded in loc. 622 623 # grouping assertions 624 IFMATCH off 1 2 Succeeds if the following matches. 625 UNLESSM off 1 2 Fails if the following matches. 626 SUSPEND off 1 1 "Independent" sub-regex. 627 IFTHEN off 1 1 Switch, should be preceded by switcher . 628 GROUPP num 1 Whether the group matched. 629 630 # Support for long regex 631 LONGJMP off 1 1 Jump far away. 632 BRANCHJ off 1 1 BRANCH with long offset. 633 634 # The heavy worker 635 EVAL evl 1 Execute some Perl code. 636 637 # Modifiers 638 MINMOD no Next operator is not greedy. 639 LOGICAL no Next opcode should set the flag only. 640 641 # This is not used yet 642 RENUM off 1 1 Group with independently numbered parens. 643 644 # This is not really a node, but an optimized away piece of a "long" node. 645 # To simplify debugging output, we mark it as if it were a node 646 OPTIMIZED off Placeholder for dump. 647 648=for unprinted-credits 649Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 650 651Following the optimizer information is a dump of the offset/length 652table, here split across several lines: 653 654 Offsets: [45] 655 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] 656 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] 657 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] 658 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] 659 660The first line here indicates that the offset/length table contains 45 661entries. Each entry is a pair of integers, denoted by C<offset[length]>. 662Entries are numbered starting with 1, so entry #1 here is C<1[4]> and 663entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:> 664(the C<1: ANYOF[bc]>) begins at character position 1 in the 665pre-compiled form of the regex, and has a length of 4 characters. 666C<5[1]> in position 12 667indicates that the node labeled C<12:> 668(the C<< 12: EXACT <d> >>) begins at character position 5 in the 669pre-compiled form of the regex, and has a length of 1 character. 670C<12[1]> in position 14 671indicates that the node labeled C<14:> 672(the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the 673pre-compiled form of the regex, and has a length of 1 character---that 674is, it corresponds to the C<+> symbol in the precompiled regex. 675 676C<0[0]> items indicate that there is no corresponding node. 677 678=head2 Run-time output 679 680First of all, when doing a match, one may get no run-time output even 681if debugging is enabled. This means that the regex engine was never 682entered and that all of the job was therefore done by the optimizer. 683 684If the regex engine was entered, the output may look like this: 685 686 Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' 687 Setting an EVAL scope, savestack=3 688 2 <ab> <cdefg__gh_> | 1: ANYOF 689 3 <abc> <defg__gh_> | 11: EXACT <d> 690 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} 691 4 <abcd> <efg__gh_> | 26: WHILEM 692 0 out of 1..32767 cc=effff31c 693 4 <abcd> <efg__gh_> | 15: OPEN1 694 4 <abcd> <efg__gh_> | 17: EXACT <e> 695 5 <abcde> <fg__gh_> | 19: STAR 696 EXACT <f> can match 1 times out of 32767... 697 Setting an EVAL scope, savestack=3 698 6 <bcdef> <g__gh__> | 22: EXACT <g> 699 7 <bcdefg> <__gh__> | 24: CLOSE1 700 7 <bcdefg> <__gh__> | 26: WHILEM 701 1 out of 1..32767 cc=effff31c 702 Setting an EVAL scope, savestack=12 703 7 <bcdefg> <__gh__> | 15: OPEN1 704 7 <bcdefg> <__gh__> | 17: EXACT <e> 705 restoring \1 to 4(4)..7 706 failed, try continuation... 707 7 <bcdefg> <__gh__> | 27: NOTHING 708 7 <bcdefg> <__gh__> | 28: EXACT <h> 709 failed... 710 failed... 711 712The most significant information in the output is about the particular I<node> 713of the compiled regex that is currently being tested against the target string. 714The format of these lines is 715 716C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> 717 718The I<TYPE> info is indented with respect to the backtracking level. 719Other incidental information appears interspersed within. 720 721=head1 Debugging Perl memory usage 722 723Perl is a profligate wastrel when it comes to memory use. There 724is a saying that to estimate memory usage of Perl, assume a reasonable 725algorithm for memory allocation, multiply that estimate by 10, and 726while you still may miss the mark, at least you won't be quite so 727astonished. This is not absolutely true, but may provide a good 728grasp of what happens. 729 730Assume that an integer cannot take less than 20 bytes of memory, a 731float cannot take less than 24 bytes, a string cannot take less 732than 32 bytes (all these examples assume 32-bit architectures, the 733result are quite a bit worse on 64-bit architectures). If a variable 734is accessed in two of three different ways (which require an integer, 735a float, or a string), the memory footprint may increase yet another 73620 bytes. A sloppy malloc(3) implementation can inflate these 737numbers dramatically. 738 739On the opposite end of the scale, a declaration like 740 741 sub foo; 742 743may take up to 500 bytes of memory, depending on which release of Perl 744you're running. 745 746Anecdotal estimates of source-to-compiled code bloat suggest an 747eightfold increase. This means that the compiled form of reasonable 748(normally commented, properly indented etc.) code will take 749about eight times more space in memory than the code took 750on disk. 751 752The B<-DL> command-line switch is obsolete since circa Perl 5.6.0 753(it was available only if Perl was built with C<-DDEBUGGING>). 754The switch was used to track Perl's memory allocations and possible 755memory leaks. These days the use of malloc debugging tools like 756F<Purify> or F<valgrind> is suggested instead. 757 758One way to find out how much memory is being used by Perl data 759structures is to install the Devel::Size module from CPAN: it gives 760you the minimum number of bytes required to store a particular data 761structure. Please be mindful of the difference between the size() 762and total_size(). 763 764If Perl has been compiled using Perl's malloc you can analyze Perl 765memory usage by setting the $ENV{PERL_DEBUG_MSTATS}. 766 767=head2 Using C<$ENV{PERL_DEBUG_MSTATS}> 768 769If your perl is using Perl's malloc() and was compiled with the 770necessary switches (this is the default), then it will print memory 771usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} 772> 1 >>, and before termination of the program when C<< 773$ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to 774the following example: 775 776 $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" 777 Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) 778 14216 free: 130 117 28 7 9 0 2 2 1 0 0 779 437 61 36 0 5 780 60924 used: 125 137 161 55 7 8 6 16 2 0 1 781 74 109 304 84 20 782 Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. 783 Memory allocation statistics after execution: (buckets 4(4)..8188(8192) 784 30888 free: 245 78 85 13 6 2 1 3 2 0 1 785 315 162 39 42 11 786 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 787 196 178 1066 798 39 788 Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. 789 790It is possible to ask for such a statistic at arbitrary points in 791your execution using the mstat() function out of the standard 792Devel::Peek module. 793 794Here is some explanation of that format: 795 796=over 4 797 798=item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> 799 800Perl's malloc() uses bucketed allocations. Every request is rounded 801up to the closest bucket size available, and a bucket is taken from 802the pool of buckets of that size. 803 804The line above describes the limits of buckets currently in use. 805Each bucket has two sizes: memory footprint and the maximal size 806of user data that can fit into this bucket. Suppose in the above 807example that the smallest bucket were size 4. The biggest bucket 808would have usable size 8188, and the memory footprint would be 8192. 809 810In a Perl built for debugging, some buckets may have negative usable 811size. This means that these buckets cannot (and will not) be used. 812For larger buckets, the memory footprint may be one page greater 813than a power of 2. If so, case the corresponding power of two is 814printed in the C<APPROX> field above. 815 816=item Free/Used 817 818The 1 or 2 rows of numbers following that correspond to the number 819of buckets of each size between C<SMALLEST> and C<GREATEST>. In 820the first row, the sizes (memory footprints) of buckets are powers 821of two--or possibly one page greater. In the second row, if present, 822the memory footprints of the buckets are between the memory footprints 823of two buckets "above". 824 825For example, suppose under the previous example, the memory footprints 826were 827 828 free: 8 16 32 64 128 256 512 1024 2048 4096 8192 829 4 12 24 48 80 830 831With non-C<DEBUGGING> perl, the buckets starting from C<128> have 832a 4-byte overhead, and thus an 8192-long bucket may take up to 8338188-byte allocations. 834 835=item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> 836 837The first two fields give the total amount of memory perl sbrk(2)ed 838(ess-broken? :-) and number of sbrk(2)s used. The third number is 839what perl thinks about continuity of returned chunks. So long as 840this number is positive, malloc() will assume that it is probable 841that sbrk(2) will provide continuous memory. 842 843Memory allocated by external libraries is not counted. 844 845=item C<pad: 0> 846 847The amount of sbrk(2)ed memory needed to keep buckets aligned. 848 849=item C<heads: 2192> 850 851Although memory overhead of bigger buckets is kept inside the bucket, for 852smaller buckets, it is kept in separate areas. This field gives the 853total size of these areas. 854 855=item C<chain: 0> 856 857malloc() may want to subdivide a bigger bucket into smaller buckets. 858If only a part of the deceased bucket is left unsubdivided, the rest 859is kept as an element of a linked list. This field gives the total 860size of these chunks. 861 862=item C<tail: 6144> 863 864To minimize the number of sbrk(2)s, malloc() asks for more memory. This 865field gives the size of the yet unused part, which is sbrk(2)ed, but 866never touched. 867 868=back 869 870=head2 Example of using B<-DL> switch 871 872(Note that -DL is obsolete since circa 5.6.0, and even before that 873Perl needed to be compiled with -DDEBUGGING.) 874 875Below we show how to analyse memory usage by 876 877 do 'lib/auto/POSIX/autosplit.ix'; 878 879The file in question contains a header and 146 lines similar to 880 881 sub getcwd; 882 883B<WARNING>: The discussion below supposes 32-bit architecture. In 884newer releases of Perl, memory usage of the constructs discussed 885here is greatly improved, but the story discussed below is a real-life 886story. This story is mercilessly terse, and assumes rather more than cursory 887knowledge of Perl internals. Type space to continue, `q' to quit. 888(Actually, you just want to skip to the next section.) 889 890Here is the itemized list of Perl allocations performed during parsing 891of this file: 892 893 !!! "after" at test.pl line 3. 894 Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+ 895 0 02 13752 . . . . 294 . . . . . . . . . . 4 896 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3 897 5 05 32 . . . . . . . 1 . . . . . . . . 898 6 02 7152 . . . . . . . . . . 149 . . . . . 899 7 02 3600 . . . . . 150 . . . . . . . . . . 900 7 03 64 . -1 . 1 . . 2 . . . . . . . . . 901 7 04 7056 . . . . . . . . . . . . . . . 7 902 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 . 903 9 03 2078 17 249 32 . . . . 2 . . . . . . . . 904 905 906To see this list, insert two C<warn('!...')> statements around the call: 907 908 warn('!'); 909 do 'lib/auto/POSIX/autosplit.ix'; 910 warn('!!! "after"'); 911 912and run it with Perl's B<-DL> option. The first warn() will print 913memory allocation info before parsing the file and will memorize 914the statistics at this point (we ignore what it prints). The second 915warn() prints increments with respect to these memorized data. This 916is the printout shown above. 917 918Different I<Id>s on the left correspond to different subsystems of 919the perl interpreter. They are just the first argument given to 920the perl memory allocation API named New(). To find what C<9 03> 921means, just B<grep> the perl source for C<903>. You'll find it in 922F<util.c>, function savepvn(). (I know, you wonder why we told you 923to B<grep> and then gave away the answer. That's because grepping 924the source is good for the soul.) This function is used to store 925a copy of an existing chunk of memory. Using a C debugger, one can 926see that the function was called either directly from gv_init() or 927via sv_magic(), and that gv_init() is called from gv_fetchpv()--which 928was itself called from newSUB(). Please stop to catch your breath now. 929 930B<NOTE>: To reach this point in the debugger and skip the calls to 931savepvn() during the compilation of the main program, you should 932set a C breakpoint 933in Perl_warn(), continue until this point is reached, and I<then> set 934a C breakpoint in Perl_savepvn(). Note that you may need to skip a 935handful of Perl_savepvn() calls that do not correspond to mass production 936of CVs (there are more C<903> allocations than 146 similar lines of 937F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are 938added by macroization code in perl header files to avoid conflicts 939with external libraries. 940 941Anyway, we see that C<903> ids correspond to creation of globs, twice 942per glob - for glob name, and glob stringification magic. 943 944Here are explanations for other I<Id>s above: 945 946=over 4 947 948=item C<717> 949 950Creates bigger C<XPV*> structures. In the case above, it 951creates 3 C<AV>s per subroutine, one for a list of lexical variable 952names, one for a scratchpad (which contains lexical variables and 953C<targets>), and one for the array of scratchpads needed for 954recursion. 955 956It also creates a C<GV> and a C<CV> per subroutine, all called from 957start_subparse(). 958 959=item C<002> 960 961Creates a C array corresponding to the C<AV> of scratchpads and the 962scratchpad itself. The first fake entry of this scratchpad is 963created though the subroutine itself is not defined yet. 964 965It also creates C arrays to keep data for the stash. This is one HV, 966but it grows; thus, there are 4 big allocations: the big chunks are not 967freed, but are kept as additional arenas for C<SV> allocations. 968 969=item C<054> 970 971Creates a C<HEK> for the name of the glob for the subroutine. This 972name is a key in a I<stash>. 973 974Big allocations with this I<Id> correspond to allocations of new 975arenas to keep C<HE>. 976 977=item C<602> 978 979Creates a C<GP> for the glob for the subroutine. 980 981=item C<702> 982 983Creates the C<MAGIC> for the glob for the subroutine. 984 985=item C<704> 986 987Creates I<arenas> which keep SVs. 988 989=back 990 991=head2 B<-DL> details 992 993If Perl is run with B<-DL> option, then warn()s that start with `!' 994behave specially. They print a list of I<categories> of memory 995allocations, and statistics of allocations of different sizes for 996these categories. 997 998If warn() string starts with 999 1000=over 4 1001 1002=item C<!!!> 1003 1004print changed categories only, print the differences in counts of allocations. 1005 1006=item C<!!> 1007 1008print grown categories only; print the absolute values of counts, and totals. 1009 1010=item C<!> 1011 1012print nonempty categories, print the absolute values of counts and totals. 1013 1014=back 1015 1016=head2 Limitations of B<-DL> statistics 1017 1018If an extension or external library does not use the Perl API to 1019allocate memory, such allocations are not counted. 1020 1021=head1 SEE ALSO 1022 1023L<perldebug>, 1024L<perlguts>, 1025L<perlrun> 1026L<re>, 1027and 1028L<Devel::DProf>. 1029