1=head1 NAME 2 3perlhack - How to hack at the Perl internals 4 5=head1 DESCRIPTION 6 7This document attempts to explain how Perl development takes place, 8and ends with some suggestions for people wanting to become bona fide 9porters. 10 11The perl5-porters mailing list is where the Perl standard distribution 12is maintained and developed. The list can get anywhere from 10 to 150 13messages a day, depending on the heatedness of the debate. Most days 14there are two or three patches, extensions, features, or bugs being 15discussed at a time. 16 17A searchable archive of the list is at either: 18 19 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ 20 21or 22 23 http://archive.develooper.com/perl5-porters@perl.org/ 24 25List subscribers (the porters themselves) come in several flavours. 26Some are quiet curious lurkers, who rarely pitch in and instead watch 27the ongoing development to ensure they're forewarned of new changes or 28features in Perl. Some are representatives of vendors, who are there 29to make sure that Perl continues to compile and work on their 30platforms. Some patch any reported bug that they know how to fix, 31some are actively patching their pet area (threads, Win32, the regexp 32engine), while others seem to do nothing but complain. In other 33words, it's your usual mix of technical people. 34 35Over this group of porters presides Larry Wall. He has the final word 36in what does and does not change in the Perl language. Various 37releases of Perl are shepherded by a ``pumpking'', a porter 38responsible for gathering patches, deciding on a patch-by-patch 39feature-by-feature basis what will and will not go into the release. 40For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of 41Perl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and 42Hugo van der Sanden will be the pumpking for the 5.10 release. 43 44In addition, various people are pumpkings for different things. For 45instance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure> 46pumpkin. 47 48Larry sees Perl development along the lines of the US government: 49there's the Legislature (the porters), the Executive branch (the 50pumpkings), and the Supreme Court (Larry). The legislature can 51discuss and submit patches to the executive branch all they like, but 52the executive branch is free to veto them. Rarely, the Supreme Court 53will side with the executive branch over the legislature, or the 54legislature over the executive branch. Mostly, however, the 55legislature and the executive branch are supposed to get along and 56work out their differences without impeachment or court cases. 57 58You might sometimes see reference to Rule 1 and Rule 2. Larry's power 59as Supreme Court is expressed in The Rules: 60 61=over 4 62 63=item 1 64 65Larry is always by definition right about how Perl should behave. 66This means he has final veto power on the core functionality. 67 68=item 2 69 70Larry is allowed to change his mind about any matter at a later date, 71regardless of whether he previously invoked Rule 1. 72 73=back 74 75Got that? Larry is always right, even when he was wrong. It's rare 76to see either Rule exercised, but they are often alluded to. 77 78New features and extensions to the language are contentious, because 79the criteria used by the pumpkings, Larry, and other porters to decide 80which features should be implemented and incorporated are not codified 81in a few small design goals as with some other languages. Instead, 82the heuristics are flexible and often difficult to fathom. Here is 83one person's list, roughly in decreasing order of importance, of 84heuristics that new features have to be weighed against: 85 86=over 4 87 88=item Does concept match the general goals of Perl? 89 90These haven't been written anywhere in stone, but one approximation 91is: 92 93 1. Keep it fast, simple, and useful. 94 2. Keep features/concepts as orthogonal as possible. 95 3. No arbitrary limits (platforms, data sizes, cultures). 96 4. Keep it open and exciting to use/patch/advocate Perl everywhere. 97 5. Either assimilate new technologies, or build bridges to them. 98 99=item Where is the implementation? 100 101All the talk in the world is useless without an implementation. In 102almost every case, the person or people who argue for a new feature 103will be expected to be the ones who implement it. Porters capable 104of coding new features have their own agendas, and are not available 105to implement your (possibly good) idea. 106 107=item Backwards compatibility 108 109It's a cardinal sin to break existing Perl programs. New warnings are 110contentious--some say that a program that emits warnings is not 111broken, while others say it is. Adding keywords has the potential to 112break programs, changing the meaning of existing token sequences or 113functions might break programs. 114 115=item Could it be a module instead? 116 117Perl 5 has extension mechanisms, modules and XS, specifically to avoid 118the need to keep changing the Perl interpreter. You can write modules 119that export functions, you can give those functions prototypes so they 120can be called like built-in functions, you can even write XS code to 121mess with the runtime data structures of the Perl interpreter if you 122want to implement really complicated things. If it can be done in a 123module instead of in the core, it's highly unlikely to be added. 124 125=item Is the feature generic enough? 126 127Is this something that only the submitter wants added to the language, 128or would it be broadly useful? Sometimes, instead of adding a feature 129with a tight focus, the porters might decide to wait until someone 130implements the more generalized feature. For instance, instead of 131implementing a ``delayed evaluation'' feature, the porters are waiting 132for a macro system that would permit delayed evaluation and much more. 133 134=item Does it potentially introduce new bugs? 135 136Radical rewrites of large chunks of the Perl interpreter have the 137potential to introduce new bugs. The smaller and more localized the 138change, the better. 139 140=item Does it preclude other desirable features? 141 142A patch is likely to be rejected if it closes off future avenues of 143development. For instance, a patch that placed a true and final 144interpretation on prototypes is likely to be rejected because there 145are still options for the future of prototypes that haven't been 146addressed. 147 148=item Is the implementation robust? 149 150Good patches (tight code, complete, correct) stand more chance of 151going in. Sloppy or incorrect patches might be placed on the back 152burner until the pumpking has time to fix, or might be discarded 153altogether without further notice. 154 155=item Is the implementation generic enough to be portable? 156 157The worst patches make use of a system-specific features. It's highly 158unlikely that nonportable additions to the Perl language will be 159accepted. 160 161=item Is the implementation tested? 162 163Patches which change behaviour (fixing bugs or introducing new features) 164must include regression tests to verify that everything works as expected. 165Without tests provided by the original author, how can anyone else changing 166perl in the future be sure that they haven't unwittingly broken the behaviour 167the patch implements? And without tests, how can the patch's author be 168confident that his/her hard work put into the patch won't be accidentally 169thrown away by someone in the future? 170 171=item Is there enough documentation? 172 173Patches without documentation are probably ill-thought out or 174incomplete. Nothing can be added without documentation, so submitting 175a patch for the appropriate manpages as well as the source code is 176always a good idea. 177 178=item Is there another way to do it? 179 180Larry said ``Although the Perl Slogan is I<There's More Than One Way 181to Do It>, I hesitate to make 10 ways to do something''. This is a 182tricky heuristic to navigate, though--one man's essential addition is 183another man's pointless cruft. 184 185=item Does it create too much work? 186 187Work for the pumpking, work for Perl programmers, work for module 188authors, ... Perl is supposed to be easy. 189 190=item Patches speak louder than words 191 192Working code is always preferred to pie-in-the-sky ideas. A patch to 193add a feature stands a much higher chance of making it to the language 194than does a random feature request, no matter how fervently argued the 195request might be. This ties into ``Will it be useful?'', as the fact 196that someone took the time to make the patch demonstrates a strong 197desire for the feature. 198 199=back 200 201If you're on the list, you might hear the word ``core'' bandied 202around. It refers to the standard distribution. ``Hacking on the 203core'' means you're changing the C source code to the Perl 204interpreter. ``A core module'' is one that ships with Perl. 205 206=head2 Keeping in sync 207 208The source code to the Perl interpreter, in its different versions, is 209kept in a repository managed by a revision control system ( which is 210currently the Perforce program, see http://perforce.com/ ). The 211pumpkings and a few others have access to the repository to check in 212changes. Periodically the pumpking for the development version of Perl 213will release a new version, so the rest of the porters can see what's 214changed. The current state of the main trunk of repository, and patches 215that describe the individual changes that have happened since the last 216public release are available at this location: 217 218 http://public.activestate.com/gsar/APC/ 219 ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/ 220 221If you're looking for a particular change, or a change that affected 222a particular set of files, you may find the B<Perl Repository Browser> 223useful: 224 225 http://public.activestate.com/cgi-bin/perlbrowse 226 227You may also want to subscribe to the perl5-changes mailing list to 228receive a copy of each patch that gets submitted to the maintenance 229and development "branches" of the perl repository. See 230http://lists.perl.org/ for subscription information. 231 232If you are a member of the perl5-porters mailing list, it is a good 233thing to keep in touch with the most recent changes. If not only to 234verify if what you would have posted as a bug report isn't already 235solved in the most recent available perl development branch, also 236known as perl-current, bleading edge perl, bleedperl or bleadperl. 237 238Needless to say, the source code in perl-current is usually in a perpetual 239state of evolution. You should expect it to be very buggy. Do B<not> use 240it for any purpose other than testing and development. 241 242Keeping in sync with the most recent branch can be done in several ways, 243but the most convenient and reliable way is using B<rsync>, available at 244ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent 245branch by FTP.) 246 247If you choose to keep in sync using rsync, there are two approaches 248to doing so: 249 250=over 4 251 252=item rsync'ing the source tree 253 254Presuming you are in the directory where your perl source resides 255and you have rsync installed and available, you can `upgrade' to 256the bleadperl using: 257 258 # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ . 259 260This takes care of updating every single item in the source tree to 261the latest applied patch level, creating files that are new (to your 262distribution) and setting date/time stamps of existing files to 263reflect the bleadperl status. 264 265Note that this will not delete any files that were in '.' before 266the rsync. Once you are sure that the rsync is running correctly, 267run it with the --delete and the --dry-run options like this: 268 269 # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ . 270 271This will I<simulate> an rsync run that also deletes files not 272present in the bleadperl master copy. Observe the results from 273this run closely. If you are sure that the actual run would delete 274no files precious to you, you could remove the '--dry-run' option. 275 276You can than check what patch was the latest that was applied by 277looking in the file B<.patch>, which will show the number of the 278latest patch. 279 280If you have more than one machine to keep in sync, and not all of 281them have access to the WAN (so you are not able to rsync all the 282source trees to the real source), there are some ways to get around 283this problem. 284 285=over 4 286 287=item Using rsync over the LAN 288 289Set up a local rsync server which makes the rsynced source tree 290available to the LAN and sync the other machines against this 291directory. 292 293From http://rsync.samba.org/README.html : 294 295 "Rsync uses rsh or ssh for communication. It does not need to be 296 setuid and requires no special privileges for installation. It 297 does not require an inetd entry or a daemon. You must, however, 298 have a working rsh or ssh system. Using ssh is recommended for 299 its security features." 300 301=item Using pushing over the NFS 302 303Having the other systems mounted over the NFS, you can take an 304active pushing approach by checking the just updated tree against 305the other not-yet synced trees. An example would be 306 307 #!/usr/bin/perl -w 308 309 use strict; 310 use File::Copy; 311 312 my %MF = map { 313 m/(\S+)/; 314 $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime 315 } `cat MANIFEST`; 316 317 my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); 318 319 foreach my $host (keys %remote) { 320 unless (-d $remote{$host}) { 321 print STDERR "Cannot Xsync for host $host\n"; 322 next; 323 } 324 foreach my $file (keys %MF) { 325 my $rfile = "$remote{$host}/$file"; 326 my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; 327 defined $size or ($mode, $size, $mtime) = (0, 0, 0); 328 $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; 329 printf "%4s %-34s %8d %9d %8d %9d\n", 330 $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; 331 unlink $rfile; 332 copy ($file, $rfile); 333 utime time, $MF{$file}[2], $rfile; 334 chmod $MF{$file}[0], $rfile; 335 } 336 } 337 338though this is not perfect. It could be improved with checking 339file checksums before updating. Not all NFS systems support 340reliable utime support (when used over the NFS). 341 342=back 343 344=item rsync'ing the patches 345 346The source tree is maintained by the pumpking who applies patches to 347the files in the tree. These patches are either created by the 348pumpking himself using C<diff -c> after updating the file manually or 349by applying patches sent in by posters on the perl5-porters list. 350These patches are also saved and rsync'able, so you can apply them 351yourself to the source files. 352 353Presuming you are in a directory where your patches reside, you can 354get them in sync with 355 356 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . 357 358This makes sure the latest available patch is downloaded to your 359patch directory. 360 361It's then up to you to apply these patches, using something like 362 363 # last=`ls -t *.gz | sed q` 364 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . 365 # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch 366 # cd ../perl-current 367 # patch -p1 -N <../perl-current-diffs/blead.patch 368 369or, since this is only a hint towards how it works, use CPAN-patchaperl 370from Andreas K�nig to have better control over the patching process. 371 372=back 373 374=head2 Why rsync the source tree 375 376=over 4 377 378=item It's easier to rsync the source tree 379 380Since you don't have to apply the patches yourself, you are sure all 381files in the source tree are in the right state. 382 383=item It's more reliable 384 385While both the rsync-able source and patch areas are automatically 386updated every few minutes, keep in mind that applying patches may 387sometimes mean careful hand-holding, especially if your version of 388the C<patch> program does not understand how to deal with new files, 389files with 8-bit characters, or files without trailing newlines. 390 391=back 392 393=head2 Why rsync the patches 394 395=over 4 396 397=item It's easier to rsync the patches 398 399If you have more than one machine that you want to keep in track with 400bleadperl, it's easier to rsync the patches only once and then apply 401them to all the source trees on the different machines. 402 403In case you try to keep in pace on 5 different machines, for which 404only one of them has access to the WAN, rsync'ing all the source 405trees should than be done 5 times over the NFS. Having 406rsync'ed the patches only once, I can apply them to all the source 407trees automatically. Need you say more ;-) 408 409=item It's a good reference 410 411If you do not only like to have the most recent development branch, 412but also like to B<fix> bugs, or extend features, you want to dive 413into the sources. If you are a seasoned perl core diver, you don't 414need no manuals, tips, roadmaps, perlguts.pod or other aids to find 415your way around. But if you are a starter, the patches may help you 416in finding where you should start and how to change the bits that 417bug you. 418 419The file B<Changes> is updated on occasions the pumpking sees as his 420own little sync points. On those occasions, he releases a tar-ball of 421the current source tree (i.e. perl@7582.tar.gz), which will be an 422excellent point to start with when choosing to use the 'rsync the 423patches' scheme. Starting with perl@7582, which means a set of source 424files on which the latest applied patch is number 7582, you apply all 425succeeding patches available from then on (7583, 7584, ...). 426 427You can use the patches later as a kind of search archive. 428 429=over 4 430 431=item Finding a start point 432 433If you want to fix/change the behaviour of function/feature Foo, just 434scan the patches for patches that mention Foo either in the subject, 435the comments, or the body of the fix. A good chance the patch shows 436you the files that are affected by that patch which are very likely 437to be the starting point of your journey into the guts of perl. 438 439=item Finding how to fix a bug 440 441If you've found I<where> the function/feature Foo misbehaves, but you 442don't know how to fix it (but you do know the change you want to 443make), you can, again, peruse the patches for similar changes and 444look how others apply the fix. 445 446=item Finding the source of misbehaviour 447 448When you keep in sync with bleadperl, the pumpking would love to 449I<see> that the community efforts really work. So after each of his 450sync points, you are to 'make test' to check if everything is still 451in working order. If it is, you do 'make ok', which will send an OK 452report to perlbug@perl.org. (If you do not have access to a mailer 453from the system you just finished successfully 'make test', you can 454do 'make okfile', which creates the file C<perl.ok>, which you can 455than take to your favourite mailer and mail yourself). 456 457But of course, as always, things will not always lead to a success 458path, and one or more test do not pass the 'make test'. Before 459sending in a bug report (using 'make nok' or 'make nokfile'), check 460the mailing list if someone else has reported the bug already and if 461so, confirm it by replying to that message. If not, you might want to 462trace the source of that misbehaviour B<before> sending in the bug, 463which will help all the other porters in finding the solution. 464 465Here the saved patches come in very handy. You can check the list of 466patches to see which patch changed what file and what change caused 467the misbehaviour. If you note that in the bug report, it saves the 468one trying to solve it, looking for that point. 469 470=back 471 472If searching the patches is too bothersome, you might consider using 473perl's bugtron to find more information about discussions and 474ramblings on posted bugs. 475 476If you want to get the best of both worlds, rsync both the source 477tree for convenience, reliability and ease and rsync the patches 478for reference. 479 480=back 481 482 483=head2 Perlbug administration 484 485There is a single remote administrative interface for modifying bug status, 486category, open issues etc. using the B<RT> I<bugtracker> system, maintained 487by I<Robert Spier>. Become an administrator, and close any bugs you can get 488your sticky mitts on: 489 490 http://rt.perl.org 491 492The bugtracker mechanism for B<perl5> bugs in particular is at: 493 494 http://bugs6.perl.org/perlbug 495 496To email the bug system administrators: 497 498 "perlbug-admin" <perlbug-admin@perl.org> 499 500 501=head2 Submitting patches 502 503Always submit patches to I<perl5-porters@perl.org>. If you're 504patching a core module and there's an author listed, send the author a 505copy (see L<Patching a core module>). This lets other porters review 506your patch, which catches a surprising number of errors in patches. 507Either use the diff program (available in source code form from 508ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch> 509(available from I<CPAN/authors/id/JV/>). Unified diffs are preferred, 510but context diffs are accepted. Do not send RCS-style diffs or diffs 511without context lines. More information is given in the 512I<Porting/patching.pod> file in the Perl source distribution. Please 513patch against the latest B<development> version (e.g., if you're 514fixing a bug in the 5.005 track, patch against the latest 5.005_5x 515version). Only patches that survive the heat of the development 516branch get applied to maintenance versions. 517 518Your patch should update the documentation and test suite. See 519L<Writing a test>. 520 521To report a bug in Perl, use the program I<perlbug> which comes with 522Perl (if you can't get Perl to work, send mail to the address 523I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through 524I<perlbug> feeds into the automated bug-tracking system, access to 525which is provided through the web at http://bugs.perl.org/ . It 526often pays to check the archives of the perl5-porters mailing list to 527see whether the bug you're reporting has been reported before, and if 528so whether it was considered a bug. See above for the location of 529the searchable archives. 530 531The CPAN testers ( http://testers.cpan.org/ ) are a group of 532volunteers who test CPAN modules on a variety of platforms. Perl 533Smokers ( http://archives.develooper.com/daily-build@perl.org/ ) 534automatically tests Perl source releases on platforms with various 535configurations. Both efforts welcome volunteers. 536 537It's a good idea to read and lurk for a while before chipping in. 538That way you'll get to see the dynamic of the conversations, learn the 539personalities of the players, and hopefully be better prepared to make 540a useful contribution when do you speak up. 541 542If after all this you still think you want to join the perl5-porters 543mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To 544unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>. 545 546To hack on the Perl guts, you'll need to read the following things: 547 548=over 3 549 550=item L<perlguts> 551 552This is of paramount importance, since it's the documentation of what 553goes where in the Perl source. Read it over a couple of times and it 554might start to make sense - don't worry if it doesn't yet, because the 555best way to study it is to read it in conjunction with poking at Perl 556source, and we'll do that later on. 557 558You might also want to look at Gisle Aas's illustrated perlguts - 559there's no guarantee that this will be absolutely up-to-date with the 560latest documentation in the Perl core, but the fundamentals will be 561right. ( http://gisle.aas.no/perl/illguts/ ) 562 563=item L<perlxstut> and L<perlxs> 564 565A working knowledge of XSUB programming is incredibly useful for core 566hacking; XSUBs use techniques drawn from the PP code, the portion of the 567guts that actually executes a Perl program. It's a lot gentler to learn 568those techniques from simple examples and explanation than from the core 569itself. 570 571=item L<perlapi> 572 573The documentation for the Perl API explains what some of the internal 574functions do, as well as the many macros used in the source. 575 576=item F<Porting/pumpkin.pod> 577 578This is a collection of words of wisdom for a Perl porter; some of it is 579only useful to the pumpkin holder, but most of it applies to anyone 580wanting to go about Perl development. 581 582=item The perl5-porters FAQ 583 584This should be available from http://simon-cozens.org/writings/p5p-faq ; 585alternatively, you can get the FAQ emailed to you by sending mail to 586C<perl5-porters-faq@perl.org>. It contains hints on reading perl5-porters, 587information on how perl5-porters works and how Perl development in general 588works. 589 590=back 591 592=head2 Finding Your Way Around 593 594Perl maintenance can be split into a number of areas, and certain people 595(pumpkins) will have responsibility for each area. These areas sometimes 596correspond to files or directories in the source kit. Among the areas are: 597 598=over 3 599 600=item Core modules 601 602Modules shipped as part of the Perl core live in the F<lib/> and F<ext/> 603subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/> 604contains the core XS modules. 605 606=item Tests 607 608There are tests for nearly all the modules, built-ins and major bits 609of functionality. Test files all have a .t suffix. Module tests live 610in the F<lib/> and F<ext/> directories next to the module being 611tested. Others live in F<t/>. See L<Writing a test> 612 613=item Documentation 614 615Documentation maintenance includes looking after everything in the 616F<pod/> directory, (as well as contributing new documentation) and 617the documentation to the modules in core. 618 619=item Configure 620 621The configure process is the way we make Perl portable across the 622myriad of operating systems it supports. Responsibility for the 623configure, build and installation process, as well as the overall 624portability of the core code rests with the configure pumpkin - others 625help out with individual operating systems. 626 627The files involved are the operating system directories, (F<win32/>, 628F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h> 629and F<Makefile>, as well as the metaconfig files which generate 630F<Configure>. (metaconfig isn't included in the core distribution.) 631 632=item Interpreter 633 634And of course, there's the core of the Perl interpreter itself. Let's 635have a look at that in a little more detail. 636 637=back 638 639Before we leave looking at the layout, though, don't forget that 640F<MANIFEST> contains not only the file names in the Perl distribution, 641but short descriptions of what's in them, too. For an overview of the 642important files, try this: 643 644 perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST 645 646=head2 Elements of the interpreter 647 648The work of the interpreter has two main stages: compiling the code 649into the internal representation, or bytecode, and then executing it. 650L<perlguts/Compiled code> explains exactly how the compilation stage 651happens. 652 653Here is a short breakdown of perl's operation: 654 655=over 3 656 657=item Startup 658 659The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) 660This is very high-level code, enough to fit on a single screen, and it 661resembles the code found in L<perlembed>; most of the real action takes 662place in F<perl.c> 663 664First, F<perlmain.c> allocates some memory and constructs a Perl 665interpreter: 666 667 1 PERL_SYS_INIT3(&argc,&argv,&env); 668 2 669 3 if (!PL_do_undump) { 670 4 my_perl = perl_alloc(); 671 5 if (!my_perl) 672 6 exit(1); 673 7 perl_construct(my_perl); 674 8 PL_perl_destruct_level = 0; 675 9 } 676 677Line 1 is a macro, and its definition is dependent on your operating 678system. Line 3 references C<PL_do_undump>, a global variable - all 679global variables in Perl start with C<PL_>. This tells you whether the 680current running program was created with the C<-u> flag to perl and then 681F<undump>, which means it's going to be false in any sane context. 682 683Line 4 calls a function in F<perl.c> to allocate memory for a Perl 684interpreter. It's quite a simple function, and the guts of it looks like 685this: 686 687 my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); 688 689Here you see an example of Perl's system abstraction, which we'll see 690later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's 691own C<malloc> as defined in F<malloc.c> if you selected that option at 692configure time. 693 694Next, in line 7, we construct the interpreter; this sets up all the 695special variables that Perl needs, the stacks, and so on. 696 697Now we pass Perl the command line options, and tell it to go: 698 699 exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); 700 if (!exitstatus) { 701 exitstatus = perl_run(my_perl); 702 } 703 704 705C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined 706in F<perl.c>, which processes the command line options, sets up any 707statically linked XS modules, opens the program and calls C<yyparse> to 708parse it. 709 710=item Parsing 711 712The aim of this stage is to take the Perl source, and turn it into an op 713tree. We'll see what one of those looks like later. Strictly speaking, 714there's three things going on here. 715 716C<yyparse>, the parser, lives in F<perly.c>, although you're better off 717reading the original YACC input in F<perly.y>. (Yes, Virginia, there 718B<is> a YACC grammar for Perl!) The job of the parser is to take your 719code and `understand' it, splitting it into sentences, deciding which 720operands go with which operators and so on. 721 722The parser is nobly assisted by the lexer, which chunks up your input 723into tokens, and decides what type of thing each token is: a variable 724name, an operator, a bareword, a subroutine, a core function, and so on. 725The main point of entry to the lexer is C<yylex>, and that and its 726associated routines can be found in F<toke.c>. Perl isn't much like 727other computer languages; it's highly context sensitive at times, it can 728be tricky to work out what sort of token something is, or where a token 729ends. As such, there's a lot of interplay between the tokeniser and the 730parser, which can get pretty frightening if you're not used to it. 731 732As the parser understands a Perl program, it builds up a tree of 733operations for the interpreter to perform during execution. The routines 734which construct and link together the various operations are to be found 735in F<op.c>, and will be examined later. 736 737=item Optimization 738 739Now the parsing stage is complete, and the finished tree represents 740the operations that the Perl interpreter needs to perform to execute our 741program. Next, Perl does a dry run over the tree looking for 742optimisations: constant expressions such as C<3 + 4> will be computed 743now, and the optimizer will also see if any multiple operations can be 744replaced with a single one. For instance, to fetch the variable C<$foo>, 745instead of grabbing the glob C<*foo> and looking at the scalar 746component, the optimizer fiddles the op tree to use a function which 747directly looks up the scalar in question. The main optimizer is C<peep> 748in F<op.c>, and many ops have their own optimizing functions. 749 750=item Running 751 752Now we're finally ready to go: we have compiled Perl byte code, and all 753that's left to do is run it. The actual execution is done by the 754C<runops_standard> function in F<run.c>; more specifically, it's done by 755these three innocent looking lines: 756 757 while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { 758 PERL_ASYNC_CHECK(); 759 } 760 761You may be more comfortable with the Perl version of that: 762 763 PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; 764 765Well, maybe not. Anyway, each op contains a function pointer, which 766stipulates the function which will actually carry out the operation. 767This function will return the next op in the sequence - this allows for 768things like C<if> which choose the next op dynamically at run time. 769The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt 770execution if required. 771 772The actual functions called are known as PP code, and they're spread 773between four files: F<pp_hot.c> contains the `hot' code, which is most 774often used and highly optimized, F<pp_sys.c> contains all the 775system-specific functions, F<pp_ctl.c> contains the functions which 776implement control structures (C<if>, C<while> and the like) and F<pp.c> 777contains everything else. These are, if you like, the C code for Perl's 778built-in functions and operators. 779 780=back 781 782=head2 Internal Variable Types 783 784You should by now have had a look at L<perlguts>, which tells you about 785Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do 786that now. 787 788These variables are used not only to represent Perl-space variables, but 789also any constants in the code, as well as some structures completely 790internal to Perl. The symbol table, for instance, is an ordinary Perl 791hash. Your code is represented by an SV as it's read into the parser; 792any program files you call are opened via ordinary Perl filehandles, and 793so on. 794 795The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a 796Perl program. Let's see, for instance, how Perl treats the constant 797C<"hello">. 798 799 % perl -MDevel::Peek -e 'Dump("hello")' 800 1 SV = PV(0xa041450) at 0xa04ecbc 801 2 REFCNT = 1 802 3 FLAGS = (POK,READONLY,pPOK) 803 4 PV = 0xa0484e0 "hello"\0 804 5 CUR = 5 805 6 LEN = 6 806 807Reading C<Devel::Peek> output takes a bit of practise, so let's go 808through it line by line. 809 810Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in 811memory. SVs themselves are very simple structures, but they contain a 812pointer to a more complex structure. In this case, it's a PV, a 813structure which holds a string value, at location C<0xa041450>. Line 2 814is the reference count; there are no other references to this data, so 815it's 1. 816 817Line 3 are the flags for this SV - it's OK to use it as a PV, it's a 818read-only SV (because it's a constant) and the data is a PV internally. 819Next we've got the contents of the string, starting at location 820C<0xa0484e0>. 821 822Line 5 gives us the current length of the string - note that this does 823B<not> include the null terminator. Line 6 is not the length of the 824string, but the length of the currently allocated buffer; as the string 825grows, Perl automatically extends the available storage via a routine 826called C<SvGROW>. 827 828You can get at any of these quantities from C very easily; just add 829C<Sv> to the name of the field shown in the snippet, and you've got a 830macro which will return the value: C<SvCUR(sv)> returns the current 831length of the string, C<SvREFCOUNT(sv)> returns the reference count, 832C<SvPV(sv, len)> returns the string itself with its length, and so on. 833More macros to manipulate these properties can be found in L<perlguts>. 834 835Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c> 836 837 1 void 838 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) 839 3 { 840 4 STRLEN tlen; 841 5 char *junk; 842 843 6 junk = SvPV_force(sv, tlen); 844 7 SvGROW(sv, tlen + len + 1); 845 8 if (ptr == junk) 846 9 ptr = SvPVX(sv); 847 10 Move(ptr,SvPVX(sv)+tlen,len,char); 848 11 SvCUR(sv) += len; 849 12 *SvEND(sv) = '\0'; 850 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ 851 14 SvTAINT(sv); 852 15 } 853 854This is a function which adds a string, C<ptr>, of length C<len> onto 855the end of the PV stored in C<sv>. The first thing we do in line 6 is 856make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> 857macro to force a PV. As a side effect, C<tlen> gets set to the current 858value of the PV, and the PV itself is returned to C<junk>. 859 860In line 7, we make sure that the SV will have enough room to accommodate 861the old string, the new string and the null terminator. If C<LEN> isn't 862big enough, C<SvGROW> will reallocate space for us. 863 864Now, if C<junk> is the same as the string we're trying to add, we can 865grab the string directly from the SV; C<SvPVX> is the address of the PV 866in the SV. 867 868Line 10 does the actual catenation: the C<Move> macro moves a chunk of 869memory around: we move the string C<ptr> to the end of the PV - that's 870the start of the PV plus its current length. We're moving C<len> bytes 871of type C<char>. After doing so, we need to tell Perl we've extended the 872string, by altering C<CUR> to reflect the new length. C<SvEND> is a 873macro which gives us the end of the string, so that needs to be a 874C<"\0">. 875 876Line 13 manipulates the flags; since we've changed the PV, any IV or NV 877values will no longer be valid: if we have C<$a=10; $a.="6";> we don't 878want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware 879version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags 880and turns on POK. The final C<SvTAINT> is a macro which launders tainted 881data if taint mode is turned on. 882 883AVs and HVs are more complicated, but SVs are by far the most common 884variable type being thrown around. Having seen something of how we 885manipulate these, let's go on and look at how the op tree is 886constructed. 887 888=head2 Op Trees 889 890First, what is the op tree, anyway? The op tree is the parsed 891representation of your program, as we saw in our section on parsing, and 892it's the sequence of operations that Perl goes through to execute your 893program, as we saw in L</Running>. 894 895An op is a fundamental operation that Perl can perform: all the built-in 896functions and operators are ops, and there are a series of ops which 897deal with concepts the interpreter needs internally - entering and 898leaving a block, ending a statement, fetching a variable, and so on. 899 900The op tree is connected in two ways: you can imagine that there are two 901"routes" through it, two orders in which you can traverse the tree. 902First, parse order reflects how the parser understood the code, and 903secondly, execution order tells perl what order to perform the 904operations in. 905 906The easiest way to examine the op tree is to stop Perl after it has 907finished parsing, and get it to dump out the tree. This is exactly what 908the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise> 909and L<B::Debug|B::Debug> do. 910 911Let's have a look at how Perl sees C<$a = $b + $c>: 912 913 % perl -MO=Terse -e '$a=$b+$c' 914 1 LISTOP (0x8179888) leave 915 2 OP (0x81798b0) enter 916 3 COP (0x8179850) nextstate 917 4 BINOP (0x8179828) sassign 918 5 BINOP (0x8179800) add [1] 919 6 UNOP (0x81796e0) null [15] 920 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b 921 8 UNOP (0x81797e0) null [15] 922 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c 923 10 UNOP (0x816b4f0) null [15] 924 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a 925 926Let's start in the middle, at line 4. This is a BINOP, a binary 927operator, which is at location C<0x8179828>. The specific operator in 928question is C<sassign> - scalar assignment - and you can find the code 929which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a 930binary operator, it has two children: the add operator, providing the 931result of C<$b+$c>, is uppermost on line 5, and the left hand side is on 932line 10. 933 934Line 10 is the null op: this does exactly nothing. What is that doing 935there? If you see the null op, it's a sign that something has been 936optimized away after parsing. As we mentioned in L</Optimization>, 937the optimization stage sometimes converts two operations into one, for 938example when fetching a scalar variable. When this happens, instead of 939rewriting the op tree and cleaning up the dangling pointers, it's easier 940just to replace the redundant operation with the null op. Originally, 941the tree would have looked like this: 942 943 10 SVOP (0x816b4f0) rv2sv [15] 944 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a 945 946That is, fetch the C<a> entry from the main symbol table, and then look 947at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) 948happens to do both these things. 949 950The right hand side, starting at line 5 is similar to what we've just 951seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together 952two C<gvsv>s. 953 954Now, what's this about? 955 956 1 LISTOP (0x8179888) leave 957 2 OP (0x81798b0) enter 958 3 COP (0x8179850) nextstate 959 960C<enter> and C<leave> are scoping ops, and their job is to perform any 961housekeeping every time you enter and leave a block: lexical variables 962are tidied up, unreferenced variables are destroyed, and so on. Every 963program will have those first three lines: C<leave> is a list, and its 964children are all the statements in the block. Statements are delimited 965by C<nextstate>, so a block is a collection of C<nextstate> ops, with 966the ops to be performed for each statement being the children of 967C<nextstate>. C<enter> is a single op which functions as a marker. 968 969That's how Perl parsed the program, from top to bottom: 970 971 Program 972 | 973 Statement 974 | 975 = 976 / \ 977 / \ 978 $a + 979 / \ 980 $b $c 981 982However, it's impossible to B<perform> the operations in this order: 983you have to find the values of C<$b> and C<$c> before you add them 984together, for instance. So, the other thread that runs through the op 985tree is the execution order: each op has a field C<op_next> which points 986to the next op to be run, so following these pointers tells us how perl 987executes the code. We can traverse the tree in this order using 988the C<exec> option to C<B::Terse>: 989 990 % perl -MO=Terse,exec -e '$a=$b+$c' 991 1 OP (0x8179928) enter 992 2 COP (0x81798c8) nextstate 993 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b 994 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c 995 5 BINOP (0x8179878) add [1] 996 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a 997 7 BINOP (0x81798a0) sassign 998 8 LISTOP (0x8179900) leave 999 1000This probably makes more sense for a human: enter a block, start a 1001statement. Get the values of C<$b> and C<$c>, and add them together. 1002Find C<$a>, and assign one to the other. Then leave. 1003 1004The way Perl builds up these op trees in the parsing process can be 1005unravelled by examining F<perly.y>, the YACC grammar. Let's take the 1006piece we need to construct the tree for C<$a = $b + $c> 1007 1008 1 term : term ASSIGNOP term 1009 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } 1010 3 | term ADDOP term 1011 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } 1012 1013If you're not used to reading BNF grammars, this is how it works: You're 1014fed certain things by the tokeniser, which generally end up in upper 1015case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your 1016code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are 1017`terminal symbols', because you can't get any simpler than them. 1018 1019The grammar, lines one and three of the snippet above, tells you how to 1020build up more complex forms. These complex forms, `non-terminal symbols' 1021are generally placed in lower case. C<term> here is a non-terminal 1022symbol, representing a single expression. 1023 1024The grammar gives you the following rule: you can make the thing on the 1025left of the colon if you see all the things on the right in sequence. 1026This is called a "reduction", and the aim of parsing is to completely 1027reduce the input. There are several different ways you can perform a 1028reduction, separated by vertical bars: so, C<term> followed by C<=> 1029followed by C<term> makes a C<term>, and C<term> followed by C<+> 1030followed by C<term> can also make a C<term>. 1031 1032So, if you see two terms with an C<=> or C<+>, between them, you can 1033turn them into a single expression. When you do this, you execute the 1034code in the block on the next line: if you see C<=>, you'll do the code 1035in line 2. If you see C<+>, you'll do the code in line 4. It's this code 1036which contributes to the op tree. 1037 1038 | term ADDOP term 1039 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } 1040 1041What this does is creates a new binary op, and feeds it a number of 1042variables. The variables refer to the tokens: C<$1> is the first token in 1043the input, C<$2> the second, and so on - think regular expression 1044backreferences. C<$$> is the op returned from this reduction. So, we 1045call C<newBINOP> to create a new binary operator. The first parameter to 1046C<newBINOP>, a function in F<op.c>, is the op type. It's an addition 1047operator, so we want the type to be C<ADDOP>. We could specify this 1048directly, but it's right there as the second token in the input, so we 1049use C<$2>. The second parameter is the op's flags: 0 means `nothing 1050special'. Then the things to add: the left and right hand side of our 1051expression, in scalar context. 1052 1053=head2 Stacks 1054 1055When perl executes something like C<addop>, how does it pass on its 1056results to the next op? The answer is, through the use of stacks. Perl 1057has a number of stacks to store things it's currently working on, and 1058we'll look at the three most important ones here. 1059 1060=over 3 1061 1062=item Argument stack 1063 1064Arguments are passed to PP code and returned from PP code using the 1065argument stack, C<ST>. The typical way to handle arguments is to pop 1066them off the stack, deal with them how you wish, and then push the result 1067back onto the stack. This is how, for instance, the cosine operator 1068works: 1069 1070 NV value; 1071 value = POPn; 1072 value = Perl_cos(value); 1073 XPUSHn(value); 1074 1075We'll see a more tricky example of this when we consider Perl's macros 1076below. C<POPn> gives you the NV (floating point value) of the top SV on 1077the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push 1078the result back as an NV. The C<X> in C<XPUSHn> means that the stack 1079should be extended if necessary - it can't be necessary here, because we 1080know there's room for one more item on the stack, since we've just 1081removed one! The C<XPUSH*> macros at least guarantee safety. 1082 1083Alternatively, you can fiddle with the stack directly: C<SP> gives you 1084the first element in your portion of the stack, and C<TOP*> gives you 1085the top SV/IV/NV/etc. on the stack. So, for instance, to do unary 1086negation of an integer: 1087 1088 SETi(-TOPi); 1089 1090Just set the integer value of the top stack entry to its negation. 1091 1092Argument stack manipulation in the core is exactly the same as it is in 1093XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer 1094description of the macros used in stack manipulation. 1095 1096=item Mark stack 1097 1098I say `your portion of the stack' above because PP code doesn't 1099necessarily get the whole stack to itself: if your function calls 1100another function, you'll only want to expose the arguments aimed for the 1101called function, and not (necessarily) let it get at your own data. The 1102way we do this is to have a `virtual' bottom-of-stack, exposed to each 1103function. The mark stack keeps bookmarks to locations in the argument 1104stack usable by each function. For instance, when dealing with a tied 1105variable, (internally, something with `P' magic) Perl has to call 1106methods for accesses to the tied variables. However, we need to separate 1107the arguments exposed to the method to the argument exposed to the 1108original function - the store or fetch or whatever it may be. Here's how 1109the tied C<push> is implemented; see C<av_push> in F<av.c>: 1110 1111 1 PUSHMARK(SP); 1112 2 EXTEND(SP,2); 1113 3 PUSHs(SvTIED_obj((SV*)av, mg)); 1114 4 PUSHs(val); 1115 5 PUTBACK; 1116 6 ENTER; 1117 7 call_method("PUSH", G_SCALAR|G_DISCARD); 1118 8 LEAVE; 1119 9 POPSTACK; 1120 1121The lines which concern the mark stack are the first, fifth and last 1122lines: they save away, restore and remove the current position of the 1123argument stack. 1124 1125Let's examine the whole implementation, for practice: 1126 1127 1 PUSHMARK(SP); 1128 1129Push the current state of the stack pointer onto the mark stack. This is 1130so that when we've finished adding items to the argument stack, Perl 1131knows how many things we've added recently. 1132 1133 2 EXTEND(SP,2); 1134 3 PUSHs(SvTIED_obj((SV*)av, mg)); 1135 4 PUSHs(val); 1136 1137We're going to add two more items onto the argument stack: when you have 1138a tied array, the C<PUSH> subroutine receives the object and the value 1139to be pushed, and that's exactly what we have here - the tied object, 1140retrieved with C<SvTIED_obj>, and the value, the SV C<val>. 1141 1142 5 PUTBACK; 1143 1144Next we tell Perl to make the change to the global stack pointer: C<dSP> 1145only gave us a local copy, not a reference to the global. 1146 1147 6 ENTER; 1148 7 call_method("PUSH", G_SCALAR|G_DISCARD); 1149 8 LEAVE; 1150 1151C<ENTER> and C<LEAVE> localise a block of code - they make sure that all 1152variables are tidied up, everything that has been localised gets 1153its previous value returned, and so on. Think of them as the C<{> and 1154C<}> of a Perl block. 1155 1156To actually do the magic method call, we have to call a subroutine in 1157Perl space: C<call_method> takes care of that, and it's described in 1158L<perlcall>. We call the C<PUSH> method in scalar context, and we're 1159going to discard its return value. 1160 1161 9 POPSTACK; 1162 1163Finally, we remove the value we placed on the mark stack, since we 1164don't need it any more. 1165 1166=item Save stack 1167 1168C doesn't have a concept of local scope, so perl provides one. We've 1169seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save 1170stack implements the C equivalent of, for example: 1171 1172 { 1173 local $foo = 42; 1174 ... 1175 } 1176 1177See L<perlguts/Localising Changes> for how to use the save stack. 1178 1179=back 1180 1181=head2 Millions of Macros 1182 1183One thing you'll notice about the Perl source is that it's full of 1184macros. Some have called the pervasive use of macros the hardest thing 1185to understand, others find it adds to clarity. Let's take an example, 1186the code which implements the addition operator: 1187 1188 1 PP(pp_add) 1189 2 { 1190 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); 1191 4 { 1192 5 dPOPTOPnnrl_ul; 1193 6 SETn( left + right ); 1194 7 RETURN; 1195 8 } 1196 9 } 1197 1198Every line here (apart from the braces, of course) contains a macro. The 1199first line sets up the function declaration as Perl expects for PP code; 1200line 3 sets up variable declarations for the argument stack and the 1201target, the return value of the operation. Finally, it tries to see if 1202the addition operation is overloaded; if so, the appropriate subroutine 1203is called. 1204 1205Line 5 is another variable declaration - all variable declarations start 1206with C<d> - which pops from the top of the argument stack two NVs (hence 1207C<nn>) and puts them into the variables C<right> and C<left>, hence the 1208C<rl>. These are the two operands to the addition operator. Next, we 1209call C<SETn> to set the NV of the return value to the result of adding 1210the two values. This done, we return - the C<RETURN> macro makes sure 1211that our return value is properly handled, and we pass the next operator 1212to run back to the main run loop. 1213 1214Most of these macros are explained in L<perlapi>, and some of the more 1215important ones are explained in L<perlxs> as well. Pay special attention 1216to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on 1217the C<[pad]THX_?> macros. 1218 1219=head2 The .i Targets 1220 1221You can expand the macros in a F<foo.c> file by saying 1222 1223 make foo.i 1224 1225which will expand the macros using cpp. Don't be scared by the results. 1226 1227=head2 Poking at Perl 1228 1229To really poke around with Perl, you'll probably want to build Perl for 1230debugging, like this: 1231 1232 ./Configure -d -D optimize=-g 1233 make 1234 1235C<-g> is a flag to the C compiler to have it produce debugging 1236information which will allow us to step through a running program. 1237F<Configure> will also turn on the C<DEBUGGING> compilation symbol which 1238enables all the internal debugging code in Perl. There are a whole bunch 1239of things you can debug with this: L<perlrun> lists them all, and the 1240best way to find out about them is to play about with them. The most 1241useful options are probably 1242 1243 l Context (loop) stack processing 1244 t Trace execution 1245 o Method and overloading resolution 1246 c String/numeric conversions 1247 1248Some of the functionality of the debugging code can be achieved using XS 1249modules. 1250 1251 -Dr => use re 'debug' 1252 -Dx => use O 'Debug' 1253 1254=head2 Using a source-level debugger 1255 1256If the debugging output of C<-D> doesn't help you, it's time to step 1257through perl's execution with a source-level debugger. 1258 1259=over 3 1260 1261=item * 1262 1263We'll use C<gdb> for our examples here; the principles will apply to any 1264debugger, but check the manual of the one you're using. 1265 1266=back 1267 1268To fire up the debugger, type 1269 1270 gdb ./perl 1271 1272You'll want to do that in your Perl source tree so the debugger can read 1273the source code. You should see the copyright message, followed by the 1274prompt. 1275 1276 (gdb) 1277 1278C<help> will get you into the documentation, but here are the most 1279useful commands: 1280 1281=over 3 1282 1283=item run [args] 1284 1285Run the program with the given arguments. 1286 1287=item break function_name 1288 1289=item break source.c:xxx 1290 1291Tells the debugger that we'll want to pause execution when we reach 1292either the named function (but see L<perlguts/Internal Functions>!) or the given 1293line in the named source file. 1294 1295=item step 1296 1297Steps through the program a line at a time. 1298 1299=item next 1300 1301Steps through the program a line at a time, without descending into 1302functions. 1303 1304=item continue 1305 1306Run until the next breakpoint. 1307 1308=item finish 1309 1310Run until the end of the current function, then stop again. 1311 1312=item 'enter' 1313 1314Just pressing Enter will do the most recent operation again - it's a 1315blessing when stepping through miles of source code. 1316 1317=item print 1318 1319Execute the given C code and print its results. B<WARNING>: Perl makes 1320heavy use of macros, and F<gdb> does not necessarily support macros 1321(see later L</"gdb macro support">). You'll have to substitute them 1322yourself, or to invoke cpp on the source code files 1323(see L</"The .i Targets">) 1324So, for instance, you can't say 1325 1326 print SvPV_nolen(sv) 1327 1328but you have to say 1329 1330 print Perl_sv_2pv_nolen(sv) 1331 1332=back 1333 1334You may find it helpful to have a "macro dictionary", which you can 1335produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't 1336recursively apply those macros for you. 1337 1338=head2 gdb macro support 1339 1340Recent versions of F<gdb> have fairly good macro support, but 1341in order to use it you'll need to compile perl with macro definitions 1342included in the debugging information. Using F<gcc> version 3.1, this 1343means configuring with C<-Doptimize=-g3>. Other compilers might use a 1344different switch (if they support debugging macros at all). 1345 1346=head2 Dumping Perl Data Structures 1347 1348One way to get around this macro hell is to use the dumping functions in 1349F<dump.c>; these work a little like an internal 1350L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures 1351that you can't get at from Perl. Let's take an example. We'll use the 1352C<$a = $b + $c> we used before, but give it a bit of context: 1353C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? 1354 1355What about C<pp_add>, the function we examined earlier to implement the 1356C<+> operator: 1357 1358 (gdb) break Perl_pp_add 1359 Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. 1360 1361Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>. 1362With the breakpoint in place, we can run our program: 1363 1364 (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' 1365 1366Lots of junk will go past as gdb reads in the relevant source files and 1367libraries, and then: 1368 1369 Breakpoint 1, Perl_pp_add () at pp_hot.c:309 1370 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); 1371 (gdb) step 1372 311 dPOPTOPnnrl_ul; 1373 (gdb) 1374 1375We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul> 1376arranges for two C<NV>s to be placed into C<left> and C<right> - let's 1377slightly expand it: 1378 1379 #define dPOPTOPnnrl_ul NV right = POPn; \ 1380 SV *leftsv = TOPs; \ 1381 NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 1382 1383C<POPn> takes the SV from the top of the stack and obtains its NV either 1384directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function. 1385C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses 1386C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from 1387C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>. 1388 1389Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to 1390convert it. If we step again, we'll find ourselves there: 1391 1392 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 1393 1669 if (!sv) 1394 (gdb) 1395 1396We can now use C<Perl_sv_dump> to investigate the SV: 1397 1398 SV = PV(0xa057cc0) at 0xa0675d0 1399 REFCNT = 1 1400 FLAGS = (POK,pPOK) 1401 PV = 0xa06a510 "6XXXX"\0 1402 CUR = 5 1403 LEN = 6 1404 $1 = void 1405 1406We know we're going to get C<6> from this, so let's finish the 1407subroutine: 1408 1409 (gdb) finish 1410 Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 1411 0x462669 in Perl_pp_add () at pp_hot.c:311 1412 311 dPOPTOPnnrl_ul; 1413 1414We can also dump out this op: the current op is always stored in 1415C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us 1416similar output to L<B::Debug|B::Debug>. 1417 1418 { 1419 13 TYPE = add ===> 14 1420 TARG = 1 1421 FLAGS = (SCALAR,KIDS) 1422 { 1423 TYPE = null ===> (12) 1424 (was rv2sv) 1425 FLAGS = (SCALAR,KIDS) 1426 { 1427 11 TYPE = gvsv ===> 12 1428 FLAGS = (SCALAR) 1429 GV = main::b 1430 } 1431 } 1432 1433# finish this later # 1434 1435=head2 Patching 1436 1437All right, we've now had a look at how to navigate the Perl sources and 1438some things you'll need to know when fiddling with them. Let's now get 1439on and create a simple patch. Here's something Larry suggested: if a 1440C<U> is the first active format during a C<pack>, (for example, 1441C<pack "U3C8", @stuff>) then the resulting string should be treated as 1442UTF-8 encoded. 1443 1444How do we prepare to fix this up? First we locate the code in question - 1445the C<pack> happens at runtime, so it's going to be in one of the F<pp> 1446files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be 1447altering this file, let's copy it to F<pp.c~>. 1448 1449[Well, it was in F<pp.c> when this tutorial was written. It has now been 1450split off with C<pp_unpack> to its own file, F<pp_pack.c>] 1451 1452Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then 1453loop over the pattern, taking each format character in turn into 1454C<datum_type>. Then for each possible format character, we swallow up 1455the other arguments in the pattern (a field width, an asterisk, and so 1456on) and convert the next chunk input into the specified format, adding 1457it onto the output SV C<cat>. 1458 1459How do we know if the C<U> is the first format in the C<pat>? Well, if 1460we have a pointer to the start of C<pat> then, if we see a C<U> we can 1461test whether we're still at the start of the string. So, here's where 1462C<pat> is set up: 1463 1464 STRLEN fromlen; 1465 register char *pat = SvPVx(*++MARK, fromlen); 1466 register char *patend = pat + fromlen; 1467 register I32 len; 1468 I32 datumtype; 1469 SV *fromstr; 1470 1471We'll have another string pointer in there: 1472 1473 STRLEN fromlen; 1474 register char *pat = SvPVx(*++MARK, fromlen); 1475 register char *patend = pat + fromlen; 1476 + char *patcopy; 1477 register I32 len; 1478 I32 datumtype; 1479 SV *fromstr; 1480 1481And just before we start the loop, we'll set C<patcopy> to be the start 1482of C<pat>: 1483 1484 items = SP - MARK; 1485 MARK++; 1486 sv_setpvn(cat, "", 0); 1487 + patcopy = pat; 1488 while (pat < patend) { 1489 1490Now if we see a C<U> which was at the start of the string, we turn on 1491the C<UTF8> flag for the output SV, C<cat>: 1492 1493 + if (datumtype == 'U' && pat==patcopy+1) 1494 + SvUTF8_on(cat); 1495 if (datumtype == '#') { 1496 while (pat < patend && *pat != '\n') 1497 pat++; 1498 1499Remember that it has to be C<patcopy+1> because the first character of 1500the string is the C<U> which has been swallowed into C<datumtype!> 1501 1502Oops, we forgot one thing: what if there are spaces at the start of the 1503pattern? C<pack(" U*", @stuff)> will have C<U> as the first active 1504character, even though it's not the first thing in the pattern. In this 1505case, we have to advance C<patcopy> along with C<pat> when we see spaces: 1506 1507 if (isSPACE(datumtype)) 1508 continue; 1509 1510needs to become 1511 1512 if (isSPACE(datumtype)) { 1513 patcopy++; 1514 continue; 1515 } 1516 1517OK. That's the C part done. Now we must do two additional things before 1518this patch is ready to go: we've changed the behaviour of Perl, and so 1519we must document that change. We must also provide some more regression 1520tests to make sure our patch works and doesn't create a bug somewhere 1521else along the line. 1522 1523The regression tests for each operator live in F<t/op/>, and so we 1524make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our 1525tests to the end. First, we'll test that the C<U> does indeed create 1526Unicode strings. 1527 1528t/op/pack.t has a sensible ok() function, but if it didn't we could 1529use the one from t/test.pl. 1530 1531 require './test.pl'; 1532 plan( tests => 159 ); 1533 1534so instead of this: 1535 1536 print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000); 1537 print "ok $test\n"; $test++; 1538 1539we can write the more sensible (see L<Test::More> for a full 1540explanation of is() and other testing functions). 1541 1542 is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), 1543 "U* produces unicode" ); 1544 1545Now we'll test that we got that space-at-the-beginning business right: 1546 1547 is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000), 1548 " with spaces at the beginning" ); 1549 1550And finally we'll test that we don't make Unicode strings if C<U> is B<not> 1551the first active format: 1552 1553 isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000), 1554 "U* not first isn't unicode" ); 1555 1556Mustn't forget to change the number of tests which appears at the top, 1557or else the automated tester will get confused. This will either look 1558like this: 1559 1560 print "1..156\n"; 1561 1562or this: 1563 1564 plan( tests => 156 ); 1565 1566We now compile up Perl, and run it through the test suite. Our new 1567tests pass, hooray! 1568 1569Finally, the documentation. The job is never done until the paperwork is 1570over, so let's describe the change we've just made. The relevant place 1571is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert 1572this text in the description of C<pack>: 1573 1574 =item * 1575 1576 If the pattern begins with a C<U>, the resulting string will be treated 1577 as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string 1578 with an initial C<U0>, and the bytes that follow will be interpreted as 1579 Unicode characters. If you don't want this to happen, you can begin your 1580 pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your 1581 string, and then follow this with a C<U*> somewhere in your pattern. 1582 1583All done. Now let's create the patch. F<Porting/patching.pod> tells us 1584that if we're making major changes, we should copy the entire directory 1585to somewhere safe before we begin fiddling, and then do 1586 1587 diff -ruN old new > patch 1588 1589However, we know which files we've changed, and we can simply do this: 1590 1591 diff -u pp.c~ pp.c > patch 1592 diff -u t/op/pack.t~ t/op/pack.t >> patch 1593 diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch 1594 1595We end up with a patch looking a little like this: 1596 1597 --- pp.c~ Fri Jun 02 04:34:10 2000 1598 +++ pp.c Fri Jun 16 11:37:25 2000 1599 @@ -4375,6 +4375,7 @@ 1600 register I32 items; 1601 STRLEN fromlen; 1602 register char *pat = SvPVx(*++MARK, fromlen); 1603 + char *patcopy; 1604 register char *patend = pat + fromlen; 1605 register I32 len; 1606 I32 datumtype; 1607 @@ -4405,6 +4406,7 @@ 1608 ... 1609 1610And finally, we submit it, with our rationale, to perl5-porters. Job 1611done! 1612 1613=head2 Patching a core module 1614 1615This works just like patching anything else, with an extra 1616consideration. Many core modules also live on CPAN. If this is so, 1617patch the CPAN version instead of the core and send the patch off to 1618the module maintainer (with a copy to p5p). This will help the module 1619maintainer keep the CPAN version in sync with the core version without 1620constantly scanning p5p. 1621 1622=head2 Adding a new function to the core 1623 1624If, as part of a patch to fix a bug, or just because you have an 1625especially good idea, you decide to add a new function to the core, 1626discuss your ideas on p5p well before you start work. It may be that 1627someone else has already attempted to do what you are considering and 1628can give lots of good advice or even provide you with bits of code 1629that they already started (but never finished). 1630 1631You have to follow all of the advice given above for patching. It is 1632extremely important to test any addition thoroughly and add new tests 1633to explore all boundary conditions that your new function is expected 1634to handle. If your new function is used only by one module (e.g. toke), 1635then it should probably be named S_your_function (for static); on the 1636other hand, if you expect it to accessible from other functions in 1637Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions> 1638for more details. 1639 1640The location of any new code is also an important consideration. Don't 1641just create a new top level .c file and put your code there; you would 1642have to make changes to Configure (so the Makefile is created properly), 1643as well as possibly lots of include files. This is strictly pumpking 1644business. 1645 1646It is better to add your function to one of the existing top level 1647source code files, but your choice is complicated by the nature of 1648the Perl distribution. Only the files that are marked as compiled 1649static are located in the perl executable. Everything else is located 1650in the shared library (or DLL if you are running under WIN32). So, 1651for example, if a function was only used by functions located in 1652toke.c, then your code can go in toke.c. If, however, you want to call 1653the function from universal.c, then you should put your code in another 1654location, for example util.c. 1655 1656In addition to writing your c-code, you will need to create an 1657appropriate entry in embed.pl describing your function, then run 1658'make regen_headers' to create the entries in the numerous header 1659files that perl needs to compile correctly. See L<perlguts/Internal Functions> 1660for information on the various options that you can set in embed.pl. 1661You will forget to do this a few (or many) times and you will get 1662warnings during the compilation phase. Make sure that you mention 1663this when you post your patch to P5P; the pumpking needs to know this. 1664 1665When you write your new code, please be conscious of existing code 1666conventions used in the perl source files. See L<perlstyle> for 1667details. Although most of the guidelines discussed seem to focus on 1668Perl code, rather than c, they all apply (except when they don't ;). 1669See also I<Porting/patching.pod> file in the Perl source distribution 1670for lots of details about both formatting and submitting patches of 1671your changes. 1672 1673Lastly, TEST TEST TEST TEST TEST any code before posting to p5p. 1674Test on as many platforms as you can find. Test as many perl 1675Configure options as you can (e.g. MULTIPLICITY). If you have 1676profiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL> 1677below for how to use them to further test your code. Remember that 1678most of the people on P5P are doing this on their own time and 1679don't have the time to debug your code. 1680 1681=head2 Writing a test 1682 1683Every module and built-in function has an associated test file (or 1684should...). If you add or change functionality, you have to write a 1685test. If you fix a bug, you have to write a test so that bug never 1686comes back. If you alter the docs, it would be nice to test what the 1687new documentation says. 1688 1689In short, if you submit a patch you probably also have to patch the 1690tests. 1691 1692For modules, the test file is right next to the module itself. 1693F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation, 1694so there are some snags (and it would be wonderful for you to brush 1695them out), but it basically works that way. Everything else lives in 1696F<t/>. 1697 1698=over 3 1699 1700=item F<t/base/> 1701 1702Testing of the absolute basic functionality of Perl. Things like 1703C<if>, basic file reads and writes, simple regexes, etc. These are 1704run first in the test suite and if any of them fail, something is 1705I<really> broken. 1706 1707=item F<t/cmd/> 1708 1709These test the basic control structures, C<if/else>, C<while>, 1710subroutines, etc. 1711 1712=item F<t/comp/> 1713 1714Tests basic issues of how Perl parses and compiles itself. 1715 1716=item F<t/io/> 1717 1718Tests for built-in IO functions, including command line arguments. 1719 1720=item F<t/lib/> 1721 1722The old home for the module tests, you shouldn't put anything new in 1723here. There are still some bits and pieces hanging around in here 1724that need to be moved. Perhaps you could move them? Thanks! 1725 1726=item F<t/op/> 1727 1728Tests for perl's built in functions that don't fit into any of the 1729other directories. 1730 1731=item F<t/pod/> 1732 1733Tests for POD directives. There are still some tests for the Pod 1734modules hanging around in here that need to be moved out into F<lib/>. 1735 1736=item F<t/run/> 1737 1738Testing features of how perl actually runs, including exit codes and 1739handling of PERL* environment variables. 1740 1741=item F<t/uni/> 1742 1743Tests for the core support of Unicode. 1744 1745=item F<t/win32/> 1746 1747Windows-specific tests. 1748 1749=item F<t/x2p> 1750 1751A test suite for the s2p converter. 1752 1753=back 1754 1755The core uses the same testing style as the rest of Perl, a simple 1756"ok/not ok" run through Test::Harness, but there are a few special 1757considerations. 1758 1759There are three ways to write a test in the core. Test::More, 1760t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The 1761decision of which to use depends on what part of the test suite you're 1762working on. This is a measure to prevent a high-level failure (such 1763as Config.pm breaking) from causing basic functionality tests to fail. 1764 1765=over 4 1766 1767=item t/base t/comp 1768 1769Since we don't know if require works, or even subroutines, use ad hoc 1770tests for these two. Step carefully to avoid using the feature being 1771tested. 1772 1773=item t/cmd t/run t/io t/op 1774 1775Now that basic require() and subroutines are tested, you can use the 1776t/test.pl library which emulates the important features of Test::More 1777while using a minimum of core features. 1778 1779You can also conditionally use certain libraries like Config, but be 1780sure to skip the test gracefully if it's not there. 1781 1782=item t/lib ext lib 1783 1784Now that the core of Perl is tested, Test::More can be used. You can 1785also use the full suite of core modules in the tests. 1786 1787=back 1788 1789When you say "make test" Perl uses the F<t/TEST> program to run the 1790test suite. All tests are run from the F<t/> directory, B<not> the 1791directory which contains the test. This causes some problems with the 1792tests in F<lib/>, so here's some opportunity for some patching. 1793 1794You must be triply conscious of cross-platform concerns. This usually 1795boils down to using File::Spec and avoiding things like C<fork()> and 1796C<system()> unless absolutely necessary. 1797 1798=head2 Special Make Test Targets 1799 1800There are various special make targets that can be used to test Perl 1801slightly differently than the standard "test" target. Not all them 1802are expected to give a 100% success rate. Many of them have several 1803aliases. 1804 1805=over 4 1806 1807=item coretest 1808 1809Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests). 1810 1811=item test.deparse 1812 1813Run all the tests through B::Deparse. Not all tests will succeed. 1814 1815=item test.taintwarn 1816 1817Run all tests with the B<-t> command-line switch. Not all tests 1818are expected to succeed (until they're specifically fixed, of course). 1819 1820=item minitest 1821 1822Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>, 1823F<t/op>, and F<t/uni> tests. 1824 1825=item test.valgrind check.valgrind utest.valgrind ucheck.valgrind 1826 1827(Only in Linux) Run all the tests using the memory leak + naughty 1828memory access tool "valgrind". The log files will be named 1829F<testname.valgrind>. 1830 1831=item test.third check.third utest.third ucheck.third 1832 1833(Only in Tru64) Run all the tests using the memory leak + naughty 1834memory access tool "Third Degree". The log files will be named 1835F<perl3.log.testname>. 1836 1837=item test.torture torturetest 1838 1839Run all the usual tests and some extra tests. As of Perl 5.8.0 the 1840only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>. 1841 1842You can also run the torture test with F<t/harness> by giving 1843C<-torture> argument to F<t/harness>. 1844 1845=item utest ucheck test.utf8 check.utf8 1846 1847Run all the tests with -Mutf8. Not all tests will succeed. 1848 1849=item test_harness 1850 1851Run the test suite with the F<t/harness> controlling program, instead of 1852F<t/TEST>. F<t/harness> is more sophisticated, and uses the 1853L<Test::Harness> module, thus using this test target supposes that perl 1854mostly works. The main advantage for our purposes is that it prints a 1855detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it 1856doesn't redirect stderr to stdout. 1857 1858=back 1859 1860=head2 Running tests by hand 1861 1862You can run part of the test suite by hand by using one the following 1863commands from the F<t/> directory : 1864 1865 ./perl -I../lib TEST list-of-.t-files 1866 1867or 1868 1869 ./perl -I../lib harness list-of-.t-files 1870 1871(if you don't specify test scripts, the whole test suite will be run.) 1872 1873You can run an individual test by a command similar to 1874 1875 ./perl -I../lib patho/to/foo.t 1876 1877except that the harnesses set up some environment variables that may 1878affect the execution of the test : 1879 1880=over 4 1881 1882=item PERL_CORE=1 1883 1884indicates that we're running this test part of the perl core test suite. 1885This is useful for modules that have a dual life on CPAN. 1886 1887=item PERL_DESTRUCT_LEVEL=2 1888 1889is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>) 1890 1891=item PERL 1892 1893(used only by F<t/TEST>) if set, overrides the path to the perl executable 1894that should be used to run the tests (the default being F<./perl>). 1895 1896=item PERL_SKIP_TTY_TEST 1897 1898if set, tells to skip the tests that need a terminal. It's actually set 1899automatically by the Makefile, but can also be forced artificially by 1900running 'make test_notty'. 1901 1902=back 1903 1904=head1 EXTERNAL TOOLS FOR DEBUGGING PERL 1905 1906Sometimes it helps to use external tools while debugging and 1907testing Perl. This section tries to guide you through using 1908some common testing and debugging tools with Perl. This is 1909meant as a guide to interfacing these tools with Perl, not 1910as any kind of guide to the use of the tools themselves. 1911 1912B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or 1913Third Degree greatly slows down the execution: seconds become minutes, 1914minutes become hours. For example as of Perl 5.8.1, the 1915ext/Encode/t/Unicode.t takes extraordinarily long to complete under 1916e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more 1917than six hours, even on a snappy computer-- the said test must be 1918doing something that is quite unfriendly for memory debuggers. If you 1919don't feel like waiting, that you can simply kill away the perl 1920process. 1921 1922B<NOTE 2>: To minimize the number of memory leak false alarms (see 1923L</PERL_DESTRUCT_LEVEL> for more information), you have to have 1924environment variable PERL_DESTRUCT_LEVEL set to 2. The F<TEST> 1925and harness scripts do that automatically. But if you are running 1926some of the tests manually-- for csh-like shells: 1927 1928 setenv PERL_DESTRUCT_LEVEL 2 1929 1930and for Bourne-type shells: 1931 1932 PERL_DESTRUCT_LEVEL=2 1933 export PERL_DESTRUCT_LEVEL 1934 1935or in UNIXy environments you can also use the C<env> command: 1936 1937 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ... 1938 1939B<NOTE 3>: There are known memory leaks when there are compile-time 1940errors within eval or require, seeing C<S_doeval> in the call stack 1941is a good sign of these. Fixing these leaks is non-trivial, 1942unfortunately, but they must be fixed eventually. 1943 1944=head2 Rational Software's Purify 1945 1946Purify is a commercial tool that is helpful in identifying 1947memory overruns, wild pointers, memory leaks and other such 1948badness. Perl must be compiled in a specific way for 1949optimal testing with Purify. Purify is available under 1950Windows NT, Solaris, HP-UX, SGI, and Siemens Unix. 1951 1952=head2 Purify on Unix 1953 1954On Unix, Purify creates a new Perl binary. To get the most 1955benefit out of Purify, you should create the perl to Purify 1956using: 1957 1958 sh Configure -Accflags=-DPURIFY -Doptimize='-g' \ 1959 -Uusemymalloc -Dusemultiplicity 1960 1961where these arguments mean: 1962 1963=over 4 1964 1965=item -Accflags=-DPURIFY 1966 1967Disables Perl's arena memory allocation functions, as well as 1968forcing use of memory allocation functions derived from the 1969system malloc. 1970 1971=item -Doptimize='-g' 1972 1973Adds debugging information so that you see the exact source 1974statements where the problem occurs. Without this flag, all 1975you will see is the source filename of where the error occurred. 1976 1977=item -Uusemymalloc 1978 1979Disable Perl's malloc so that Purify can more closely monitor 1980allocations and leaks. Using Perl's malloc will make Purify 1981report most leaks in the "potential" leaks category. 1982 1983=item -Dusemultiplicity 1984 1985Enabling the multiplicity option allows perl to clean up 1986thoroughly when the interpreter shuts down, which reduces the 1987number of bogus leak reports from Purify. 1988 1989=back 1990 1991Once you've compiled a perl suitable for Purify'ing, then you 1992can just: 1993 1994 make pureperl 1995 1996which creates a binary named 'pureperl' that has been Purify'ed. 1997This binary is used in place of the standard 'perl' binary 1998when you want to debug Perl memory problems. 1999 2000As an example, to show any memory leaks produced during the 2001standard Perl testset you would create and run the Purify'ed 2002perl as: 2003 2004 make pureperl 2005 cd t 2006 ../pureperl -I../lib harness 2007 2008which would run Perl on test.pl and report any memory problems. 2009 2010Purify outputs messages in "Viewer" windows by default. If 2011you don't have a windowing environment or if you simply 2012want the Purify output to unobtrusively go to a log file 2013instead of to the interactive window, use these following 2014options to output to the log file "perl.log": 2015 2016 setenv PURIFYOPTIONS "-chain-length=25 -windows=no \ 2017 -log-file=perl.log -append-logfile=yes" 2018 2019If you plan to use the "Viewer" windows, then you only need this option: 2020 2021 setenv PURIFYOPTIONS "-chain-length=25" 2022 2023In Bourne-type shells: 2024 2025 PURIFYOPTIONS="..." 2026 export PURIFYOPTIONS 2027 2028or if you have the "env" utility: 2029 2030 env PURIFYOPTIONS="..." ../pureperl ... 2031 2032=head2 Purify on NT 2033 2034Purify on Windows NT instruments the Perl binary 'perl.exe' 2035on the fly. There are several options in the makefile you 2036should change to get the most use out of Purify: 2037 2038=over 4 2039 2040=item DEFINES 2041 2042You should add -DPURIFY to the DEFINES line so the DEFINES 2043line looks something like: 2044 2045 DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 2046 2047to disable Perl's arena memory allocation functions, as 2048well as to force use of memory allocation functions derived 2049from the system malloc. 2050 2051=item USE_MULTI = define 2052 2053Enabling the multiplicity option allows perl to clean up 2054thoroughly when the interpreter shuts down, which reduces the 2055number of bogus leak reports from Purify. 2056 2057=item #PERL_MALLOC = define 2058 2059Disable Perl's malloc so that Purify can more closely monitor 2060allocations and leaks. Using Perl's malloc will make Purify 2061report most leaks in the "potential" leaks category. 2062 2063=item CFG = Debug 2064 2065Adds debugging information so that you see the exact source 2066statements where the problem occurs. Without this flag, all 2067you will see is the source filename of where the error occurred. 2068 2069=back 2070 2071As an example, to show any memory leaks produced during the 2072standard Perl testset you would create and run Purify as: 2073 2074 cd win32 2075 make 2076 cd ../t 2077 purify ../perl -I../lib harness 2078 2079which would instrument Perl in memory, run Perl on test.pl, 2080then finally report any memory problems. 2081 2082=head2 valgrind 2083 2084The excellent valgrind tool can be used to find out both memory leaks 2085and illegal memory accesses. As of August 2003 it unfortunately works 2086only on x86 (ELF) Linux. The special "test.valgrind" target can be used 2087to run the tests under valgrind. Found errors and memory leaks are 2088logged in files named F<test.valgrind>. 2089 2090As system libraries (most notably glibc) are also triggering errors, 2091valgrind allows to suppress such errors using suppression files. The 2092default suppression file that comes with valgrind already catches a lot 2093of them. Some additional suppressions are defined in F<t/perl.supp>. 2094 2095To get valgrind and for more information see 2096 2097 http://developer.kde.org/~sewardj/ 2098 2099=head2 Compaq's/Digital's/HP's Third Degree 2100 2101Third Degree is a tool for memory leak detection and memory access checks. 2102It is one of the many tools in the ATOM toolkit. The toolkit is only 2103available on Tru64 (formerly known as Digital UNIX formerly known as 2104DEC OSF/1). 2105 2106When building Perl, you must first run Configure with -Doptimize=-g 2107and -Uusemymalloc flags, after that you can use the make targets 2108"perl.third" and "test.third". (What is required is that Perl must be 2109compiled using the C<-g> flag, you may need to re-Configure.) 2110 2111The short story is that with "atom" you can instrument the Perl 2112executable to create a new executable called F<perl.third>. When the 2113instrumented executable is run, it creates a log of dubious memory 2114traffic in file called F<perl.3log>. See the manual pages of atom and 2115third for more information. The most extensive Third Degree 2116documentation is available in the Compaq "Tru64 UNIX Programmer's 2117Guide", chapter "Debugging Programs with Third Degree". 2118 2119The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/ 2120subdirectory. There is a problem with these files: Third Degree is so 2121effective that it finds problems also in the system libraries. 2122Therefore you should used the Porting/thirdclean script to cleanup 2123the F<*.3log> files. 2124 2125There are also leaks that for given certain definition of a leak, 2126aren't. See L</PERL_DESTRUCT_LEVEL> for more information. 2127 2128=head2 PERL_DESTRUCT_LEVEL 2129 2130If you want to run any of the tests yourself manually using e.g. 2131valgrind, or the pureperl or perl.third executables, please note that 2132by default perl B<does not> explicitly cleanup all the memory it has 2133allocated (such as global memory arenas) but instead lets the exit() 2134of the whole program "take care" of such allocations, also known as 2135"global destruction of objects". 2136 2137There is a way to tell perl to do complete cleanup: set the 2138environment variable PERL_DESTRUCT_LEVEL to a non-zero value. 2139The t/TEST wrapper does set this to 2, and this is what you 2140need to do too, if you don't want to see the "global leaks": 2141For example, for "third-degreed" Perl: 2142 2143 env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t 2144 2145(Note: the mod_perl apache module uses also this environment variable 2146for its own purposes and extended its semantics. Refer to the mod_perl 2147documentation for more information. Also, spawned threads do the 2148equivalent of setting this variable to the value 1.) 2149 2150If, at the end of a run you get the message I<N scalars leaked>, you can 2151recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause 2152the addresses of all those leaked SVs to be dumped; it also converts 2153C<new_SV()> from a macro into a real function, so you can use your 2154favourite debugger to discover where those pesky SVs were allocated. 2155 2156=head2 Profiling 2157 2158Depending on your platform there are various of profiling Perl. 2159 2160There are two commonly used techniques of profiling executables: 2161I<statistical time-sampling> and I<basic-block counting>. 2162 2163The first method takes periodically samples of the CPU program 2164counter, and since the program counter can be correlated with the code 2165generated for functions, we get a statistical view of in which 2166functions the program is spending its time. The caveats are that very 2167small/fast functions have lower probability of showing up in the 2168profile, and that periodically interrupting the program (this is 2169usually done rather frequently, in the scale of milliseconds) imposes 2170an additional overhead that may skew the results. The first problem 2171can be alleviated by running the code for longer (in general this is a 2172good idea for profiling), the second problem is usually kept in guard 2173by the profiling tools themselves. 2174 2175The second method divides up the generated code into I<basic blocks>. 2176Basic blocks are sections of code that are entered only in the 2177beginning and exited only at the end. For example, a conditional jump 2178starts a basic block. Basic block profiling usually works by 2179I<instrumenting> the code by adding I<enter basic block #nnnn> 2180book-keeping code to the generated code. During the execution of the 2181code the basic block counters are then updated appropriately. The 2182caveat is that the added extra code can skew the results: again, the 2183profiling tools usually try to factor their own effects out of the 2184results. 2185 2186=head2 Gprof Profiling 2187 2188gprof is a profiling tool available in many UNIX platforms, 2189it uses F<statistical time-sampling>. 2190 2191You can build a profiled version of perl called "perl.gprof" by 2192invoking the make target "perl.gprof" (What is required is that Perl 2193must be compiled using the C<-pg> flag, you may need to re-Configure). 2194Running the profiled version of Perl will create an output file called 2195F<gmon.out> is created which contains the profiling data collected 2196during the execution. 2197 2198The gprof tool can then display the collected data in various ways. 2199Usually gprof understands the following options: 2200 2201=over 4 2202 2203=item -a 2204 2205Suppress statically defined functions from the profile. 2206 2207=item -b 2208 2209Suppress the verbose descriptions in the profile. 2210 2211=item -e routine 2212 2213Exclude the given routine and its descendants from the profile. 2214 2215=item -f routine 2216 2217Display only the given routine and its descendants in the profile. 2218 2219=item -s 2220 2221Generate a summary file called F<gmon.sum> which then may be given 2222to subsequent gprof runs to accumulate data over several runs. 2223 2224=item -z 2225 2226Display routines that have zero usage. 2227 2228=back 2229 2230For more detailed explanation of the available commands and output 2231formats, see your own local documentation of gprof. 2232 2233=head2 GCC gcov Profiling 2234 2235Starting from GCC 3.0 I<basic block profiling> is officially available 2236for the GNU CC. 2237 2238You can build a profiled version of perl called F<perl.gcov> by 2239invoking the make target "perl.gcov" (what is required that Perl must 2240be compiled using gcc with the flags C<-fprofile-arcs 2241-ftest-coverage>, you may need to re-Configure). 2242 2243Running the profiled version of Perl will cause profile output to be 2244generated. For each source file an accompanying ".da" file will be 2245created. 2246 2247To display the results you use the "gcov" utility (which should 2248be installed if you have gcc 3.0 or newer installed). F<gcov> is 2249run on source code files, like this 2250 2251 gcov sv.c 2252 2253which will cause F<sv.c.gcov> to be created. The F<.gcov> files 2254contain the source code annotated with relative frequencies of 2255execution indicated by "#" markers. 2256 2257Useful options of F<gcov> include C<-b> which will summarise the 2258basic block, branch, and function call coverage, and C<-c> which 2259instead of relative frequencies will use the actual counts. For 2260more information on the use of F<gcov> and basic block profiling 2261with gcc, see the latest GNU CC manual, as of GCC 3.0 see 2262 2263 http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html 2264 2265and its section titled "8. gcov: a Test Coverage Program" 2266 2267 http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132 2268 2269=head2 Pixie Profiling 2270 2271Pixie is a profiling tool available on IRIX and Tru64 (aka Digital 2272UNIX aka DEC OSF/1) platforms. Pixie does its profiling using 2273I<basic-block counting>. 2274 2275You can build a profiled version of perl called F<perl.pixie> by 2276invoking the make target "perl.pixie" (what is required is that Perl 2277must be compiled using the C<-g> flag, you may need to re-Configure). 2278 2279In Tru64 a file called F<perl.Addrs> will also be silently created, 2280this file contains the addresses of the basic blocks. Running the 2281profiled version of Perl will create a new file called "perl.Counts" 2282which contains the counts for the basic block for that particular 2283program execution. 2284 2285To display the results you use the F<prof> utility. The exact 2286incantation depends on your operating system, "prof perl.Counts" in 2287IRIX, and "prof -pixie -all -L. perl" in Tru64. 2288 2289In IRIX the following prof options are available: 2290 2291=over 4 2292 2293=item -h 2294 2295Reports the most heavily used lines in descending order of use. 2296Useful for finding the hotspot lines. 2297 2298=item -l 2299 2300Groups lines by procedure, with procedures sorted in descending order of use. 2301Within a procedure, lines are listed in source order. 2302Useful for finding the hotspots of procedures. 2303 2304=back 2305 2306In Tru64 the following options are available: 2307 2308=over 4 2309 2310=item -p[rocedures] 2311 2312Procedures sorted in descending order by the number of cycles executed 2313in each procedure. Useful for finding the hotspot procedures. 2314(This is the default option.) 2315 2316=item -h[eavy] 2317 2318Lines sorted in descending order by the number of cycles executed in 2319each line. Useful for finding the hotspot lines. 2320 2321=item -i[nvocations] 2322 2323The called procedures are sorted in descending order by number of calls 2324made to the procedures. Useful for finding the most used procedures. 2325 2326=item -l[ines] 2327 2328Grouped by procedure, sorted by cycles executed per procedure. 2329Useful for finding the hotspots of procedures. 2330 2331=item -testcoverage 2332 2333The compiler emitted code for these lines, but the code was unexecuted. 2334 2335=item -z[ero] 2336 2337Unexecuted procedures. 2338 2339=back 2340 2341For further information, see your system's manual pages for pixie and prof. 2342 2343=head2 Miscellaneous tricks 2344 2345=over 4 2346 2347=item * 2348 2349Those debugging perl with the DDD frontend over gdb may find the 2350following useful: 2351 2352You can extend the data conversion shortcuts menu, so for example you 2353can display an SV's IV value with one click, without doing any typing. 2354To do that simply edit ~/.ddd/init file and add after: 2355 2356 ! Display shortcuts. 2357 Ddd*gdbDisplayShortcuts: \ 2358 /t () // Convert to Bin\n\ 2359 /d () // Convert to Dec\n\ 2360 /x () // Convert to Hex\n\ 2361 /o () // Convert to Oct(\n\ 2362 2363the following two lines: 2364 2365 ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\ 2366 ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx 2367 2368so now you can do ivx and pvx lookups or you can plug there the 2369sv_peek "conversion": 2370 2371 Perl_sv_peek(my_perl, (SV*)()) // sv_peek 2372 2373(The my_perl is for threaded builds.) 2374Just remember that every line, but the last one, should end with \n\ 2375 2376Alternatively edit the init file interactively via: 23773rd mouse button -> New Display -> Edit Menu 2378 2379Note: you can define up to 20 conversion shortcuts in the gdb 2380section. 2381 2382=item * 2383 2384If you see in a debugger a memory area mysteriously full of 0xabababab, 2385you may be seeing the effect of the Poison() macro, see L<perlclib>. 2386 2387=back 2388 2389=head2 CONCLUSION 2390 2391We've had a brief look around the Perl source, an overview of the stages 2392F<perl> goes through when it's running your code, and how to use a 2393debugger to poke at the Perl guts. We took a very simple problem and 2394demonstrated how to solve it fully - with documentation, regression 2395tests, and finally a patch for submission to p5p. Finally, we talked 2396about how to use external tools to debug and test Perl. 2397 2398I'd now suggest you read over those references again, and then, as soon 2399as possible, get your hands dirty. The best way to learn is by doing, 2400so: 2401 2402=over 3 2403 2404=item * 2405 2406Subscribe to perl5-porters, follow the patches and try and understand 2407them; don't be afraid to ask if there's a portion you're not clear on - 2408who knows, you may unearth a bug in the patch... 2409 2410=item * 2411 2412Keep up to date with the bleeding edge Perl distributions and get 2413familiar with the changes. Try and get an idea of what areas people are 2414working on and the changes they're making. 2415 2416=item * 2417 2418Do read the README associated with your operating system, e.g. README.aix 2419on the IBM AIX OS. Don't hesitate to supply patches to that README if 2420you find anything missing or changed over a new OS release. 2421 2422=item * 2423 2424Find an area of Perl that seems interesting to you, and see if you can 2425work out how it works. Scan through the source, and step over it in the 2426debugger. Play, poke, investigate, fiddle! You'll probably get to 2427understand not just your chosen area but a much wider range of F<perl>'s 2428activity as well, and probably sooner than you'd think. 2429 2430=back 2431 2432=over 3 2433 2434=item I<The Road goes ever on and on, down from the door where it began.> 2435 2436=back 2437 2438If you can do these things, you've started on the long road to Perl porting. 2439Thanks for wanting to help make Perl better - and happy hacking! 2440 2441=head1 AUTHOR 2442 2443This document was written by Nathan Torkington, and is maintained by 2444the perl5-porters mailing list. 2445 2446