1898184e3Ssthen 2898184e3Ssthen=encoding utf8 3898184e3Ssthen 4898184e3Ssthen=for comment 5898184e3SsthenConsistent formatting of this file is achieved with: 6898184e3Ssthen perl ./Porting/podtidy pod/perlhacktips.pod 7898184e3Ssthen 8898184e3Ssthen=head1 NAME 9898184e3Ssthen 10898184e3Ssthenperlhacktips - Tips for Perl core C code hacking 11898184e3Ssthen 12898184e3Ssthen=head1 DESCRIPTION 13898184e3Ssthen 14898184e3SsthenThis document will help you learn the best way to go about hacking on 15898184e3Ssthenthe Perl core C code. It covers common problems, debugging, profiling, 16898184e3Ssthenand more. 17898184e3Ssthen 18898184e3SsthenIf you haven't read L<perlhack> and L<perlhacktut> yet, you might want 19898184e3Ssthento do that first. 20898184e3Ssthen 21898184e3Ssthen=head1 COMMON PROBLEMS 22898184e3Ssthen 23eac174f2Safresh1Perl source now permits some specific C99 features which we know are 24*3d61058aSafresh1supported by all platforms, but mostly plays by ANSI C89 rules. You 25*3d61058aSafresh1don't care about some particular platform having broken Perl? I hear 26*3d61058aSafresh1there is still a strong demand for J2EE programmers. 27898184e3Ssthen 28898184e3Ssthen=head2 Perl environment problems 29898184e3Ssthen 30898184e3Ssthen=over 4 31898184e3Ssthen 32898184e3Ssthen=item * 33898184e3Ssthen 34898184e3SsthenNot compiling with threading 35898184e3Ssthen 36898184e3SsthenCompiling with threading (-Duseithreads) completely rewrites the 37898184e3Ssthenfunction prototypes of Perl. You better try your changes with that. 38898184e3SsthenRelated to this is the difference between "Perl_-less" and "Perl_-ly" 39898184e3SsthenAPIs, for example: 40898184e3Ssthen 41898184e3Ssthen Perl_sv_setiv(aTHX_ ...); 42898184e3Ssthen sv_setiv(...); 43898184e3Ssthen 44898184e3SsthenThe first one explicitly passes in the context, which is needed for 45898184e3Ssthene.g. threaded builds. The second one does that implicitly; do not get 46898184e3Ssthenthem mixed. If you are not passing in a aTHX_, you will need to do a 47eac174f2Safresh1dTHX as the first thing in the function. 48898184e3Ssthen 49898184e3SsthenSee L<perlguts/"How multiple interpreters and concurrency are 50898184e3Ssthensupported"> for further discussion about context. 51898184e3Ssthen 52898184e3Ssthen=item * 53898184e3Ssthen 54898184e3SsthenNot compiling with -DDEBUGGING 55898184e3Ssthen 56898184e3SsthenThe DEBUGGING define exposes more code to the compiler, therefore more 57898184e3Ssthenways for things to go wrong. You should try it. 58898184e3Ssthen 59898184e3Ssthen=item * 60898184e3Ssthen 61898184e3SsthenIntroducing (non-read-only) globals 62898184e3Ssthen 63898184e3SsthenDo not introduce any modifiable globals, truly global or file static. 64898184e3SsthenThey are bad form and complicate multithreading and other forms of 65898184e3Ssthenconcurrency. The right way is to introduce them as new interpreter 66898184e3Ssthenvariables, see F<intrpvar.h> (at the very end for binary 67898184e3Ssthencompatibility). 68898184e3Ssthen 69898184e3SsthenIntroducing read-only (const) globals is okay, as long as you verify 70898184e3Ssthenwith e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has 71898184e3SsthenBSD-style output) that the data you added really is read-only. (If it 72898184e3Ssthenis, it shouldn't show up in the output of that command.) 73898184e3Ssthen 74898184e3SsthenIf you want to have static strings, make them constant: 75898184e3Ssthen 76898184e3Ssthen static const char etc[] = "..."; 77898184e3Ssthen 78898184e3SsthenIf you want to have arrays of constant strings, note carefully the 79898184e3Ssthenright combination of C<const>s: 80898184e3Ssthen 81898184e3Ssthen static const char * const yippee[] = 82898184e3Ssthen {"hi", "ho", "silver"}; 83898184e3Ssthen 84898184e3Ssthen=item * 85898184e3Ssthen 86898184e3SsthenNot exporting your new function 87898184e3Ssthen 88898184e3SsthenSome platforms (Win32, AIX, VMS, OS/2, to name a few) require any 89898184e3Ssthenfunction that is part of the public API (the shared Perl library) to be 90898184e3Ssthenexplicitly marked as exported. See the discussion about F<embed.pl> in 91898184e3SsthenL<perlguts>. 92898184e3Ssthen 93898184e3Ssthen=item * 94898184e3Ssthen 95898184e3SsthenExporting your new function 96898184e3Ssthen 97898184e3SsthenThe new shiny result of either genuine new functionality or your 98898184e3Ssthenarduous refactoring is now ready and correctly exported. So what could 99898184e3Ssthenpossibly go wrong? 100898184e3Ssthen 101898184e3SsthenMaybe simply that your function did not need to be exported in the 102898184e3Ssthenfirst place. Perl has a long and not so glorious history of exporting 103898184e3Ssthenfunctions that it should not have. 104898184e3Ssthen 105898184e3SsthenIf the function is used only inside one source code file, make it 106898184e3Ssthenstatic. See the discussion about F<embed.pl> in L<perlguts>. 107898184e3Ssthen 108898184e3SsthenIf the function is used across several files, but intended only for 109898184e3SsthenPerl's internal use (and this should be the common case), do not export 110898184e3Ssthenit to the public API. See the discussion about F<embed.pl> in 111898184e3SsthenL<perlguts>. 112898184e3Ssthen 113898184e3Ssthen=back 114898184e3Ssthen 115eac174f2Safresh1=head2 C99 116eac174f2Safresh1 117*3d61058aSafresh1Starting from 5.35.5 we now permit some C99 features in the core C 118*3d61058aSafresh1source. However, code in dual life extensions still needs to be C89 119*3d61058aSafresh1only, because it needs to compile against earlier version of Perl 120*3d61058aSafresh1running on older platforms. Also note that our headers need to also be 121*3d61058aSafresh1valid as C++, because XS extensions written in C++ need to include 122*3d61058aSafresh1them, hence I<member structure initialisers> can't be used in headers. 123eac174f2Safresh1 124*3d61058aSafresh1C99 support is still far from complete on all platforms we currently 125*3d61058aSafresh1support. As a baseline we can only assume C89 semantics with the 126*3d61058aSafresh1specific C99 features described below, which we've verified work 127*3d61058aSafresh1everywhere. It's fine to probe for additional C99 features and use 128*3d61058aSafresh1them where available, providing there is also a fallback for compilers 129*3d61058aSafresh1that don't support the feature. For example, we use C11 thread local 130*3d61058aSafresh1storage when available, but fall back to POSIX thread specific APIs 131*3d61058aSafresh1otherwise, and we use C<char> for booleans if C<< <stdbool.h> >> isn't 132eac174f2Safresh1available. 133eac174f2Safresh1 134eac174f2Safresh1Code can use (and rely on) the following C99 features being present 135eac174f2Safresh1 136eac174f2Safresh1=over 137eac174f2Safresh1 138eac174f2Safresh1=item * 139eac174f2Safresh1 140eac174f2Safresh1mixed declarations and code 141eac174f2Safresh1 142eac174f2Safresh1=item * 143eac174f2Safresh1 144eac174f2Safresh164 bit integer types 145eac174f2Safresh1 146*3d61058aSafresh1For consistency with the existing source code, use the typedefs C<I64> 147*3d61058aSafresh1and C<U64>, instead of using C<long long> and C<unsigned long long> 148*3d61058aSafresh1directly. 149eac174f2Safresh1 150eac174f2Safresh1=item * 151eac174f2Safresh1 152eac174f2Safresh1variadic macros 153eac174f2Safresh1 154eac174f2Safresh1 void greet(char *file, unsigned int line, char *format, ...); 155eac174f2Safresh1 #define logged_greet(...) greet(__FILE__, __LINE__, __VA_ARGS__); 156eac174f2Safresh1 157*3d61058aSafresh1Note that C<__VA_OPT__> is standardized as of C23 and C++20. Before 158*3d61058aSafresh1that it was a gcc extension. 159eac174f2Safresh1 160eac174f2Safresh1=item * 161eac174f2Safresh1 162eac174f2Safresh1declarations in for loops 163eac174f2Safresh1 164eac174f2Safresh1 for (const char *p = message; *p; ++p) { 165eac174f2Safresh1 putchar(*p); 166eac174f2Safresh1 } 167eac174f2Safresh1 168eac174f2Safresh1=item * 169eac174f2Safresh1 170eac174f2Safresh1member structure initialisers 171eac174f2Safresh1 172*3d61058aSafresh1But not in headers, as support was only added to C++ relatively 173*3d61058aSafresh1recently. 174eac174f2Safresh1 175eac174f2Safresh1Hence this is fine in C and XS code, but not headers: 176eac174f2Safresh1 177eac174f2Safresh1 struct message { 178eac174f2Safresh1 char *action; 179eac174f2Safresh1 char *target; 180eac174f2Safresh1 }; 181eac174f2Safresh1 182eac174f2Safresh1 struct message mcguffin = { 183eac174f2Safresh1 .target = "member structure initialisers", 184eac174f2Safresh1 .action = "Built" 185eac174f2Safresh1 }; 186eac174f2Safresh1 187*3d61058aSafresh1You cannot use the similar syntax for compound literals, since we also 188*3d61058aSafresh1build perl using C++ compilers: 189*3d61058aSafresh1 190*3d61058aSafresh1 /* this is fine */ 191*3d61058aSafresh1 struct message m = { 192*3d61058aSafresh1 .target = "some target", 193*3d61058aSafresh1 .action = "some action" 194*3d61058aSafresh1 }; 195*3d61058aSafresh1 /* this is not valid in C++ */ 196*3d61058aSafresh1 m = (struct message){ 197*3d61058aSafresh1 .target = "some target", 198*3d61058aSafresh1 .action = "some action" 199*3d61058aSafresh1 }; 200*3d61058aSafresh1 201*3d61058aSafresh1While structure designators are usable, the related array designators 202*3d61058aSafresh1are not, since they aren't supported by C++ at all. 203*3d61058aSafresh1 204eac174f2Safresh1=item * 205eac174f2Safresh1 206eac174f2Safresh1flexible array members 207eac174f2Safresh1 208eac174f2Safresh1This is standards conformant: 209eac174f2Safresh1 210eac174f2Safresh1 struct greeting { 211eac174f2Safresh1 unsigned int len; 212eac174f2Safresh1 char message[]; 213eac174f2Safresh1 }; 214eac174f2Safresh1 215*3d61058aSafresh1However, the source code already uses the "unwarranted chumminess with 216*3d61058aSafresh1the compiler" hack in many places: 217eac174f2Safresh1 218eac174f2Safresh1 struct greeting { 219eac174f2Safresh1 unsigned int len; 220eac174f2Safresh1 char message[1]; 221eac174f2Safresh1 }; 222eac174f2Safresh1 223*3d61058aSafresh1Strictly it B<is> undefined behaviour accessing beyond C<message[0]>, 224*3d61058aSafresh1but this has been a commonly used hack since K&R times, and using it 225*3d61058aSafresh1hasn't been a practical issue anywhere (in the perl source or any other 226*3d61058aSafresh1common C code). Hence it's unclear what we would gain from actively 227*3d61058aSafresh1changing to the C99 approach. 228eac174f2Safresh1 229eac174f2Safresh1=item * 230eac174f2Safresh1 231eac174f2Safresh1C<//> comments 232eac174f2Safresh1 233*3d61058aSafresh1All compilers we tested support their use. Not all humans we tested 234*3d61058aSafresh1support their use. 235eac174f2Safresh1 236eac174f2Safresh1=back 237eac174f2Safresh1 238eac174f2Safresh1Code explicitly should not use any other C99 features. For example 239eac174f2Safresh1 240eac174f2Safresh1=over 4 241eac174f2Safresh1 242eac174f2Safresh1=item * 243eac174f2Safresh1 244eac174f2Safresh1variable length arrays 245eac174f2Safresh1 246eac174f2Safresh1Not supported by B<any> MSVC, and this is not going to change. 247eac174f2Safresh1 248*3d61058aSafresh1Even "variable" length arrays where the variable is a constant 249*3d61058aSafresh1expression are syntax errors under MSVC. 250eac174f2Safresh1 251eac174f2Safresh1=item * 252eac174f2Safresh1 253eac174f2Safresh1C99 types in C<< <stdint.h> >> 254eac174f2Safresh1 255eac174f2Safresh1Use C<PERL_INT_FAST8_T> etc as defined in F<handy.h> 256eac174f2Safresh1 257eac174f2Safresh1=item * 258eac174f2Safresh1 259eac174f2Safresh1C99 format strings in C<< <inttypes.h> >> 260eac174f2Safresh1 261*3d61058aSafresh1C<snprintf> in the VMS libc only added support for C<PRIdN> etc very 262*3d61058aSafresh1recently, meaning that there are live supported installations without 263*3d61058aSafresh1this, or formats such as C<%zu>. 264eac174f2Safresh1 265*3d61058aSafresh1(perl's C<sv_catpvf> etc use parser code code in F<sv.c>, which 266*3d61058aSafresh1supports the C<z> modifier, along with perl-specific formats such as 267*3d61058aSafresh1C<SVf>.) 268eac174f2Safresh1 269eac174f2Safresh1=back 270eac174f2Safresh1 271*3d61058aSafresh1If you want to use a C99 feature not listed above then you need to do 272*3d61058aSafresh1one of 273eac174f2Safresh1 274eac174f2Safresh1=over 4 275eac174f2Safresh1 276eac174f2Safresh1=item * 277eac174f2Safresh1 278*3d61058aSafresh1Probe for it in F<Configure>, set a variable in F<config.sh>, and add 279*3d61058aSafresh1fallback logic in the headers for platforms which don't have it. 280eac174f2Safresh1 281eac174f2Safresh1=item * 282eac174f2Safresh1 283*3d61058aSafresh1Write test code and verify that it works on platforms we need to 284*3d61058aSafresh1support, before relying on it unconditionally. 285eac174f2Safresh1 286eac174f2Safresh1=back 287eac174f2Safresh1 288*3d61058aSafresh1Likely you want to repeat the same plan as we used to get the current 289*3d61058aSafresh1C99 feature set. See the message at 290*3d61058aSafresh1L<https://markmail.org/thread/odr4fjrn72u2fkpz> for the C99 probes we 291*3d61058aSafresh1used before. Note that the two most "fussy" compilers appear to be MSVC 292*3d61058aSafresh1and the vendor compiler on VMS. To date all the *nix compilers have 293*3d61058aSafresh1been far more flexible in what they support. 294eac174f2Safresh1 295*3d61058aSafresh1On *nix platforms, F<Configure> attempts to set compiler flags 296*3d61058aSafresh1appropriately. All vendor compilers that we tested defaulted to C99 (or 297*3d61058aSafresh1C11) support. However, older versions of gcc default to C89, or permit 298*3d61058aSafresh1I<most> C99 (with warnings), but forbid I<declarations in for loops> 299*3d61058aSafresh1unless C<-std=gnu99> is added. The alternative C<-std=c99> B<might> 300*3d61058aSafresh1seem better, but using it on some platforms can prevent C<< <unistd.h> 301*3d61058aSafresh1>> declaring some prototypes being declared, which breaks the build. 302*3d61058aSafresh1gcc's C<-ansi> flag implies C<-std=c89> so we can no longer set that, 303*3d61058aSafresh1hence the Configure option C<-gccansipedantic> now only adds 304*3d61058aSafresh1C<-pedantic>. 305eac174f2Safresh1 306*3d61058aSafresh1The Perl core source code files (the ones at the top level of the 307*3d61058aSafresh1source code distribution) are automatically compiled with as many as 308*3d61058aSafresh1possible of the C<-std=gnu99>, C<-pedantic>, and a selection of C<-W> 309*3d61058aSafresh1flags (see cflags.SH). Files in F<ext/> F<dist/> F<cpan/> etc are 310*3d61058aSafresh1compiled with the same flags as the installed perl would use to compile 311*3d61058aSafresh1XS extensions. 312eac174f2Safresh1 313eac174f2Safresh1Basically, it's safe to assume that F<Configure> and F<cflags.SH> have 314*3d61058aSafresh1picked the best combination of flags for the version of gcc on the 315*3d61058aSafresh1platform, and attempting to add more flags related to enforcing a C 316*3d61058aSafresh1dialect will cause problems either locally, or on other systems that 317*3d61058aSafresh1the code is shipped to. 318eac174f2Safresh1 319*3d61058aSafresh1We believe that the C99 support in gcc 3.1 is good enough for us, but 320*3d61058aSafresh1we don't have a 19 year old gcc handy to check this :-) If you have 321*3d61058aSafresh1ancient vendor compilers that don't default to C99, the flags you might 322*3d61058aSafresh1want to try are 323eac174f2Safresh1 324eac174f2Safresh1=over 4 325eac174f2Safresh1 326eac174f2Safresh1=item AIX 327eac174f2Safresh1 328eac174f2Safresh1C<-qlanglvl=stdc99> 329eac174f2Safresh1 330eac174f2Safresh1=item HP/UX 331eac174f2Safresh1 332eac174f2Safresh1C<-AC99> 333eac174f2Safresh1 334eac174f2Safresh1=item Solaris 335eac174f2Safresh1 336eac174f2Safresh1C<-xc99> 337eac174f2Safresh1 338eac174f2Safresh1=back 339eac174f2Safresh1 340e0680481Safresh1=head2 Symbol Names and Namespace Pollution 341e0680481Safresh1 342e0680481Safresh1=head3 Choosing legal symbol names 343e0680481Safresh1 344e0680481Safresh1C reserves for its implementation any symbol whose name begins with an 345e0680481Safresh1underscore followed immediately by either an uppercase letter C<[A-Z]> 346e0680481Safresh1or another underscore. C++ further reserves any symbol containing two 347*3d61058aSafresh1consecutive underscores, and further reserves in the global name space 348*3d61058aSafresh1any symbol beginning with an underscore, not just ones followed by a 349*3d61058aSafresh1capital. We care about C++ because header files (F<*.h>) need to be 350*3d61058aSafresh1compilable by it, and some people do all their development using a C++ 351*3d61058aSafresh1compiler. 352e0680481Safresh1 353e0680481Safresh1The consequences of failing to do this are probably none. Unless you 354e0680481Safresh1stumble on a name that the implementation uses, things will work. 355e0680481Safresh1Indeed, the perl core has more than a few instances of using 356e0680481Safresh1implementation-reserved symbols. (These are gradually being changed.) 357e0680481Safresh1But your code might stop working any time that the implementation 358e0680481Safresh1decides to use a name you already had chosen, potentially many years 359e0680481Safresh1before. 360e0680481Safresh1 361e0680481Safresh1It's best then to: 362e0680481Safresh1 363e0680481Safresh1=over 364e0680481Safresh1 365e0680481Safresh1=item B<Don't begin a symbol name with an underscore>; (I<e.g.>, don't 366e0680481Safresh1use: C<_FOOBAR>) 367e0680481Safresh1 368e0680481Safresh1=item B<Don't use two consecutive underscores in a symbol name>; 369e0680481Safresh1(I<e.g.>, don't use C<FOO__BAR>) 370e0680481Safresh1 371e0680481Safresh1=back 372e0680481Safresh1 373e0680481Safresh1POSIX also reserves many symbols. See Section 2.2.2 in 374*3d61058aSafresh1L<https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>. 375e0680481Safresh1Perl also has conflicts with that. 376e0680481Safresh1 377*3d61058aSafresh1Perl reserves for its use any symbol beginning with C<Perl>, C<perl>, 378*3d61058aSafresh1or C<PL_>. Any time you introduce a macro into a header file that 379*3d61058aSafresh1doesn't follow that convention, you are creating the possiblity of a 380*3d61058aSafresh1namespace clash with an existing XS module, unless you restrict it by, 381*3d61058aSafresh1say, 382e0680481Safresh1 383e0680481Safresh1 #ifdef PERL_CORE 384e0680481Safresh1 # define my_symbol 385e0680481Safresh1 #endif 386e0680481Safresh1 387*3d61058aSafresh1There are many symbols in header files that aren't of this form, and 388*3d61058aSafresh1which are accessible from XS namespace, intentionally or not, just 389*3d61058aSafresh1about anything in F<config.h>, for example. 390e0680481Safresh1 391*3d61058aSafresh1Having to use one of these prefixes detracts from the readability of 392*3d61058aSafresh1the code, and hasn't been an actual issue for non-trivial names. Things 393e0680481Safresh1like perl defining its own C<MAX> macro have been problematic, but they 394e0680481Safresh1were quickly discovered, and a S<C<#ifdef PERL_CORE>> guard added. 395e0680481Safresh1 396e0680481Safresh1So there's no rule imposed about using such symbols, just be aware of 397e0680481Safresh1the issues. 398e0680481Safresh1 399e0680481Safresh1=head3 Choosing good symbol names 400e0680481Safresh1 401e0680481Safresh1Ideally, a symbol name name should correctly and precisely describe its 402*3d61058aSafresh1intended purpose. But there is a tension between that and getting 403*3d61058aSafresh1names that are overly long and hence awkward to type and read. 404*3d61058aSafresh1Metaphors could be helpful (a poetic name), but those tend to be 405*3d61058aSafresh1culturally specific, and may not translate for someone whose native 406*3d61058aSafresh1language isn't English, or even comes from a different cultural 407*3d61058aSafresh1background. Besides, the talent of writing poetry seems to be rare in 408*3d61058aSafresh1programmers. 409e0680481Safresh1 410e0680481Safresh1Certain symbol names don't reflect their purpose, but are nonetheless 411e0680481Safresh1fine to use because of long-standing conventions. These often 412e0680481Safresh1originated in the field of Mathematics, where C<i> and C<j> are 413*3d61058aSafresh1frequently used as subscripts, and C<n> as a population count. Since 414*3d61058aSafresh1at least the 1950's, computer programs have used C<i>, I<etc.> as loop 415e0680481Safresh1variables. 416e0680481Safresh1 417e0680481Safresh1Our guidance is to choose a name that reasonably describes the purpose, 418e0680481Safresh1and to comment its declaration more precisely. 419e0680481Safresh1 420e0680481Safresh1One certainly shouldn't use misleading nor ambiguous names. C<last_foo> 421e0680481Safresh1could mean either the final C<foo> or the previous C<foo>, and so could 422e0680481Safresh1be confusing to the reader, or even to the writer coming back to the 423e0680481Safresh1code after a few months of working on something else. Sometimes the 424e0680481Safresh1programmer has a particular line of thought in mind, and it doesn't 425e0680481Safresh1occur to them that ambiguity is present. 426e0680481Safresh1 427e0680481Safresh1There are probably still many off-by-1 bugs around because the name 428*3d61058aSafresh1L<perlapi/C<av_len>> doesn't correspond to what other I<-len> 429*3d61058aSafresh1constructs mean, such as L<perlapi/C<sv_len>>. Awkward (and 430*3d61058aSafresh1controversial) synonyms were created to use instead that conveyed its 431*3d61058aSafresh1true meaning (L<perlapi/C<av_top_index>>). Eventually, though, someone 432*3d61058aSafresh1had the better idea to create a new name to signify what most people 433*3d61058aSafresh1think C<-len> signifies. So L<perlapi/C<av_count>> was born. And we 434*3d61058aSafresh1wish it had been thought up much earlier. 435e0680481Safresh1 436e0680481Safresh1=head2 Writing safer macros 437e0680481Safresh1 438e0680481Safresh1Macros are used extensively in the Perl core for such things as hiding 439e0680481Safresh1internal details from the caller, so that it doesn't have to be 440*3d61058aSafresh1concerned about them. For example, most lines of code don't need to 441*3d61058aSafresh1know if they are running on a threaded versus unthreaded perl. That 442e0680481Safresh1detail is automatically mostly hidden. 443e0680481Safresh1 444e0680481Safresh1It is often better to use an inline function instead of a macro. They 445e0680481Safresh1are immune to name collisions with the caller, and don't magnify 446e0680481Safresh1problems when called with parameters that are expressions with side 447e0680481Safresh1effects. There was a time when one might choose a macro over an inline 448e0680481Safresh1function because compiler support for inline functions was quite 449e0680481Safresh1limited. Some only would actually only inline the first two or three 450e0680481Safresh1encountered in a compilation. But those days are long gone, and inline 451e0680481Safresh1functions are fully supported in modern compilers. 452e0680481Safresh1 453e0680481Safresh1Nevertheless, there are situations where a function won't do, and a 454e0680481Safresh1macro is required. One example is when a parameter can be any of 455e0680481Safresh1several types. A function has to be declared with a single explicit 456e0680481Safresh1 457e0680481Safresh1Or maybe the code involved is so trivial that a function would be just 458e0680481Safresh1complicating overkill, such as when the macro simply creates a mnemonic 459e0680481Safresh1name for some constant value. 460e0680481Safresh1 461e0680481Safresh1If you do choose to use a non-trivial macro, be aware that there are 462*3d61058aSafresh1several avoidable pitfalls that can occur. Keep in mind that a macro 463*3d61058aSafresh1is expanded within the lexical context of each place in the source it 464*3d61058aSafresh1is called. If you have a token C<foo> in the macro and the source 465*3d61058aSafresh1happens also to have C<foo>, the meaning of the macro's C<foo> will 466*3d61058aSafresh1become that of the caller's. Sometimes that is exactly the behavior 467*3d61058aSafresh1you want, but be aware that this tends to be confusing later on. It 468*3d61058aSafresh1effectively turns C<foo> into a reserved word for any code that calls 469*3d61058aSafresh1the macro, and this fact is usually not documented nor considered. It 470*3d61058aSafresh1is safer to pass C<foo> as a parameter, so that C<foo> remains freely 471*3d61058aSafresh1available to the caller and the macro interface is explicitly 472*3d61058aSafresh1specified. 473e0680481Safresh1 474e0680481Safresh1Worse is when the equivalence between the two C<foo>'s is coincidental. 475e0680481Safresh1Suppose for example, that the macro declares a variable 476e0680481Safresh1 477e0680481Safresh1 int foo 478e0680481Safresh1 479e0680481Safresh1That works fine as long as the caller doesn't define the string C<foo> 480e0680481Safresh1in some way. And it might not be until years later that someone comes 481e0680481Safresh1along with an instance where C<foo> is used. For example a future 482e0680481Safresh1caller could do this: 483e0680481Safresh1 484e0680481Safresh1 #define foo bar 485e0680481Safresh1 486e0680481Safresh1Then that declaration of C<foo> in the macro suddenly becomes 487e0680481Safresh1 488e0680481Safresh1 int bar 489e0680481Safresh1 490e0680481Safresh1That could mean that something completely different happens than 491e0680481Safresh1intended. It is hard to debug; the macro and call may not even be in 492*3d61058aSafresh1the same file, so it would require some digging and gnashing of teeth 493*3d61058aSafresh1to figure out. 494e0680481Safresh1 495e0680481Safresh1Therefore, if a macro does use variables, their names should be such 496*3d61058aSafresh1that it is very unlikely that they would collide with any caller, now 497*3d61058aSafresh1or forever. One way to do that, now being used in the perl source, is 498*3d61058aSafresh1to include the name of the macro itself as part of the name of each 499e0680481Safresh1variable in the macro. Suppose the macro is named C<SvPV> Then we 500e0680481Safresh1could have 501e0680481Safresh1 502e0680481Safresh1 int foo_svpv_ = 0; 503e0680481Safresh1 504e0680481Safresh1This is harder to read than plain C<foo>, but it is pretty much 505e0680481Safresh1guaranteed that a caller will never naively use C<foo_svpv_> (and run 506e0680481Safresh1into problems). (The lowercasing makes it clearer that this is a 507e0680481Safresh1variable, but assumes that there won't be two elements whose names 508e0680481Safresh1differ only in the case of their letters.) The trailing underscore 509*3d61058aSafresh1makes it even more unlikely to clash, as those, by convention, signify 510*3d61058aSafresh1a private variable name. (See L</Choosing legal symbol names> for 511e0680481Safresh1restrictions on what names you can use.) 512e0680481Safresh1 513e0680481Safresh1This kind of name collision doesn't happen with the macro's formal 514*3d61058aSafresh1parameters, so they don't need to have complicated names. But there 515*3d61058aSafresh1are pitfalls when a a parameter is an expression, or has some Perl 516*3d61058aSafresh1magic attached. When calling a function, C will evaluate the parameter 517*3d61058aSafresh1once, and pass the result to the function. But when calling a macro, 518*3d61058aSafresh1the parameter is copied as-is by the C preprocessor to each instance 519*3d61058aSafresh1inside the macro. This means that when evaluating a parameter having 520*3d61058aSafresh1side effects, the function and macro results differ. This is 521*3d61058aSafresh1particularly fraught when a parameter has overload magic, say it is a 522*3d61058aSafresh1tied variable that reads the next line in a file upon each evaluation. 523*3d61058aSafresh1Having it read multiple lines per call is probably not what the caller 524*3d61058aSafresh1intended. If a macro refers to a potentially overloadable parameter 525*3d61058aSafresh1more than once, it should first make a copy and then use that copy the 526*3d61058aSafresh1rest of the time. There are macros in the perl core that violate this, 527*3d61058aSafresh1but are gradually being converted, usually by changing to use inline 528*3d61058aSafresh1functions instead. 529e0680481Safresh1 530*3d61058aSafresh1Above we said "first make a copy". In a macro, that is easier said 531*3d61058aSafresh1than done, because macros are normally expressions, and declarations 532*3d61058aSafresh1aren't allowed in expressions. But the S<C<STMT_START> .. C<STMT_END>> 533e0680481Safresh1construct, described in L<perlapi|perlapi/STMT_START>, allows you to 534e0680481Safresh1have declarations in most contexts, as long as you don't need a return 535*3d61058aSafresh1value. If you do need a value returned, you can make the interface 536*3d61058aSafresh1such that a pointer is passed to the construct, which then stores its 537*3d61058aSafresh1result there. (Or you can use GCC brace groups. But these require a 538*3d61058aSafresh1fallback if the code will ever get executed on a platform that lacks 539*3d61058aSafresh1this non-standard extension to C. And that fallback would be another 540*3d61058aSafresh1code path, which can get out-of-sync with the brace group one, so doing 541*3d61058aSafresh1this isn't advisable.) In situations where there's no other way, Perl 542*3d61058aSafresh1does furnish L<perlintern/C<PL_Sv>> and L<perlapi/C<PL_na>> to use 543*3d61058aSafresh1(with a slight performance penalty) for some such common cases. But 544*3d61058aSafresh1beware that a call chain involving multiple macros using them will zap 545*3d61058aSafresh1the other's use. These have been very difficult to debug. 546e0680481Safresh1 547e0680481Safresh1For a concrete example of these pitfalls in action, see 548*3d61058aSafresh1L<https://perlmonks.org/?node_id=11144355>. 549e0680481Safresh1 550898184e3Ssthen=head2 Portability problems 551898184e3Ssthen 552898184e3SsthenThe following are common causes of compilation and/or execution 553898184e3Ssthenfailures, not common to Perl as such. The C FAQ is good bedtime 554898184e3Ssthenreading. Please test your changes with as many C compilers and 555898184e3Ssthenplatforms as possible; we will, anyway, and it's nice to save oneself 556898184e3Ssthenfrom public embarrassment. 557898184e3Ssthen 558898184e3SsthenAlso study L<perlport> carefully to avoid any bad assumptions about the 559b8851fccSafresh1operating system, filesystems, character set, and so forth. 560898184e3Ssthen 561898184e3SsthenDo not assume an operating system indicates a certain compiler. 562898184e3Ssthen 563898184e3Ssthen=over 4 564898184e3Ssthen 565898184e3Ssthen=item * 566898184e3Ssthen 567898184e3SsthenCasting pointers to integers or casting integers to pointers 568898184e3Ssthen 569898184e3Ssthen void castaway(U8* p) 570898184e3Ssthen { 571898184e3Ssthen IV i = p; 572898184e3Ssthen 573898184e3Ssthenor 574898184e3Ssthen 575898184e3Ssthen void castaway(U8* p) 576898184e3Ssthen { 577898184e3Ssthen IV i = (IV)p; 578898184e3Ssthen 579898184e3SsthenBoth are bad, and broken, and unportable. Use the PTR2IV() macro that 580898184e3Ssthendoes it right. (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and 581898184e3SsthenNUM2PTR().) 582898184e3Ssthen 583898184e3Ssthen=item * 584898184e3Ssthen 585b8851fccSafresh1Casting between function pointers and data pointers 586898184e3Ssthen 587898184e3SsthenTechnically speaking casting between function pointers and data 588898184e3Ssthenpointers is unportable and undefined, but practically speaking it seems 589898184e3Ssthento work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros. 590898184e3SsthenSometimes you can also play games with unions. 591898184e3Ssthen 592898184e3Ssthen=item * 593898184e3Ssthen 594*3d61058aSafresh1Assuming C<sizeof(int) == sizeof(long)> 595898184e3Ssthen 596898184e3SsthenThere are platforms where longs are 64 bits, and platforms where ints 597898184e3Ssthenare 64 bits, and while we are out to shock you, even platforms where 598898184e3Ssthenshorts are 64 bits. This is all legal according to the C standard. (In 599*3d61058aSafresh1other words, C<long long> is not a portable way to specify 64 bits, and 600*3d61058aSafresh1C<long long> is not even guaranteed to be any wider than C<long>.) 601898184e3Ssthen 602*3d61058aSafresh1Instead, use the definitions C<IV>, C<UV>, C<IVSIZE>, C<I32SIZE>, and 603*3d61058aSafresh1so forth. Avoid things like C<I32> because they are B<not> guaranteed 604*3d61058aSafresh1to be I<exactly> 32 bits, they are I<at least> 32 bits, nor are they 605*3d61058aSafresh1guaranteed to be C<int> or C<long>. If you explicitly need 64-bit 606*3d61058aSafresh1variables, use C<I64> and C<U64>. 607898184e3Ssthen 608898184e3Ssthen=item * 609898184e3Ssthen 610898184e3SsthenAssuming one can dereference any type of pointer for any type of data 611898184e3Ssthen 612898184e3Ssthen char *p = ...; 613b8851fccSafresh1 long pony = *(long *)p; /* BAD */ 614898184e3Ssthen 615898184e3SsthenMany platforms, quite rightly so, will give you a core dump instead of 616898184e3Ssthena pony if the p happens not to be correctly aligned. 617898184e3Ssthen 618898184e3Ssthen=item * 619898184e3Ssthen 620898184e3SsthenLvalue casts 621898184e3Ssthen 622898184e3Ssthen (int)*p = ...; /* BAD */ 623898184e3Ssthen 624898184e3SsthenSimply not portable. Get your lvalue to be of the right type, or maybe 625898184e3Ssthenuse temporary variables, or dirty tricks with unions. 626898184e3Ssthen 627898184e3Ssthen=item * 628898184e3Ssthen 629898184e3SsthenAssume B<anything> about structs (especially the ones you don't 630898184e3Ssthencontrol, like the ones coming from the system headers) 631898184e3Ssthen 632898184e3Ssthen=over 8 633898184e3Ssthen 634898184e3Ssthen=item * 635898184e3Ssthen 636898184e3SsthenThat a certain field exists in a struct 637898184e3Ssthen 638898184e3Ssthen=item * 639898184e3Ssthen 640898184e3SsthenThat no other fields exist besides the ones you know of 641898184e3Ssthen 642898184e3Ssthen=item * 643898184e3Ssthen 644898184e3SsthenThat a field is of certain signedness, sizeof, or type 645898184e3Ssthen 646898184e3Ssthen=item * 647898184e3Ssthen 648898184e3SsthenThat the fields are in a certain order 649898184e3Ssthen 650898184e3Ssthen=over 8 651898184e3Ssthen 652898184e3Ssthen=item * 653898184e3Ssthen 654898184e3SsthenWhile C guarantees the ordering specified in the struct definition, 655898184e3Ssthenbetween different platforms the definitions might differ 656898184e3Ssthen 657898184e3Ssthen=back 658898184e3Ssthen 659898184e3Ssthen=item * 660898184e3Ssthen 661*3d61058aSafresh1That the C<sizeof(struct)> or the alignments are the same everywhere 662898184e3Ssthen 663898184e3Ssthen=over 8 664898184e3Ssthen 665898184e3Ssthen=item * 666898184e3Ssthen 667898184e3SsthenThere might be padding bytes between the fields to align the fields - 668898184e3Ssthenthe bytes can be anything 669898184e3Ssthen 670898184e3Ssthen=item * 671898184e3Ssthen 672898184e3SsthenStructs are required to be aligned to the maximum alignment required by 673*3d61058aSafresh1the fields - which for native types is usually equivalent to 674*3d61058aSafresh1C<sizeof(the_field)>. 675898184e3Ssthen 676898184e3Ssthen=back 677898184e3Ssthen 678898184e3Ssthen=back 679898184e3Ssthen 680898184e3Ssthen=item * 681898184e3Ssthen 682898184e3SsthenAssuming the character set is ASCIIish 683898184e3Ssthen 684898184e3SsthenPerl can compile and run under EBCDIC platforms. See L<perlebcdic>. 685898184e3SsthenThis is transparent for the most part, but because the character sets 686898184e3Ssthendiffer, you shouldn't use numeric (decimal, octal, nor hex) constants 687b8851fccSafresh1to refer to characters. You can safely say C<'A'>, but not C<0x41>. 688b8851fccSafresh1You can safely say C<'\n'>, but not C<\012>. However, you can use 689b8851fccSafresh1macros defined in F<utf8.h> to specify any code point portably. 690b8851fccSafresh1C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means 691b8851fccSafresh1LATIN SMALL LETTER SHARP S on whatever platform you are running on (on 692b8851fccSafresh1ASCII platforms it compiles without adding any extra code, so there is 693b8851fccSafresh1zero performance hit on those). The acceptable inputs to 694b8851fccSafresh1C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>. If your input 695b8851fccSafresh1isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead. 696b8851fccSafresh1C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite 697b8851fccSafresh1direction. 698b8851fccSafresh1 699*3d61058aSafresh1If you need the string representation of a character that doesn't have 700*3d61058aSafresh1a mnemonic name in C, you should add it to the list in 701*3d61058aSafresh1F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for 702*3d61058aSafresh1you, based on the current platform. 703898184e3Ssthen 704b8851fccSafresh1Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work 705b8851fccSafresh1properly on native code points and strings. 706b8851fccSafresh1 707898184e3SsthenAlso, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper 708*3d61058aSafresh1case alphabetic characters. That is not true in EBCDIC. Nor for 'a' 709*3d61058aSafresh1to 'z'. But '0' - '9' is an unbroken range in both systems. Don't 710*3d61058aSafresh1assume anything about other ranges. (Note that special handling of 711*3d61058aSafresh1ranges in regular expression patterns and transliterations makes it 712*3d61058aSafresh1appear to Perl code that the aforementioned ranges are all unbroken.) 713898184e3Ssthen 714898184e3SsthenMany of the comments in the existing code ignore the possibility of 715898184e3SsthenEBCDIC, and may be wrong therefore, even if the code works. This is 716898184e3Ssthenactually a tribute to the successful transparent insertion of being 717898184e3Ssthenable to handle EBCDIC without having to change pre-existing code. 718898184e3Ssthen 719898184e3SsthenUTF-8 and UTF-EBCDIC are two different encodings used to represent 720898184e3SsthenUnicode code points as sequences of bytes. Macros with the same names 721b8851fccSafresh1(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to 722898184e3Ssthenallow the calling code to think that there is only one such encoding. 723898184e3SsthenThis is almost always referred to as C<utf8>, but it means the EBCDIC 724898184e3Ssthenversion as well. Again, comments in the code may well be wrong even if 725*3d61058aSafresh1the code itself is right. For example, the concept of UTF-8 726*3d61058aSafresh1C<invariant characters> differs between ASCII and EBCDIC. On ASCII 727*3d61058aSafresh1platforms, only characters that do not have the high-order bit set 728*3d61058aSafresh1(i.e. whose ordinals are strict ASCII, 0 - 127) are invariant, and the 729*3d61058aSafresh1documentation and comments in the code may assume that, often referring 730*3d61058aSafresh1to something like, say, C<hibit>. The situation differs and is not so 731*3d61058aSafresh1simple on EBCDIC machines, but as long as the code itself uses the 732898184e3SsthenC<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the 733898184e3Ssthencomments are wrong. 734898184e3Ssthen 735b8851fccSafresh1As noted in L<perlhack/TESTING>, when writing test scripts, the file 736b8851fccSafresh1F<t/charset_tools.pl> contains some helpful functions for writing tests 737b8851fccSafresh1valid on both ASCII and EBCDIC platforms. Sometimes, though, a test 738b8851fccSafresh1can't use a function and it's inconvenient to have different test 739b8851fccSafresh1versions depending on the platform. There are 20 code points that are 740b8851fccSafresh1the same in all 4 character sets currently recognized by Perl (the 3 741*3d61058aSafresh1EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)). These can be used 742*3d61058aSafresh1in such tests, though there is a small possibility that Perl will 743*3d61058aSafresh1become available in yet another character set, breaking your test. All 744*3d61058aSafresh1but one of these code points are C0 control characters. The most 745*3d61058aSafresh1significant controls that are the same are C<\0>, C<\r>, and C<\N{VT}> 746*3d61058aSafresh1(also specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>). The 747*3d61058aSafresh1single non-control is U+00B6 PILCROW SIGN. The controls that are the 748*3d61058aSafresh1same have the same bit pattern in all 4 character sets, regardless of 749*3d61058aSafresh1the UTF8ness of the string containing them. The bit pattern for U+B6 750*3d61058aSafresh1is the same in all 4 for non-UTF8 strings, but differs in each when its 751*3d61058aSafresh1containing string is UTF-8 encoded. The only other code points that 752*3d61058aSafresh1have some sort of sameness across all 4 character sets are the pair 753*3d61058aSafresh10xDC and 0xFC. Together these represent upper- and lowercase LATIN 754*3d61058aSafresh1LETTER U WITH DIAERESIS, but which is upper and which is lower may be 755*3d61058aSafresh1reversed: 0xDC is the capital in Latin1 and 0xFC is the small letter, 756*3d61058aSafresh1while 0xFC is the capital in EBCDIC and 0xDC is the small one. This 757*3d61058aSafresh1factoid may be exploited in writing case insensitive tests that are the 758*3d61058aSafresh1same across all 4 character sets. 759b8851fccSafresh1 760898184e3Ssthen=item * 761898184e3Ssthen 762898184e3SsthenAssuming the character set is just ASCII 763898184e3Ssthen 764*3d61058aSafresh1ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 765*3d61058aSafresh1extra characters have different meanings depending on the locale. 766*3d61058aSafresh1Absent a locale, currently these extra characters are generally 767*3d61058aSafresh1considered to be unassigned, and this has presented some problems. This 768*3d61058aSafresh1has being changed starting in 5.12 so that these characters can be 769*3d61058aSafresh1considered to be Latin-1 (ISO-8859-1). 770898184e3Ssthen 771898184e3Ssthen=item * 772898184e3Ssthen 773898184e3SsthenMixing #define and #ifdef 774898184e3Ssthen 775898184e3Ssthen #define BURGLE(x) ... \ 776898184e3Ssthen #ifdef BURGLE_OLD_STYLE /* BAD */ 777898184e3Ssthen ... do it the old way ... \ 778898184e3Ssthen #else 779898184e3Ssthen ... do it the new way ... \ 780898184e3Ssthen #endif 781898184e3Ssthen 782898184e3SsthenYou cannot portably "stack" cpp directives. For example in the above 783898184e3Ssthenyou need two separate BURGLE() #defines, one for each #ifdef branch. 784898184e3Ssthen 785898184e3Ssthen=item * 786898184e3Ssthen 787898184e3SsthenAdding non-comment stuff after #endif or #else 788898184e3Ssthen 789898184e3Ssthen #ifdef SNOSH 790898184e3Ssthen ... 791898184e3Ssthen #else !SNOSH /* BAD */ 792898184e3Ssthen ... 793898184e3Ssthen #endif SNOSH /* BAD */ 794898184e3Ssthen 795898184e3SsthenThe #endif and #else cannot portably have anything non-comment after 796898184e3Ssthenthem. If you want to document what is going (which is a good idea 797898184e3Ssthenespecially if the branches are long), use (C) comments: 798898184e3Ssthen 799898184e3Ssthen #ifdef SNOSH 800898184e3Ssthen ... 801898184e3Ssthen #else /* !SNOSH */ 802898184e3Ssthen ... 803898184e3Ssthen #endif /* SNOSH */ 804898184e3Ssthen 805898184e3SsthenThe gcc option C<-Wendif-labels> warns about the bad variant (by 806898184e3Ssthendefault on starting from Perl 5.9.4). 807898184e3Ssthen 808898184e3Ssthen=item * 809898184e3Ssthen 810898184e3SsthenHaving a comma after the last element of an enum list 811898184e3Ssthen 812898184e3Ssthen enum color { 813898184e3Ssthen CERULEAN, 814898184e3Ssthen CHARTREUSE, 815898184e3Ssthen CINNABAR, /* BAD */ 816898184e3Ssthen }; 817898184e3Ssthen 818898184e3Ssthenis not portable. Leave out the last comma. 819898184e3Ssthen 820898184e3SsthenAlso note that whether enums are implicitly morphable to ints varies 821898184e3Ssthenbetween compilers, you might need to (int). 822898184e3Ssthen 823898184e3Ssthen=item * 824898184e3Ssthen 825898184e3SsthenMixing signed char pointers with unsigned char pointers 826898184e3Ssthen 827898184e3Ssthen int foo(char *s) { ... } 828898184e3Ssthen ... 829898184e3Ssthen unsigned char *t = ...; /* Or U8* t = ... */ 830898184e3Ssthen foo(t); /* BAD */ 831898184e3Ssthen 832898184e3SsthenWhile this is legal practice, it is certainly dubious, and downright 833898184e3Ssthenfatal in at least one platform: for example VMS cc considers this a 834898184e3Ssthenfatal error. One cause for people often making this mistake is that a 835898184e3Ssthen"naked char" and therefore dereferencing a "naked char pointer" have an 836898184e3Ssthenundefined signedness: it depends on the compiler and the flags of the 837898184e3Ssthencompiler and the underlying platform whether the result is signed or 838*3d61058aSafresh1unsigned. For this very same reason using a 'char' as an array index 839*3d61058aSafresh1is bad. 840898184e3Ssthen 841898184e3Ssthen=item * 842898184e3Ssthen 843898184e3SsthenMacros that have string constants and their arguments as substrings of 844898184e3Ssthenthe string constants 845898184e3Ssthen 846898184e3Ssthen #define FOO(n) printf("number = %d\n", n) /* BAD */ 847898184e3Ssthen FOO(10); 848898184e3Ssthen 849898184e3SsthenPre-ANSI semantics for that was equivalent to 850898184e3Ssthen 851898184e3Ssthen printf("10umber = %d\10"); 852898184e3Ssthen 853898184e3Ssthenwhich is probably not what you were expecting. Unfortunately at least 854898184e3Ssthenone reasonably common and modern C compiler does "real backward 855898184e3Ssthencompatibility" here, in AIX that is what still happens even though the 856898184e3Ssthenrest of the AIX compiler is very happily C89. 857898184e3Ssthen 858898184e3Ssthen=item * 859898184e3Ssthen 860898184e3SsthenUsing printf formats for non-basic C types 861898184e3Ssthen 862898184e3Ssthen IV i = ...; 863898184e3Ssthen printf("i = %d\n", i); /* BAD */ 864898184e3Ssthen 865898184e3SsthenWhile this might by accident work in some platform (where IV happens to 866*3d61058aSafresh1be an C<int>), in general it cannot. IV might be something larger. 867*3d61058aSafresh1Even worse the situation is with more specific types (defined by Perl's 868898184e3Ssthenconfiguration step in F<config.h>): 869898184e3Ssthen 870898184e3Ssthen Uid_t who = ...; 871898184e3Ssthen printf("who = %d\n", who); /* BAD */ 872898184e3Ssthen 873898184e3SsthenThe problem here is that Uid_t might be not only not C<int>-wide but it 874898184e3Ssthenmight also be unsigned, in which case large uids would be printed as 875898184e3Ssthennegative values. 876898184e3Ssthen 877898184e3SsthenThere is no simple solution to this because of printf()'s limited 878898184e3Ssthenintelligence, but for many types the right format is available as with 879898184e3Sstheneither 'f' or '_f' suffix, for example: 880898184e3Ssthen 881898184e3Ssthen IVdf /* IV in decimal */ 882898184e3Ssthen UVxf /* UV is hexadecimal */ 883898184e3Ssthen 884898184e3Ssthen printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */ 885898184e3Ssthen 886898184e3Ssthen Uid_t_f /* Uid_t in decimal */ 887898184e3Ssthen 888898184e3Ssthen printf("who = %"Uid_t_f"\n", who); 889898184e3Ssthen 890898184e3SsthenOr you can try casting to a "wide enough" type: 891898184e3Ssthen 892898184e3Ssthen printf("i = %"IVdf"\n", (IV)something_very_small_and_signed); 893898184e3Ssthen 8949f11ffb7Safresh1See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to 8959f11ffb7Safresh1print those. 8969f11ffb7Safresh1 897898184e3SsthenAlso remember that the C<%p> format really does require a void pointer: 898898184e3Ssthen 899898184e3Ssthen U8* p = ...; 900898184e3Ssthen printf("p = %p\n", (void*)p); 901898184e3Ssthen 902898184e3SsthenThe gcc option C<-Wformat> scans for such problems. 903898184e3Ssthen 904898184e3Ssthen=item * 905898184e3Ssthen 906898184e3SsthenBlindly passing va_list 907898184e3Ssthen 908898184e3SsthenNot all platforms support passing va_list to further varargs (stdarg) 909898184e3Ssthenfunctions. The right thing to do is to copy the va_list using the 910898184e3SsthenPerl_va_copy() if the NEED_VA_COPY is defined. 911898184e3Ssthen 912eac174f2Safresh1=for apidoc_section $genconfig 913eac174f2Safresh1=for apidoc Amnh||NEED_VA_COPY 914eac174f2Safresh1 915898184e3Ssthen=item * 916898184e3Ssthen 917898184e3SsthenUsing gcc statement expressions 918898184e3Ssthen 919898184e3Ssthen val = ({...;...;...}); /* BAD */ 920898184e3Ssthen 921eac174f2Safresh1While a nice extension, it's not portable. Historically, Perl used 922*3d61058aSafresh1them in macros if available to gain some extra speed (essentially as a 923*3d61058aSafresh1funky form of inlining), but we now support (or emulate) C99 C<static 924*3d61058aSafresh1inline> functions, so use them instead. Declare functions as 925*3d61058aSafresh1C<PERL_STATIC_INLINE> to transparently fall back to emulation where 926*3d61058aSafresh1needed. 927898184e3Ssthen 928898184e3Ssthen=item * 929898184e3Ssthen 930898184e3SsthenBinding together several statements in a macro 931898184e3Ssthen 932e0680481Safresh1Use the macros C<STMT_START> and C<STMT_END>. 933898184e3Ssthen 934898184e3Ssthen STMT_START { 935898184e3Ssthen ... 936898184e3Ssthen } STMT_END 937898184e3Ssthen 938e0680481Safresh1But there can be subtle (but avoidable if you do it right) bugs 939e0680481Safresh1introduced with these; see L<perlapi/C<STMT_START>> for best practices 940e0680481Safresh1for their use. 941e0680481Safresh1 942898184e3Ssthen=item * 943898184e3Ssthen 944*3d61058aSafresh1Testing for operating systems or versions when you should be testing 945*3d61058aSafresh1for features 946898184e3Ssthen 947898184e3Ssthen #ifdef __FOONIX__ /* BAD */ 948898184e3Ssthen foo = quux(); 949898184e3Ssthen #endif 950898184e3Ssthen 951898184e3SsthenUnless you know with 100% certainty that quux() is only ever available 952898184e3Ssthenfor the "Foonix" operating system B<and> that is available B<and> 953898184e3Ssthencorrectly working for B<all> past, present, B<and> future versions of 954898184e3Ssthen"Foonix", the above is very wrong. This is more correct (though still 955898184e3Ssthennot perfect, because the below is a compile-time check): 956898184e3Ssthen 957898184e3Ssthen #ifdef HAS_QUUX 958898184e3Ssthen foo = quux(); 959898184e3Ssthen #endif 960898184e3Ssthen 961898184e3SsthenHow does the HAS_QUUX become defined where it needs to be? Well, if 962898184e3SsthenFoonix happens to be Unixy enough to be able to run the Configure 963898184e3Ssthenscript, and Configure has been taught about detecting and testing 964*3d61058aSafresh1quux(), the HAS_QUUX will be correctly defined. In other platforms, 965*3d61058aSafresh1the corresponding configuration step will hopefully do the same. 966898184e3Ssthen 967898184e3SsthenIn a pinch, if you cannot wait for Configure to be educated, or if you 968898184e3Ssthenhave a good hunch of where quux() might be available, you can 969898184e3Ssthentemporarily try the following: 970898184e3Ssthen 971898184e3Ssthen #if (defined(__FOONIX__) || defined(__BARNIX__)) 972898184e3Ssthen # define HAS_QUUX 973898184e3Ssthen #endif 974898184e3Ssthen 975898184e3Ssthen ... 976898184e3Ssthen 977898184e3Ssthen #ifdef HAS_QUUX 978898184e3Ssthen foo = quux(); 979898184e3Ssthen #endif 980898184e3Ssthen 981898184e3SsthenBut in any case, try to keep the features and operating systems 982898184e3Ssthenseparate. 983898184e3Ssthen 984*3d61058aSafresh1A good resource on the predefined macros for various operating systems, 985*3d61058aSafresh1compilers, and so forth is 986*3d61058aSafresh1L<https://sourceforge.net/p/predef/wiki/Home/>. 987b8851fccSafresh1 988fb8aa749Safresh1=item * 989fb8aa749Safresh1 990fb8aa749Safresh1Assuming the contents of static memory pointed to by the return values 991*3d61058aSafresh1of Perl wrappers for C library functions doesn't change. Many C 992*3d61058aSafresh1library functions return pointers to static storage that can be 993*3d61058aSafresh1overwritten by subsequent calls to the same or related functions. Perl 994*3d61058aSafresh1has wrappers for some of these functions. Originally many of those 995*3d61058aSafresh1wrappers returned those volatile pointers. But over time almost all of 996*3d61058aSafresh1them have evolved to return stable copies. To cope with the remaining 997*3d61058aSafresh1ones, do a L<perlapi/savepv> to make a copy, thus avoiding these 998*3d61058aSafresh1problems. You will have to free the copy when you're done to avoid 999*3d61058aSafresh1memory leaks. If you don't have control over when it gets freed, 1000*3d61058aSafresh1you'll need to make the copy in a mortal scalar, like so 1001fb8aa749Safresh1 1002eac174f2Safresh1 SvPVX(sv_2mortal(newSVpv(volatile_string, 0))) 1003fb8aa749Safresh1 1004898184e3Ssthen=back 1005898184e3Ssthen 1006898184e3Ssthen=head2 Problematic System Interfaces 1007898184e3Ssthen 1008898184e3Ssthen=over 4 1009898184e3Ssthen 1010898184e3Ssthen=item * 1011898184e3Ssthen 101256d68f1eSafresh1Perl strings are NOT the same as C strings: They may contain C<NUL> 1013*3d61058aSafresh1characters, whereas a C string is terminated by the first C<NUL>. That 1014*3d61058aSafresh1is why Perl API functions that deal with strings generally take a 101556d68f1eSafresh1pointer to the first byte and either a length or a pointer to the byte 101656d68f1eSafresh1just beyond the final one. 101756d68f1eSafresh1 101856d68f1eSafresh1And this is the reason that many of the C library string handling 101956d68f1eSafresh1functions should not be used. They don't cope with the full generality 102056d68f1eSafresh1of Perl strings. It may be that your test cases don't have embedded 102156d68f1eSafresh1C<NUL>s, and so the tests pass, whereas there may well eventually arise 102256d68f1eSafresh1real-world cases where they fail. A lesson here is to include C<NUL>s 102356d68f1eSafresh1in your tests. Now it's fairly rare in most real world cases to get 102456d68f1eSafresh1C<NUL>s, so your code may seem to work, until one day a C<NUL> comes 102556d68f1eSafresh1along. 102656d68f1eSafresh1 1027*3d61058aSafresh1Here's an example. It used to be a common paradigm, for decades, in 1028*3d61058aSafresh1the perl core to use S<C<strchr("list", c)>> to see if the character 1029*3d61058aSafresh1C<c> is any of the ones given in C<"list">, a double-quote-enclosed 1030*3d61058aSafresh1string of the set of characters that we are seeing if C<c> is one of. 1031*3d61058aSafresh1As long as C<c> isn't a C<NUL>, it works. But when C<c> is a C<NUL>, 1032*3d61058aSafresh1C<strchr> returns a pointer to the terminating C<NUL> in C<"list">. 1033*3d61058aSafresh1This likely will result in a segfault or a security issue when the 1034*3d61058aSafresh1caller uses that end pointer as the starting point to read from. 103556d68f1eSafresh1 1036*3d61058aSafresh1A solution to this and many similar issues is to use the C<mem>I<-foo> 1037*3d61058aSafresh1C library functions instead. In this case C<memchr> can be used to see 1038*3d61058aSafresh1if C<c> is in C<"list"> and works even if C<c> is C<NUL>. These 1039*3d61058aSafresh1functions need an additional parameter to give the string length. In 1040*3d61058aSafresh1the case of literal string parameters, perl has defined macros that 1041eac174f2Safresh1calculate the length for you. See L<perlapi/String Handling>. 104256d68f1eSafresh1 104356d68f1eSafresh1=item * 104456d68f1eSafresh1 1045898184e3Ssthenmalloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable 1046898184e3Ssthenallocate at least one byte. (In general you should rarely need to work 1047898184e3Ssthenat this low level, but instead use the various malloc wrappers.) 1048898184e3Ssthen 1049898184e3Ssthen=item * 1050898184e3Ssthen 1051898184e3Ssthensnprintf() - the return type is unportable. Use my_snprintf() instead. 1052898184e3Ssthen 1053898184e3Ssthen=back 1054898184e3Ssthen 1055898184e3Ssthen=head2 Security problems 1056898184e3Ssthen 1057*3d61058aSafresh1Last but not least, here are various tips for safer coding. See also 1058*3d61058aSafresh1L<perlclib> for libc/stdio replacements one should use. 1059898184e3Ssthen 1060898184e3Ssthen=over 4 1061898184e3Ssthen 1062898184e3Ssthen=item * 1063898184e3Ssthen 1064898184e3SsthenDo not use gets() 1065898184e3Ssthen 1066898184e3SsthenOr we will publicly ridicule you. Seriously. 1067898184e3Ssthen 1068898184e3Ssthen=item * 1069898184e3Ssthen 10706fb12b70Safresh1Do not use tmpfile() 10716fb12b70Safresh1 10726fb12b70Safresh1Use mkstemp() instead. 10736fb12b70Safresh1 10746fb12b70Safresh1=item * 10756fb12b70Safresh1 1076898184e3SsthenDo not use strcpy() or strcat() or strncpy() or strncat() 1077898184e3Ssthen 1078898184e3SsthenUse my_strlcpy() and my_strlcat() instead: they either use the native 1079898184e3Ssthenimplementation, or Perl's own implementation (borrowed from the public 1080898184e3Ssthendomain implementation of INN). 1081898184e3Ssthen 1082898184e3Ssthen=item * 1083898184e3Ssthen 1084898184e3SsthenDo not use sprintf() or vsprintf() 1085898184e3Ssthen 1086898184e3SsthenIf you really want just plain byte strings, use my_snprintf() and 1087898184e3Ssthenmy_vsnprintf() instead, which will try to use snprintf() and 1088898184e3Ssthenvsnprintf() if those safer APIs are available. If you want something 1089*3d61058aSafresh1fancier than a plain byte string, use L<C<Perl_form>()|perlapi/form> or 1090*3d61058aSafresh1SVs and L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>. 10916fb12b70Safresh1 10926fb12b70Safresh1Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc 10936fb12b70Safresh1version 2.17. They won't allow a C<%.s> format with a precision to 10946fb12b70Safresh1create a string that isn't valid UTF-8 if the current underlying locale 1095*3d61058aSafresh1of the program is UTF-8. What happens is that the C<%s> and its 1096*3d61058aSafresh1operand are simply skipped without any notice. 10976fb12b70Safresh1L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>. 1098898184e3Ssthen 1099b8851fccSafresh1=item * 1100b8851fccSafresh1 1101b8851fccSafresh1Do not use atoi() 1102b8851fccSafresh1 1103*3d61058aSafresh1Use grok_atoUV() instead. atoi() has ill-defined behavior on 1104*3d61058aSafresh1overflows, and cannot be used for incremental parsing. It is also 1105*3d61058aSafresh1affected by locale, which is bad. 1106b8851fccSafresh1 1107b8851fccSafresh1=item * 1108b8851fccSafresh1 1109b8851fccSafresh1Do not use strtol() or strtoul() 1110b8851fccSafresh1 1111*3d61058aSafresh1Use grok_atoUV() instead. strtol() or strtoul() (or their 1112*3d61058aSafresh1IV/UV-friendly macro disguises, Strtol() and Strtoul(), or Atol() and 1113*3d61058aSafresh1Atoul() are affected by locale, which is bad. 1114b8851fccSafresh1 1115eac174f2Safresh1=for apidoc_section $numeric 1116eac174f2Safresh1=for apidoc AmhD||Atol|const char * nptr 1117eac174f2Safresh1=for apidoc AmhD||Atoul|const char * nptr 1118eac174f2Safresh1 1119898184e3Ssthen=back 1120898184e3Ssthen 1121898184e3Ssthen=head1 DEBUGGING 1122898184e3Ssthen 1123898184e3SsthenYou can compile a special debugging version of Perl, which allows you 1124898184e3Ssthento use the C<-D> option of Perl to tell more about what Perl is doing. 1125898184e3SsthenBut sometimes there is no alternative than to dive in with a debugger, 1126898184e3Sstheneither to see the stack trace of a core dump (very useful in a bug 1127898184e3Ssthenreport), or trying to figure out what went wrong before the core dump 1128898184e3Ssthenhappened, or how did we end up having wrong or unexpected results. 1129898184e3Ssthen 1130898184e3Ssthen=head2 Poking at Perl 1131898184e3Ssthen 1132898184e3SsthenTo really poke around with Perl, you'll probably want to build Perl for 1133898184e3Ssthendebugging, like this: 1134898184e3Ssthen 11359f11ffb7Safresh1 ./Configure -d -DDEBUGGING 1136898184e3Ssthen make 1137898184e3Ssthen 11389f11ffb7Safresh1C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce 11399f11ffb7Safresh1debugging information which will allow us to step through a running 1140*3d61058aSafresh1program, and to see in which C function we are at (without the 1141*3d61058aSafresh1debugging information we might see only the numerical addresses of the 1142*3d61058aSafresh1functions, which is not very helpful). It will also turn on the 1143*3d61058aSafresh1C<DEBUGGING> compilation symbol which enables all the internal 1144*3d61058aSafresh1debugging code in Perl. There are a whole bunch of things you can debug 1145*3d61058aSafresh1with this: L<perlrun|perlrun/-Dletters> lists them all, and the best 1146*3d61058aSafresh1way to find out about them is to play about with them. The most useful 1147*3d61058aSafresh1options are probably 1148898184e3Ssthen 1149898184e3Ssthen l Context (loop) stack processing 11509f11ffb7Safresh1 s Stack snapshots (with v, displays all stacks) 1151898184e3Ssthen t Trace execution 1152898184e3Ssthen o Method and overloading resolution 1153898184e3Ssthen c String/numeric conversions 1154898184e3Ssthen 11559f11ffb7Safresh1For example 11569f11ffb7Safresh1 1157*3d61058aSafresh1 $ perl -Dst -e '$x + 1' 11589f11ffb7Safresh1 .... 1159*3d61058aSafresh1 (-e:1) gvsv(main::x) 11609f11ffb7Safresh1 => UNDEF 11619f11ffb7Safresh1 (-e:1) const(IV(1)) 11629f11ffb7Safresh1 => UNDEF IV(1) 11639f11ffb7Safresh1 (-e:1) add 11649f11ffb7Safresh1 => NV(1) 11659f11ffb7Safresh1 11669f11ffb7Safresh1 11679f11ffb7Safresh1Some of the functionality of the debugging code can be achieved with a 11689f11ffb7Safresh1non-debugging perl by using XS modules: 1169898184e3Ssthen 1170898184e3Ssthen -Dr => use re 'debug' 1171898184e3Ssthen -Dx => use O 'Debug' 1172898184e3Ssthen 1173898184e3Ssthen=head2 Using a source-level debugger 1174898184e3Ssthen 1175898184e3SsthenIf the debugging output of C<-D> doesn't help you, it's time to step 1176898184e3Ssthenthrough perl's execution with a source-level debugger. 1177898184e3Ssthen 1178898184e3Ssthen=over 3 1179898184e3Ssthen 1180898184e3Ssthen=item * 1181898184e3Ssthen 1182898184e3SsthenWe'll use C<gdb> for our examples here; the principles will apply to 1183898184e3Ssthenany debugger (many vendors call their debugger C<dbx>), but check the 1184898184e3Ssthenmanual of the one you're using. 1185898184e3Ssthen 1186898184e3Ssthen=back 1187898184e3Ssthen 1188898184e3SsthenTo fire up the debugger, type 1189898184e3Ssthen 1190898184e3Ssthen gdb ./perl 1191898184e3Ssthen 1192898184e3SsthenOr if you have a core dump: 1193898184e3Ssthen 1194898184e3Ssthen gdb ./perl core 1195898184e3Ssthen 1196898184e3SsthenYou'll want to do that in your Perl source tree so the debugger can 1197*3d61058aSafresh1read the source code. You should see the copyright message, followed 1198*3d61058aSafresh1by the prompt. 1199898184e3Ssthen 1200898184e3Ssthen (gdb) 1201898184e3Ssthen 1202898184e3SsthenC<help> will get you into the documentation, but here are the most 1203898184e3Ssthenuseful commands: 1204898184e3Ssthen 1205898184e3Ssthen=over 3 1206898184e3Ssthen 1207898184e3Ssthen=item * run [args] 1208898184e3Ssthen 1209898184e3SsthenRun the program with the given arguments. 1210898184e3Ssthen 1211898184e3Ssthen=item * break function_name 1212898184e3Ssthen 1213898184e3Ssthen=item * break source.c:xxx 1214898184e3Ssthen 1215898184e3SsthenTells the debugger that we'll want to pause execution when we reach 1216898184e3Sstheneither the named function (but see L<perlguts/Internal Functions>!) or 1217898184e3Ssthenthe given line in the named source file. 1218898184e3Ssthen 1219898184e3Ssthen=item * step 1220898184e3Ssthen 1221898184e3SsthenSteps through the program a line at a time. 1222898184e3Ssthen 1223898184e3Ssthen=item * next 1224898184e3Ssthen 1225898184e3SsthenSteps through the program a line at a time, without descending into 1226898184e3Ssthenfunctions. 1227898184e3Ssthen 1228898184e3Ssthen=item * continue 1229898184e3Ssthen 1230898184e3SsthenRun until the next breakpoint. 1231898184e3Ssthen 1232898184e3Ssthen=item * finish 1233898184e3Ssthen 1234898184e3SsthenRun until the end of the current function, then stop again. 1235898184e3Ssthen 1236898184e3Ssthen=item * 'enter' 1237898184e3Ssthen 1238898184e3SsthenJust pressing Enter will do the most recent operation again - it's a 1239898184e3Ssthenblessing when stepping through miles of source code. 1240898184e3Ssthen 12416fb12b70Safresh1=item * ptype 12426fb12b70Safresh1 12436fb12b70Safresh1Prints the C definition of the argument given. 12446fb12b70Safresh1 12456fb12b70Safresh1 (gdb) ptype PL_op 12466fb12b70Safresh1 type = struct op { 12476fb12b70Safresh1 OP *op_next; 1248b8851fccSafresh1 OP *op_sibparent; 12496fb12b70Safresh1 OP *(*op_ppaddr)(void); 12506fb12b70Safresh1 PADOFFSET op_targ; 12516fb12b70Safresh1 unsigned int op_type : 9; 12526fb12b70Safresh1 unsigned int op_opt : 1; 12536fb12b70Safresh1 unsigned int op_slabbed : 1; 12546fb12b70Safresh1 unsigned int op_savefree : 1; 12556fb12b70Safresh1 unsigned int op_static : 1; 12566fb12b70Safresh1 unsigned int op_folded : 1; 12576fb12b70Safresh1 unsigned int op_spare : 2; 12586fb12b70Safresh1 U8 op_flags; 12596fb12b70Safresh1 U8 op_private; 12606fb12b70Safresh1 } * 12616fb12b70Safresh1 1262898184e3Ssthen=item * print 1263898184e3Ssthen 1264898184e3SsthenExecute the given C code and print its results. B<WARNING>: Perl makes 1265898184e3Ssthenheavy use of macros, and F<gdb> does not necessarily support macros 1266898184e3Ssthen(see later L</"gdb macro support">). You'll have to substitute them 1267898184e3Ssthenyourself, or to invoke cpp on the source code files (see L</"The .i 1268898184e3SsthenTargets">) So, for instance, you can't say 1269898184e3Ssthen 1270898184e3Ssthen print SvPV_nolen(sv) 1271898184e3Ssthen 1272898184e3Ssthenbut you have to say 1273898184e3Ssthen 1274898184e3Ssthen print Perl_sv_2pv_nolen(sv) 1275898184e3Ssthen 1276898184e3Ssthen=back 1277898184e3Ssthen 1278898184e3SsthenYou may find it helpful to have a "macro dictionary", which you can 1279898184e3Ssthenproduce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't 1280898184e3Ssthenrecursively apply those macros for you. 1281898184e3Ssthen 1282898184e3Ssthen=head2 gdb macro support 1283898184e3Ssthen 1284898184e3SsthenRecent versions of F<gdb> have fairly good macro support, but in order 1285898184e3Ssthento use it you'll need to compile perl with macro definitions included 1286898184e3Ssthenin the debugging information. Using F<gcc> version 3.1, this means 1287898184e3Ssthenconfiguring with C<-Doptimize=-g3>. Other compilers might use a 1288898184e3Ssthendifferent switch (if they support debugging macros at all). 1289898184e3Ssthen 1290898184e3Ssthen=head2 Dumping Perl Data Structures 1291898184e3Ssthen 1292898184e3SsthenOne way to get around this macro hell is to use the dumping functions 1293898184e3Ssthenin F<dump.c>; these work a little like an internal 1294898184e3SsthenL<Devel::Peek|Devel::Peek>, but they also cover OPs and other 1295898184e3Ssthenstructures that you can't get at from Perl. Let's take an example. 1296*3d61058aSafresh1We'll use the C<$x = $y + $z> we used before, but give it a bit of 1297*3d61058aSafresh1context: C<$y = "6XXXX"; $z = 2.3;>. Where's a good place to stop and 1298898184e3Ssthenpoke around? 1299898184e3Ssthen 1300898184e3SsthenWhat about C<pp_add>, the function we examined earlier to implement the 1301898184e3SsthenC<+> operator: 1302898184e3Ssthen 1303898184e3Ssthen (gdb) break Perl_pp_add 1304898184e3Ssthen Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. 1305898184e3Ssthen 1306898184e3SsthenNotice we use C<Perl_pp_add> and not C<pp_add> - see 1307898184e3SsthenL<perlguts/Internal Functions>. With the breakpoint in place, we can 1308898184e3Ssthenrun our program: 1309898184e3Ssthen 1310*3d61058aSafresh1 (gdb) run -e '$y = "6XXXX"; $z = 2.3; $x = $y + $z' 1311898184e3Ssthen 1312898184e3SsthenLots of junk will go past as gdb reads in the relevant source files and 1313898184e3Ssthenlibraries, and then: 1314898184e3Ssthen 1315898184e3Ssthen Breakpoint 1, Perl_pp_add () at pp_hot.c:309 1316b46d8ef2Safresh1 1396 dSP; dATARGET; bool useleft; SV *svl, *svr; 1317898184e3Ssthen (gdb) step 1318898184e3Ssthen 311 dPOPTOPnnrl_ul; 1319898184e3Ssthen (gdb) 1320898184e3Ssthen 1321898184e3SsthenWe looked at this bit of code before, and we said that 1322898184e3SsthenC<dPOPTOPnnrl_ul> arranges for two C<NV>s to be placed into C<left> and 1323898184e3SsthenC<right> - let's slightly expand it: 1324898184e3Ssthen 1325898184e3Ssthen #define dPOPTOPnnrl_ul NV right = POPn; \ 1326898184e3Ssthen SV *leftsv = TOPs; \ 1327898184e3Ssthen NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 1328898184e3Ssthen 1329898184e3SsthenC<POPn> takes the SV from the top of the stack and obtains its NV 1330898184e3Sstheneither directly (if C<SvNOK> is set) or by calling the C<sv_2nv> 1331898184e3Ssthenfunction. C<TOPs> takes the next SV from the top of the stack - yes, 1332898184e3SsthenC<POPn> uses C<TOPs> - but doesn't remove it. We then use C<SvNV> to 1333898184e3Ssthenget the NV from C<leftsv> in the same way as before - yes, C<POPn> uses 1334898184e3SsthenC<SvNV>. 1335898184e3Ssthen 1336*3d61058aSafresh1Since we don't have an NV for C<$y>, we'll have to use C<sv_2nv> to 1337898184e3Ssthenconvert it. If we step again, we'll find ourselves there: 1338898184e3Ssthen 13396fb12b70Safresh1 (gdb) step 1340898184e3Ssthen Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 1341898184e3Ssthen 1669 if (!sv) 1342898184e3Ssthen (gdb) 1343898184e3Ssthen 1344898184e3SsthenWe can now use C<Perl_sv_dump> to investigate the SV: 1345898184e3Ssthen 13466fb12b70Safresh1 (gdb) print Perl_sv_dump(sv) 1347898184e3Ssthen SV = PV(0xa057cc0) at 0xa0675d0 1348898184e3Ssthen REFCNT = 1 1349898184e3Ssthen FLAGS = (POK,pPOK) 1350898184e3Ssthen PV = 0xa06a510 "6XXXX"\0 1351898184e3Ssthen CUR = 5 1352898184e3Ssthen LEN = 6 1353898184e3Ssthen $1 = void 1354898184e3Ssthen 1355898184e3SsthenWe know we're going to get C<6> from this, so let's finish the 1356898184e3Ssthensubroutine: 1357898184e3Ssthen 1358898184e3Ssthen (gdb) finish 1359898184e3Ssthen Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 1360898184e3Ssthen 0x462669 in Perl_pp_add () at pp_hot.c:311 1361898184e3Ssthen 311 dPOPTOPnnrl_ul; 1362898184e3Ssthen 1363898184e3SsthenWe can also dump out this op: the current op is always stored in 1364898184e3SsthenC<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us 1365*3d61058aSafresh1similar output to CPAN module L<B::Debug>. 1366898184e3Ssthen 1367eac174f2Safresh1=for apidoc_section $debugging 1368eac174f2Safresh1=for apidoc Amnh||PL_op 1369eac174f2Safresh1 13706fb12b70Safresh1 (gdb) print Perl_op_dump(PL_op) 1371898184e3Ssthen { 1372898184e3Ssthen 13 TYPE = add ===> 14 1373898184e3Ssthen TARG = 1 1374898184e3Ssthen FLAGS = (SCALAR,KIDS) 1375898184e3Ssthen { 1376898184e3Ssthen TYPE = null ===> (12) 1377898184e3Ssthen (was rv2sv) 1378898184e3Ssthen FLAGS = (SCALAR,KIDS) 1379898184e3Ssthen { 1380898184e3Ssthen 11 TYPE = gvsv ===> 12 1381898184e3Ssthen FLAGS = (SCALAR) 1382898184e3Ssthen GV = main::b 1383898184e3Ssthen } 1384898184e3Ssthen } 1385898184e3Ssthen 1386898184e3Ssthen# finish this later # 1387898184e3Ssthen 13886fb12b70Safresh1=head2 Using gdb to look at specific parts of a program 13896fb12b70Safresh1 1390*3d61058aSafresh1With the example above, you knew to look for C<Perl_pp_add>, but what 1391*3d61058aSafresh1if there were multiple calls to it all over the place, or you didn't 1392*3d61058aSafresh1know what the op was you were looking for? 13936fb12b70Safresh1 1394*3d61058aSafresh1One way to do this is to inject a rare call somewhere near what you're 1395*3d61058aSafresh1looking for. For example, you could add C<study> before your method: 13966fb12b70Safresh1 13976fb12b70Safresh1 study; 13986fb12b70Safresh1 13996fb12b70Safresh1And in gdb do: 14006fb12b70Safresh1 14016fb12b70Safresh1 (gdb) break Perl_pp_study 14026fb12b70Safresh1 1403*3d61058aSafresh1And then step until you hit what you're looking for. This works well 1404*3d61058aSafresh1in a loop if you want to only break at certain iterations: 14056fb12b70Safresh1 1406*3d61058aSafresh1 for my $i (1..100) { 1407*3d61058aSafresh1 study if $i == 50; 14086fb12b70Safresh1 } 14096fb12b70Safresh1 14106fb12b70Safresh1=head2 Using gdb to look at what the parser/lexer are doing 14116fb12b70Safresh1 1412*3d61058aSafresh1If you want to see what perl is doing when parsing/lexing your code, 1413*3d61058aSafresh1you can use C<BEGIN {}>: 14146fb12b70Safresh1 14156fb12b70Safresh1 print "Before\n"; 14166fb12b70Safresh1 BEGIN { study; } 14176fb12b70Safresh1 print "After\n"; 14186fb12b70Safresh1 14196fb12b70Safresh1And in gdb: 14206fb12b70Safresh1 14216fb12b70Safresh1 (gdb) break Perl_pp_study 14226fb12b70Safresh1 1423*3d61058aSafresh1If you want to see what the parser/lexer is doing inside of C<if> 1424*3d61058aSafresh1blocks and the like you need to be a little trickier: 14256fb12b70Safresh1 1426*3d61058aSafresh1 if ($x && $y && do { BEGIN { study } 1 } && $z) { ... } 14276fb12b70Safresh1 1428898184e3Ssthen=head1 SOURCE CODE STATIC ANALYSIS 1429898184e3Ssthen 1430898184e3SsthenVarious tools exist for analysing C source code B<statically>, as 1431898184e3Ssthenopposed to B<dynamically>, that is, without executing the code. It is 1432898184e3Ssthenpossible to detect resource leaks, undefined behaviour, type 1433898184e3Ssthenmismatches, portability problems, code paths that would cause illegal 1434898184e3Ssthenmemory accesses, and other similar problems by just parsing the C code 1435898184e3Ssthenand looking at the resulting graph, what does it tell about the 1436898184e3Ssthenexecution and data flows. As a matter of fact, this is exactly how C 1437898184e3Ssthencompilers know to give warnings about dubious code. 1438898184e3Ssthen 14399f11ffb7Safresh1=head2 lint 1440898184e3Ssthen 1441898184e3SsthenThe good old C code quality inspector, C<lint>, is available in several 1442898184e3Ssthenplatforms, but please be aware that there are several different 1443898184e3Ssthenimplementations of it by different vendors, which means that the flags 1444898184e3Ssthenare not identical across different platforms. 1445898184e3Ssthen 1446*3d61058aSafresh1There is a C<lint> target in Makefile, but you may have to diddle with 1447*3d61058aSafresh1the flags (see above). 1448898184e3Ssthen 1449898184e3Ssthen=head2 Coverity 1450898184e3Ssthen 1451*3d61058aSafresh1Coverity (L<https://www.coverity.com/>) is a product similar to lint and 1452*3d61058aSafresh1as a testbed for their product they periodically check several open 1453*3d61058aSafresh1source projects, and they give out accounts to open source developers 1454*3d61058aSafresh1to the defect databases. 1455898184e3Ssthen 1456b8851fccSafresh1There is Coverity setup for the perl5 project: 1457b8851fccSafresh1L<https://scan.coverity.com/projects/perl5> 1458b8851fccSafresh1 1459b8851fccSafresh1=head2 HP-UX cadvise (Code Advisor) 1460b8851fccSafresh1 1461b8851fccSafresh1HP has a C/C++ static analyzer product for HP-UX caller Code Advisor. 1462*3d61058aSafresh1(Link not given here because the URL is horribly long and seems 1463*3d61058aSafresh1horribly unstable; use the search engine of your choice to find it.) 1464*3d61058aSafresh1The use of the C<cadvise_cc> recipe with C<Configure ... 1465*3d61058aSafresh1-Dcc=./cadvise_cc> (see cadvise "User Guide") is recommended; as is the 1466*3d61058aSafresh1use of C<+wall>. 1467b8851fccSafresh1 1468898184e3Ssthen=head2 cpd (cut-and-paste detector) 1469898184e3Ssthen 1470898184e3SsthenThe cpd tool detects cut-and-paste coding. If one instance of the 1471898184e3Ssthencut-and-pasted code changes, all the other spots should probably be 1472898184e3Ssthenchanged, too. Therefore such code should probably be turned into a 1473898184e3Ssthensubroutine or a macro. 1474898184e3Ssthen 1475*3d61058aSafresh1cpd (L<https://docs.pmd-code.org/latest/pmd_userdocs_cpd.html>) is part 1476*3d61058aSafresh1of the pmd project (L<https://pmd.github.io/>). pmd was originally 1477*3d61058aSafresh1written for static analysis of Java code, but later the cpd part of it 1478*3d61058aSafresh1was extended to parse also C and C++. 1479898184e3Ssthen 1480898184e3SsthenDownload the pmd-bin-X.Y.zip () from the SourceForge site, extract the 1481898184e3Ssthenpmd-X.Y.jar from it, and then run that on source code thusly: 1482898184e3Ssthen 148391f110e0Safresh1 java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \ 148491f110e0Safresh1 --minimum-tokens 100 --files /some/where/src --language c > cpd.txt 1485898184e3Ssthen 1486898184e3SsthenYou may run into memory limits, in which case you should use the -Xmx 1487898184e3Ssthenoption: 1488898184e3Ssthen 1489898184e3Ssthen java -Xmx512M ... 1490898184e3Ssthen 1491898184e3Ssthen=head2 gcc warnings 1492898184e3Ssthen 1493898184e3SsthenThough much can be written about the inconsistency and coverage 1494898184e3Ssthenproblems of gcc warnings (like C<-Wall> not meaning "all the warnings", 1495898184e3Ssthenor some common portability problems not being covered by C<-Wall>, or 1496898184e3SsthenC<-ansi> and C<-pedantic> both being a poorly defined collection of 1497898184e3Ssthenwarnings, and so forth), gcc is still a useful tool in keeping our 1498898184e3Ssthencoding nose clean. 1499898184e3Ssthen 1500898184e3SsthenThe C<-Wall> is by default on. 1501898184e3Ssthen 1502*3d61058aSafresh1It would be nice for C<-pedantic>) to be on always, but unfortunately 1503*3d61058aSafresh1it is not safe on all platforms - for example fatal conflicts with the 1504*3d61058aSafresh1system headers (Solaris being a prime example). If Configure 1505*3d61058aSafresh1C<-Dgccansipedantic> is used, the C<cflags> frontend selects 1506*3d61058aSafresh1C<-pedantic> for the platforms where it is known to be safe. 1507898184e3Ssthen 15089f11ffb7Safresh1The following extra flags are added: 1509898184e3Ssthen 1510898184e3Ssthen=over 4 1511898184e3Ssthen 1512898184e3Ssthen=item * 1513898184e3Ssthen 1514898184e3SsthenC<-Wendif-labels> 1515898184e3Ssthen 1516898184e3Ssthen=item * 1517898184e3Ssthen 1518898184e3SsthenC<-Wextra> 1519898184e3Ssthen 1520898184e3Ssthen=item * 1521898184e3Ssthen 15229f11ffb7Safresh1C<-Wc++-compat> 15239f11ffb7Safresh1 15249f11ffb7Safresh1=item * 15259f11ffb7Safresh1 15269f11ffb7Safresh1C<-Wwrite-strings> 15279f11ffb7Safresh1 15289f11ffb7Safresh1=item * 15299f11ffb7Safresh1 1530eac174f2Safresh1C<-Werror=pointer-arith> 15319f11ffb7Safresh1 15329f11ffb7Safresh1=item * 15339f11ffb7Safresh1 1534eac174f2Safresh1C<-Werror=vla> 1535898184e3Ssthen 1536898184e3Ssthen=back 1537898184e3Ssthen 1538898184e3SsthenThe following flags would be nice to have but they would first need 1539898184e3Ssthentheir own Augean stablemaster: 1540898184e3Ssthen 1541898184e3Ssthen=over 4 1542898184e3Ssthen 1543898184e3Ssthen=item * 1544898184e3Ssthen 1545898184e3SsthenC<-Wshadow> 1546898184e3Ssthen 1547898184e3Ssthen=item * 1548898184e3Ssthen 1549898184e3SsthenC<-Wstrict-prototypes> 1550898184e3Ssthen 1551898184e3Ssthen=back 1552898184e3Ssthen 1553898184e3SsthenThe C<-Wtraditional> is another example of the annoying tendency of gcc 1554898184e3Ssthento bundle a lot of warnings under one switch (it would be impossible to 1555898184e3Ssthendeploy in practice because it would complain a lot) but it does contain 1556898184e3Ssthensome warnings that would be beneficial to have available on their own, 1557898184e3Ssthensuch as the warning about string constants inside macros containing the 1558898184e3Ssthenmacro arguments: this behaved differently pre-ANSI than it does in 1559898184e3SsthenANSI, and some C compilers are still in transition, AIX being an 1560898184e3Ssthenexample. 1561898184e3Ssthen 1562898184e3Ssthen=head2 Warnings of other C compilers 1563898184e3Ssthen 1564898184e3SsthenOther C compilers (yes, there B<are> other C compilers than gcc) often 1565898184e3Ssthenhave their "strict ANSI" or "strict ANSI with some portability 1566898184e3Ssthenextensions" modes on, like for example the Sun Workshop has its C<-Xa> 1567898184e3Ssthenmode on (though implicitly), or the DEC (these days, HP...) has its 1568898184e3SsthenC<-std1> mode on. 1569898184e3Ssthen 1570898184e3Ssthen=head1 MEMORY DEBUGGERS 1571898184e3Ssthen 157291f110e0Safresh1B<NOTE 1>: Running under older memory debuggers such as Purify, 157391f110e0Safresh1valgrind or Third Degree greatly slows down the execution: seconds 1574*3d61058aSafresh1become minutes, minutes become hours. For example as of Perl 5.8.1, 1575*3d61058aSafresh1the F<ext/Encode/t/Unicode.t> test takes extraordinarily long to 1576*3d61058aSafresh1complete under e.g. Purify, Third Degree, and valgrind. Under valgrind 1577*3d61058aSafresh1it takes more than six hours, even on a snappy computer. Said test 1578*3d61058aSafresh1must be doing something that is quite unfriendly for memory debuggers. 1579*3d61058aSafresh1If you don't feel like waiting, you can simply kill the perl process. 158091f110e0Safresh1Roughly valgrind slows down execution by factor 10, AddressSanitizer by 158191f110e0Safresh1factor 2. 1582898184e3Ssthen 1583898184e3SsthenB<NOTE 2>: To minimize the number of memory leak false alarms (see 1584898184e3SsthenL</PERL_DESTRUCT_LEVEL> for more information), you have to set the 1585*3d61058aSafresh1environment variable C<PERL_DESTRUCT_LEVEL> to 2. For example, like 1586*3d61058aSafresh1this: 1587898184e3Ssthen 1588898184e3Ssthen env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ... 1589898184e3Ssthen 1590898184e3SsthenB<NOTE 3>: There are known memory leaks when there are compile-time 1591*3d61058aSafresh1errors within C<eval> or C<require>; seeing C<S_doeval> in the call 1592*3d61058aSafresh1stack is a good sign of these. Fixing these leaks is non-trivial, 1593*3d61058aSafresh1unfortunately, but they must be fixed eventually. 1594898184e3Ssthen 1595898184e3SsthenB<NOTE 4>: L<DynaLoader> will not clean up after itself completely 1596898184e3Ssthenunless Perl is built with the Configure option 1597898184e3SsthenC<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>. 1598898184e3Ssthen 1599898184e3Ssthen=head2 valgrind 1600898184e3Ssthen 160191f110e0Safresh1The valgrind tool can be used to find out both memory leaks and illegal 1602*3d61058aSafresh1heap memory accesses. As of version 3.3.0, Valgrind only supports 1603*3d61058aSafresh1Linux on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64. 1604*3d61058aSafresh1The special "test.valgrind" target can be used to run the tests under 160591f110e0Safresh1valgrind. Found errors and memory leaks are logged in files named 1606b8851fccSafresh1F<testfile.valgrind> and by default output is displayed inline. 1607b8851fccSafresh1 1608b8851fccSafresh1Example usage: 1609b8851fccSafresh1 1610b8851fccSafresh1 make test.valgrind 1611b8851fccSafresh1 1612*3d61058aSafresh1Since valgrind adds significant overhead, tests will take much longer 1613*3d61058aSafresh1to run. The valgrind tests support being run in parallel to help with 1614*3d61058aSafresh1this: 1615b8851fccSafresh1 1616b8851fccSafresh1 TEST_JOBS=9 make test.valgrind 1617b8851fccSafresh1 1618b8851fccSafresh1Note that the above two invocations will be very verbose as reachable 1619*3d61058aSafresh1memory and leak-checking is enabled by default. If you want to just 1620*3d61058aSafresh1see pure errors, try: 1621b8851fccSafresh1 1622b8851fccSafresh1 VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \ 1623b8851fccSafresh1 make test.valgrind 1624898184e3Ssthen 1625898184e3SsthenValgrind also provides a cachegrind tool, invoked on perl as: 1626898184e3Ssthen 1627898184e3Ssthen VG_OPTS=--tool=cachegrind make test.valgrind 1628898184e3Ssthen 1629898184e3SsthenAs system libraries (most notably glibc) are also triggering errors, 1630898184e3Ssthenvalgrind allows to suppress such errors using suppression files. The 1631898184e3Ssthendefault suppression file that comes with valgrind already catches a lot 1632898184e3Ssthenof them. Some additional suppressions are defined in F<t/perl.supp>. 1633898184e3Ssthen 1634*3d61058aSafresh1To get valgrind and for more information see L<https://valgrind.org/>. 1635898184e3Ssthen 163691f110e0Safresh1=head2 AddressSanitizer 163791f110e0Safresh1 163856d68f1eSafresh1AddressSanitizer ("ASan") consists of a compiler instrumentation module 163956d68f1eSafresh1and a run-time C<malloc> library. ASan is available for a variety of 1640*3d61058aSafresh1architectures, operating systems, and compilers (see project link 1641*3d61058aSafresh1below). It checks for unsafe memory usage, such as use after free and 1642*3d61058aSafresh1buffer overflow conditions, and is fast enough that you can easily 1643*3d61058aSafresh1compile your debugging or optimized perl with it. Modern versions of 1644*3d61058aSafresh1ASan check for memory leaks by default on most platforms, otherwise 1645*3d61058aSafresh1(e.g. x86_64 OS X) this feature can be enabled via 1646*3d61058aSafresh1C<ASAN_OPTIONS=detect_leaks=1>. 164756d68f1eSafresh1 164891f110e0Safresh1 164991f110e0Safresh1To build perl with AddressSanitizer, your Configure invocation should 165091f110e0Safresh1look like: 165191f110e0Safresh1 165291f110e0Safresh1 sh Configure -des -Dcc=clang \ 165356d68f1eSafresh1 -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \ 165456d68f1eSafresh1 -Alddlflags=-shared\ -fsanitize=address \ 165556d68f1eSafresh1 -fsanitize-blacklist=`pwd`/asan_ignore 165691f110e0Safresh1 165791f110e0Safresh1where these arguments mean: 165891f110e0Safresh1 165991f110e0Safresh1=over 4 166091f110e0Safresh1 166191f110e0Safresh1=item * -Dcc=clang 166291f110e0Safresh1 166391f110e0Safresh1This should be replaced by the full path to your clang executable if it 166491f110e0Safresh1is not in your path. 166591f110e0Safresh1 166656d68f1eSafresh1=item * -Accflags=-fsanitize=address 166791f110e0Safresh1 166891f110e0Safresh1Compile perl and extensions sources with AddressSanitizer. 166991f110e0Safresh1 167056d68f1eSafresh1=item * -Aldflags=-fsanitize=address 167191f110e0Safresh1 167291f110e0Safresh1Link the perl executable with AddressSanitizer. 167391f110e0Safresh1 167456d68f1eSafresh1=item * -Alddlflags=-shared\ -fsanitize=address 167591f110e0Safresh1 167691f110e0Safresh1Link dynamic extensions with AddressSanitizer. You must manually 167791f110e0Safresh1specify C<-shared> because using C<-Alddlflags=-shared> will prevent 167891f110e0Safresh1Configure from setting a default value for C<lddlflags>, which usually 16796fb12b70Safresh1contains C<-shared> (at least on Linux). 168091f110e0Safresh1 168156d68f1eSafresh1=item * -fsanitize-blacklist=`pwd`/asan_ignore 168256d68f1eSafresh1 168356d68f1eSafresh1AddressSanitizer will ignore functions listed in the C<asan_ignore> 1684*3d61058aSafresh1file. (This file should contain a short explanation of why each of the 1685*3d61058aSafresh1functions is listed.) 168656d68f1eSafresh1 168791f110e0Safresh1=back 168891f110e0Safresh1 1689*3d61058aSafresh1See also L<https://github.com/google/sanitizers/wiki/AddressSanitizer>. 1690*3d61058aSafresh1 1691*3d61058aSafresh1=head2 Dr Memory 1692*3d61058aSafresh1 1693*3d61058aSafresh1Dr. Memory is a tool similar to valgrind which is usable on Windows 1694*3d61058aSafresh1and Linux. 1695*3d61058aSafresh1 1696*3d61058aSafresh1It supports heap checking like C<memcheck> from valgrind. There are 1697*3d61058aSafresh1also other tools included. 1698*3d61058aSafresh1 1699*3d61058aSafresh1See L<https://drmemory.org/>. 170091f110e0Safresh1 170191f110e0Safresh1 1702898184e3Ssthen=head1 PROFILING 1703898184e3Ssthen 1704898184e3SsthenDepending on your platform there are various ways of profiling Perl. 1705898184e3Ssthen 1706898184e3SsthenThere are two commonly used techniques of profiling executables: 1707898184e3SsthenI<statistical time-sampling> and I<basic-block counting>. 1708898184e3Ssthen 1709898184e3SsthenThe first method takes periodically samples of the CPU program counter, 1710898184e3Ssthenand since the program counter can be correlated with the code generated 1711898184e3Ssthenfor functions, we get a statistical view of in which functions the 1712898184e3Ssthenprogram is spending its time. The caveats are that very small/fast 1713898184e3Ssthenfunctions have lower probability of showing up in the profile, and that 1714898184e3Ssthenperiodically interrupting the program (this is usually done rather 1715898184e3Ssthenfrequently, in the scale of milliseconds) imposes an additional 1716*3d61058aSafresh1overhead that may skew the results. The first problem can be 1717*3d61058aSafresh1alleviated by running the code for longer (in general this is a good 1718*3d61058aSafresh1idea for profiling), the second problem is usually kept in guard by the 1719898184e3Ssthenprofiling tools themselves. 1720898184e3Ssthen 1721898184e3SsthenThe second method divides up the generated code into I<basic blocks>. 1722898184e3SsthenBasic blocks are sections of code that are entered only in the 1723898184e3Ssthenbeginning and exited only at the end. For example, a conditional jump 1724898184e3Ssthenstarts a basic block. Basic block profiling usually works by 1725898184e3SsthenI<instrumenting> the code by adding I<enter basic block #nnnn> 1726898184e3Ssthenbook-keeping code to the generated code. During the execution of the 1727898184e3Ssthencode the basic block counters are then updated appropriately. The 1728898184e3Ssthencaveat is that the added extra code can skew the results: again, the 1729898184e3Ssthenprofiling tools usually try to factor their own effects out of the 1730898184e3Ssthenresults. 1731898184e3Ssthen 1732898184e3Ssthen=head2 Gprof Profiling 1733898184e3Ssthen 17346fb12b70Safresh1I<gprof> is a profiling tool available in many Unix platforms which 17356fb12b70Safresh1uses I<statistical time-sampling>. You can build a profiled version of 17366fb12b70Safresh1F<perl> by compiling using gcc with the flag C<-pg>. Either edit 17376fb12b70Safresh1F<config.sh> or re-run F<Configure>. Running the profiled version of 17386fb12b70Safresh1Perl will create an output file called F<gmon.out> which contains the 17396fb12b70Safresh1profiling data collected during the execution. 1740898184e3Ssthen 17416fb12b70Safresh1quick hint: 1742898184e3Ssthen 17436fb12b70Safresh1 $ sh Configure -des -Dusedevel -Accflags='-pg' \ 17446fb12b70Safresh1 -Aldflags='-pg' -Alddlflags='-pg -shared' \ 17456fb12b70Safresh1 && make perl 17466fb12b70Safresh1 $ ./perl ... # creates gmon.out in current directory 17476fb12b70Safresh1 $ gprof ./perl > out 17486fb12b70Safresh1 $ less out 17496fb12b70Safresh1 17506fb12b70Safresh1(you probably need to add C<-shared> to the <-Alddlflags> line until RT 17516fb12b70Safresh1#118199 is resolved) 17526fb12b70Safresh1 17536fb12b70Safresh1The F<gprof> tool can then display the collected data in various ways. 17546fb12b70Safresh1Usually F<gprof> understands the following options: 1755898184e3Ssthen 1756898184e3Ssthen=over 4 1757898184e3Ssthen 1758898184e3Ssthen=item * -a 1759898184e3Ssthen 1760898184e3SsthenSuppress statically defined functions from the profile. 1761898184e3Ssthen 1762898184e3Ssthen=item * -b 1763898184e3Ssthen 1764898184e3SsthenSuppress the verbose descriptions in the profile. 1765898184e3Ssthen 1766898184e3Ssthen=item * -e routine 1767898184e3Ssthen 1768898184e3SsthenExclude the given routine and its descendants from the profile. 1769898184e3Ssthen 1770898184e3Ssthen=item * -f routine 1771898184e3Ssthen 1772898184e3SsthenDisplay only the given routine and its descendants in the profile. 1773898184e3Ssthen 1774898184e3Ssthen=item * -s 1775898184e3Ssthen 1776898184e3SsthenGenerate a summary file called F<gmon.sum> which then may be given to 1777898184e3Ssthensubsequent gprof runs to accumulate data over several runs. 1778898184e3Ssthen 1779898184e3Ssthen=item * -z 1780898184e3Ssthen 1781898184e3SsthenDisplay routines that have zero usage. 1782898184e3Ssthen 1783898184e3Ssthen=back 1784898184e3Ssthen 1785898184e3SsthenFor more detailed explanation of the available commands and output 17866fb12b70Safresh1formats, see your own local documentation of F<gprof>. 1787898184e3Ssthen 1788898184e3Ssthen=head2 GCC gcov Profiling 1789898184e3Ssthen 17906fb12b70Safresh1I<basic block profiling> is officially available in gcc 3.0 and later. 17916fb12b70Safresh1You can build a profiled version of F<perl> by compiling using gcc with 17926fb12b70Safresh1the flags C<-fprofile-arcs -ftest-coverage>. Either edit F<config.sh> 17936fb12b70Safresh1or re-run F<Configure>. 1794898184e3Ssthen 17956fb12b70Safresh1quick hint: 17966fb12b70Safresh1 17976fb12b70Safresh1 $ sh Configure -des -Dusedevel -Doptimize='-g' \ 17986fb12b70Safresh1 -Accflags='-fprofile-arcs -ftest-coverage' \ 17996fb12b70Safresh1 -Aldflags='-fprofile-arcs -ftest-coverage' \ 18006fb12b70Safresh1 -Alddlflags='-fprofile-arcs -ftest-coverage -shared' \ 18016fb12b70Safresh1 && make perl 18026fb12b70Safresh1 $ rm -f regexec.c.gcov regexec.gcda 18036fb12b70Safresh1 $ ./perl ... 18046fb12b70Safresh1 $ gcov regexec.c 18056fb12b70Safresh1 $ less regexec.c.gcov 18066fb12b70Safresh1 18076fb12b70Safresh1(you probably need to add C<-shared> to the <-Alddlflags> line until RT 18086fb12b70Safresh1#118199 is resolved) 1809898184e3Ssthen 1810898184e3SsthenRunning the profiled version of Perl will cause profile output to be 18116fb12b70Safresh1generated. For each source file an accompanying F<.gcda> file will be 1812898184e3Ssthencreated. 1813898184e3Ssthen 18146fb12b70Safresh1To display the results you use the I<gcov> utility (which should be 1815898184e3Sstheninstalled if you have gcc 3.0 or newer installed). F<gcov> is run on 1816898184e3Ssthensource code files, like this 1817898184e3Ssthen 1818898184e3Ssthen gcov sv.c 1819898184e3Ssthen 1820*3d61058aSafresh1which will cause F<sv.c.gcov> to be created. The F<.gcov> files 1821*3d61058aSafresh1contain the source code annotated with relative frequencies of 1822*3d61058aSafresh1execution indicated by "#" markers. If you want to generate F<.gcov> 1823*3d61058aSafresh1files for all profiled object files, you can run something like this: 18246fb12b70Safresh1 18256fb12b70Safresh1 for file in `find . -name \*.gcno` 18266fb12b70Safresh1 do sh -c "cd `dirname $file` && gcov `basename $file .gcno`" 18276fb12b70Safresh1 done 1828898184e3Ssthen 1829898184e3SsthenUseful options of F<gcov> include C<-b> which will summarise the basic 1830898184e3Ssthenblock, branch, and function call coverage, and C<-c> which instead of 1831898184e3Ssthenrelative frequencies will use the actual counts. For more information 1832898184e3Ssthenon the use of F<gcov> and basic block profiling with gcc, see the 18336fb12b70Safresh1latest GNU CC manual. As of gcc 4.8, this is at 1834*3d61058aSafresh1L<https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>. 1835898184e3Ssthen 1836eac174f2Safresh1=head2 callgrind profiling 1837eac174f2Safresh1 1838*3d61058aSafresh1callgrind is a valgrind tool for profiling source code. Paired with 1839*3d61058aSafresh1kcachegrind (a Qt based UI), it gives you an overview of where code is 1840*3d61058aSafresh1taking up time, as well as the ability to examine callers, call trees, 1841*3d61058aSafresh1and more. One of its benefits is you can use it on perl and XS modules 1842*3d61058aSafresh1that have not been compiled with debugging symbols. 1843eac174f2Safresh1 1844*3d61058aSafresh1If perl is compiled with debugging symbols (C<-g>), you can view the 1845*3d61058aSafresh1annotated source and click around, much like L<Devel::NYTProf>'s HTML 1846*3d61058aSafresh1output. 1847eac174f2Safresh1 1848eac174f2Safresh1For basic usage: 1849eac174f2Safresh1 1850eac174f2Safresh1 valgrind --tool=callgrind ./perl ... 1851eac174f2Safresh1 1852*3d61058aSafresh1By default it will write output to F<callgrind.out.PID>, but you can 1853*3d61058aSafresh1change that with C<--callgrind-out-file=...> 1854eac174f2Safresh1 1855eac174f2Safresh1To view the data, do: 1856eac174f2Safresh1 1857eac174f2Safresh1 kcachegrind callgrind.out.PID 1858eac174f2Safresh1 1859eac174f2Safresh1If you'd prefer to view the data in a terminal, you can use 1860*3d61058aSafresh1F<callgrind_annotate>. In its basic form: 1861eac174f2Safresh1 1862eac174f2Safresh1 callgrind_annotate callgrind.out.PID | less 1863eac174f2Safresh1 1864eac174f2Safresh1Some useful options are: 1865eac174f2Safresh1 1866eac174f2Safresh1=over 4 1867eac174f2Safresh1 1868eac174f2Safresh1=item * --threshold 1869eac174f2Safresh1 1870*3d61058aSafresh1Percentage of counts (of primary sort event) we are interested in. The 1871*3d61058aSafresh1default is 99%, 100% might show things that seem to be missing. 1872eac174f2Safresh1 1873eac174f2Safresh1=item * --auto 1874eac174f2Safresh1 1875*3d61058aSafresh1Annotate all source files containing functions that helped reach the 1876*3d61058aSafresh1event count threshold. 1877eac174f2Safresh1 1878eac174f2Safresh1=back 1879eac174f2Safresh1 1880*3d61058aSafresh1=head2 C<profiler> profiling (Cygwin) 1881*3d61058aSafresh1 1882*3d61058aSafresh1Cygwin allows for C<gprof> profiling and C<gcov> coverage testing, but 1883*3d61058aSafresh1this only profiles the main executable. 1884*3d61058aSafresh1 1885*3d61058aSafresh1You can use the C<profiler> tool to perform sample based profiling, it 1886*3d61058aSafresh1requires no special preparation of the executables beyond debugging 1887*3d61058aSafresh1symbols. 1888*3d61058aSafresh1 1889*3d61058aSafresh1This produces sampling data which can be processed with C<gprof>. 1890*3d61058aSafresh1 1891*3d61058aSafresh1There is L<limited 1892*3d61058aSafresh1documentation|https://www.cygwin.com/cygwin-ug-net/profiler.html> on 1893*3d61058aSafresh1the Cygwin web site. 1894*3d61058aSafresh1 1895*3d61058aSafresh1=head2 Visual Studio Profiling 1896*3d61058aSafresh1 1897*3d61058aSafresh1You can use the Visual Studio profiler to profile perl if you've built 1898*3d61058aSafresh1perl with MSVC, even though we build perl at the command-line. You 1899*3d61058aSafresh1will need to build perl with C<CFG=Debug> or C<CFG=DebugSymbols>. 1900*3d61058aSafresh1 1901*3d61058aSafresh1The Visual Studio profiler is a sampling profiler. 1902*3d61058aSafresh1 1903*3d61058aSafresh1See L<the visual studio 1904*3d61058aSafresh1documentation|https://github.com/MicrosoftDocs/visualstudio-docs/blob/main/docs/profiling/beginners-guide-to-performance-profiling.md> 1905*3d61058aSafresh1to get started. 1906*3d61058aSafresh1 1907898184e3Ssthen=head1 MISCELLANEOUS TRICKS 1908898184e3Ssthen 1909898184e3Ssthen=head2 PERL_DESTRUCT_LEVEL 1910898184e3Ssthen 1911898184e3SsthenIf you want to run any of the tests yourself manually using e.g. 1912*3d61058aSafresh1valgrind, please note that by default perl B<does not> explicitly clean 1913*3d61058aSafresh1up all the memory it has allocated (such as global memory arenas) but 1914*3d61058aSafresh1instead lets the C<exit()> of the whole program "take care" of such 19156fb12b70Safresh1allocations, also known as "global destruction of objects". 1916898184e3Ssthen 1917898184e3SsthenThere is a way to tell perl to do complete cleanup: set the environment 1918*3d61058aSafresh1variable C<PERL_DESTRUCT_LEVEL> to a non-zero value. The F<t/TEST> 1919*3d61058aSafresh1wrapper does set this to 2, and this is what you need to do too, if you 1920*3d61058aSafresh1don't want to see the "global leaks": For example, for running under 1921*3d61058aSafresh1valgrind 1922898184e3Ssthen 19236fb12b70Safresh1 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t 1924898184e3Ssthen 1925*3d61058aSafresh1(Note: the mod_perl Apache module uses this environment variable for 1926*3d61058aSafresh1its own purposes and extends its semantics. Refer to L<the mod_perl 1927*3d61058aSafresh1documentation|https://perl.apache.org/docs/> for more information. 1928*3d61058aSafresh1Also, spawned threads do the equivalent of setting this variable to the 1929*3d61058aSafresh1value 1.) 1930898184e3Ssthen 1931*3d61058aSafresh1If, at the end of a run, you get the message I<N scalars leaked>, you 1932*3d61058aSafresh1can recompile with C<-DDEBUG_LEAKING_SCALARS> (C<Configure 1933*3d61058aSafresh1-Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the addresses of 1934*3d61058aSafresh1all those leaked SVs to be dumped along with details as to where each 1935*3d61058aSafresh1SV was originally allocated. This information is also displayed by 1936*3d61058aSafresh1L<Devel::Peek>. Note that the extra details recorded with each SV 1937*3d61058aSafresh1increase memory usage, so it shouldn't be used in production 1938898184e3Ssthenenvironments. It also converts C<new_SV()> from a macro into a real 1939898184e3Ssthenfunction, so you can use your favourite debugger to discover where 1940898184e3Ssthenthose pesky SVs were allocated. 1941898184e3Ssthen 1942898184e3SsthenIf you see that you're leaking memory at runtime, but neither valgrind 1943898184e3Ssthennor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably 1944898184e3Ssthenleaking SVs that are still reachable and will be properly cleaned up 1945898184e3Ssthenduring destruction of the interpreter. In such cases, using the C<-Dm> 1946898184e3Ssthenswitch can point you to the source of the leak. If the executable was 1947898184e3Ssthenbuilt with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV 1948*3d61058aSafresh1allocations in addition to memory allocations. Each SV allocation has 1949*3d61058aSafresh1a distinct serial number that will be written on creation and 1950*3d61058aSafresh1destruction of the SV. So if you're executing the leaking code in a 1951*3d61058aSafresh1loop, you need to look for SVs that are created, but never destroyed 1952*3d61058aSafresh1between each cycle. If such an SV is found, set a conditional 1953*3d61058aSafresh1breakpoint within C<new_SV()> and make it break only when 1954*3d61058aSafresh1C<PL_sv_serial> is equal to the serial number of the leaking SV. Then 1955*3d61058aSafresh1you will catch the interpreter in exactly the state where the leaking 1956*3d61058aSafresh1SV is allocated, which is sufficient in many cases to find the source 1957*3d61058aSafresh1of the leak. 1958898184e3Ssthen 1959898184e3SsthenAs C<-Dm> is using the PerlIO layer for output, it will by itself 1960898184e3Ssthenallocate quite a bunch of SVs, which are hidden to avoid recursion. You 1961898184e3Ssthencan bypass the PerlIO layer if you use the SV logging provided by 1962898184e3SsthenC<-DPERL_MEM_LOG> instead. 1963898184e3Ssthen 1964eac174f2Safresh1=for apidoc_section $debugging 1965eac174f2Safresh1=for apidoc Amnh||PL_sv_serial 1966eac174f2Safresh1 1967*3d61058aSafresh1=head2 Leaked SV spotting: sv_mark_arenas() and sv_sweep_arenas() 1968*3d61058aSafresh1 1969*3d61058aSafresh1These functions exist only on C<DEBUGGING> builds. The first marks all 1970*3d61058aSafresh1live SVs which can be found in the SV arenas with the C<SVf_BREAK> flag. 1971*3d61058aSafresh1The second lists any such SVs which don't have the flag set, and resets 1972*3d61058aSafresh1the flag on the rest. They are intended to identify SVs which are being 1973*3d61058aSafresh1created, but not freed, between two points in code. They can be used 1974*3d61058aSafresh1either by temporarily adding calls to them in the relevant places in the 1975*3d61058aSafresh1code, or by calling them directly from a debugger. 1976*3d61058aSafresh1 1977*3d61058aSafresh1For example, suppose the following code was found to be leaking: 1978*3d61058aSafresh1 1979*3d61058aSafresh1 while (1) { eval '\(1..3)' } 1980*3d61058aSafresh1 1981*3d61058aSafresh1A F<gdb> session on a threaded perl might look something like this: 1982*3d61058aSafresh1 1983*3d61058aSafresh1 $ gdb ./perl 1984*3d61058aSafresh1 (gdb) break Perl_pp_entereval 1985*3d61058aSafresh1 (gdb) run -e'while (1) { eval q{\(1..3)} }' 1986*3d61058aSafresh1 ... 1987*3d61058aSafresh1 Breakpoint 1, Perl_pp_entereval .... 1988*3d61058aSafresh1 (gdb) call Perl_sv_mark_arenas(my_perl) 1989*3d61058aSafresh1 (gdb) continue 1990*3d61058aSafresh1 ... 1991*3d61058aSafresh1 Breakpoint 1, Perl_pp_entereval ....` 1992*3d61058aSafresh1 (gdb) call Perl_sv_sweep_arenas(my_perl) 1993*3d61058aSafresh1 Unmarked SV: 0xaf23a8: AV() 1994*3d61058aSafresh1 Unmarked SV: 0xaf2408: IV(1) 1995*3d61058aSafresh1 Unmarked SV: 0xaf2468: IV(2) 1996*3d61058aSafresh1 Unmarked SV: 0xaf24c8: IV(3) 1997*3d61058aSafresh1 Unmarked SV: 0xace6c8: PV("AV()"\0) 1998*3d61058aSafresh1 Unmarked SV: 0xace848: PV("IV(1)"\0) 1999*3d61058aSafresh1 (gdb) 2000*3d61058aSafresh1 2001*3d61058aSafresh1Here, at the start of the first call to pp_entereval(), all existing SVs 2002*3d61058aSafresh1are marked. Then at the start of the second call, we list all the SVs 2003*3d61058aSafresh1which have been since been created but not yet freed. It is quickly clear 2004*3d61058aSafresh1that an array and its three elements are likely not being freed, perhaps 2005*3d61058aSafresh1as a result of a bug during constant folding. The final two SVs are just 2006*3d61058aSafresh1temporaries created during the debugging output and can be ignored. 2007*3d61058aSafresh1 2008*3d61058aSafresh1This trick relies on the C<SVf_BREAK> flag not otherwise being used. This 2009*3d61058aSafresh1flag is typically used only during global destruction, but also sometimes 2010*3d61058aSafresh1for a mark and sweep operation when looking for common elements on the two 2011*3d61058aSafresh1sides of a list assignment. The presence of the flag can also alter the 2012*3d61058aSafresh1behaviour of some specific actions in the core, such as choosing whether to 2013*3d61058aSafresh1copy or to COW a string SV. So turning it on can occasionally alter the 2014*3d61058aSafresh1behaviour of code slightly. 2015*3d61058aSafresh1 2016898184e3Ssthen=head2 PERL_MEM_LOG 2017898184e3Ssthen 2018b8851fccSafresh1If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both 2019*3d61058aSafresh1memory and SV allocations go through logging functions, which is handy 2020*3d61058aSafresh1for breakpoint setting. 2021898184e3Ssthen 2022b8851fccSafresh1Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is 2023b8851fccSafresh1also compiled, the logging functions read $ENV{PERL_MEM_LOG} to 2024b8851fccSafresh1determine whether to log the event, and if so how: 2025898184e3Ssthen 2026898184e3Ssthen $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops 2027898184e3Ssthen $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops 2028e0680481Safresh1 $ENV{PERL_MEM_LOG} =~ /c/ Additionally log C backtrace for 2029e0680481Safresh1 new_SV events 2030898184e3Ssthen $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log 2031898184e3Ssthen $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2) 2032898184e3Ssthen 2033898184e3SsthenMemory logging is somewhat similar to C<-Dm> but is independent of 2034898184e3SsthenC<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and 2035898184e3SsthenSafefree() are logged with the caller's source code file and line 2036898184e3Ssthennumber (and C function name, if supported by the C compiler). In 2037*3d61058aSafresh1contrast, C<-Dm> is directly at the point of C<malloc()>. SV logging 2038*3d61058aSafresh1is similar. 2039898184e3Ssthen 2040898184e3SsthenSince the logging doesn't use PerlIO, all SV allocations are logged and 2041898184e3Ssthenno extra SV allocations are introduced by enabling the logging. If 2042898184e3Ssthencompiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV 2043898184e3Ssthenallocation is also logged. 2044898184e3Ssthen 2045e0680481Safresh1The C<c> option uses the C<Perl_c_backtrace> facility, and therefore 2046e0680481Safresh1additionally requires the Configure C<-Dusecbacktrace> compile flag in 2047e0680481Safresh1order to access it. 2048e0680481Safresh1 2049898184e3Ssthen=head2 DDD over gdb 2050898184e3Ssthen 2051898184e3SsthenThose debugging perl with the DDD frontend over gdb may find the 2052898184e3Ssthenfollowing useful: 2053898184e3Ssthen 2054898184e3SsthenYou can extend the data conversion shortcuts menu, so for example you 2055898184e3Ssthencan display an SV's IV value with one click, without doing any typing. 2056898184e3SsthenTo do that simply edit ~/.ddd/init file and add after: 2057898184e3Ssthen 2058898184e3Ssthen ! Display shortcuts. 2059898184e3Ssthen Ddd*gdbDisplayShortcuts: \ 2060898184e3Ssthen /t () // Convert to Bin\n\ 2061898184e3Ssthen /d () // Convert to Dec\n\ 2062898184e3Ssthen /x () // Convert to Hex\n\ 2063898184e3Ssthen /o () // Convert to Oct(\n\ 2064898184e3Ssthen 2065898184e3Ssthenthe following two lines: 2066898184e3Ssthen 2067898184e3Ssthen ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\ 2068898184e3Ssthen ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx 2069898184e3Ssthen 2070898184e3Ssthenso now you can do ivx and pvx lookups or you can plug there the sv_peek 2071898184e3Ssthen"conversion": 2072898184e3Ssthen 2073898184e3Ssthen Perl_sv_peek(my_perl, (SV*)()) // sv_peek 2074898184e3Ssthen 2075898184e3Ssthen(The my_perl is for threaded builds.) Just remember that every line, 2076898184e3Ssthenbut the last one, should end with \n\ 2077898184e3Ssthen 2078898184e3SsthenAlternatively edit the init file interactively via: 3rd mouse button -> 2079898184e3SsthenNew Display -> Edit Menu 2080898184e3Ssthen 2081898184e3SsthenNote: you can define up to 20 conversion shortcuts in the gdb section. 2082898184e3Ssthen 2083b8851fccSafresh1=head2 C backtrace 2084b8851fccSafresh1 2085b8851fccSafresh1On some platforms Perl supports retrieving the C level backtrace 2086b8851fccSafresh1(similar to what symbolic debuggers like gdb do). 2087b8851fccSafresh1 2088*3d61058aSafresh1The backtrace returns the stack trace of the C call frames, with the 2089*3d61058aSafresh1symbol names (function names), the object names (like "perl"), and if 2090*3d61058aSafresh1it can, also the source code locations (file:line). 2091b8851fccSafresh1 2092*3d61058aSafresh1The supported platforms are Linux, and OS X (some *BSD might work at 2093*3d61058aSafresh1least partly, but they have not yet been tested). 2094b8851fccSafresh1 2095*3d61058aSafresh1This feature hasn't been tested with multiple threads, but it will only 2096*3d61058aSafresh1show the backtrace of the thread doing the backtracing. 2097b8851fccSafresh1 2098b8851fccSafresh1The feature needs to be enabled with C<Configure -Dusecbacktrace>. 2099b8851fccSafresh1 2100b8851fccSafresh1The C<-Dusecbacktrace> also enables keeping the debug information when 2101b8851fccSafresh1compiling/linking (often: C<-g>). Many compilers/linkers do support 2102b8851fccSafresh1having both optimization and keeping the debug information. The debug 2103b8851fccSafresh1information is needed for the symbol names and the source locations. 2104b8851fccSafresh1 2105b8851fccSafresh1Static functions might not be visible for the backtrace. 2106b8851fccSafresh1 2107b8851fccSafresh1Source code locations, even if available, can often be missing or 2108*3d61058aSafresh1misleading if the compiler has e.g. inlined code. Optimizer can make 2109*3d61058aSafresh1matching the source code and the object code quite challenging. 2110b8851fccSafresh1 2111b8851fccSafresh1=over 4 2112b8851fccSafresh1 2113b8851fccSafresh1=item Linux 2114b8851fccSafresh1 2115*3d61058aSafresh1You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> 2116*3d61058aSafresh1will fail to link. The BFD is usually distributed as part of the GNU 2117*3d61058aSafresh1binutils. 2118b8851fccSafresh1 2119*3d61058aSafresh1Summary: C<Configure ... -Dusecbacktrace> and you need C<-lbfd>. 2120b8851fccSafresh1 2121b8851fccSafresh1=item OS X 2122b8851fccSafresh1 2123*3d61058aSafresh1The source code locations are supported B<only> if you have the 2124*3d61058aSafresh1Developer Tools installed. (BFD is B<not> needed.) 2125b8851fccSafresh1 2126*3d61058aSafresh1Summary: C<Configure ... -Dusecbacktrace> and installing the Developer 2127*3d61058aSafresh1Tools would be good. 2128b8851fccSafresh1 2129b8851fccSafresh1=back 2130b8851fccSafresh1 2131b8851fccSafresh1Optionally, for trying out the feature, you may want to enable 2132b8851fccSafresh1automatic dumping of the backtrace just before a warning or croak (die) 2133b8851fccSafresh1message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR> 2134b8851fccSafresh1for Configure. 2135b8851fccSafresh1 2136b8851fccSafresh1Unless the above additional feature is enabled, nothing about the 2137b8851fccSafresh1backtrace functionality is visible, except for the Perl/XS level. 2138b8851fccSafresh1 2139*3d61058aSafresh1Furthermore, even if you have enabled this feature to be compiled, you 2140*3d61058aSafresh1need to enable it in runtime with an environment variable: 2141*3d61058aSafresh1C<PERL_C_BACKTRACE_ON_ERROR=10>. It must be an integer higher than 2142*3d61058aSafresh1zero, telling the desired frame count. 2143b8851fccSafresh1 2144b8851fccSafresh1Retrieving the backtrace from Perl level (using for example an XS 2145b8851fccSafresh1extension) would be much less exciting than one would hope: normally 2146b8851fccSafresh1you would see C<runops>, C<entersub>, and not much else. This API is 2147b8851fccSafresh1intended to be called B<from within> the Perl implementation, not from 2148b8851fccSafresh1Perl level execution. 2149b8851fccSafresh1 2150b8851fccSafresh1The C API for the backtrace is as follows: 2151b8851fccSafresh1 2152b8851fccSafresh1=over 4 2153b8851fccSafresh1 2154b8851fccSafresh1=item get_c_backtrace 2155b8851fccSafresh1 2156b8851fccSafresh1=item free_c_backtrace 2157b8851fccSafresh1 2158b8851fccSafresh1=item get_c_backtrace_dump 2159b8851fccSafresh1 2160b8851fccSafresh1=item dump_c_backtrace 2161b8851fccSafresh1 2162b8851fccSafresh1=back 2163b8851fccSafresh1 2164898184e3Ssthen=head2 Poison 2165898184e3Ssthen 2166898184e3SsthenIf you see in a debugger a memory area mysteriously full of 0xABABABAB 2167898184e3Ssthenor 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see 2168898184e3SsthenL<perlclib>. 2169898184e3Ssthen 2170898184e3Ssthen=head2 Read-only optrees 2171898184e3Ssthen 2172*3d61058aSafresh1Under ithreads the optree is read only. If you want to enforce this, 2173*3d61058aSafresh1to check for write accesses from buggy code, compile with 2174*3d61058aSafresh1C<-Accflags=-DPERL_DEBUG_READONLY_OPS> to enable code that allocates op 2175*3d61058aSafresh1memory via C<mmap>, and sets it read-only when it is attached to a 2176*3d61058aSafresh1subroutine. Any write access to an op results in a C<SIGBUS> and abort. 2177898184e3Ssthen 2178898184e3SsthenThis code is intended for development only, and may not be portable 2179898184e3Sstheneven to all Unix variants. Also, it is an 80% solution, in that it 2180*3d61058aSafresh1isn't able to make all ops read only. Specifically it does not apply 2181*3d61058aSafresh1to op slabs belonging to C<BEGIN> blocks. 2182898184e3Ssthen 21836fb12b70Safresh1However, as an 80% solution it is still effective, as it has caught 21846fb12b70Safresh1bugs in the past. 21856fb12b70Safresh1 21866fb12b70Safresh1=head2 When is a bool not a bool? 21876fb12b70Safresh1 2188eac174f2Safresh1There wasn't necessarily a standard C<bool> type on compilers prior to 2189eac174f2Safresh1C99, and so some workarounds were created. The C<TRUE> and C<FALSE> 2190eac174f2Safresh1macros are still available as alternatives for C<true> and C<false>. 2191eac174f2Safresh1And the C<cBOOL> macro was created to correctly cast to a true/false 2192*3d61058aSafresh1value in all circumstances, but should no longer be necessary. Using 2193*3d61058aSafresh1S<C<(bool)> I<expr>>> should now always work. 21946fb12b70Safresh1 2195eac174f2Safresh1There are no plans to remove any of C<TRUE>, C<FALSE>, nor C<cBOOL>. 2196eac174f2Safresh1 2197eac174f2Safresh1=head2 Finding unsafe truncations 2198eac174f2Safresh1 2199eac174f2Safresh1You may wish to run C<Configure> with something like 22006fb12b70Safresh1 22016fb12b70Safresh1 -Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32' 22026fb12b70Safresh1 2203*3d61058aSafresh1or your compiler's equivalent to make it easier to spot any unsafe 2204*3d61058aSafresh1truncations that show up. 2205898184e3Ssthen 2206898184e3Ssthen=head2 The .i Targets 2207898184e3Ssthen 2208898184e3SsthenYou can expand the macros in a F<foo.c> file by saying 2209898184e3Ssthen 2210898184e3Ssthen make foo.i 2211898184e3Ssthen 221291f110e0Safresh1which will expand the macros using cpp. Don't be scared by the 221391f110e0Safresh1results. 2214898184e3Ssthen 2215898184e3Ssthen=head1 AUTHOR 2216898184e3Ssthen 2217898184e3SsthenThis document was originally written by Nathan Torkington, and is 2218898184e3Ssthenmaintained by the perl5-porters mailing list. 2219*3d61058aSafresh1 2220