xref: /openbsd-src/gnu/usr.bin/perl/pod/perlhacktips.pod (revision 3d61058aa5c692477b6d18acfbbdb653a9930ff9)
1898184e3Ssthen
2898184e3Ssthen=encoding utf8
3898184e3Ssthen
4898184e3Ssthen=for comment
5898184e3SsthenConsistent formatting of this file is achieved with:
6898184e3Ssthen  perl ./Porting/podtidy pod/perlhacktips.pod
7898184e3Ssthen
8898184e3Ssthen=head1 NAME
9898184e3Ssthen
10898184e3Ssthenperlhacktips - Tips for Perl core C code hacking
11898184e3Ssthen
12898184e3Ssthen=head1 DESCRIPTION
13898184e3Ssthen
14898184e3SsthenThis document will help you learn the best way to go about hacking on
15898184e3Ssthenthe Perl core C code.  It covers common problems, debugging, profiling,
16898184e3Ssthenand more.
17898184e3Ssthen
18898184e3SsthenIf you haven't read L<perlhack> and L<perlhacktut> yet, you might want
19898184e3Ssthento do that first.
20898184e3Ssthen
21898184e3Ssthen=head1 COMMON PROBLEMS
22898184e3Ssthen
23eac174f2Safresh1Perl source now permits some specific C99 features which we know are
24*3d61058aSafresh1supported by all platforms, but mostly plays by ANSI C89 rules.  You
25*3d61058aSafresh1don't care about some particular platform having broken Perl?  I hear
26*3d61058aSafresh1there is still a strong demand for J2EE programmers.
27898184e3Ssthen
28898184e3Ssthen=head2 Perl environment problems
29898184e3Ssthen
30898184e3Ssthen=over 4
31898184e3Ssthen
32898184e3Ssthen=item *
33898184e3Ssthen
34898184e3SsthenNot compiling with threading
35898184e3Ssthen
36898184e3SsthenCompiling with threading (-Duseithreads) completely rewrites the
37898184e3Ssthenfunction prototypes of Perl.  You better try your changes with that.
38898184e3SsthenRelated to this is the difference between "Perl_-less" and "Perl_-ly"
39898184e3SsthenAPIs, for example:
40898184e3Ssthen
41898184e3Ssthen  Perl_sv_setiv(aTHX_ ...);
42898184e3Ssthen  sv_setiv(...);
43898184e3Ssthen
44898184e3SsthenThe first one explicitly passes in the context, which is needed for
45898184e3Ssthene.g. threaded builds.  The second one does that implicitly; do not get
46898184e3Ssthenthem mixed.  If you are not passing in a aTHX_, you will need to do a
47eac174f2Safresh1dTHX as the first thing in the function.
48898184e3Ssthen
49898184e3SsthenSee L<perlguts/"How multiple interpreters and concurrency are
50898184e3Ssthensupported"> for further discussion about context.
51898184e3Ssthen
52898184e3Ssthen=item *
53898184e3Ssthen
54898184e3SsthenNot compiling with -DDEBUGGING
55898184e3Ssthen
56898184e3SsthenThe DEBUGGING define exposes more code to the compiler, therefore more
57898184e3Ssthenways for things to go wrong.  You should try it.
58898184e3Ssthen
59898184e3Ssthen=item *
60898184e3Ssthen
61898184e3SsthenIntroducing (non-read-only) globals
62898184e3Ssthen
63898184e3SsthenDo not introduce any modifiable globals, truly global or file static.
64898184e3SsthenThey are bad form and complicate multithreading and other forms of
65898184e3Ssthenconcurrency.  The right way is to introduce them as new interpreter
66898184e3Ssthenvariables, see F<intrpvar.h> (at the very end for binary
67898184e3Ssthencompatibility).
68898184e3Ssthen
69898184e3SsthenIntroducing read-only (const) globals is okay, as long as you verify
70898184e3Ssthenwith e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has
71898184e3SsthenBSD-style output) that the data you added really is read-only.  (If it
72898184e3Ssthenis, it shouldn't show up in the output of that command.)
73898184e3Ssthen
74898184e3SsthenIf you want to have static strings, make them constant:
75898184e3Ssthen
76898184e3Ssthen  static const char etc[] = "...";
77898184e3Ssthen
78898184e3SsthenIf you want to have arrays of constant strings, note carefully the
79898184e3Ssthenright combination of C<const>s:
80898184e3Ssthen
81898184e3Ssthen    static const char * const yippee[] =
82898184e3Ssthen        {"hi", "ho", "silver"};
83898184e3Ssthen
84898184e3Ssthen=item *
85898184e3Ssthen
86898184e3SsthenNot exporting your new function
87898184e3Ssthen
88898184e3SsthenSome platforms (Win32, AIX, VMS, OS/2, to name a few) require any
89898184e3Ssthenfunction that is part of the public API (the shared Perl library) to be
90898184e3Ssthenexplicitly marked as exported.  See the discussion about F<embed.pl> in
91898184e3SsthenL<perlguts>.
92898184e3Ssthen
93898184e3Ssthen=item *
94898184e3Ssthen
95898184e3SsthenExporting your new function
96898184e3Ssthen
97898184e3SsthenThe new shiny result of either genuine new functionality or your
98898184e3Ssthenarduous refactoring is now ready and correctly exported.  So what could
99898184e3Ssthenpossibly go wrong?
100898184e3Ssthen
101898184e3SsthenMaybe simply that your function did not need to be exported in the
102898184e3Ssthenfirst place.  Perl has a long and not so glorious history of exporting
103898184e3Ssthenfunctions that it should not have.
104898184e3Ssthen
105898184e3SsthenIf the function is used only inside one source code file, make it
106898184e3Ssthenstatic.  See the discussion about F<embed.pl> in L<perlguts>.
107898184e3Ssthen
108898184e3SsthenIf the function is used across several files, but intended only for
109898184e3SsthenPerl's internal use (and this should be the common case), do not export
110898184e3Ssthenit to the public API.  See the discussion about F<embed.pl> in
111898184e3SsthenL<perlguts>.
112898184e3Ssthen
113898184e3Ssthen=back
114898184e3Ssthen
115eac174f2Safresh1=head2 C99
116eac174f2Safresh1
117*3d61058aSafresh1Starting from 5.35.5 we now permit some C99 features in the core C
118*3d61058aSafresh1source. However, code in dual life extensions still needs to be C89
119*3d61058aSafresh1only, because it needs to compile against earlier version of Perl
120*3d61058aSafresh1running on older platforms.  Also note that our headers need to also be
121*3d61058aSafresh1valid as C++, because XS extensions written in C++ need to include
122*3d61058aSafresh1them, hence I<member structure initialisers> can't be used in headers.
123eac174f2Safresh1
124*3d61058aSafresh1C99 support is still far from complete on all platforms we currently
125*3d61058aSafresh1support. As a baseline we can only assume C89 semantics with the
126*3d61058aSafresh1specific C99 features described below, which we've verified work
127*3d61058aSafresh1everywhere.  It's fine to probe for additional C99 features and use
128*3d61058aSafresh1them where available, providing there is also a fallback for compilers
129*3d61058aSafresh1that don't support the feature.  For example, we use C11 thread local
130*3d61058aSafresh1storage when available, but fall back to POSIX thread specific APIs
131*3d61058aSafresh1otherwise, and we use C<char> for booleans if C<< <stdbool.h> >> isn't
132eac174f2Safresh1available.
133eac174f2Safresh1
134eac174f2Safresh1Code can use (and rely on) the following C99 features being present
135eac174f2Safresh1
136eac174f2Safresh1=over
137eac174f2Safresh1
138eac174f2Safresh1=item *
139eac174f2Safresh1
140eac174f2Safresh1mixed declarations and code
141eac174f2Safresh1
142eac174f2Safresh1=item *
143eac174f2Safresh1
144eac174f2Safresh164 bit integer types
145eac174f2Safresh1
146*3d61058aSafresh1For consistency with the existing source code, use the typedefs C<I64>
147*3d61058aSafresh1and C<U64>, instead of using C<long long> and C<unsigned long long>
148*3d61058aSafresh1directly.
149eac174f2Safresh1
150eac174f2Safresh1=item *
151eac174f2Safresh1
152eac174f2Safresh1variadic macros
153eac174f2Safresh1
154eac174f2Safresh1    void greet(char *file, unsigned int line, char *format, ...);
155eac174f2Safresh1    #define logged_greet(...) greet(__FILE__, __LINE__, __VA_ARGS__);
156eac174f2Safresh1
157*3d61058aSafresh1Note that C<__VA_OPT__> is standardized as of C23 and C++20.  Before
158*3d61058aSafresh1that it was a gcc extension.
159eac174f2Safresh1
160eac174f2Safresh1=item *
161eac174f2Safresh1
162eac174f2Safresh1declarations in for loops
163eac174f2Safresh1
164eac174f2Safresh1    for (const char *p = message; *p; ++p) {
165eac174f2Safresh1        putchar(*p);
166eac174f2Safresh1    }
167eac174f2Safresh1
168eac174f2Safresh1=item *
169eac174f2Safresh1
170eac174f2Safresh1member structure initialisers
171eac174f2Safresh1
172*3d61058aSafresh1But not in headers, as support was only added to C++ relatively
173*3d61058aSafresh1recently.
174eac174f2Safresh1
175eac174f2Safresh1Hence this is fine in C and XS code, but not headers:
176eac174f2Safresh1
177eac174f2Safresh1    struct message {
178eac174f2Safresh1        char *action;
179eac174f2Safresh1        char *target;
180eac174f2Safresh1    };
181eac174f2Safresh1
182eac174f2Safresh1    struct message mcguffin = {
183eac174f2Safresh1        .target = "member structure initialisers",
184eac174f2Safresh1        .action = "Built"
185eac174f2Safresh1     };
186eac174f2Safresh1
187*3d61058aSafresh1You cannot use the similar syntax for compound literals, since we also
188*3d61058aSafresh1build perl using C++ compilers:
189*3d61058aSafresh1
190*3d61058aSafresh1    /* this is fine */
191*3d61058aSafresh1    struct message m = {
192*3d61058aSafresh1        .target = "some target",
193*3d61058aSafresh1        .action = "some action"
194*3d61058aSafresh1    };
195*3d61058aSafresh1    /* this is not valid in C++ */
196*3d61058aSafresh1    m = (struct message){
197*3d61058aSafresh1        .target = "some target",
198*3d61058aSafresh1        .action = "some action"
199*3d61058aSafresh1    };
200*3d61058aSafresh1
201*3d61058aSafresh1While structure designators are usable, the related array designators
202*3d61058aSafresh1are not, since they aren't supported by C++ at all.
203*3d61058aSafresh1
204eac174f2Safresh1=item *
205eac174f2Safresh1
206eac174f2Safresh1flexible array members
207eac174f2Safresh1
208eac174f2Safresh1This is standards conformant:
209eac174f2Safresh1
210eac174f2Safresh1    struct greeting {
211eac174f2Safresh1        unsigned int len;
212eac174f2Safresh1        char message[];
213eac174f2Safresh1    };
214eac174f2Safresh1
215*3d61058aSafresh1However, the source code already uses the "unwarranted chumminess with
216*3d61058aSafresh1the compiler" hack in many places:
217eac174f2Safresh1
218eac174f2Safresh1    struct greeting {
219eac174f2Safresh1        unsigned int len;
220eac174f2Safresh1        char message[1];
221eac174f2Safresh1    };
222eac174f2Safresh1
223*3d61058aSafresh1Strictly it B<is> undefined behaviour accessing beyond C<message[0]>,
224*3d61058aSafresh1but this has been a commonly used hack since K&R times, and using it
225*3d61058aSafresh1hasn't been a practical issue anywhere (in the perl source or any other
226*3d61058aSafresh1common C code). Hence it's unclear what we would gain from actively
227*3d61058aSafresh1changing to the C99 approach.
228eac174f2Safresh1
229eac174f2Safresh1=item *
230eac174f2Safresh1
231eac174f2Safresh1C<//> comments
232eac174f2Safresh1
233*3d61058aSafresh1All compilers we tested support their use. Not all humans we tested
234*3d61058aSafresh1support their use.
235eac174f2Safresh1
236eac174f2Safresh1=back
237eac174f2Safresh1
238eac174f2Safresh1Code explicitly should not use any other C99 features. For example
239eac174f2Safresh1
240eac174f2Safresh1=over 4
241eac174f2Safresh1
242eac174f2Safresh1=item *
243eac174f2Safresh1
244eac174f2Safresh1variable length arrays
245eac174f2Safresh1
246eac174f2Safresh1Not supported by B<any> MSVC, and this is not going to change.
247eac174f2Safresh1
248*3d61058aSafresh1Even "variable" length arrays where the variable is a constant
249*3d61058aSafresh1expression are syntax errors under MSVC.
250eac174f2Safresh1
251eac174f2Safresh1=item *
252eac174f2Safresh1
253eac174f2Safresh1C99 types in C<< <stdint.h> >>
254eac174f2Safresh1
255eac174f2Safresh1Use C<PERL_INT_FAST8_T> etc as defined in F<handy.h>
256eac174f2Safresh1
257eac174f2Safresh1=item *
258eac174f2Safresh1
259eac174f2Safresh1C99 format strings in C<< <inttypes.h> >>
260eac174f2Safresh1
261*3d61058aSafresh1C<snprintf> in the VMS libc only added support for C<PRIdN> etc very
262*3d61058aSafresh1recently, meaning that there are live supported installations without
263*3d61058aSafresh1this, or formats such as C<%zu>.
264eac174f2Safresh1
265*3d61058aSafresh1(perl's C<sv_catpvf> etc use parser code code in F<sv.c>, which
266*3d61058aSafresh1supports the C<z> modifier, along with perl-specific formats such as
267*3d61058aSafresh1C<SVf>.)
268eac174f2Safresh1
269eac174f2Safresh1=back
270eac174f2Safresh1
271*3d61058aSafresh1If you want to use a C99 feature not listed above then you need to do
272*3d61058aSafresh1one of
273eac174f2Safresh1
274eac174f2Safresh1=over 4
275eac174f2Safresh1
276eac174f2Safresh1=item *
277eac174f2Safresh1
278*3d61058aSafresh1Probe for it in F<Configure>, set a variable in F<config.sh>, and add
279*3d61058aSafresh1fallback logic in the headers for platforms which don't have it.
280eac174f2Safresh1
281eac174f2Safresh1=item *
282eac174f2Safresh1
283*3d61058aSafresh1Write test code and verify that it works on platforms we need to
284*3d61058aSafresh1support, before relying on it unconditionally.
285eac174f2Safresh1
286eac174f2Safresh1=back
287eac174f2Safresh1
288*3d61058aSafresh1Likely you want to repeat the same plan as we used to get the current
289*3d61058aSafresh1C99 feature set. See the message at
290*3d61058aSafresh1L<https://markmail.org/thread/odr4fjrn72u2fkpz> for the C99 probes we
291*3d61058aSafresh1used before. Note that the two most "fussy" compilers appear to be MSVC
292*3d61058aSafresh1and the vendor compiler on VMS. To date all the *nix compilers have
293*3d61058aSafresh1been far more flexible in what they support.
294eac174f2Safresh1
295*3d61058aSafresh1On *nix platforms, F<Configure> attempts to set compiler flags
296*3d61058aSafresh1appropriately. All vendor compilers that we tested defaulted to C99 (or
297*3d61058aSafresh1C11) support. However, older versions of gcc default to C89, or permit
298*3d61058aSafresh1I<most> C99 (with warnings), but forbid I<declarations in for loops>
299*3d61058aSafresh1unless C<-std=gnu99> is added. The alternative C<-std=c99> B<might>
300*3d61058aSafresh1seem better, but using it on some platforms can prevent C<< <unistd.h>
301*3d61058aSafresh1>> declaring some prototypes being declared, which breaks the build.
302*3d61058aSafresh1gcc's C<-ansi> flag implies C<-std=c89> so we can no longer set that,
303*3d61058aSafresh1hence the Configure option C<-gccansipedantic> now only adds
304*3d61058aSafresh1C<-pedantic>.
305eac174f2Safresh1
306*3d61058aSafresh1The Perl core source code files (the ones at the top level of the
307*3d61058aSafresh1source code distribution) are automatically compiled with as many as
308*3d61058aSafresh1possible of the C<-std=gnu99>, C<-pedantic>, and a selection of C<-W>
309*3d61058aSafresh1flags (see cflags.SH). Files in F<ext/> F<dist/> F<cpan/> etc are
310*3d61058aSafresh1compiled with the same flags as the installed perl would use to compile
311*3d61058aSafresh1XS extensions.
312eac174f2Safresh1
313eac174f2Safresh1Basically, it's safe to assume that F<Configure> and F<cflags.SH> have
314*3d61058aSafresh1picked the best combination of flags for the version of gcc on the
315*3d61058aSafresh1platform, and attempting to add more flags related to enforcing a C
316*3d61058aSafresh1dialect will cause problems either locally, or on other systems that
317*3d61058aSafresh1the code is shipped to.
318eac174f2Safresh1
319*3d61058aSafresh1We believe that the C99 support in gcc 3.1 is good enough for us, but
320*3d61058aSafresh1we don't have a 19 year old gcc handy to check this :-) If you have
321*3d61058aSafresh1ancient vendor compilers that don't default to C99, the flags you might
322*3d61058aSafresh1want to try are
323eac174f2Safresh1
324eac174f2Safresh1=over 4
325eac174f2Safresh1
326eac174f2Safresh1=item AIX
327eac174f2Safresh1
328eac174f2Safresh1C<-qlanglvl=stdc99>
329eac174f2Safresh1
330eac174f2Safresh1=item HP/UX
331eac174f2Safresh1
332eac174f2Safresh1C<-AC99>
333eac174f2Safresh1
334eac174f2Safresh1=item Solaris
335eac174f2Safresh1
336eac174f2Safresh1C<-xc99>
337eac174f2Safresh1
338eac174f2Safresh1=back
339eac174f2Safresh1
340e0680481Safresh1=head2 Symbol Names and Namespace Pollution
341e0680481Safresh1
342e0680481Safresh1=head3 Choosing legal symbol names
343e0680481Safresh1
344e0680481Safresh1C reserves for its implementation any symbol whose name begins with an
345e0680481Safresh1underscore followed immediately by either an uppercase letter C<[A-Z]>
346e0680481Safresh1or another underscore.  C++ further reserves any symbol containing two
347*3d61058aSafresh1consecutive underscores, and further reserves in the global name space
348*3d61058aSafresh1any symbol beginning with an underscore, not just ones followed by a
349*3d61058aSafresh1capital.  We care about C++ because header files (F<*.h>) need to be
350*3d61058aSafresh1compilable by it, and some people do all their development using a C++
351*3d61058aSafresh1compiler.
352e0680481Safresh1
353e0680481Safresh1The consequences of failing to do this are probably none.  Unless you
354e0680481Safresh1stumble on a name that the implementation uses, things will work.
355e0680481Safresh1Indeed, the perl core has more than a few instances of using
356e0680481Safresh1implementation-reserved symbols.  (These are gradually being changed.)
357e0680481Safresh1But your code might stop working any time that the implementation
358e0680481Safresh1decides to use a name you already had chosen, potentially many years
359e0680481Safresh1before.
360e0680481Safresh1
361e0680481Safresh1It's best then to:
362e0680481Safresh1
363e0680481Safresh1=over
364e0680481Safresh1
365e0680481Safresh1=item B<Don't begin a symbol name with an underscore>; (I<e.g.>, don't
366e0680481Safresh1use: C<_FOOBAR>)
367e0680481Safresh1
368e0680481Safresh1=item B<Don't use two consecutive underscores in a symbol name>;
369e0680481Safresh1(I<e.g.>, don't use C<FOO__BAR>)
370e0680481Safresh1
371e0680481Safresh1=back
372e0680481Safresh1
373e0680481Safresh1POSIX also reserves many symbols.  See Section 2.2.2 in
374*3d61058aSafresh1L<https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>.
375e0680481Safresh1Perl also has conflicts with that.
376e0680481Safresh1
377*3d61058aSafresh1Perl reserves for its use any symbol beginning with C<Perl>, C<perl>,
378*3d61058aSafresh1or C<PL_>.  Any time you introduce a macro into a header file that
379*3d61058aSafresh1doesn't follow that convention, you are creating the possiblity of a
380*3d61058aSafresh1namespace clash with an existing XS module, unless you restrict it by,
381*3d61058aSafresh1say,
382e0680481Safresh1
383e0680481Safresh1 #ifdef PERL_CORE
384e0680481Safresh1 #  define my_symbol
385e0680481Safresh1 #endif
386e0680481Safresh1
387*3d61058aSafresh1There are many symbols in header files that aren't of this form, and
388*3d61058aSafresh1which are accessible from XS namespace, intentionally or not, just
389*3d61058aSafresh1about anything in F<config.h>, for example.
390e0680481Safresh1
391*3d61058aSafresh1Having to use one of these prefixes detracts from the readability of
392*3d61058aSafresh1the code, and hasn't been an actual issue for non-trivial names. Things
393e0680481Safresh1like perl defining its own C<MAX> macro have been problematic, but they
394e0680481Safresh1were quickly discovered, and a S<C<#ifdef PERL_CORE>> guard added.
395e0680481Safresh1
396e0680481Safresh1So there's no rule imposed about using such symbols, just be aware of
397e0680481Safresh1the issues.
398e0680481Safresh1
399e0680481Safresh1=head3 Choosing good symbol names
400e0680481Safresh1
401e0680481Safresh1Ideally, a symbol name name should correctly and precisely describe its
402*3d61058aSafresh1intended purpose.  But there is a tension between that and getting
403*3d61058aSafresh1names that are overly long and hence awkward to type and read.
404*3d61058aSafresh1Metaphors could be helpful (a poetic name), but those tend to be
405*3d61058aSafresh1culturally specific, and may not translate for someone whose native
406*3d61058aSafresh1language isn't English, or even comes from a different cultural
407*3d61058aSafresh1background.  Besides, the talent of writing poetry seems to be rare in
408*3d61058aSafresh1programmers.
409e0680481Safresh1
410e0680481Safresh1Certain symbol names don't reflect their purpose, but are nonetheless
411e0680481Safresh1fine to use because of long-standing conventions.  These often
412e0680481Safresh1originated in the field of Mathematics, where C<i> and C<j> are
413*3d61058aSafresh1frequently used as subscripts, and C<n> as a population count.  Since
414*3d61058aSafresh1at least the 1950's, computer programs have used C<i>, I<etc.> as loop
415e0680481Safresh1variables.
416e0680481Safresh1
417e0680481Safresh1Our guidance is to choose a name that reasonably describes the purpose,
418e0680481Safresh1and to comment its declaration more precisely.
419e0680481Safresh1
420e0680481Safresh1One certainly shouldn't use misleading nor ambiguous names. C<last_foo>
421e0680481Safresh1could mean either the final C<foo> or the previous C<foo>, and so could
422e0680481Safresh1be confusing to the reader, or even to the writer coming back to the
423e0680481Safresh1code after a few months of working on something else. Sometimes the
424e0680481Safresh1programmer has a particular line of thought in mind, and it doesn't
425e0680481Safresh1occur to them that ambiguity is present.
426e0680481Safresh1
427e0680481Safresh1There are probably still many off-by-1 bugs around because the name
428*3d61058aSafresh1L<perlapi/C<av_len>> doesn't correspond to what other I<-len>
429*3d61058aSafresh1constructs mean, such as L<perlapi/C<sv_len>>.  Awkward (and
430*3d61058aSafresh1controversial) synonyms were created to use instead that conveyed its
431*3d61058aSafresh1true meaning (L<perlapi/C<av_top_index>>).  Eventually, though, someone
432*3d61058aSafresh1had the better idea to create a new name to signify what most people
433*3d61058aSafresh1think C<-len> signifies.  So L<perlapi/C<av_count>> was born.  And we
434*3d61058aSafresh1wish it had been thought up much earlier.
435e0680481Safresh1
436e0680481Safresh1=head2 Writing safer macros
437e0680481Safresh1
438e0680481Safresh1Macros are used extensively in the Perl core for such things as hiding
439e0680481Safresh1internal details from the caller, so that it doesn't have to be
440*3d61058aSafresh1concerned about them.  For example, most lines of code don't need to
441*3d61058aSafresh1know if they are running on a threaded versus unthreaded perl.  That
442e0680481Safresh1detail is automatically mostly hidden.
443e0680481Safresh1
444e0680481Safresh1It is often better to use an inline function instead of a macro.  They
445e0680481Safresh1are immune to name collisions with the caller, and don't magnify
446e0680481Safresh1problems when called with parameters that are expressions with side
447e0680481Safresh1effects.  There was a time when one might choose a macro over an inline
448e0680481Safresh1function because compiler support for inline functions was quite
449e0680481Safresh1limited.  Some only would actually only inline the first two or three
450e0680481Safresh1encountered in a compilation.  But those days are long gone, and inline
451e0680481Safresh1functions are fully supported in modern compilers.
452e0680481Safresh1
453e0680481Safresh1Nevertheless, there are situations where a function won't do, and a
454e0680481Safresh1macro is required.  One example is when a parameter can be any of
455e0680481Safresh1several types.  A function has to be declared with a single explicit
456e0680481Safresh1
457e0680481Safresh1Or maybe the code involved is so trivial that a function would be just
458e0680481Safresh1complicating overkill, such as when the macro simply creates a mnemonic
459e0680481Safresh1name for some constant value.
460e0680481Safresh1
461e0680481Safresh1If you do choose to use a non-trivial macro, be aware that there are
462*3d61058aSafresh1several avoidable pitfalls that can occur.  Keep in mind that a macro
463*3d61058aSafresh1is expanded within the lexical context of each place in the source it
464*3d61058aSafresh1is called.  If you have a token C<foo> in the macro and the source
465*3d61058aSafresh1happens also to have C<foo>, the meaning of the macro's C<foo> will
466*3d61058aSafresh1become that of the caller's.  Sometimes that is exactly the behavior
467*3d61058aSafresh1you want, but be aware that this tends to be confusing later on.  It
468*3d61058aSafresh1effectively turns C<foo> into a reserved word for any code that calls
469*3d61058aSafresh1the macro, and this fact is usually not documented nor considered.  It
470*3d61058aSafresh1is safer to pass C<foo> as a parameter, so that C<foo> remains freely
471*3d61058aSafresh1available to the caller and the macro interface is explicitly
472*3d61058aSafresh1specified.
473e0680481Safresh1
474e0680481Safresh1Worse is when the equivalence between the two C<foo>'s is coincidental.
475e0680481Safresh1Suppose for example, that the macro declares a variable
476e0680481Safresh1
477e0680481Safresh1 int foo
478e0680481Safresh1
479e0680481Safresh1That works fine as long as the caller doesn't define the string C<foo>
480e0680481Safresh1in some way.  And it might not be until years later that someone comes
481e0680481Safresh1along with an instance where C<foo> is used.  For example a future
482e0680481Safresh1caller could do this:
483e0680481Safresh1
484e0680481Safresh1 #define foo  bar
485e0680481Safresh1
486e0680481Safresh1Then that declaration of C<foo> in the macro suddenly becomes
487e0680481Safresh1
488e0680481Safresh1 int bar
489e0680481Safresh1
490e0680481Safresh1That could mean that something completely different happens than
491e0680481Safresh1intended.  It is hard to debug; the macro and call may not even be in
492*3d61058aSafresh1the same file, so it would require some digging and gnashing of teeth
493*3d61058aSafresh1to figure out.
494e0680481Safresh1
495e0680481Safresh1Therefore, if a macro does use variables, their names should be such
496*3d61058aSafresh1that it is very unlikely that they would collide with any caller, now
497*3d61058aSafresh1or forever.  One way to do that, now being used in the perl source, is
498*3d61058aSafresh1to include the name of the macro itself as part of the name of each
499e0680481Safresh1variable in the macro.  Suppose the macro is named C<SvPV>  Then we
500e0680481Safresh1could have
501e0680481Safresh1
502e0680481Safresh1 int foo_svpv_ = 0;
503e0680481Safresh1
504e0680481Safresh1This is harder to read than plain C<foo>, but it is pretty much
505e0680481Safresh1guaranteed that a caller will never naively use C<foo_svpv_> (and run
506e0680481Safresh1into problems).  (The lowercasing makes it clearer that this is a
507e0680481Safresh1variable, but assumes that there won't be two elements whose names
508e0680481Safresh1differ only in the case of their letters.)  The trailing underscore
509*3d61058aSafresh1makes it even more unlikely to clash, as those, by convention, signify
510*3d61058aSafresh1a private variable name.  (See L</Choosing legal symbol names> for
511e0680481Safresh1restrictions on what names you can use.)
512e0680481Safresh1
513e0680481Safresh1This kind of name collision doesn't happen with the macro's formal
514*3d61058aSafresh1parameters, so they don't need to have complicated names.  But there
515*3d61058aSafresh1are pitfalls when a a parameter is an expression, or has some Perl
516*3d61058aSafresh1magic attached.  When calling a function, C will evaluate the parameter
517*3d61058aSafresh1once, and pass the result to the function.  But when calling a macro,
518*3d61058aSafresh1the parameter is copied as-is by the C preprocessor to each instance
519*3d61058aSafresh1inside the macro.  This means that when evaluating a parameter having
520*3d61058aSafresh1side effects, the function and macro results differ.  This is
521*3d61058aSafresh1particularly fraught when a parameter has overload magic, say it is a
522*3d61058aSafresh1tied variable that reads the next line in a file upon each evaluation.
523*3d61058aSafresh1Having it read multiple lines per call is probably not what the caller
524*3d61058aSafresh1intended.  If a macro refers to a potentially overloadable parameter
525*3d61058aSafresh1more than once, it should first make a copy and then use that copy the
526*3d61058aSafresh1rest of the time. There are macros in the perl core that violate this,
527*3d61058aSafresh1but are gradually being converted, usually by changing to use inline
528*3d61058aSafresh1functions instead.
529e0680481Safresh1
530*3d61058aSafresh1Above we said "first make a copy".  In a macro, that is easier said
531*3d61058aSafresh1than done, because macros are normally expressions, and declarations
532*3d61058aSafresh1aren't allowed in expressions.  But the S<C<STMT_START> .. C<STMT_END>>
533e0680481Safresh1construct, described in L<perlapi|perlapi/STMT_START>, allows you to
534e0680481Safresh1have declarations in most contexts, as long as you don't need a return
535*3d61058aSafresh1value.  If you do need a value returned, you can make the interface
536*3d61058aSafresh1such that a pointer is passed to the construct, which then stores its
537*3d61058aSafresh1result there.  (Or you can use GCC brace groups.  But these require a
538*3d61058aSafresh1fallback if the code will ever get executed on a platform that lacks
539*3d61058aSafresh1this non-standard extension to C.  And that fallback would be another
540*3d61058aSafresh1code path, which can get out-of-sync with the brace group one, so doing
541*3d61058aSafresh1this isn't advisable.)  In situations where there's no other way, Perl
542*3d61058aSafresh1does furnish L<perlintern/C<PL_Sv>> and L<perlapi/C<PL_na>> to use
543*3d61058aSafresh1(with a slight performance penalty) for some such common cases.  But
544*3d61058aSafresh1beware that a call chain involving multiple macros using them will zap
545*3d61058aSafresh1the other's use.  These have been very difficult to debug.
546e0680481Safresh1
547e0680481Safresh1For a concrete example of these pitfalls in action, see
548*3d61058aSafresh1L<https://perlmonks.org/?node_id=11144355>.
549e0680481Safresh1
550898184e3Ssthen=head2 Portability problems
551898184e3Ssthen
552898184e3SsthenThe following are common causes of compilation and/or execution
553898184e3Ssthenfailures, not common to Perl as such.  The C FAQ is good bedtime
554898184e3Ssthenreading.  Please test your changes with as many C compilers and
555898184e3Ssthenplatforms as possible; we will, anyway, and it's nice to save oneself
556898184e3Ssthenfrom public embarrassment.
557898184e3Ssthen
558898184e3SsthenAlso study L<perlport> carefully to avoid any bad assumptions about the
559b8851fccSafresh1operating system, filesystems, character set, and so forth.
560898184e3Ssthen
561898184e3SsthenDo not assume an operating system indicates a certain compiler.
562898184e3Ssthen
563898184e3Ssthen=over 4
564898184e3Ssthen
565898184e3Ssthen=item *
566898184e3Ssthen
567898184e3SsthenCasting pointers to integers or casting integers to pointers
568898184e3Ssthen
569898184e3Ssthen    void castaway(U8* p)
570898184e3Ssthen    {
571898184e3Ssthen      IV i = p;
572898184e3Ssthen
573898184e3Ssthenor
574898184e3Ssthen
575898184e3Ssthen    void castaway(U8* p)
576898184e3Ssthen    {
577898184e3Ssthen      IV i = (IV)p;
578898184e3Ssthen
579898184e3SsthenBoth are bad, and broken, and unportable.  Use the PTR2IV() macro that
580898184e3Ssthendoes it right.  (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and
581898184e3SsthenNUM2PTR().)
582898184e3Ssthen
583898184e3Ssthen=item *
584898184e3Ssthen
585b8851fccSafresh1Casting between function pointers and data pointers
586898184e3Ssthen
587898184e3SsthenTechnically speaking casting between function pointers and data
588898184e3Ssthenpointers is unportable and undefined, but practically speaking it seems
589898184e3Ssthento work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros.
590898184e3SsthenSometimes you can also play games with unions.
591898184e3Ssthen
592898184e3Ssthen=item *
593898184e3Ssthen
594*3d61058aSafresh1Assuming C<sizeof(int) == sizeof(long)>
595898184e3Ssthen
596898184e3SsthenThere are platforms where longs are 64 bits, and platforms where ints
597898184e3Ssthenare 64 bits, and while we are out to shock you, even platforms where
598898184e3Ssthenshorts are 64 bits.  This is all legal according to the C standard. (In
599*3d61058aSafresh1other words, C<long long> is not a portable way to specify 64 bits, and
600*3d61058aSafresh1C<long long> is not even guaranteed to be any wider than C<long>.)
601898184e3Ssthen
602*3d61058aSafresh1Instead, use the definitions C<IV>, C<UV>, C<IVSIZE>, C<I32SIZE>, and
603*3d61058aSafresh1so forth. Avoid things like C<I32> because they are B<not> guaranteed
604*3d61058aSafresh1to be I<exactly> 32 bits, they are I<at least> 32 bits, nor are they
605*3d61058aSafresh1guaranteed to be C<int> or C<long>.  If you explicitly need 64-bit
606*3d61058aSafresh1variables, use C<I64> and C<U64>.
607898184e3Ssthen
608898184e3Ssthen=item *
609898184e3Ssthen
610898184e3SsthenAssuming one can dereference any type of pointer for any type of data
611898184e3Ssthen
612898184e3Ssthen  char *p = ...;
613b8851fccSafresh1  long pony = *(long *)p;    /* BAD */
614898184e3Ssthen
615898184e3SsthenMany platforms, quite rightly so, will give you a core dump instead of
616898184e3Ssthena pony if the p happens not to be correctly aligned.
617898184e3Ssthen
618898184e3Ssthen=item *
619898184e3Ssthen
620898184e3SsthenLvalue casts
621898184e3Ssthen
622898184e3Ssthen  (int)*p = ...;    /* BAD */
623898184e3Ssthen
624898184e3SsthenSimply not portable.  Get your lvalue to be of the right type, or maybe
625898184e3Ssthenuse temporary variables, or dirty tricks with unions.
626898184e3Ssthen
627898184e3Ssthen=item *
628898184e3Ssthen
629898184e3SsthenAssume B<anything> about structs (especially the ones you don't
630898184e3Ssthencontrol, like the ones coming from the system headers)
631898184e3Ssthen
632898184e3Ssthen=over 8
633898184e3Ssthen
634898184e3Ssthen=item *
635898184e3Ssthen
636898184e3SsthenThat a certain field exists in a struct
637898184e3Ssthen
638898184e3Ssthen=item *
639898184e3Ssthen
640898184e3SsthenThat no other fields exist besides the ones you know of
641898184e3Ssthen
642898184e3Ssthen=item *
643898184e3Ssthen
644898184e3SsthenThat a field is of certain signedness, sizeof, or type
645898184e3Ssthen
646898184e3Ssthen=item *
647898184e3Ssthen
648898184e3SsthenThat the fields are in a certain order
649898184e3Ssthen
650898184e3Ssthen=over 8
651898184e3Ssthen
652898184e3Ssthen=item *
653898184e3Ssthen
654898184e3SsthenWhile C guarantees the ordering specified in the struct definition,
655898184e3Ssthenbetween different platforms the definitions might differ
656898184e3Ssthen
657898184e3Ssthen=back
658898184e3Ssthen
659898184e3Ssthen=item *
660898184e3Ssthen
661*3d61058aSafresh1That the C<sizeof(struct)> or the alignments are the same everywhere
662898184e3Ssthen
663898184e3Ssthen=over 8
664898184e3Ssthen
665898184e3Ssthen=item *
666898184e3Ssthen
667898184e3SsthenThere might be padding bytes between the fields to align the fields -
668898184e3Ssthenthe bytes can be anything
669898184e3Ssthen
670898184e3Ssthen=item *
671898184e3Ssthen
672898184e3SsthenStructs are required to be aligned to the maximum alignment required by
673*3d61058aSafresh1the fields - which for native types is usually equivalent to
674*3d61058aSafresh1C<sizeof(the_field)>.
675898184e3Ssthen
676898184e3Ssthen=back
677898184e3Ssthen
678898184e3Ssthen=back
679898184e3Ssthen
680898184e3Ssthen=item *
681898184e3Ssthen
682898184e3SsthenAssuming the character set is ASCIIish
683898184e3Ssthen
684898184e3SsthenPerl can compile and run under EBCDIC platforms.  See L<perlebcdic>.
685898184e3SsthenThis is transparent for the most part, but because the character sets
686898184e3Ssthendiffer, you shouldn't use numeric (decimal, octal, nor hex) constants
687b8851fccSafresh1to refer to characters.  You can safely say C<'A'>, but not C<0x41>.
688b8851fccSafresh1You can safely say C<'\n'>, but not C<\012>.  However, you can use
689b8851fccSafresh1macros defined in F<utf8.h> to specify any code point portably.
690b8851fccSafresh1C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
691b8851fccSafresh1LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
692b8851fccSafresh1ASCII platforms it compiles without adding any extra code, so there is
693b8851fccSafresh1zero performance hit on those).  The acceptable inputs to
694b8851fccSafresh1C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>.  If your input
695b8851fccSafresh1isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
696b8851fccSafresh1C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
697b8851fccSafresh1direction.
698b8851fccSafresh1
699*3d61058aSafresh1If you need the string representation of a character that doesn't have
700*3d61058aSafresh1a mnemonic name in C, you should add it to the list in
701*3d61058aSafresh1F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for
702*3d61058aSafresh1you, based on the current platform.
703898184e3Ssthen
704b8851fccSafresh1Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
705b8851fccSafresh1properly on native code points and strings.
706b8851fccSafresh1
707898184e3SsthenAlso, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
708*3d61058aSafresh1case alphabetic characters.  That is not true in EBCDIC.  Nor for 'a'
709*3d61058aSafresh1to 'z'.  But '0' - '9' is an unbroken range in both systems.  Don't
710*3d61058aSafresh1assume anything about other ranges.  (Note that special handling of
711*3d61058aSafresh1ranges in regular expression patterns and transliterations makes it
712*3d61058aSafresh1appear to Perl code that the aforementioned ranges are all unbroken.)
713898184e3Ssthen
714898184e3SsthenMany of the comments in the existing code ignore the possibility of
715898184e3SsthenEBCDIC, and may be wrong therefore, even if the code works.  This is
716898184e3Ssthenactually a tribute to the successful transparent insertion of being
717898184e3Ssthenable to handle EBCDIC without having to change pre-existing code.
718898184e3Ssthen
719898184e3SsthenUTF-8 and UTF-EBCDIC are two different encodings used to represent
720898184e3SsthenUnicode code points as sequences of bytes.  Macros  with the same names
721b8851fccSafresh1(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
722898184e3Ssthenallow the calling code to think that there is only one such encoding.
723898184e3SsthenThis is almost always referred to as C<utf8>, but it means the EBCDIC
724898184e3Ssthenversion as well.  Again, comments in the code may well be wrong even if
725*3d61058aSafresh1the code itself is right.  For example, the concept of UTF-8
726*3d61058aSafresh1C<invariant characters> differs between ASCII and EBCDIC.  On ASCII
727*3d61058aSafresh1platforms, only characters that do not have the high-order bit set
728*3d61058aSafresh1(i.e.  whose ordinals are strict ASCII, 0 - 127) are invariant, and the
729*3d61058aSafresh1documentation and comments in the code may assume that, often referring
730*3d61058aSafresh1to something like, say, C<hibit>.  The situation differs and is not so
731*3d61058aSafresh1simple on EBCDIC machines, but as long as the code itself uses the
732898184e3SsthenC<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the
733898184e3Ssthencomments are wrong.
734898184e3Ssthen
735b8851fccSafresh1As noted in L<perlhack/TESTING>, when writing test scripts, the file
736b8851fccSafresh1F<t/charset_tools.pl> contains some helpful functions for writing tests
737b8851fccSafresh1valid on both ASCII and EBCDIC platforms.  Sometimes, though, a test
738b8851fccSafresh1can't use a function and it's inconvenient to have different test
739b8851fccSafresh1versions depending on the platform.  There are 20 code points that are
740b8851fccSafresh1the same in all 4 character sets currently recognized by Perl (the 3
741*3d61058aSafresh1EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)).  These can be used
742*3d61058aSafresh1in such tests, though there is a small possibility that Perl will
743*3d61058aSafresh1become available in yet another character set, breaking your test.  All
744*3d61058aSafresh1but one of these code points are C0 control characters.  The most
745*3d61058aSafresh1significant controls that are the same are C<\0>, C<\r>, and C<\N{VT}>
746*3d61058aSafresh1(also specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>).  The
747*3d61058aSafresh1single non-control is U+00B6 PILCROW SIGN.  The controls that are the
748*3d61058aSafresh1same have the same bit pattern in all 4 character sets, regardless of
749*3d61058aSafresh1the UTF8ness of the string containing them.  The bit pattern for U+B6
750*3d61058aSafresh1is the same in all 4 for non-UTF8 strings, but differs in each when its
751*3d61058aSafresh1containing string is UTF-8 encoded.  The only other code points that
752*3d61058aSafresh1have some sort of sameness across all 4 character sets are the pair
753*3d61058aSafresh10xDC and 0xFC. Together these represent upper- and lowercase LATIN
754*3d61058aSafresh1LETTER U WITH DIAERESIS, but which is upper and which is lower may be
755*3d61058aSafresh1reversed: 0xDC is the capital in Latin1 and 0xFC is the small letter,
756*3d61058aSafresh1while 0xFC is the capital in EBCDIC and 0xDC is the small one.  This
757*3d61058aSafresh1factoid may be exploited in writing case insensitive tests that are the
758*3d61058aSafresh1same across all 4 character sets.
759b8851fccSafresh1
760898184e3Ssthen=item *
761898184e3Ssthen
762898184e3SsthenAssuming the character set is just ASCII
763898184e3Ssthen
764*3d61058aSafresh1ASCII is a 7 bit encoding, but bytes have 8 bits in them.  The 128
765*3d61058aSafresh1extra characters have different meanings depending on the locale.
766*3d61058aSafresh1Absent a locale, currently these extra characters are generally
767*3d61058aSafresh1considered to be unassigned, and this has presented some problems. This
768*3d61058aSafresh1has being changed starting in 5.12 so that these characters can be
769*3d61058aSafresh1considered to be Latin-1 (ISO-8859-1).
770898184e3Ssthen
771898184e3Ssthen=item *
772898184e3Ssthen
773898184e3SsthenMixing #define and #ifdef
774898184e3Ssthen
775898184e3Ssthen  #define BURGLE(x) ... \
776898184e3Ssthen  #ifdef BURGLE_OLD_STYLE        /* BAD */
777898184e3Ssthen  ... do it the old way ... \
778898184e3Ssthen  #else
779898184e3Ssthen  ... do it the new way ... \
780898184e3Ssthen  #endif
781898184e3Ssthen
782898184e3SsthenYou cannot portably "stack" cpp directives.  For example in the above
783898184e3Ssthenyou need two separate BURGLE() #defines, one for each #ifdef branch.
784898184e3Ssthen
785898184e3Ssthen=item *
786898184e3Ssthen
787898184e3SsthenAdding non-comment stuff after #endif or #else
788898184e3Ssthen
789898184e3Ssthen  #ifdef SNOSH
790898184e3Ssthen  ...
791898184e3Ssthen  #else !SNOSH    /* BAD */
792898184e3Ssthen  ...
793898184e3Ssthen  #endif SNOSH    /* BAD */
794898184e3Ssthen
795898184e3SsthenThe #endif and #else cannot portably have anything non-comment after
796898184e3Ssthenthem.  If you want to document what is going (which is a good idea
797898184e3Ssthenespecially if the branches are long), use (C) comments:
798898184e3Ssthen
799898184e3Ssthen  #ifdef SNOSH
800898184e3Ssthen  ...
801898184e3Ssthen  #else /* !SNOSH */
802898184e3Ssthen  ...
803898184e3Ssthen  #endif /* SNOSH */
804898184e3Ssthen
805898184e3SsthenThe gcc option C<-Wendif-labels> warns about the bad variant (by
806898184e3Ssthendefault on starting from Perl 5.9.4).
807898184e3Ssthen
808898184e3Ssthen=item *
809898184e3Ssthen
810898184e3SsthenHaving a comma after the last element of an enum list
811898184e3Ssthen
812898184e3Ssthen  enum color {
813898184e3Ssthen    CERULEAN,
814898184e3Ssthen    CHARTREUSE,
815898184e3Ssthen    CINNABAR,     /* BAD */
816898184e3Ssthen  };
817898184e3Ssthen
818898184e3Ssthenis not portable.  Leave out the last comma.
819898184e3Ssthen
820898184e3SsthenAlso note that whether enums are implicitly morphable to ints varies
821898184e3Ssthenbetween compilers, you might need to (int).
822898184e3Ssthen
823898184e3Ssthen=item *
824898184e3Ssthen
825898184e3SsthenMixing signed char pointers with unsigned char pointers
826898184e3Ssthen
827898184e3Ssthen  int foo(char *s) { ... }
828898184e3Ssthen  ...
829898184e3Ssthen  unsigned char *t = ...; /* Or U8* t = ... */
830898184e3Ssthen  foo(t);   /* BAD */
831898184e3Ssthen
832898184e3SsthenWhile this is legal practice, it is certainly dubious, and downright
833898184e3Ssthenfatal in at least one platform: for example VMS cc considers this a
834898184e3Ssthenfatal error.  One cause for people often making this mistake is that a
835898184e3Ssthen"naked char" and therefore dereferencing a "naked char pointer" have an
836898184e3Ssthenundefined signedness: it depends on the compiler and the flags of the
837898184e3Ssthencompiler and the underlying platform whether the result is signed or
838*3d61058aSafresh1unsigned.  For this very same reason using a 'char' as an array index
839*3d61058aSafresh1is bad.
840898184e3Ssthen
841898184e3Ssthen=item *
842898184e3Ssthen
843898184e3SsthenMacros that have string constants and their arguments as substrings of
844898184e3Ssthenthe string constants
845898184e3Ssthen
846898184e3Ssthen  #define FOO(n) printf("number = %d\n", n)    /* BAD */
847898184e3Ssthen  FOO(10);
848898184e3Ssthen
849898184e3SsthenPre-ANSI semantics for that was equivalent to
850898184e3Ssthen
851898184e3Ssthen  printf("10umber = %d\10");
852898184e3Ssthen
853898184e3Ssthenwhich is probably not what you were expecting.  Unfortunately at least
854898184e3Ssthenone reasonably common and modern C compiler does "real backward
855898184e3Ssthencompatibility" here, in AIX that is what still happens even though the
856898184e3Ssthenrest of the AIX compiler is very happily C89.
857898184e3Ssthen
858898184e3Ssthen=item *
859898184e3Ssthen
860898184e3SsthenUsing printf formats for non-basic C types
861898184e3Ssthen
862898184e3Ssthen   IV i = ...;
863898184e3Ssthen   printf("i = %d\n", i);    /* BAD */
864898184e3Ssthen
865898184e3SsthenWhile this might by accident work in some platform (where IV happens to
866*3d61058aSafresh1be an C<int>), in general it cannot.  IV might be something larger.
867*3d61058aSafresh1Even worse the situation is with more specific types (defined by Perl's
868898184e3Ssthenconfiguration step in F<config.h>):
869898184e3Ssthen
870898184e3Ssthen   Uid_t who = ...;
871898184e3Ssthen   printf("who = %d\n", who);    /* BAD */
872898184e3Ssthen
873898184e3SsthenThe problem here is that Uid_t might be not only not C<int>-wide but it
874898184e3Ssthenmight also be unsigned, in which case large uids would be printed as
875898184e3Ssthennegative values.
876898184e3Ssthen
877898184e3SsthenThere is no simple solution to this because of printf()'s limited
878898184e3Ssthenintelligence, but for many types the right format is available as with
879898184e3Sstheneither 'f' or '_f' suffix, for example:
880898184e3Ssthen
881898184e3Ssthen   IVdf /* IV in decimal */
882898184e3Ssthen   UVxf /* UV is hexadecimal */
883898184e3Ssthen
884898184e3Ssthen   printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
885898184e3Ssthen
886898184e3Ssthen   Uid_t_f /* Uid_t in decimal */
887898184e3Ssthen
888898184e3Ssthen   printf("who = %"Uid_t_f"\n", who);
889898184e3Ssthen
890898184e3SsthenOr you can try casting to a "wide enough" type:
891898184e3Ssthen
892898184e3Ssthen   printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
893898184e3Ssthen
8949f11ffb7Safresh1See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to
8959f11ffb7Safresh1print those.
8969f11ffb7Safresh1
897898184e3SsthenAlso remember that the C<%p> format really does require a void pointer:
898898184e3Ssthen
899898184e3Ssthen   U8* p = ...;
900898184e3Ssthen   printf("p = %p\n", (void*)p);
901898184e3Ssthen
902898184e3SsthenThe gcc option C<-Wformat> scans for such problems.
903898184e3Ssthen
904898184e3Ssthen=item *
905898184e3Ssthen
906898184e3SsthenBlindly passing va_list
907898184e3Ssthen
908898184e3SsthenNot all platforms support passing va_list to further varargs (stdarg)
909898184e3Ssthenfunctions.  The right thing to do is to copy the va_list using the
910898184e3SsthenPerl_va_copy() if the NEED_VA_COPY is defined.
911898184e3Ssthen
912eac174f2Safresh1=for apidoc_section $genconfig
913eac174f2Safresh1=for apidoc Amnh||NEED_VA_COPY
914eac174f2Safresh1
915898184e3Ssthen=item *
916898184e3Ssthen
917898184e3SsthenUsing gcc statement expressions
918898184e3Ssthen
919898184e3Ssthen   val = ({...;...;...});    /* BAD */
920898184e3Ssthen
921eac174f2Safresh1While a nice extension, it's not portable.  Historically, Perl used
922*3d61058aSafresh1them in macros if available to gain some extra speed (essentially as a
923*3d61058aSafresh1funky form of inlining), but we now support (or emulate) C99 C<static
924*3d61058aSafresh1inline> functions, so use them instead. Declare functions as
925*3d61058aSafresh1C<PERL_STATIC_INLINE> to transparently fall back to emulation where
926*3d61058aSafresh1needed.
927898184e3Ssthen
928898184e3Ssthen=item *
929898184e3Ssthen
930898184e3SsthenBinding together several statements in a macro
931898184e3Ssthen
932e0680481Safresh1Use the macros C<STMT_START> and C<STMT_END>.
933898184e3Ssthen
934898184e3Ssthen   STMT_START {
935898184e3Ssthen      ...
936898184e3Ssthen   } STMT_END
937898184e3Ssthen
938e0680481Safresh1But there can be subtle (but avoidable if you do it right) bugs
939e0680481Safresh1introduced with these; see L<perlapi/C<STMT_START>> for best practices
940e0680481Safresh1for their use.
941e0680481Safresh1
942898184e3Ssthen=item *
943898184e3Ssthen
944*3d61058aSafresh1Testing for operating systems or versions when you should be testing
945*3d61058aSafresh1for features
946898184e3Ssthen
947898184e3Ssthen  #ifdef __FOONIX__    /* BAD */
948898184e3Ssthen  foo = quux();
949898184e3Ssthen  #endif
950898184e3Ssthen
951898184e3SsthenUnless you know with 100% certainty that quux() is only ever available
952898184e3Ssthenfor the "Foonix" operating system B<and> that is available B<and>
953898184e3Ssthencorrectly working for B<all> past, present, B<and> future versions of
954898184e3Ssthen"Foonix", the above is very wrong.  This is more correct (though still
955898184e3Ssthennot perfect, because the below is a compile-time check):
956898184e3Ssthen
957898184e3Ssthen  #ifdef HAS_QUUX
958898184e3Ssthen  foo = quux();
959898184e3Ssthen  #endif
960898184e3Ssthen
961898184e3SsthenHow does the HAS_QUUX become defined where it needs to be?  Well, if
962898184e3SsthenFoonix happens to be Unixy enough to be able to run the Configure
963898184e3Ssthenscript, and Configure has been taught about detecting and testing
964*3d61058aSafresh1quux(), the HAS_QUUX will be correctly defined.  In other platforms,
965*3d61058aSafresh1the corresponding configuration step will hopefully do the same.
966898184e3Ssthen
967898184e3SsthenIn a pinch, if you cannot wait for Configure to be educated, or if you
968898184e3Ssthenhave a good hunch of where quux() might be available, you can
969898184e3Ssthentemporarily try the following:
970898184e3Ssthen
971898184e3Ssthen  #if (defined(__FOONIX__) || defined(__BARNIX__))
972898184e3Ssthen  # define HAS_QUUX
973898184e3Ssthen  #endif
974898184e3Ssthen
975898184e3Ssthen  ...
976898184e3Ssthen
977898184e3Ssthen  #ifdef HAS_QUUX
978898184e3Ssthen  foo = quux();
979898184e3Ssthen  #endif
980898184e3Ssthen
981898184e3SsthenBut in any case, try to keep the features and operating systems
982898184e3Ssthenseparate.
983898184e3Ssthen
984*3d61058aSafresh1A good resource on the predefined macros for various operating systems,
985*3d61058aSafresh1compilers, and so forth is
986*3d61058aSafresh1L<https://sourceforge.net/p/predef/wiki/Home/>.
987b8851fccSafresh1
988fb8aa749Safresh1=item *
989fb8aa749Safresh1
990fb8aa749Safresh1Assuming the contents of static memory pointed to by the return values
991*3d61058aSafresh1of Perl wrappers for C library functions doesn't change.  Many C
992*3d61058aSafresh1library functions return pointers to static storage that can be
993*3d61058aSafresh1overwritten by subsequent calls to the same or related functions.  Perl
994*3d61058aSafresh1has wrappers for some of these functions.  Originally many of those
995*3d61058aSafresh1wrappers returned those volatile pointers.  But over time almost all of
996*3d61058aSafresh1them have evolved to return stable copies.  To cope with the remaining
997*3d61058aSafresh1ones, do a L<perlapi/savepv> to make a copy, thus avoiding these
998*3d61058aSafresh1problems.  You will have to free the copy when you're done to avoid
999*3d61058aSafresh1memory leaks.  If you don't have control over when it gets freed,
1000*3d61058aSafresh1you'll need to make the copy in a mortal scalar, like so
1001fb8aa749Safresh1
1002eac174f2Safresh1 SvPVX(sv_2mortal(newSVpv(volatile_string, 0)))
1003fb8aa749Safresh1
1004898184e3Ssthen=back
1005898184e3Ssthen
1006898184e3Ssthen=head2 Problematic System Interfaces
1007898184e3Ssthen
1008898184e3Ssthen=over 4
1009898184e3Ssthen
1010898184e3Ssthen=item *
1011898184e3Ssthen
101256d68f1eSafresh1Perl strings are NOT the same as C strings:  They may contain C<NUL>
1013*3d61058aSafresh1characters, whereas a C string is terminated by the first C<NUL>. That
1014*3d61058aSafresh1is why Perl API functions that deal with strings generally take a
101556d68f1eSafresh1pointer to the first byte and either a length or a pointer to the byte
101656d68f1eSafresh1just beyond the final one.
101756d68f1eSafresh1
101856d68f1eSafresh1And this is the reason that many of the C library string handling
101956d68f1eSafresh1functions should not be used.  They don't cope with the full generality
102056d68f1eSafresh1of Perl strings.  It may be that your test cases don't have embedded
102156d68f1eSafresh1C<NUL>s, and so the tests pass, whereas there may well eventually arise
102256d68f1eSafresh1real-world cases where they fail.  A lesson here is to include C<NUL>s
102356d68f1eSafresh1in your tests.  Now it's fairly rare in most real world cases to get
102456d68f1eSafresh1C<NUL>s, so your code may seem to work, until one day a C<NUL> comes
102556d68f1eSafresh1along.
102656d68f1eSafresh1
1027*3d61058aSafresh1Here's an example.  It used to be a common paradigm, for decades, in
1028*3d61058aSafresh1the perl core to use S<C<strchr("list", c)>> to see if the character
1029*3d61058aSafresh1C<c> is any of the ones given in C<"list">, a double-quote-enclosed
1030*3d61058aSafresh1string of the set of characters that we are seeing if C<c> is one of.
1031*3d61058aSafresh1As long as C<c> isn't a C<NUL>, it works.  But when C<c> is a C<NUL>,
1032*3d61058aSafresh1C<strchr> returns a pointer to the terminating C<NUL> in C<"list">.
1033*3d61058aSafresh1This likely will result in a segfault or a security issue when the
1034*3d61058aSafresh1caller uses that end pointer as the starting point to read from.
103556d68f1eSafresh1
1036*3d61058aSafresh1A solution to this and many similar issues is to use the C<mem>I<-foo>
1037*3d61058aSafresh1C library functions instead.  In this case C<memchr> can be used to see
1038*3d61058aSafresh1if C<c> is in C<"list"> and works even if C<c> is C<NUL>.  These
1039*3d61058aSafresh1functions need an additional parameter to give the string length. In
1040*3d61058aSafresh1the case of literal string parameters, perl has defined macros that
1041eac174f2Safresh1calculate the length for you.  See L<perlapi/String Handling>.
104256d68f1eSafresh1
104356d68f1eSafresh1=item *
104456d68f1eSafresh1
1045898184e3Ssthenmalloc(0), realloc(0), calloc(0, 0) are non-portable.  To be portable
1046898184e3Ssthenallocate at least one byte.  (In general you should rarely need to work
1047898184e3Ssthenat this low level, but instead use the various malloc wrappers.)
1048898184e3Ssthen
1049898184e3Ssthen=item *
1050898184e3Ssthen
1051898184e3Ssthensnprintf() - the return type is unportable.  Use my_snprintf() instead.
1052898184e3Ssthen
1053898184e3Ssthen=back
1054898184e3Ssthen
1055898184e3Ssthen=head2 Security problems
1056898184e3Ssthen
1057*3d61058aSafresh1Last but not least, here are various tips for safer coding. See also
1058*3d61058aSafresh1L<perlclib> for libc/stdio replacements one should use.
1059898184e3Ssthen
1060898184e3Ssthen=over 4
1061898184e3Ssthen
1062898184e3Ssthen=item *
1063898184e3Ssthen
1064898184e3SsthenDo not use gets()
1065898184e3Ssthen
1066898184e3SsthenOr we will publicly ridicule you.  Seriously.
1067898184e3Ssthen
1068898184e3Ssthen=item *
1069898184e3Ssthen
10706fb12b70Safresh1Do not use tmpfile()
10716fb12b70Safresh1
10726fb12b70Safresh1Use mkstemp() instead.
10736fb12b70Safresh1
10746fb12b70Safresh1=item *
10756fb12b70Safresh1
1076898184e3SsthenDo not use strcpy() or strcat() or strncpy() or strncat()
1077898184e3Ssthen
1078898184e3SsthenUse my_strlcpy() and my_strlcat() instead: they either use the native
1079898184e3Ssthenimplementation, or Perl's own implementation (borrowed from the public
1080898184e3Ssthendomain implementation of INN).
1081898184e3Ssthen
1082898184e3Ssthen=item *
1083898184e3Ssthen
1084898184e3SsthenDo not use sprintf() or vsprintf()
1085898184e3Ssthen
1086898184e3SsthenIf you really want just plain byte strings, use my_snprintf() and
1087898184e3Ssthenmy_vsnprintf() instead, which will try to use snprintf() and
1088898184e3Ssthenvsnprintf() if those safer APIs are available.  If you want something
1089*3d61058aSafresh1fancier than a plain byte string, use L<C<Perl_form>()|perlapi/form> or
1090*3d61058aSafresh1SVs and L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>.
10916fb12b70Safresh1
10926fb12b70Safresh1Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc
10936fb12b70Safresh1version 2.17.  They won't allow a C<%.s> format with a precision to
10946fb12b70Safresh1create a string that isn't valid UTF-8 if the current underlying locale
1095*3d61058aSafresh1of the program is UTF-8.  What happens is that the C<%s> and its
1096*3d61058aSafresh1operand are simply skipped without any notice.
10976fb12b70Safresh1L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
1098898184e3Ssthen
1099b8851fccSafresh1=item *
1100b8851fccSafresh1
1101b8851fccSafresh1Do not use atoi()
1102b8851fccSafresh1
1103*3d61058aSafresh1Use grok_atoUV() instead.  atoi() has ill-defined behavior on
1104*3d61058aSafresh1overflows, and cannot be used for incremental parsing.  It is also
1105*3d61058aSafresh1affected by locale, which is bad.
1106b8851fccSafresh1
1107b8851fccSafresh1=item *
1108b8851fccSafresh1
1109b8851fccSafresh1Do not use strtol() or strtoul()
1110b8851fccSafresh1
1111*3d61058aSafresh1Use grok_atoUV() instead.  strtol() or strtoul() (or their
1112*3d61058aSafresh1IV/UV-friendly macro disguises, Strtol() and Strtoul(), or Atol() and
1113*3d61058aSafresh1Atoul() are affected by locale, which is bad.
1114b8851fccSafresh1
1115eac174f2Safresh1=for apidoc_section $numeric
1116eac174f2Safresh1=for apidoc AmhD||Atol|const char * nptr
1117eac174f2Safresh1=for apidoc AmhD||Atoul|const char * nptr
1118eac174f2Safresh1
1119898184e3Ssthen=back
1120898184e3Ssthen
1121898184e3Ssthen=head1 DEBUGGING
1122898184e3Ssthen
1123898184e3SsthenYou can compile a special debugging version of Perl, which allows you
1124898184e3Ssthento use the C<-D> option of Perl to tell more about what Perl is doing.
1125898184e3SsthenBut sometimes there is no alternative than to dive in with a debugger,
1126898184e3Sstheneither to see the stack trace of a core dump (very useful in a bug
1127898184e3Ssthenreport), or trying to figure out what went wrong before the core dump
1128898184e3Ssthenhappened, or how did we end up having wrong or unexpected results.
1129898184e3Ssthen
1130898184e3Ssthen=head2 Poking at Perl
1131898184e3Ssthen
1132898184e3SsthenTo really poke around with Perl, you'll probably want to build Perl for
1133898184e3Ssthendebugging, like this:
1134898184e3Ssthen
11359f11ffb7Safresh1    ./Configure -d -DDEBUGGING
1136898184e3Ssthen    make
1137898184e3Ssthen
11389f11ffb7Safresh1C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce
11399f11ffb7Safresh1debugging information which will allow us to step through a running
1140*3d61058aSafresh1program, and to see in which C function we are at (without the
1141*3d61058aSafresh1debugging information we might see only the numerical addresses of the
1142*3d61058aSafresh1functions, which is not very helpful). It will also turn on the
1143*3d61058aSafresh1C<DEBUGGING> compilation symbol which enables all the internal
1144*3d61058aSafresh1debugging code in Perl. There are a whole bunch of things you can debug
1145*3d61058aSafresh1with this: L<perlrun|perlrun/-Dletters> lists them all, and the best
1146*3d61058aSafresh1way to find out about them is to play about with them.  The most useful
1147*3d61058aSafresh1options are probably
1148898184e3Ssthen
1149898184e3Ssthen    l  Context (loop) stack processing
11509f11ffb7Safresh1    s  Stack snapshots (with v, displays all stacks)
1151898184e3Ssthen    t  Trace execution
1152898184e3Ssthen    o  Method and overloading resolution
1153898184e3Ssthen    c  String/numeric conversions
1154898184e3Ssthen
11559f11ffb7Safresh1For example
11569f11ffb7Safresh1
1157*3d61058aSafresh1    $ perl -Dst -e '$x + 1'
11589f11ffb7Safresh1    ....
1159*3d61058aSafresh1    (-e:1)	gvsv(main::x)
11609f11ffb7Safresh1        =>  UNDEF
11619f11ffb7Safresh1    (-e:1)	const(IV(1))
11629f11ffb7Safresh1        =>  UNDEF  IV(1)
11639f11ffb7Safresh1    (-e:1)	add
11649f11ffb7Safresh1        =>  NV(1)
11659f11ffb7Safresh1
11669f11ffb7Safresh1
11679f11ffb7Safresh1Some of the functionality of the debugging code can be achieved with a
11689f11ffb7Safresh1non-debugging perl by using XS modules:
1169898184e3Ssthen
1170898184e3Ssthen    -Dr => use re 'debug'
1171898184e3Ssthen    -Dx => use O 'Debug'
1172898184e3Ssthen
1173898184e3Ssthen=head2 Using a source-level debugger
1174898184e3Ssthen
1175898184e3SsthenIf the debugging output of C<-D> doesn't help you, it's time to step
1176898184e3Ssthenthrough perl's execution with a source-level debugger.
1177898184e3Ssthen
1178898184e3Ssthen=over 3
1179898184e3Ssthen
1180898184e3Ssthen=item *
1181898184e3Ssthen
1182898184e3SsthenWe'll use C<gdb> for our examples here; the principles will apply to
1183898184e3Ssthenany debugger (many vendors call their debugger C<dbx>), but check the
1184898184e3Ssthenmanual of the one you're using.
1185898184e3Ssthen
1186898184e3Ssthen=back
1187898184e3Ssthen
1188898184e3SsthenTo fire up the debugger, type
1189898184e3Ssthen
1190898184e3Ssthen    gdb ./perl
1191898184e3Ssthen
1192898184e3SsthenOr if you have a core dump:
1193898184e3Ssthen
1194898184e3Ssthen    gdb ./perl core
1195898184e3Ssthen
1196898184e3SsthenYou'll want to do that in your Perl source tree so the debugger can
1197*3d61058aSafresh1read the source code.  You should see the copyright message, followed
1198*3d61058aSafresh1by the prompt.
1199898184e3Ssthen
1200898184e3Ssthen    (gdb)
1201898184e3Ssthen
1202898184e3SsthenC<help> will get you into the documentation, but here are the most
1203898184e3Ssthenuseful commands:
1204898184e3Ssthen
1205898184e3Ssthen=over 3
1206898184e3Ssthen
1207898184e3Ssthen=item * run [args]
1208898184e3Ssthen
1209898184e3SsthenRun the program with the given arguments.
1210898184e3Ssthen
1211898184e3Ssthen=item * break function_name
1212898184e3Ssthen
1213898184e3Ssthen=item * break source.c:xxx
1214898184e3Ssthen
1215898184e3SsthenTells the debugger that we'll want to pause execution when we reach
1216898184e3Sstheneither the named function (but see L<perlguts/Internal Functions>!) or
1217898184e3Ssthenthe given line in the named source file.
1218898184e3Ssthen
1219898184e3Ssthen=item * step
1220898184e3Ssthen
1221898184e3SsthenSteps through the program a line at a time.
1222898184e3Ssthen
1223898184e3Ssthen=item * next
1224898184e3Ssthen
1225898184e3SsthenSteps through the program a line at a time, without descending into
1226898184e3Ssthenfunctions.
1227898184e3Ssthen
1228898184e3Ssthen=item * continue
1229898184e3Ssthen
1230898184e3SsthenRun until the next breakpoint.
1231898184e3Ssthen
1232898184e3Ssthen=item * finish
1233898184e3Ssthen
1234898184e3SsthenRun until the end of the current function, then stop again.
1235898184e3Ssthen
1236898184e3Ssthen=item * 'enter'
1237898184e3Ssthen
1238898184e3SsthenJust pressing Enter will do the most recent operation again - it's a
1239898184e3Ssthenblessing when stepping through miles of source code.
1240898184e3Ssthen
12416fb12b70Safresh1=item * ptype
12426fb12b70Safresh1
12436fb12b70Safresh1Prints the C definition of the argument given.
12446fb12b70Safresh1
12456fb12b70Safresh1  (gdb) ptype PL_op
12466fb12b70Safresh1  type = struct op {
12476fb12b70Safresh1      OP *op_next;
1248b8851fccSafresh1      OP *op_sibparent;
12496fb12b70Safresh1      OP *(*op_ppaddr)(void);
12506fb12b70Safresh1      PADOFFSET op_targ;
12516fb12b70Safresh1      unsigned int op_type : 9;
12526fb12b70Safresh1      unsigned int op_opt : 1;
12536fb12b70Safresh1      unsigned int op_slabbed : 1;
12546fb12b70Safresh1      unsigned int op_savefree : 1;
12556fb12b70Safresh1      unsigned int op_static : 1;
12566fb12b70Safresh1      unsigned int op_folded : 1;
12576fb12b70Safresh1      unsigned int op_spare : 2;
12586fb12b70Safresh1      U8 op_flags;
12596fb12b70Safresh1      U8 op_private;
12606fb12b70Safresh1  } *
12616fb12b70Safresh1
1262898184e3Ssthen=item * print
1263898184e3Ssthen
1264898184e3SsthenExecute the given C code and print its results.  B<WARNING>: Perl makes
1265898184e3Ssthenheavy use of macros, and F<gdb> does not necessarily support macros
1266898184e3Ssthen(see later L</"gdb macro support">).  You'll have to substitute them
1267898184e3Ssthenyourself, or to invoke cpp on the source code files (see L</"The .i
1268898184e3SsthenTargets">) So, for instance, you can't say
1269898184e3Ssthen
1270898184e3Ssthen    print SvPV_nolen(sv)
1271898184e3Ssthen
1272898184e3Ssthenbut you have to say
1273898184e3Ssthen
1274898184e3Ssthen    print Perl_sv_2pv_nolen(sv)
1275898184e3Ssthen
1276898184e3Ssthen=back
1277898184e3Ssthen
1278898184e3SsthenYou may find it helpful to have a "macro dictionary", which you can
1279898184e3Ssthenproduce by saying C<cpp -dM perl.c | sort>.  Even then, F<cpp> won't
1280898184e3Ssthenrecursively apply those macros for you.
1281898184e3Ssthen
1282898184e3Ssthen=head2 gdb macro support
1283898184e3Ssthen
1284898184e3SsthenRecent versions of F<gdb> have fairly good macro support, but in order
1285898184e3Ssthento use it you'll need to compile perl with macro definitions included
1286898184e3Ssthenin the debugging information.  Using F<gcc> version 3.1, this means
1287898184e3Ssthenconfiguring with C<-Doptimize=-g3>.  Other compilers might use a
1288898184e3Ssthendifferent switch (if they support debugging macros at all).
1289898184e3Ssthen
1290898184e3Ssthen=head2 Dumping Perl Data Structures
1291898184e3Ssthen
1292898184e3SsthenOne way to get around this macro hell is to use the dumping functions
1293898184e3Ssthenin F<dump.c>; these work a little like an internal
1294898184e3SsthenL<Devel::Peek|Devel::Peek>, but they also cover OPs and other
1295898184e3Ssthenstructures that you can't get at from Perl.  Let's take an example.
1296*3d61058aSafresh1We'll use the C<$x = $y + $z> we used before, but give it a bit of
1297*3d61058aSafresh1context: C<$y = "6XXXX"; $z = 2.3;>.  Where's a good place to stop and
1298898184e3Ssthenpoke around?
1299898184e3Ssthen
1300898184e3SsthenWhat about C<pp_add>, the function we examined earlier to implement the
1301898184e3SsthenC<+> operator:
1302898184e3Ssthen
1303898184e3Ssthen    (gdb) break Perl_pp_add
1304898184e3Ssthen    Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1305898184e3Ssthen
1306898184e3SsthenNotice we use C<Perl_pp_add> and not C<pp_add> - see
1307898184e3SsthenL<perlguts/Internal Functions>.  With the breakpoint in place, we can
1308898184e3Ssthenrun our program:
1309898184e3Ssthen
1310*3d61058aSafresh1    (gdb) run -e '$y = "6XXXX"; $z = 2.3; $x = $y + $z'
1311898184e3Ssthen
1312898184e3SsthenLots of junk will go past as gdb reads in the relevant source files and
1313898184e3Ssthenlibraries, and then:
1314898184e3Ssthen
1315898184e3Ssthen    Breakpoint 1, Perl_pp_add () at pp_hot.c:309
1316b46d8ef2Safresh1    1396    dSP; dATARGET; bool useleft; SV *svl, *svr;
1317898184e3Ssthen    (gdb) step
1318898184e3Ssthen    311           dPOPTOPnnrl_ul;
1319898184e3Ssthen    (gdb)
1320898184e3Ssthen
1321898184e3SsthenWe looked at this bit of code before, and we said that
1322898184e3SsthenC<dPOPTOPnnrl_ul> arranges for two C<NV>s to be placed into C<left> and
1323898184e3SsthenC<right> - let's slightly expand it:
1324898184e3Ssthen
1325898184e3Ssthen #define dPOPTOPnnrl_ul  NV right = POPn; \
1326898184e3Ssthen                         SV *leftsv = TOPs; \
1327898184e3Ssthen                         NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1328898184e3Ssthen
1329898184e3SsthenC<POPn> takes the SV from the top of the stack and obtains its NV
1330898184e3Sstheneither directly (if C<SvNOK> is set) or by calling the C<sv_2nv>
1331898184e3Ssthenfunction.  C<TOPs> takes the next SV from the top of the stack - yes,
1332898184e3SsthenC<POPn> uses C<TOPs> - but doesn't remove it.  We then use C<SvNV> to
1333898184e3Ssthenget the NV from C<leftsv> in the same way as before - yes, C<POPn> uses
1334898184e3SsthenC<SvNV>.
1335898184e3Ssthen
1336*3d61058aSafresh1Since we don't have an NV for C<$y>, we'll have to use C<sv_2nv> to
1337898184e3Ssthenconvert it.  If we step again, we'll find ourselves there:
1338898184e3Ssthen
13396fb12b70Safresh1    (gdb) step
1340898184e3Ssthen    Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1341898184e3Ssthen    1669        if (!sv)
1342898184e3Ssthen    (gdb)
1343898184e3Ssthen
1344898184e3SsthenWe can now use C<Perl_sv_dump> to investigate the SV:
1345898184e3Ssthen
13466fb12b70Safresh1    (gdb) print Perl_sv_dump(sv)
1347898184e3Ssthen    SV = PV(0xa057cc0) at 0xa0675d0
1348898184e3Ssthen    REFCNT = 1
1349898184e3Ssthen    FLAGS = (POK,pPOK)
1350898184e3Ssthen    PV = 0xa06a510 "6XXXX"\0
1351898184e3Ssthen    CUR = 5
1352898184e3Ssthen    LEN = 6
1353898184e3Ssthen    $1 = void
1354898184e3Ssthen
1355898184e3SsthenWe know we're going to get C<6> from this, so let's finish the
1356898184e3Ssthensubroutine:
1357898184e3Ssthen
1358898184e3Ssthen    (gdb) finish
1359898184e3Ssthen    Run till exit from #0  Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1360898184e3Ssthen    0x462669 in Perl_pp_add () at pp_hot.c:311
1361898184e3Ssthen    311           dPOPTOPnnrl_ul;
1362898184e3Ssthen
1363898184e3SsthenWe can also dump out this op: the current op is always stored in
1364898184e3SsthenC<PL_op>, and we can dump it with C<Perl_op_dump>.  This'll give us
1365*3d61058aSafresh1similar output to CPAN module L<B::Debug>.
1366898184e3Ssthen
1367eac174f2Safresh1=for apidoc_section $debugging
1368eac174f2Safresh1=for apidoc Amnh||PL_op
1369eac174f2Safresh1
13706fb12b70Safresh1    (gdb) print Perl_op_dump(PL_op)
1371898184e3Ssthen    {
1372898184e3Ssthen    13  TYPE = add  ===> 14
1373898184e3Ssthen        TARG = 1
1374898184e3Ssthen        FLAGS = (SCALAR,KIDS)
1375898184e3Ssthen        {
1376898184e3Ssthen            TYPE = null  ===> (12)
1377898184e3Ssthen              (was rv2sv)
1378898184e3Ssthen            FLAGS = (SCALAR,KIDS)
1379898184e3Ssthen            {
1380898184e3Ssthen    11          TYPE = gvsv  ===> 12
1381898184e3Ssthen                FLAGS = (SCALAR)
1382898184e3Ssthen                GV = main::b
1383898184e3Ssthen            }
1384898184e3Ssthen        }
1385898184e3Ssthen
1386898184e3Ssthen# finish this later #
1387898184e3Ssthen
13886fb12b70Safresh1=head2 Using gdb to look at specific parts of a program
13896fb12b70Safresh1
1390*3d61058aSafresh1With the example above, you knew to look for C<Perl_pp_add>, but what
1391*3d61058aSafresh1if there were multiple calls to it all over the place, or you didn't
1392*3d61058aSafresh1know what the op was you were looking for?
13936fb12b70Safresh1
1394*3d61058aSafresh1One way to do this is to inject a rare call somewhere near what you're
1395*3d61058aSafresh1looking for.  For example, you could add C<study> before your method:
13966fb12b70Safresh1
13976fb12b70Safresh1    study;
13986fb12b70Safresh1
13996fb12b70Safresh1And in gdb do:
14006fb12b70Safresh1
14016fb12b70Safresh1    (gdb) break Perl_pp_study
14026fb12b70Safresh1
1403*3d61058aSafresh1And then step until you hit what you're looking for.  This works well
1404*3d61058aSafresh1in a loop if you want to only break at certain iterations:
14056fb12b70Safresh1
1406*3d61058aSafresh1    for my $i (1..100) {
1407*3d61058aSafresh1        study if $i == 50;
14086fb12b70Safresh1    }
14096fb12b70Safresh1
14106fb12b70Safresh1=head2 Using gdb to look at what the parser/lexer are doing
14116fb12b70Safresh1
1412*3d61058aSafresh1If you want to see what perl is doing when parsing/lexing your code,
1413*3d61058aSafresh1you can use C<BEGIN {}>:
14146fb12b70Safresh1
14156fb12b70Safresh1    print "Before\n";
14166fb12b70Safresh1    BEGIN { study; }
14176fb12b70Safresh1    print "After\n";
14186fb12b70Safresh1
14196fb12b70Safresh1And in gdb:
14206fb12b70Safresh1
14216fb12b70Safresh1    (gdb) break Perl_pp_study
14226fb12b70Safresh1
1423*3d61058aSafresh1If you want to see what the parser/lexer is doing inside of C<if>
1424*3d61058aSafresh1blocks and the like you need to be a little trickier:
14256fb12b70Safresh1
1426*3d61058aSafresh1    if ($x && $y && do { BEGIN { study } 1 } && $z) { ... }
14276fb12b70Safresh1
1428898184e3Ssthen=head1 SOURCE CODE STATIC ANALYSIS
1429898184e3Ssthen
1430898184e3SsthenVarious tools exist for analysing C source code B<statically>, as
1431898184e3Ssthenopposed to B<dynamically>, that is, without executing the code.  It is
1432898184e3Ssthenpossible to detect resource leaks, undefined behaviour, type
1433898184e3Ssthenmismatches, portability problems, code paths that would cause illegal
1434898184e3Ssthenmemory accesses, and other similar problems by just parsing the C code
1435898184e3Ssthenand looking at the resulting graph, what does it tell about the
1436898184e3Ssthenexecution and data flows.  As a matter of fact, this is exactly how C
1437898184e3Ssthencompilers know to give warnings about dubious code.
1438898184e3Ssthen
14399f11ffb7Safresh1=head2 lint
1440898184e3Ssthen
1441898184e3SsthenThe good old C code quality inspector, C<lint>, is available in several
1442898184e3Ssthenplatforms, but please be aware that there are several different
1443898184e3Ssthenimplementations of it by different vendors, which means that the flags
1444898184e3Ssthenare not identical across different platforms.
1445898184e3Ssthen
1446*3d61058aSafresh1There is a C<lint> target in Makefile, but you may have to diddle with
1447*3d61058aSafresh1the flags (see above).
1448898184e3Ssthen
1449898184e3Ssthen=head2 Coverity
1450898184e3Ssthen
1451*3d61058aSafresh1Coverity (L<https://www.coverity.com/>) is a product similar to lint and
1452*3d61058aSafresh1as a testbed for their product they periodically check several open
1453*3d61058aSafresh1source projects, and they give out accounts to open source developers
1454*3d61058aSafresh1to the defect databases.
1455898184e3Ssthen
1456b8851fccSafresh1There is Coverity setup for the perl5 project:
1457b8851fccSafresh1L<https://scan.coverity.com/projects/perl5>
1458b8851fccSafresh1
1459b8851fccSafresh1=head2 HP-UX cadvise (Code Advisor)
1460b8851fccSafresh1
1461b8851fccSafresh1HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
1462*3d61058aSafresh1(Link not given here because the URL is horribly long and seems
1463*3d61058aSafresh1horribly unstable; use the search engine of your choice to find it.)
1464*3d61058aSafresh1The use of the C<cadvise_cc> recipe with C<Configure ...
1465*3d61058aSafresh1-Dcc=./cadvise_cc> (see cadvise "User Guide") is recommended; as is the
1466*3d61058aSafresh1use of C<+wall>.
1467b8851fccSafresh1
1468898184e3Ssthen=head2 cpd (cut-and-paste detector)
1469898184e3Ssthen
1470898184e3SsthenThe cpd tool detects cut-and-paste coding.  If one instance of the
1471898184e3Ssthencut-and-pasted code changes, all the other spots should probably be
1472898184e3Ssthenchanged, too.  Therefore such code should probably be turned into a
1473898184e3Ssthensubroutine or a macro.
1474898184e3Ssthen
1475*3d61058aSafresh1cpd (L<https://docs.pmd-code.org/latest/pmd_userdocs_cpd.html>) is part
1476*3d61058aSafresh1of the pmd project (L<https://pmd.github.io/>).  pmd was originally
1477*3d61058aSafresh1written for static analysis of Java code, but later the cpd part of it
1478*3d61058aSafresh1was extended to parse also C and C++.
1479898184e3Ssthen
1480898184e3SsthenDownload the pmd-bin-X.Y.zip () from the SourceForge site, extract the
1481898184e3Ssthenpmd-X.Y.jar from it, and then run that on source code thusly:
1482898184e3Ssthen
148391f110e0Safresh1  java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \
148491f110e0Safresh1   --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
1485898184e3Ssthen
1486898184e3SsthenYou may run into memory limits, in which case you should use the -Xmx
1487898184e3Ssthenoption:
1488898184e3Ssthen
1489898184e3Ssthen  java -Xmx512M ...
1490898184e3Ssthen
1491898184e3Ssthen=head2 gcc warnings
1492898184e3Ssthen
1493898184e3SsthenThough much can be written about the inconsistency and coverage
1494898184e3Ssthenproblems of gcc warnings (like C<-Wall> not meaning "all the warnings",
1495898184e3Ssthenor some common portability problems not being covered by C<-Wall>, or
1496898184e3SsthenC<-ansi> and C<-pedantic> both being a poorly defined collection of
1497898184e3Ssthenwarnings, and so forth), gcc is still a useful tool in keeping our
1498898184e3Ssthencoding nose clean.
1499898184e3Ssthen
1500898184e3SsthenThe C<-Wall> is by default on.
1501898184e3Ssthen
1502*3d61058aSafresh1It would be nice for C<-pedantic>) to be on always, but unfortunately
1503*3d61058aSafresh1it is not safe on all platforms - for example fatal conflicts with the
1504*3d61058aSafresh1system headers (Solaris being a prime example).  If Configure
1505*3d61058aSafresh1C<-Dgccansipedantic> is used, the C<cflags> frontend selects
1506*3d61058aSafresh1C<-pedantic> for the platforms where it is known to be safe.
1507898184e3Ssthen
15089f11ffb7Safresh1The following extra flags are added:
1509898184e3Ssthen
1510898184e3Ssthen=over 4
1511898184e3Ssthen
1512898184e3Ssthen=item *
1513898184e3Ssthen
1514898184e3SsthenC<-Wendif-labels>
1515898184e3Ssthen
1516898184e3Ssthen=item *
1517898184e3Ssthen
1518898184e3SsthenC<-Wextra>
1519898184e3Ssthen
1520898184e3Ssthen=item *
1521898184e3Ssthen
15229f11ffb7Safresh1C<-Wc++-compat>
15239f11ffb7Safresh1
15249f11ffb7Safresh1=item *
15259f11ffb7Safresh1
15269f11ffb7Safresh1C<-Wwrite-strings>
15279f11ffb7Safresh1
15289f11ffb7Safresh1=item *
15299f11ffb7Safresh1
1530eac174f2Safresh1C<-Werror=pointer-arith>
15319f11ffb7Safresh1
15329f11ffb7Safresh1=item *
15339f11ffb7Safresh1
1534eac174f2Safresh1C<-Werror=vla>
1535898184e3Ssthen
1536898184e3Ssthen=back
1537898184e3Ssthen
1538898184e3SsthenThe following flags would be nice to have but they would first need
1539898184e3Ssthentheir own Augean stablemaster:
1540898184e3Ssthen
1541898184e3Ssthen=over 4
1542898184e3Ssthen
1543898184e3Ssthen=item *
1544898184e3Ssthen
1545898184e3SsthenC<-Wshadow>
1546898184e3Ssthen
1547898184e3Ssthen=item *
1548898184e3Ssthen
1549898184e3SsthenC<-Wstrict-prototypes>
1550898184e3Ssthen
1551898184e3Ssthen=back
1552898184e3Ssthen
1553898184e3SsthenThe C<-Wtraditional> is another example of the annoying tendency of gcc
1554898184e3Ssthento bundle a lot of warnings under one switch (it would be impossible to
1555898184e3Ssthendeploy in practice because it would complain a lot) but it does contain
1556898184e3Ssthensome warnings that would be beneficial to have available on their own,
1557898184e3Ssthensuch as the warning about string constants inside macros containing the
1558898184e3Ssthenmacro arguments: this behaved differently pre-ANSI than it does in
1559898184e3SsthenANSI, and some C compilers are still in transition, AIX being an
1560898184e3Ssthenexample.
1561898184e3Ssthen
1562898184e3Ssthen=head2 Warnings of other C compilers
1563898184e3Ssthen
1564898184e3SsthenOther C compilers (yes, there B<are> other C compilers than gcc) often
1565898184e3Ssthenhave their "strict ANSI" or "strict ANSI with some portability
1566898184e3Ssthenextensions" modes on, like for example the Sun Workshop has its C<-Xa>
1567898184e3Ssthenmode on (though implicitly), or the DEC (these days, HP...) has its
1568898184e3SsthenC<-std1> mode on.
1569898184e3Ssthen
1570898184e3Ssthen=head1 MEMORY DEBUGGERS
1571898184e3Ssthen
157291f110e0Safresh1B<NOTE 1>: Running under older memory debuggers such as Purify,
157391f110e0Safresh1valgrind or Third Degree greatly slows down the execution: seconds
1574*3d61058aSafresh1become minutes, minutes become hours.  For example as of Perl 5.8.1,
1575*3d61058aSafresh1the F<ext/Encode/t/Unicode.t> test takes extraordinarily long to
1576*3d61058aSafresh1complete under e.g. Purify, Third Degree, and valgrind.  Under valgrind
1577*3d61058aSafresh1it takes more than six hours, even on a snappy computer.  Said test
1578*3d61058aSafresh1must be doing something that is quite unfriendly for memory debuggers.
1579*3d61058aSafresh1If you don't feel like waiting, you can simply kill the perl process.
158091f110e0Safresh1Roughly valgrind slows down execution by factor 10, AddressSanitizer by
158191f110e0Safresh1factor 2.
1582898184e3Ssthen
1583898184e3SsthenB<NOTE 2>: To minimize the number of memory leak false alarms (see
1584898184e3SsthenL</PERL_DESTRUCT_LEVEL> for more information), you have to set the
1585*3d61058aSafresh1environment variable C<PERL_DESTRUCT_LEVEL> to 2.  For example, like
1586*3d61058aSafresh1this:
1587898184e3Ssthen
1588898184e3Ssthen    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
1589898184e3Ssthen
1590898184e3SsthenB<NOTE 3>: There are known memory leaks when there are compile-time
1591*3d61058aSafresh1errors within C<eval> or C<require>; seeing C<S_doeval> in the call
1592*3d61058aSafresh1stack is a good sign of these.  Fixing these leaks is non-trivial,
1593*3d61058aSafresh1unfortunately, but they must be fixed eventually.
1594898184e3Ssthen
1595898184e3SsthenB<NOTE 4>: L<DynaLoader> will not clean up after itself completely
1596898184e3Ssthenunless Perl is built with the Configure option
1597898184e3SsthenC<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
1598898184e3Ssthen
1599898184e3Ssthen=head2 valgrind
1600898184e3Ssthen
160191f110e0Safresh1The valgrind tool can be used to find out both memory leaks and illegal
1602*3d61058aSafresh1heap memory accesses.  As of version 3.3.0, Valgrind only supports
1603*3d61058aSafresh1Linux on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64.
1604*3d61058aSafresh1The special "test.valgrind" target can be used to run the tests under
160591f110e0Safresh1valgrind.  Found errors and memory leaks are logged in files named
1606b8851fccSafresh1F<testfile.valgrind> and by default output is displayed inline.
1607b8851fccSafresh1
1608b8851fccSafresh1Example usage:
1609b8851fccSafresh1
1610b8851fccSafresh1    make test.valgrind
1611b8851fccSafresh1
1612*3d61058aSafresh1Since valgrind adds significant overhead, tests will take much longer
1613*3d61058aSafresh1to run.  The valgrind tests support being run in parallel to help with
1614*3d61058aSafresh1this:
1615b8851fccSafresh1
1616b8851fccSafresh1    TEST_JOBS=9 make test.valgrind
1617b8851fccSafresh1
1618b8851fccSafresh1Note that the above two invocations will be very verbose as reachable
1619*3d61058aSafresh1memory and leak-checking is enabled by default.  If you want to just
1620*3d61058aSafresh1see pure errors, try:
1621b8851fccSafresh1
1622b8851fccSafresh1    VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
1623b8851fccSafresh1        make test.valgrind
1624898184e3Ssthen
1625898184e3SsthenValgrind also provides a cachegrind tool, invoked on perl as:
1626898184e3Ssthen
1627898184e3Ssthen    VG_OPTS=--tool=cachegrind make test.valgrind
1628898184e3Ssthen
1629898184e3SsthenAs system libraries (most notably glibc) are also triggering errors,
1630898184e3Ssthenvalgrind allows to suppress such errors using suppression files.  The
1631898184e3Ssthendefault suppression file that comes with valgrind already catches a lot
1632898184e3Ssthenof them.  Some additional suppressions are defined in F<t/perl.supp>.
1633898184e3Ssthen
1634*3d61058aSafresh1To get valgrind and for more information see L<https://valgrind.org/>.
1635898184e3Ssthen
163691f110e0Safresh1=head2 AddressSanitizer
163791f110e0Safresh1
163856d68f1eSafresh1AddressSanitizer ("ASan") consists of a compiler instrumentation module
163956d68f1eSafresh1and a run-time C<malloc> library. ASan is available for a variety of
1640*3d61058aSafresh1architectures, operating systems, and compilers (see project link
1641*3d61058aSafresh1below). It checks for unsafe memory usage, such as use after free and
1642*3d61058aSafresh1buffer overflow conditions, and is fast enough that you can easily
1643*3d61058aSafresh1compile your debugging or optimized perl with it. Modern versions of
1644*3d61058aSafresh1ASan check for memory leaks by default on most platforms, otherwise
1645*3d61058aSafresh1(e.g. x86_64 OS X) this feature can be enabled via
1646*3d61058aSafresh1C<ASAN_OPTIONS=detect_leaks=1>.
164756d68f1eSafresh1
164891f110e0Safresh1
164991f110e0Safresh1To build perl with AddressSanitizer, your Configure invocation should
165091f110e0Safresh1look like:
165191f110e0Safresh1
165291f110e0Safresh1    sh Configure -des -Dcc=clang \
165356d68f1eSafresh1       -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \
165456d68f1eSafresh1       -Alddlflags=-shared\ -fsanitize=address \
165556d68f1eSafresh1       -fsanitize-blacklist=`pwd`/asan_ignore
165691f110e0Safresh1
165791f110e0Safresh1where these arguments mean:
165891f110e0Safresh1
165991f110e0Safresh1=over 4
166091f110e0Safresh1
166191f110e0Safresh1=item * -Dcc=clang
166291f110e0Safresh1
166391f110e0Safresh1This should be replaced by the full path to your clang executable if it
166491f110e0Safresh1is not in your path.
166591f110e0Safresh1
166656d68f1eSafresh1=item * -Accflags=-fsanitize=address
166791f110e0Safresh1
166891f110e0Safresh1Compile perl and extensions sources with AddressSanitizer.
166991f110e0Safresh1
167056d68f1eSafresh1=item * -Aldflags=-fsanitize=address
167191f110e0Safresh1
167291f110e0Safresh1Link the perl executable with AddressSanitizer.
167391f110e0Safresh1
167456d68f1eSafresh1=item * -Alddlflags=-shared\ -fsanitize=address
167591f110e0Safresh1
167691f110e0Safresh1Link dynamic extensions with AddressSanitizer.  You must manually
167791f110e0Safresh1specify C<-shared> because using C<-Alddlflags=-shared> will prevent
167891f110e0Safresh1Configure from setting a default value for C<lddlflags>, which usually
16796fb12b70Safresh1contains C<-shared> (at least on Linux).
168091f110e0Safresh1
168156d68f1eSafresh1=item * -fsanitize-blacklist=`pwd`/asan_ignore
168256d68f1eSafresh1
168356d68f1eSafresh1AddressSanitizer will ignore functions listed in the C<asan_ignore>
1684*3d61058aSafresh1file.  (This file should contain a short explanation of why each of the
1685*3d61058aSafresh1functions is listed.)
168656d68f1eSafresh1
168791f110e0Safresh1=back
168891f110e0Safresh1
1689*3d61058aSafresh1See also L<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
1690*3d61058aSafresh1
1691*3d61058aSafresh1=head2 Dr Memory
1692*3d61058aSafresh1
1693*3d61058aSafresh1Dr. Memory is a tool similar to valgrind which is usable on Windows
1694*3d61058aSafresh1and Linux.
1695*3d61058aSafresh1
1696*3d61058aSafresh1It supports heap checking like C<memcheck> from valgrind.  There are
1697*3d61058aSafresh1also other tools included.
1698*3d61058aSafresh1
1699*3d61058aSafresh1See L<https://drmemory.org/>.
170091f110e0Safresh1
170191f110e0Safresh1
1702898184e3Ssthen=head1 PROFILING
1703898184e3Ssthen
1704898184e3SsthenDepending on your platform there are various ways of profiling Perl.
1705898184e3Ssthen
1706898184e3SsthenThere are two commonly used techniques of profiling executables:
1707898184e3SsthenI<statistical time-sampling> and I<basic-block counting>.
1708898184e3Ssthen
1709898184e3SsthenThe first method takes periodically samples of the CPU program counter,
1710898184e3Ssthenand since the program counter can be correlated with the code generated
1711898184e3Ssthenfor functions, we get a statistical view of in which functions the
1712898184e3Ssthenprogram is spending its time.  The caveats are that very small/fast
1713898184e3Ssthenfunctions have lower probability of showing up in the profile, and that
1714898184e3Ssthenperiodically interrupting the program (this is usually done rather
1715898184e3Ssthenfrequently, in the scale of milliseconds) imposes an additional
1716*3d61058aSafresh1overhead that may skew the results.  The first problem can be
1717*3d61058aSafresh1alleviated by running the code for longer (in general this is a good
1718*3d61058aSafresh1idea for profiling), the second problem is usually kept in guard by the
1719898184e3Ssthenprofiling tools themselves.
1720898184e3Ssthen
1721898184e3SsthenThe second method divides up the generated code into I<basic blocks>.
1722898184e3SsthenBasic blocks are sections of code that are entered only in the
1723898184e3Ssthenbeginning and exited only at the end.  For example, a conditional jump
1724898184e3Ssthenstarts a basic block.  Basic block profiling usually works by
1725898184e3SsthenI<instrumenting> the code by adding I<enter basic block #nnnn>
1726898184e3Ssthenbook-keeping code to the generated code.  During the execution of the
1727898184e3Ssthencode the basic block counters are then updated appropriately.  The
1728898184e3Ssthencaveat is that the added extra code can skew the results: again, the
1729898184e3Ssthenprofiling tools usually try to factor their own effects out of the
1730898184e3Ssthenresults.
1731898184e3Ssthen
1732898184e3Ssthen=head2 Gprof Profiling
1733898184e3Ssthen
17346fb12b70Safresh1I<gprof> is a profiling tool available in many Unix platforms which
17356fb12b70Safresh1uses I<statistical time-sampling>.  You can build a profiled version of
17366fb12b70Safresh1F<perl> by compiling using gcc with the flag C<-pg>.  Either edit
17376fb12b70Safresh1F<config.sh> or re-run F<Configure>.  Running the profiled version of
17386fb12b70Safresh1Perl will create an output file called F<gmon.out> which contains the
17396fb12b70Safresh1profiling data collected during the execution.
1740898184e3Ssthen
17416fb12b70Safresh1quick hint:
1742898184e3Ssthen
17436fb12b70Safresh1    $ sh Configure -des -Dusedevel -Accflags='-pg' \
17446fb12b70Safresh1        -Aldflags='-pg' -Alddlflags='-pg -shared' \
17456fb12b70Safresh1        && make perl
17466fb12b70Safresh1    $ ./perl ... # creates gmon.out in current directory
17476fb12b70Safresh1    $ gprof ./perl > out
17486fb12b70Safresh1    $ less out
17496fb12b70Safresh1
17506fb12b70Safresh1(you probably need to add C<-shared> to the <-Alddlflags> line until RT
17516fb12b70Safresh1#118199 is resolved)
17526fb12b70Safresh1
17536fb12b70Safresh1The F<gprof> tool can then display the collected data in various ways.
17546fb12b70Safresh1Usually F<gprof> understands the following options:
1755898184e3Ssthen
1756898184e3Ssthen=over 4
1757898184e3Ssthen
1758898184e3Ssthen=item * -a
1759898184e3Ssthen
1760898184e3SsthenSuppress statically defined functions from the profile.
1761898184e3Ssthen
1762898184e3Ssthen=item * -b
1763898184e3Ssthen
1764898184e3SsthenSuppress the verbose descriptions in the profile.
1765898184e3Ssthen
1766898184e3Ssthen=item * -e routine
1767898184e3Ssthen
1768898184e3SsthenExclude the given routine and its descendants from the profile.
1769898184e3Ssthen
1770898184e3Ssthen=item * -f routine
1771898184e3Ssthen
1772898184e3SsthenDisplay only the given routine and its descendants in the profile.
1773898184e3Ssthen
1774898184e3Ssthen=item * -s
1775898184e3Ssthen
1776898184e3SsthenGenerate a summary file called F<gmon.sum> which then may be given to
1777898184e3Ssthensubsequent gprof runs to accumulate data over several runs.
1778898184e3Ssthen
1779898184e3Ssthen=item * -z
1780898184e3Ssthen
1781898184e3SsthenDisplay routines that have zero usage.
1782898184e3Ssthen
1783898184e3Ssthen=back
1784898184e3Ssthen
1785898184e3SsthenFor more detailed explanation of the available commands and output
17866fb12b70Safresh1formats, see your own local documentation of F<gprof>.
1787898184e3Ssthen
1788898184e3Ssthen=head2 GCC gcov Profiling
1789898184e3Ssthen
17906fb12b70Safresh1I<basic block profiling> is officially available in gcc 3.0 and later.
17916fb12b70Safresh1You can build a profiled version of F<perl> by compiling using gcc with
17926fb12b70Safresh1the flags C<-fprofile-arcs -ftest-coverage>.  Either edit F<config.sh>
17936fb12b70Safresh1or re-run F<Configure>.
1794898184e3Ssthen
17956fb12b70Safresh1quick hint:
17966fb12b70Safresh1
17976fb12b70Safresh1    $ sh Configure -des -Dusedevel -Doptimize='-g' \
17986fb12b70Safresh1        -Accflags='-fprofile-arcs -ftest-coverage' \
17996fb12b70Safresh1        -Aldflags='-fprofile-arcs -ftest-coverage' \
18006fb12b70Safresh1        -Alddlflags='-fprofile-arcs -ftest-coverage -shared' \
18016fb12b70Safresh1        && make perl
18026fb12b70Safresh1    $ rm -f regexec.c.gcov regexec.gcda
18036fb12b70Safresh1    $ ./perl ...
18046fb12b70Safresh1    $ gcov regexec.c
18056fb12b70Safresh1    $ less regexec.c.gcov
18066fb12b70Safresh1
18076fb12b70Safresh1(you probably need to add C<-shared> to the <-Alddlflags> line until RT
18086fb12b70Safresh1#118199 is resolved)
1809898184e3Ssthen
1810898184e3SsthenRunning the profiled version of Perl will cause profile output to be
18116fb12b70Safresh1generated.  For each source file an accompanying F<.gcda> file will be
1812898184e3Ssthencreated.
1813898184e3Ssthen
18146fb12b70Safresh1To display the results you use the I<gcov> utility (which should be
1815898184e3Sstheninstalled if you have gcc 3.0 or newer installed).  F<gcov> is run on
1816898184e3Ssthensource code files, like this
1817898184e3Ssthen
1818898184e3Ssthen    gcov sv.c
1819898184e3Ssthen
1820*3d61058aSafresh1which will cause F<sv.c.gcov> to be created.  The F<.gcov> files
1821*3d61058aSafresh1contain the source code annotated with relative frequencies of
1822*3d61058aSafresh1execution indicated by "#" markers.  If you want to generate F<.gcov>
1823*3d61058aSafresh1files for all profiled object files, you can run something like this:
18246fb12b70Safresh1
18256fb12b70Safresh1    for file in `find . -name \*.gcno`
18266fb12b70Safresh1    do sh -c "cd `dirname $file` && gcov `basename $file .gcno`"
18276fb12b70Safresh1    done
1828898184e3Ssthen
1829898184e3SsthenUseful options of F<gcov> include C<-b> which will summarise the basic
1830898184e3Ssthenblock, branch, and function call coverage, and C<-c> which instead of
1831898184e3Ssthenrelative frequencies will use the actual counts.  For more information
1832898184e3Ssthenon the use of F<gcov> and basic block profiling with gcc, see the
18336fb12b70Safresh1latest GNU CC manual.  As of gcc 4.8, this is at
1834*3d61058aSafresh1L<https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>.
1835898184e3Ssthen
1836eac174f2Safresh1=head2 callgrind profiling
1837eac174f2Safresh1
1838*3d61058aSafresh1callgrind is a valgrind tool for profiling source code. Paired with
1839*3d61058aSafresh1kcachegrind (a Qt based UI), it gives you an overview of where code is
1840*3d61058aSafresh1taking up time, as well as the ability to examine callers, call trees,
1841*3d61058aSafresh1and more. One of its benefits is you can use it on perl and XS modules
1842*3d61058aSafresh1that have not been compiled with debugging symbols.
1843eac174f2Safresh1
1844*3d61058aSafresh1If perl is compiled with debugging symbols (C<-g>), you can view the
1845*3d61058aSafresh1annotated source and click around, much like L<Devel::NYTProf>'s HTML
1846*3d61058aSafresh1output.
1847eac174f2Safresh1
1848eac174f2Safresh1For basic usage:
1849eac174f2Safresh1
1850eac174f2Safresh1    valgrind --tool=callgrind ./perl ...
1851eac174f2Safresh1
1852*3d61058aSafresh1By default it will write output to F<callgrind.out.PID>, but you can
1853*3d61058aSafresh1change that with C<--callgrind-out-file=...>
1854eac174f2Safresh1
1855eac174f2Safresh1To view the data, do:
1856eac174f2Safresh1
1857eac174f2Safresh1    kcachegrind callgrind.out.PID
1858eac174f2Safresh1
1859eac174f2Safresh1If you'd prefer to view the data in a terminal, you can use
1860*3d61058aSafresh1F<callgrind_annotate>.  In its basic form:
1861eac174f2Safresh1
1862eac174f2Safresh1    callgrind_annotate callgrind.out.PID | less
1863eac174f2Safresh1
1864eac174f2Safresh1Some useful options are:
1865eac174f2Safresh1
1866eac174f2Safresh1=over 4
1867eac174f2Safresh1
1868eac174f2Safresh1=item * --threshold
1869eac174f2Safresh1
1870*3d61058aSafresh1Percentage of counts (of primary sort event) we are interested in. The
1871*3d61058aSafresh1default is 99%, 100% might show things that seem to be missing.
1872eac174f2Safresh1
1873eac174f2Safresh1=item * --auto
1874eac174f2Safresh1
1875*3d61058aSafresh1Annotate all source files containing functions that helped reach the
1876*3d61058aSafresh1event count threshold.
1877eac174f2Safresh1
1878eac174f2Safresh1=back
1879eac174f2Safresh1
1880*3d61058aSafresh1=head2 C<profiler> profiling (Cygwin)
1881*3d61058aSafresh1
1882*3d61058aSafresh1Cygwin allows for C<gprof> profiling and C<gcov> coverage testing, but
1883*3d61058aSafresh1this only profiles the main executable.
1884*3d61058aSafresh1
1885*3d61058aSafresh1You can use the C<profiler> tool to perform sample based profiling, it
1886*3d61058aSafresh1requires no special preparation of the executables beyond debugging
1887*3d61058aSafresh1symbols.
1888*3d61058aSafresh1
1889*3d61058aSafresh1This produces sampling data which can be processed with C<gprof>.
1890*3d61058aSafresh1
1891*3d61058aSafresh1There is L<limited
1892*3d61058aSafresh1documentation|https://www.cygwin.com/cygwin-ug-net/profiler.html> on
1893*3d61058aSafresh1the Cygwin web site.
1894*3d61058aSafresh1
1895*3d61058aSafresh1=head2 Visual Studio Profiling
1896*3d61058aSafresh1
1897*3d61058aSafresh1You can use the Visual Studio profiler to profile perl if you've built
1898*3d61058aSafresh1perl with MSVC, even though we build perl at the command-line.  You
1899*3d61058aSafresh1will need to build perl with C<CFG=Debug> or C<CFG=DebugSymbols>.
1900*3d61058aSafresh1
1901*3d61058aSafresh1The Visual Studio profiler is a sampling profiler.
1902*3d61058aSafresh1
1903*3d61058aSafresh1See L<the visual studio
1904*3d61058aSafresh1documentation|https://github.com/MicrosoftDocs/visualstudio-docs/blob/main/docs/profiling/beginners-guide-to-performance-profiling.md>
1905*3d61058aSafresh1to get started.
1906*3d61058aSafresh1
1907898184e3Ssthen=head1 MISCELLANEOUS TRICKS
1908898184e3Ssthen
1909898184e3Ssthen=head2 PERL_DESTRUCT_LEVEL
1910898184e3Ssthen
1911898184e3SsthenIf you want to run any of the tests yourself manually using e.g.
1912*3d61058aSafresh1valgrind, please note that by default perl B<does not> explicitly clean
1913*3d61058aSafresh1up all the memory it has allocated (such as global memory arenas) but
1914*3d61058aSafresh1instead lets the C<exit()> of the whole program "take care" of such
19156fb12b70Safresh1allocations, also known as "global destruction of objects".
1916898184e3Ssthen
1917898184e3SsthenThere is a way to tell perl to do complete cleanup: set the environment
1918*3d61058aSafresh1variable C<PERL_DESTRUCT_LEVEL> to a non-zero value.  The F<t/TEST>
1919*3d61058aSafresh1wrapper does set this to 2, and this is what you need to do too, if you
1920*3d61058aSafresh1don't want to see the "global leaks": For example, for running under
1921*3d61058aSafresh1valgrind
1922898184e3Ssthen
19236fb12b70Safresh1    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
1924898184e3Ssthen
1925*3d61058aSafresh1(Note: the mod_perl Apache module uses this environment variable for
1926*3d61058aSafresh1its own purposes and extends its semantics.  Refer to L<the mod_perl
1927*3d61058aSafresh1documentation|https://perl.apache.org/docs/> for more information.
1928*3d61058aSafresh1Also, spawned threads do the equivalent of setting this variable to the
1929*3d61058aSafresh1value 1.)
1930898184e3Ssthen
1931*3d61058aSafresh1If, at the end of a run, you get the message I<N scalars leaked>, you
1932*3d61058aSafresh1can recompile with C<-DDEBUG_LEAKING_SCALARS> (C<Configure
1933*3d61058aSafresh1-Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the addresses of
1934*3d61058aSafresh1all those leaked SVs to be dumped along with details as to where each
1935*3d61058aSafresh1SV was originally allocated.  This information is also displayed by
1936*3d61058aSafresh1L<Devel::Peek>.  Note that the extra details recorded with each SV
1937*3d61058aSafresh1increase memory usage, so it shouldn't be used in production
1938898184e3Ssthenenvironments.  It also converts C<new_SV()> from a macro into a real
1939898184e3Ssthenfunction, so you can use your favourite debugger to discover where
1940898184e3Ssthenthose pesky SVs were allocated.
1941898184e3Ssthen
1942898184e3SsthenIf you see that you're leaking memory at runtime, but neither valgrind
1943898184e3Ssthennor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably
1944898184e3Ssthenleaking SVs that are still reachable and will be properly cleaned up
1945898184e3Ssthenduring destruction of the interpreter.  In such cases, using the C<-Dm>
1946898184e3Ssthenswitch can point you to the source of the leak.  If the executable was
1947898184e3Ssthenbuilt with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV
1948*3d61058aSafresh1allocations in addition to memory allocations.  Each SV allocation has
1949*3d61058aSafresh1a distinct serial number that will be written on creation and
1950*3d61058aSafresh1destruction of the SV.  So if you're executing the leaking code in a
1951*3d61058aSafresh1loop, you need to look for SVs that are created, but never destroyed
1952*3d61058aSafresh1between each cycle.  If such an SV is found, set a conditional
1953*3d61058aSafresh1breakpoint within C<new_SV()> and make it break only when
1954*3d61058aSafresh1C<PL_sv_serial> is equal to the serial number of the leaking SV.  Then
1955*3d61058aSafresh1you will catch the interpreter in exactly the state where the leaking
1956*3d61058aSafresh1SV is allocated, which is sufficient in many cases to find the source
1957*3d61058aSafresh1of the leak.
1958898184e3Ssthen
1959898184e3SsthenAs C<-Dm> is using the PerlIO layer for output, it will by itself
1960898184e3Ssthenallocate quite a bunch of SVs, which are hidden to avoid recursion. You
1961898184e3Ssthencan bypass the PerlIO layer if you use the SV logging provided by
1962898184e3SsthenC<-DPERL_MEM_LOG> instead.
1963898184e3Ssthen
1964eac174f2Safresh1=for apidoc_section $debugging
1965eac174f2Safresh1=for apidoc Amnh||PL_sv_serial
1966eac174f2Safresh1
1967*3d61058aSafresh1=head2 Leaked SV spotting: sv_mark_arenas() and sv_sweep_arenas()
1968*3d61058aSafresh1
1969*3d61058aSafresh1These functions exist only on C<DEBUGGING> builds. The first marks all
1970*3d61058aSafresh1live SVs which can be found in the SV arenas with the C<SVf_BREAK> flag.
1971*3d61058aSafresh1The second lists any such SVs which don't have the flag set, and resets
1972*3d61058aSafresh1the flag on the rest. They are intended to identify SVs which are being
1973*3d61058aSafresh1created, but not freed, between two points in code. They can be used
1974*3d61058aSafresh1either by temporarily adding calls to them in the relevant places in the
1975*3d61058aSafresh1code, or by calling them directly from a debugger.
1976*3d61058aSafresh1
1977*3d61058aSafresh1For example, suppose the following code was found to be leaking:
1978*3d61058aSafresh1
1979*3d61058aSafresh1    while (1) { eval '\(1..3)' }
1980*3d61058aSafresh1
1981*3d61058aSafresh1A F<gdb> session on a threaded perl might look something like this:
1982*3d61058aSafresh1
1983*3d61058aSafresh1    $ gdb ./perl
1984*3d61058aSafresh1    (gdb) break Perl_pp_entereval
1985*3d61058aSafresh1    (gdb) run -e'while (1) { eval q{\(1..3)} }'
1986*3d61058aSafresh1    ...
1987*3d61058aSafresh1    Breakpoint 1, Perl_pp_entereval ....
1988*3d61058aSafresh1    (gdb) call Perl_sv_mark_arenas(my_perl)
1989*3d61058aSafresh1    (gdb) continue
1990*3d61058aSafresh1    ...
1991*3d61058aSafresh1    Breakpoint 1, Perl_pp_entereval ....`
1992*3d61058aSafresh1    (gdb) call Perl_sv_sweep_arenas(my_perl)
1993*3d61058aSafresh1    Unmarked SV: 0xaf23a8: AV()
1994*3d61058aSafresh1    Unmarked SV: 0xaf2408: IV(1)
1995*3d61058aSafresh1    Unmarked SV: 0xaf2468: IV(2)
1996*3d61058aSafresh1    Unmarked SV: 0xaf24c8: IV(3)
1997*3d61058aSafresh1    Unmarked SV: 0xace6c8: PV("AV()"\0)
1998*3d61058aSafresh1    Unmarked SV: 0xace848: PV("IV(1)"\0)
1999*3d61058aSafresh1    (gdb)
2000*3d61058aSafresh1
2001*3d61058aSafresh1Here, at the start of the first call to pp_entereval(), all existing SVs
2002*3d61058aSafresh1are marked. Then at the start of the second call, we list all the SVs
2003*3d61058aSafresh1which have been since been created but not yet freed. It is quickly clear
2004*3d61058aSafresh1that an array and its three elements are likely not being freed, perhaps
2005*3d61058aSafresh1as a result of a bug during constant folding. The final two SVs are just
2006*3d61058aSafresh1temporaries created during the debugging output and can be ignored.
2007*3d61058aSafresh1
2008*3d61058aSafresh1This trick relies on the C<SVf_BREAK> flag not otherwise being used. This
2009*3d61058aSafresh1flag is typically used only during global destruction, but also sometimes
2010*3d61058aSafresh1for a mark and sweep operation when looking for common elements on the two
2011*3d61058aSafresh1sides of a list assignment. The presence of the flag can also alter the
2012*3d61058aSafresh1behaviour of some specific actions in the core, such as choosing whether to
2013*3d61058aSafresh1copy or to COW a string SV. So turning it on can occasionally alter the
2014*3d61058aSafresh1behaviour of code slightly.
2015*3d61058aSafresh1
2016898184e3Ssthen=head2 PERL_MEM_LOG
2017898184e3Ssthen
2018b8851fccSafresh1If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both
2019*3d61058aSafresh1memory and SV allocations go through logging functions, which is handy
2020*3d61058aSafresh1for breakpoint setting.
2021898184e3Ssthen
2022b8851fccSafresh1Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is
2023b8851fccSafresh1also compiled, the logging functions read $ENV{PERL_MEM_LOG} to
2024b8851fccSafresh1determine whether to log the event, and if so how:
2025898184e3Ssthen
2026898184e3Ssthen    $ENV{PERL_MEM_LOG} =~ /m/           Log all memory ops
2027898184e3Ssthen    $ENV{PERL_MEM_LOG} =~ /s/           Log all SV ops
2028e0680481Safresh1    $ENV{PERL_MEM_LOG} =~ /c/           Additionally log C backtrace for
2029e0680481Safresh1                                        new_SV events
2030898184e3Ssthen    $ENV{PERL_MEM_LOG} =~ /t/           include timestamp in Log
2031898184e3Ssthen    $ENV{PERL_MEM_LOG} =~ /^(\d+)/      write to FD given (default is 2)
2032898184e3Ssthen
2033898184e3SsthenMemory logging is somewhat similar to C<-Dm> but is independent of
2034898184e3SsthenC<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and
2035898184e3SsthenSafefree() are logged with the caller's source code file and line
2036898184e3Ssthennumber (and C function name, if supported by the C compiler).  In
2037*3d61058aSafresh1contrast, C<-Dm> is directly at the point of C<malloc()>.  SV logging
2038*3d61058aSafresh1is similar.
2039898184e3Ssthen
2040898184e3SsthenSince the logging doesn't use PerlIO, all SV allocations are logged and
2041898184e3Ssthenno extra SV allocations are introduced by enabling the logging.  If
2042898184e3Ssthencompiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV
2043898184e3Ssthenallocation is also logged.
2044898184e3Ssthen
2045e0680481Safresh1The C<c> option uses the C<Perl_c_backtrace> facility, and therefore
2046e0680481Safresh1additionally requires the Configure C<-Dusecbacktrace> compile flag in
2047e0680481Safresh1order to access it.
2048e0680481Safresh1
2049898184e3Ssthen=head2 DDD over gdb
2050898184e3Ssthen
2051898184e3SsthenThose debugging perl with the DDD frontend over gdb may find the
2052898184e3Ssthenfollowing useful:
2053898184e3Ssthen
2054898184e3SsthenYou can extend the data conversion shortcuts menu, so for example you
2055898184e3Ssthencan display an SV's IV value with one click, without doing any typing.
2056898184e3SsthenTo do that simply edit ~/.ddd/init file and add after:
2057898184e3Ssthen
2058898184e3Ssthen  ! Display shortcuts.
2059898184e3Ssthen  Ddd*gdbDisplayShortcuts: \
2060898184e3Ssthen  /t ()   // Convert to Bin\n\
2061898184e3Ssthen  /d ()   // Convert to Dec\n\
2062898184e3Ssthen  /x ()   // Convert to Hex\n\
2063898184e3Ssthen  /o ()   // Convert to Oct(\n\
2064898184e3Ssthen
2065898184e3Ssthenthe following two lines:
2066898184e3Ssthen
2067898184e3Ssthen  ((XPV*) (())->sv_any )->xpv_pv  // 2pvx\n\
2068898184e3Ssthen  ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
2069898184e3Ssthen
2070898184e3Ssthenso now you can do ivx and pvx lookups or you can plug there the sv_peek
2071898184e3Ssthen"conversion":
2072898184e3Ssthen
2073898184e3Ssthen  Perl_sv_peek(my_perl, (SV*)()) // sv_peek
2074898184e3Ssthen
2075898184e3Ssthen(The my_perl is for threaded builds.)  Just remember that every line,
2076898184e3Ssthenbut the last one, should end with \n\
2077898184e3Ssthen
2078898184e3SsthenAlternatively edit the init file interactively via: 3rd mouse button ->
2079898184e3SsthenNew Display -> Edit Menu
2080898184e3Ssthen
2081898184e3SsthenNote: you can define up to 20 conversion shortcuts in the gdb section.
2082898184e3Ssthen
2083b8851fccSafresh1=head2 C backtrace
2084b8851fccSafresh1
2085b8851fccSafresh1On some platforms Perl supports retrieving the C level backtrace
2086b8851fccSafresh1(similar to what symbolic debuggers like gdb do).
2087b8851fccSafresh1
2088*3d61058aSafresh1The backtrace returns the stack trace of the C call frames, with the
2089*3d61058aSafresh1symbol names (function names), the object names (like "perl"), and if
2090*3d61058aSafresh1it can, also the source code locations (file:line).
2091b8851fccSafresh1
2092*3d61058aSafresh1The supported platforms are Linux, and OS X (some *BSD might work at
2093*3d61058aSafresh1least partly, but they have not yet been tested).
2094b8851fccSafresh1
2095*3d61058aSafresh1This feature hasn't been tested with multiple threads, but it will only
2096*3d61058aSafresh1show the backtrace of the thread doing the backtracing.
2097b8851fccSafresh1
2098b8851fccSafresh1The feature needs to be enabled with C<Configure -Dusecbacktrace>.
2099b8851fccSafresh1
2100b8851fccSafresh1The C<-Dusecbacktrace> also enables keeping the debug information when
2101b8851fccSafresh1compiling/linking (often: C<-g>).  Many compilers/linkers do support
2102b8851fccSafresh1having both optimization and keeping the debug information.  The debug
2103b8851fccSafresh1information is needed for the symbol names and the source locations.
2104b8851fccSafresh1
2105b8851fccSafresh1Static functions might not be visible for the backtrace.
2106b8851fccSafresh1
2107b8851fccSafresh1Source code locations, even if available, can often be missing or
2108*3d61058aSafresh1misleading if the compiler has e.g. inlined code.  Optimizer can make
2109*3d61058aSafresh1matching the source code and the object code quite challenging.
2110b8851fccSafresh1
2111b8851fccSafresh1=over 4
2112b8851fccSafresh1
2113b8851fccSafresh1=item Linux
2114b8851fccSafresh1
2115*3d61058aSafresh1You B<must> have the BFD (-lbfd) library installed, otherwise C<perl>
2116*3d61058aSafresh1will fail to link.  The BFD is usually distributed as part of the GNU
2117*3d61058aSafresh1binutils.
2118b8851fccSafresh1
2119*3d61058aSafresh1Summary: C<Configure ... -Dusecbacktrace> and you need C<-lbfd>.
2120b8851fccSafresh1
2121b8851fccSafresh1=item OS X
2122b8851fccSafresh1
2123*3d61058aSafresh1The source code locations are supported B<only> if you have the
2124*3d61058aSafresh1Developer Tools installed.  (BFD is B<not> needed.)
2125b8851fccSafresh1
2126*3d61058aSafresh1Summary: C<Configure ... -Dusecbacktrace> and installing the Developer
2127*3d61058aSafresh1Tools would be good.
2128b8851fccSafresh1
2129b8851fccSafresh1=back
2130b8851fccSafresh1
2131b8851fccSafresh1Optionally, for trying out the feature, you may want to enable
2132b8851fccSafresh1automatic dumping of the backtrace just before a warning or croak (die)
2133b8851fccSafresh1message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
2134b8851fccSafresh1for Configure.
2135b8851fccSafresh1
2136b8851fccSafresh1Unless the above additional feature is enabled, nothing about the
2137b8851fccSafresh1backtrace functionality is visible, except for the Perl/XS level.
2138b8851fccSafresh1
2139*3d61058aSafresh1Furthermore, even if you have enabled this feature to be compiled, you
2140*3d61058aSafresh1need to enable it in runtime with an environment variable:
2141*3d61058aSafresh1C<PERL_C_BACKTRACE_ON_ERROR=10>.  It must be an integer higher than
2142*3d61058aSafresh1zero, telling the desired frame count.
2143b8851fccSafresh1
2144b8851fccSafresh1Retrieving the backtrace from Perl level (using for example an XS
2145b8851fccSafresh1extension) would be much less exciting than one would hope: normally
2146b8851fccSafresh1you would see C<runops>, C<entersub>, and not much else.  This API is
2147b8851fccSafresh1intended to be called B<from within> the Perl implementation, not from
2148b8851fccSafresh1Perl level execution.
2149b8851fccSafresh1
2150b8851fccSafresh1The C API for the backtrace is as follows:
2151b8851fccSafresh1
2152b8851fccSafresh1=over 4
2153b8851fccSafresh1
2154b8851fccSafresh1=item get_c_backtrace
2155b8851fccSafresh1
2156b8851fccSafresh1=item free_c_backtrace
2157b8851fccSafresh1
2158b8851fccSafresh1=item get_c_backtrace_dump
2159b8851fccSafresh1
2160b8851fccSafresh1=item dump_c_backtrace
2161b8851fccSafresh1
2162b8851fccSafresh1=back
2163b8851fccSafresh1
2164898184e3Ssthen=head2 Poison
2165898184e3Ssthen
2166898184e3SsthenIf you see in a debugger a memory area mysteriously full of 0xABABABAB
2167898184e3Ssthenor 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see
2168898184e3SsthenL<perlclib>.
2169898184e3Ssthen
2170898184e3Ssthen=head2 Read-only optrees
2171898184e3Ssthen
2172*3d61058aSafresh1Under ithreads the optree is read only.  If you want to enforce this,
2173*3d61058aSafresh1to check for write accesses from buggy code, compile with
2174*3d61058aSafresh1C<-Accflags=-DPERL_DEBUG_READONLY_OPS> to enable code that allocates op
2175*3d61058aSafresh1memory via C<mmap>, and sets it read-only when it is attached to a
2176*3d61058aSafresh1subroutine. Any write access to an op results in a C<SIGBUS> and abort.
2177898184e3Ssthen
2178898184e3SsthenThis code is intended for development only, and may not be portable
2179898184e3Sstheneven to all Unix variants.  Also, it is an 80% solution, in that it
2180*3d61058aSafresh1isn't able to make all ops read only.  Specifically it does not apply
2181*3d61058aSafresh1to op slabs belonging to C<BEGIN> blocks.
2182898184e3Ssthen
21836fb12b70Safresh1However, as an 80% solution it is still effective, as it has caught
21846fb12b70Safresh1bugs in the past.
21856fb12b70Safresh1
21866fb12b70Safresh1=head2 When is a bool not a bool?
21876fb12b70Safresh1
2188eac174f2Safresh1There wasn't necessarily a standard C<bool> type on compilers prior to
2189eac174f2Safresh1C99, and so some workarounds were created.  The C<TRUE> and C<FALSE>
2190eac174f2Safresh1macros are still available as alternatives for C<true> and C<false>.
2191eac174f2Safresh1And the C<cBOOL> macro was created to correctly cast to a true/false
2192*3d61058aSafresh1value in all circumstances, but should no longer be necessary.  Using
2193*3d61058aSafresh1S<C<(bool)> I<expr>>> should now always work.
21946fb12b70Safresh1
2195eac174f2Safresh1There are no plans to remove any of C<TRUE>, C<FALSE>, nor C<cBOOL>.
2196eac174f2Safresh1
2197eac174f2Safresh1=head2 Finding unsafe truncations
2198eac174f2Safresh1
2199eac174f2Safresh1You may wish to run C<Configure> with something like
22006fb12b70Safresh1
22016fb12b70Safresh1    -Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32'
22026fb12b70Safresh1
2203*3d61058aSafresh1or your compiler's equivalent to make it easier to spot any unsafe
2204*3d61058aSafresh1truncations that show up.
2205898184e3Ssthen
2206898184e3Ssthen=head2 The .i Targets
2207898184e3Ssthen
2208898184e3SsthenYou can expand the macros in a F<foo.c> file by saying
2209898184e3Ssthen
2210898184e3Ssthen    make foo.i
2211898184e3Ssthen
221291f110e0Safresh1which will expand the macros using cpp.  Don't be scared by the
221391f110e0Safresh1results.
2214898184e3Ssthen
2215898184e3Ssthen=head1 AUTHOR
2216898184e3Ssthen
2217898184e3SsthenThis document was originally written by Nathan Torkington, and is
2218898184e3Ssthenmaintained by the perl5-porters mailing list.
2219*3d61058aSafresh1
2220